I can't see much difference. After the 'n' you have your tongue of the alveolar ridge. Both /s/ and /t/ are unvoiced. To say the /s/, you have to remove your tongue from the alveolar ridge; this almost automatically makes a /t/ before the /s/.
In contrast, saying 'silenze' or 'fluenzy' doesn't present this problem.
I can just say 'silence' and 'fluency' without an interpolated /t/, but it takes an effort.
Student or Learner