One way to understand the difference is by comparison with written language.
In written English every word is made up of a combination just 26 letters of the alphabet. But for each letter there are many ways to write it, e.g. A,a,A,a,A,a,A,a,A,a (plus all the variations we find in handwriting) - but ALL of these different shapes belong to the single abstract category of 'A'. To 'decode' the written language, we need to know which of the 26 letters (categories) it belongs to, and we ignore the differences in fonts and handwriting.
Similarly in spoken English, every word is made up of a combination the (in British RP) just 44 sound categories (=phonemes). But for each phoneme there are many variations (=allophones), e.g. /h/ is said differently in hot, hurt, heart and hit - but ALL of these different sounds belong to the abstract category of the phoneme /h/. There are many variations (allophones) depending on the neighbouring sounds, the speaker's accent, and the speaker's own particular style of speaking. To 'decode' speech, we need to know which of the 44 phonemes it belongs to, and we ignore (and for native speakers, we rarely even notice) the different allophones.
So, phonemes are the meaningful categories that enable us to correctly identify which word the speaker is saying.
Allophones are the variations within that category that don't affect the meaning.
The challenge when we start learning a foreign language is that we hear many different sounds, but we don't know which variations in sound are important (phonemic) and which can be ignored (allophones) because these categories are different for every language.