The human brain is a pattern recognition machine.
What we 'see' incorporates just a tiny proportion of the pixel data coming through our eyes.
At night there is much less data coming through our eyes so we have to mgive extra weight to vague shapes and so the brain has to interpret what these shapes mean. That is why ghosts ONLY appear in badly lit conditions.
Auditory input requires a lot less 'data' as you can see when you compare the size of an MP3 file compared to a JPEG file.
And likewise it does not need much to trigger a sound memory.
Music is a case in point, In a few different office environments I have indulged in 'song seeding'. This is the game where you whistle or hum a well known tune as you pass a colleague consentrating on their work. If this person is then later observed singing/humming etc, this song, you gain points. It only takes a few bars of the song to be successful.
Dreams and hallucinations occur when the brain is cut off from its primary input.
Because those pattern recognition parts of the brain are still active, and they are not getting any input from your eyes, then those bits of noise from other electrical activity still occurring in the brain have EXTRA significance.
Sounds require less data to activate a memory. That's why rustling trees in the dark can sound like whispering voices.