The Science Behind What We Hear And How We Hear The Music | Audio Explained by Argha

0 comments

In this episode we will learn audio from very basic to very advance psychoacoustics – We will start with What is Sound Pressure Level does to your Eardrum, to what makes the Timbre of the instrument real, how music is transferred from your headphones to you. Then we will talk about how our brain receives the sound and how your ears shape it, which makes us so different from one another, we will be answering the confusion of why we all hear differently. We will learn about how headphones achieve soundstage / imaging / details. We will dive deep into Diffuse Field, HRTF, and lot more which are foundation to understand headphone measurements. Covering all these things, we will talk about what our biological limitations are, and why we can’t sense everything that the driver produces, like auditory masking and cone of confusion. Covering all this, at the end we will discuss some common myths around the audio hobby and why they are just illusions and marketing gimmicks mostly. How can you be smart with your finance while obtaining this hobby.

I made a Public Draft of the Handwritten Notes; you can suggest changes (simplify / explain) and download it and use it as per your liking - https://shorturl.at/3Fkmf

INDEX

Need for Scientific Understanding in Audiophile Hobby

Level 1 - Unfolding MUSIC

Level 2 - Unfolding EAR

Level 3 - Unfolding BRAIN

Level 4 - Unfolding LIMITS

Level 5 - Unfolding MYTHS

CONCLUSION

But let’s begin with -

WHY DO YOU NEED TO UNDERSTAND THE SCIENCE BEHIND WHAT WE HEAR?

Hope you learned something from my previous video/article, where I talked about the underlying fact that is hidden from you which makes music good or bad. In this episode, we will be taking a deep dive into how we can better understand what we are hearing. Needless to say, this video is extremely important for understanding headphone/earphone measurements, which I will cover in the next video/article.

Let's start with the main problem.

You are confused more than you know. Well, most of you are. Since you got into this hobby of portable music, reviewers have presented you with terms like Soundstage, Details, Imaging. Well, while it is simple to learn what those words are - by doing a simple Google search, often these mean something more complicated and twisted.

Often you hear someone say, "This headphone/IEM has very good imaging. The soundstage is like speakers." You buy that headphone and you simply do not understand why nothing like that is happening. Well, it's not your fault. Audio is a hobby and not something to be thought about critically or academically, right?

But wait a second. You are paying a premium for these things; you are spending money on something. So, you deserve to know the truth. In this video, I expose you to the truth, which can help you filter through hundreds of reviews to extract what is important. And not to mention, this is the fundamental part of learning measurements, so do pay attention in this episode.

Now to start things off, we need to understand a few things. If you don't understand anything even after trying, just let it go – at the end, everything will make sense.

LEVEL 1 - What is "SOUND"?

HOW WE HEAR THINGS? (YOU CAN SKIP THIS WHOLE LEVEL IF YOU DON'T WANT THE BORING BASICS)

What is sound? Sound is just a transfer of energy.

Think of it this way.

Suppose air is made from tiny particles. When a headphone driver pushes those particles, what will happen? When you push water in a swimming pool, what happens? It creates a wave, and it spreads. Sound works exactly like that, but the particles in the air don't displace themselves. They just push their neighbouring particles and stay steady at their position, transferring only the energy.

And what happens when you push something? It creates pressure fluctuations. That pressure travels through air and goes into your ears and pressurizes your eardrums, and we hear that as a sound. We call it Sound Pressure.

But how does this pressure make music and not a single short burst of sound? We will get to that now.

(Above Picture is a very simplistic way to showcasing waveform)

SINUSOIDAL WAVE

So, we understood what sound is and how it propagates like a wave (like a wave in water), but what is the simplest type of wave that we can get? The answer is "Sine Wave.

" What is a sine wave? - It's just gradual up and down, that's it. It's as simple as it gets. When the driver moves outward, it creates high pressure; when it moves inward, it creates low pressure. How does this sound? You can go to this website - https://www.szynalski.com/tone-generator/

Sine waves are also called Pure Tones, due to being the purest and most basic tone of the bunch. We will use this tone a lot to understand almost everything in upcoming days. Because everything you hear is just made by Sinewaves, by a procedure called “Additive Synthesis”.

Now I hope the above two concepts (of Sound Nature) are somewhat clear in your mind. We discussed about sound in 2 Dimensions, time & pressure. Now let's dive into another Dimensions for a more practical view of the real world sound.

FREQUENCY

Frequency is just the number of waves that can fit in a second. Now remember that when a wave rises and declines and then comes back to zero again, we call it a cycle. This cycle is important.

Since 1 Cycle in a Second means 1Hz, 2 Cycles in a Second means 2Hz. 20 Cycles in a second means 20Hz, which is the lower limit of out hearing, below 20Hz we can experience sound, but that just a tactile nature / feeling of vibration instead of hearing through eardrums. And out upper limit - 20 thousand cycles in a second which means 20KHz (1 Kilo = 1000).

So, what it simply means is, if it's 20Hz, we can hear that, and we can hear until 20KHz. So 20Hz to 20KHz becomes our audible range. Everything in between we can hear by the eardrums. But your age/ear health determines how much your audible range is. For me, it's around 19.3KHz, since I am very young.

So, what did we learn? If a sound completes 10,000 cycles in 1 second (Up-Down-Zero), we can hear it as a 10KHz tone. Same with anything from 20Hz to 20KHz. It's just the repetition of the waves.

AMPLITUDE

Now what is Amplitude?

Amplitude is the height of the waves. The larger the height, the larger the amplitude. The shorter the height, the shorter the amplitude – that's it. But what is amplitude in real life? Volume. Yes, it simply means how loud the signal can be. (As we are discussing things from a hobbyist POV, we won't look super deep into how it performs in Energy Transfer per square calculations.)

(You can see, the waves are much expanded in the Loud signal and much narrow in the soft signal)

Now here is a tricky part: we don't use random numbers as volume. We use specific ratio scales to measure volume known as dB or Decibels. And again, we don't say volume just like that, because it could mean a lot of things that may destroy the basic understanding.

So, from now on we will use the term SPL (Sound Pressure Level). Yes, the exact sound pressure we discussed before, and the level of that wave/pressure is what we call SPL, which ultimately denotes the loudness of a wave.

Now, the scale which we will be using is not a linear scale (Linear means it increases or decreases steadily, meaning it follows a simple rule of multiplication.Example - 1 liter of water is 1kg, 2-liter water is 2kg, 100-liter water is 100kg).

Sound works on a logarithmic scale, which means the next whole value you are getting is not 2 times more. So, think of it this way: 1L water is 1KG, but 2L water is not 2KG (It might be higher or lower). Now this all might sound a bit confusing, so here is the easiest explanation.

We hear sound in Pascals. Our hearing can be from 20 micro-Pascal to 20 Pascals. Which means there are millions of values that we need to work with to determine volume (1 Pascal = 1Million micro pascal). So, we use a ratio that makes the values easy to work with. Which is -

Decibels!

0dB – 120dB is our typical hearing range (0 being the quietest we can hear, to 120 being the loudest we can hear without going deaf). And from now on we will use this decibel scale and the term SPL to denote it. [+10dB is roughly 2X perceived volume jump].

TIME DOMAIN (IN SPEAKER DRIVERS)

What is another Dimension that we were talking about? It's the time domain. So, what is time domain and how is it helpful to us?

Have you heard about reviewers describing Attack and Decay? Well, it's that, but not as imaginary as most reviewers describe.

To understand that we need to learn about a simple signal “Impulse Signal” - A very very tiny & short burst of signal that we can send to any speaker. Imagine the shortest amount of time, and we play that signal in the speaker for that amount of tiny time. What will the driver do? Imagine I give a 'tick' sounding signal consisting of 1KHz frequency at 80dB to a dynamic driver (I explained everything in the previous points, so if you are unfamiliar, go back and take a look). What will the dynamic driver do? (Dynamic Driver Means the traditional speaker driver you have seen till this day).

It will create a wave and will try to stop suddenly after that. But drivers are created with something physical material that has weight, and they are not digital. So, they will wiggle a bit before settling down to the 0 position. Simply, the time a driver takes to move itself to produce the frequency and the time it takes to settle down to the normal position is what we analyse in the time domain. How much time it is taking from being completely still to loud to again getting completely still.

Which simply means attack and decay.

(You can see the wave rises and then dips down a bit too much then it adjusts itself to reach to a stationary position. The rise is attack and the process of becoming stationary is Decay)

Now most people separate these Attack and Decay from the Frequency Response Curve (Don't worry, we will get back to it).

Which is not true. Why? Because of this Not-So-Simple equation "Fourier Transformation." But before discussing Fourier, we need to discuss another thing called "Complex Signals and Farina Method".

At the end you need to remember two things, attack and decay are a crucial metric that is Frequency Response from a different lens.

WHAT IS TIMBRE?

So, we understood about sine waves, right? But do we hear sine waves or pure tones in music? Or anywhere in the world? The answer is no. So, what do we hear? We hear a combination of various sine waves. And that is a complex signal.

Let's understand by an example.

We do know what notes in music are. Like A, A#, B, C, C#, D, D# and so on.

What is A? It's a frequency after all. To be specific, a 440Hz signal. What is B? 494Hz. What is C? 523Hz. (Not accurate values, just approximation). Any note in music is nothing but a frequency.

But wait a minute. Play an A note on a piano and play the same A note on a guitar. Do they sound the same? Absolutely not. But they are playing the same frequency, so why not?

Because of a simple phenomenon called "Harmonics." When we play both notes on piano and guitar, both generate the same frequency at 440Hz. But they also generate frequencies at 880Hz, 1320Hz, 1760Hz and so on. Can you recognize a pattern? It's the multiplication of the base tone 440Hz. 440×2 / 440×3 / 440×4 and so on and so forth. Piano & Guitar both produces harmonics like that and we hear the collection of all the frequencies together, not a single frequency.

Now it's pretty simple from now on frequencies that are multiplied by even numbers (2/4) are even-order harmonics (like 880Hz, 1760Hz) and frequencies that are multiplied by odd numbers (3/5) are odd-order harmonics (like 1320Hz). And the base tone is called the Fundamental Harmonics.

(10hz – Fundamental Harmonics, 20hz – 2^nd Order Harmonics, 30hz – 3^rd Order Harmonics, 40hz - 4^th Order Harmonics)

So simply, we don't hear a single frequency when we hear some instrument play a note. We hear a lot of multiples of the fundamental/base frequency, which are also known as overtones. So, the Fundamental Frequency + Overtones gives character to the note from which we can distinguish instruments. Like Guitar/Piano, even if they play the same frequency.

So, it's not just what frequency they are playing, it is also what the character of overtones they are producing that makes us hear things distinctively. Which we indicate as "Timbre" of an instrument.

COMPLEX TONE

Now I'm coming to the complex tones again. So, we understood that we are not hearing a simple pure tone like a sine wave; we are hearing a lot of frequencies. So how can we hear all these frequencies at the same time? Is it some kind of magic? No, transducers can produce more than 20 thousand frequencies at the same time. Hence they are rated 20hz – 20Khz in headphones. Some are rates till 40Khz, which means they can produce 40 Thousand frequencies at the same time, well it’s not that simple still, it’s making more frequency we can hear, so we can call it a waste in a theoretical sense. That's why they can adapt to the TIMBRE of the instrument, they can produce the fundamental and the overtones precisely.

But it gets even more complex.

What happens when two instruments play together? Again, it creates another complex summation of the two-instrument frequencies in the specific time domain. What about three? The frequency becomes more complex, 4…5…6… and the complexity goes on. Yet, we need to remember that everything is going on in the 20Hz – 20KHz region.

(Red – Guitar, Blue – Piano, if both played together, we get - Purple)

Now you can experiment with different noises to play around, like White Noise – which creates similar frequency in every bandwidth (bandwidth means a group of frequencies that we can hear distinctively enough). Pink Noise – White noise but less prominent in higher frequencies, more like a downward slope, which resembles human hearing more naturally. Why this is important: because it can give you an idea of the transducers to a point where rarely any program material / your music alone could reach.

(The Green Line – Resultant Frequency is a product of adding all the harmonics of an instrument)

Now that we learned about the harmonics, it should be easier to understand harmonic distortions later in this series. So, stay tuned for upcoming videos. Now before talking about the last few fundamental things, we should talk a bit about Phase.

RELATIONSHIP BETWEEN TWO SOUND WAVES

Well, let's investigate the sine wave again. It has highs and it has lows. When the high pressure comes, the driver excurses making high-pressure energy transfer, and when it retracts, it creates low pressure.

But what will happen if we take two sine waves and play them simultaneously at a single place? Or simply add them?

The highs will get higher (2X the pressure), the lows will get even lower. But why? Because as I told you in complex signals, summing up the signals causes a complex signal that takes both the previous waves as input.

In this case: (+1) + (+1) makes (+2) which is high pressure and (-1) + (-1) makes (-2 ) which is low pressure.

So, it is not the same as the original +1 and -1 signal that we fed to the driver. It's now +2 and -2, which means the Amplitude (discussed previously) gets a massive boost just by combining the signal. We call this "Constructive Interference" since the data is added but not lost. Now it's only possible if we start both sine waves at the same time.

But what will happen if we delay the 2nd signal a bit?

(+1) + (-1) = 0 and (-1) + (+1) = 0.

Exactly, the signal will become 0. And we couldn't hear anything because they cancelled each other. We call this "Destructive Interference," since the data is lost and not added.

(In this picture the sound signal is the result of constructive interference, and the quiet/silent signal is the result of destructive interference, at the Right-hand side, we also can see a “out of phase” phenomenon, which creates irregularity in the signal).

These two interferences, 1. Destructive and 2. Constructive, play a huge role in acoustics. From In-Ear monitors to Speaker Systems, every piece of equipment suffers from some kind of phase issues that they need to cater for. Suppose in IEMs where a crossover causes some bad time alignment, or in speakers where reflected sound in the room creates destructive interference with the sound that is coming from the speakers. It is a very big problem that we should look out for.

Now any phase cancellation can be observed by the irregularity of the output sound of the transducers, so it affects the quality of the music. It doesn’t need extra steps to calculate; it just simply can be understood by the frequency response graph itself. And constructive of destructive interference will cause irregularity in the frequency response graph. Now, there is a lot to talk about phase, and what I explained is just an outlook of it and not a clear picture, if you want a detailed video/article let me know.

Now when the basics are covered, let’s jump right into the NEXT LEVEL

LEVEL 2

We are rarely going to focus on the LEVEL 1 in this video/article from now on, but it was crucial to understand the next topics to avoid gaps in understanding. Let's start with how we hear sound, or more accurately how sound reaches the brain.

Before starting this discussion, let's talk about what Frequency Response is. Frequency response is a graph that shows each frequency with their respective loudness or SPL. On the horizontal plane we plot the frequency, from 20Hz to 20KHz, and on the vertical plane we plot the loudness of the frequency. Now suppose we plot: 60dB → 20Hz, 60.01dB → 21Hz, 60.02dB → 22Hz. And we continue that to 50Hz, 100Hz, 200Hz, 400Hz, 1KHz, 2KHz, 3KHz and so on till 20KHz, we will get a line. A graph-like structure that tells us: at a certain frequency, the loudness is this. This is an example of the Frequency Response graph with flat frequency at 5dB, which means at every frequency (Hz), a speaker is producing the same volume.

Now this is not possible because for various reasons a headphone can't produce an exact flat frequency. Then how does a typical speaker graph look?

Like this -

Yes, it's uneven and bizarre looking. But it is what it is. Now, don't worry about Bass, Midrange, Treble in very detail right now – you will understand all this better in the next episode. The Fundamental regions are, 20hz-200hz/250hz is bass, 250hz – 4Khz is the midrange and 4Khz to 20khz is the treble.

For now, skipping the basics of graph reading, let's discuss another important thing.

EAR ANATOMY

Now that we have a simple idea of how Frequency Response Graph works, let's jump right into our ear anatomy.

As simple as it gets, we hear by our ears. One left and one right. What we can see from the outside is our pinnae. The fleshy part of the ear. Now in the real world the pinnae is a very important filter or part of the ear structure to detect sound localization position of sound in 3D space.

OUTER EAR

When the sound is in front of us, it goes to the ears directly. But when it's behind us, it first crashes into the pinnae and then gets transferred to our eardrum. Now pinnae is very important to understand the directionality of the sound specially the height. How we hear an object's height and how we hear the object in width. As of now, just remember that pinnae is a crucial factor in how we hear things in different directions. Now when the sound reaches the ear canal, the inner part of the ear that is not easily visible from the outside world does some interesting thing.

So, what is this part? It's simply the portion from the earhole to the eardrum (that helps us to connect the outer world to the eardrum). This section of the ear is the most problematic one. Because this tube-like structure does so many peculiar things to the sound that it makes Headphone/IEM Designers mad. There are a few concepts that you need to understand a bit, but before that we must understand how hard it is to understand and how complex it is to predict something that is so dependent on tiniest part of our hearing system. And as we all know, we don't look the same, we aren't built the same, so everything changes from one person to another. Someone who has a very different ear structure than you are different from someone who has a similar ear structure. We need to always keep that in mind.

EFFECTS OF EAR ANATOMY

What will happen if you blow in a pipe? It will sound weird, honky and amplified. Now as I described the ear anatomy briefly, - because of the unique structure we don't hear a perfect flat sound in nature.

There is a substantial boost that out ear anatomy produces in the midrange that, we don’t hear a completely flat sound in nature. Inherently our ears boost the 1khz to upwards region. Due to evolution of humans or other reasons, it causes a lot of effects to the sound before it reaches our eardrums, and most of the changes happen in a region (in frequency response of our perceived tonality) what we call the ear-gain region. And it looks like this.

(Ignore the 5128DF right now, we will talk about that in the LEVEL 3)

EAR GAIN REGION

The outer part/the pinnae, and the ear canal, cause some parts of the frequency response to be extra sensitive to what we hear. Think of it this way: sound comes and interacts with your ears and changes its tonality in real time before being processed by your brain. And most of it happens at the 1KHz to 16KHz region.

Which means outer ear boosts this range by a lot. But why? It's believed that, due to speech/animal calls being in this region, it is boosted to help us hear those things clearly. Now evolution happened and we are stuck with this amplification in that region.

THE MOST IMPORTANT FACTOR

But the most important thing about the ear gain region is how it shapes the sound. If we look at the data, we can see that the pinnae do a lot of amplification in the 4KHz to 8KHz region, where the ear canal contributes the most to 1-4KHz region. Why is this important? Because there are certain things like headphones and earphones that want to bypass these regions, and they want to directly blast sound into your eardrum. Now the boosting of the frequencies is done by pinnae and ear canal simultaneously. So, it's wrong to say that "this specific region is boosted by pinnae" & "that specific region is boosted by ear canal."

Not only that, your shoulders and body also amplify or cancel out frequencies just by reflecting, so it's not just a head thing – it is a head and torso effect.

Also, there are a few things to remember – there are terms that I need you to be familiar with before going further.

Free Field – which is mostly nature, and the atmosphere that we are in. If some sound is generated in an atmosphere, like someone shouting, we will say it is a free field phenomenon. Some speaker is a room; we call is free field phenomenon.

EEP – which is the short form of Ear Entrance Point or Earhole, from where the sound enters the hole.

DRP – which is the short form of Drum Reference Point or Eardrum, which helps us to detect the pressure fluctuations.

Now factor in some simple things that we need to talk about later:

How many of you have exactly the same pinnae?
How many of you have exactly the same earhole?
How many of you have the same EEP to DRP distance?
How many of you have the same width of the earhole?
How many of you have the same stiffness of eardrum?
How do you react to certain frequencies inherently?

By common sense, if we all have different parts of the ears, we are bound to listen to different levels of frequencies of the same sound. If we have different lengths in ear canal, if we have different width of ear canal, if we have different ear shape, we have a different auditory system than another person.

Ultimately it concludes that - It is almost like a fingerprint. So we need to consider that what we are hearing is not the same.

I will get back to this topic later when we discuss HRTF. But for now, let's understand how we hear a sound in a space/or in real world situation. How can you locate a drum or a guitar that is playing in front of you or left side or right side or on top of you? How do you understand the localization of a sound?

LOCALIZATION OF SOUND / IMAGING IN FREE FIELD

Let's say someone fired a gun at your 30° left, at 6ft above your ear level and 10ft away from you. How would you know that the gun was at your 30° left angle? Because of mainly three things: 1. ILD 2. ITD 3. IID. Let's discuss these topics in detail and understand how you can locate the gunshot (maybe in a game).

(Left is getting louder sound than the right ear)

INTERAURAL LEVEL DIFFERENCE

If a gunshot was fired exactly in front of you, both your ears will receive the exact same level of volume/loudness of the gunshot. Which means if the gunshot is 100dB, your left ear will receive 100dB and your right ear will receive 100dB (Let's not talk about acoustic impedance to make things more digestible).

But the gunshot was fired to the left at 30°. So, it is natural that your left ear will hear a louder sound than the right ear. Why? Because of a thing called head shadow. Your face is blocking the direct sound of the gunshot and at the same time, it is farther away too. So, what do we get by this simply? A sound that is louder in the left ear will sound on the left, a sound that is louder in the right ear will sound on the right.

Now a crucial thing here to remember is Interaural Level Difference is not just your typical panning in stereo recording. Interaural means there should be two ears and their difference. Anything that doesn't have two ears will not create interaural phenomena.

(Sound reaches in your left ear faster than your right ear)

INTERAURAL TIME DIFFERENCE

Now another thing happens at the same time. When a gunshot is fired, the direct soundwave hits your left ear first and then it hits your right ear. The time difference is extremely small, but don't worry, our brain can interpret even 650 microseconds of delay in sound.

Now the time difference in both ears enables us to understand the location of the gunshot. If the gunshot gets fired in a room, it also helps us to detect along with the reflections.

Again, to keep in mind, to access the Interaural Time Difference phenomenon we need to have two ears, because it's "INTER-AURAL."

(Nearer object is louder, further gunshot is quitter)

INTERAURAL INTENSITY DIFFERENCE

You got the 2D placement of the object by using ITD and ILD. But how can you understand how far it is? Simple: by the loudness of the object. If something is quiet, it is happening far away; if something is close, it will sound loud.

People always count Interaural Intensity Difference and Interaural Level Difference as one, but I don't support that, thus this is my warning on how you should receive it – it's up to you.

But at the end we can ensure that, because of these three things IID, ITD & ILD, we can locate a sound source in a 3D space.

(We can’t hear soundstage and imaging in headphones, instead we can perceive them through various different techniques which will be discussed, don’t worry)

BUT WAIT FOR A SECOND, MY HEADPHONE IS WRAPPED AROUND MY EAR

Headphones and IEMs are not something that can give you a free field representation. So how do you perceive this localization in the music in headphones?

Well, headphones and IEMs can't do imaging and soundstage. SHOCKING? Yes, they can't. I really hope you watched my previous video/article (Link - https://youtu.be/1GT-ehGKGNw?si=5z2_IPfJdJ_DO9_m) to understand why – headphones, earphones and speakers do not produce the same music similarly.

But more importantly, we must remember that headphones and earphones don't have the capability to produce soundstage and imaging, but they do produce space and localization. How?

By Music.

IEMs and headphones are largely dependent on music for soundstage and imaging. There is no reason to think that there is something magical happening that is causing the headphones and earphones to have soundstage other than tonality. Now, it would be a lie to say some IEM/headphone doesn't produce a more spacious representation and clearly defined localization of objects. They do and we will talk about that later. But ultimately the most chunk of the soundstage and imaging depends on the track and not the transducers. So, to learn about those things, we need to have a few basic understandings of music production, very little.

1st: let’s talk about how music is recorded

MONO AUDIO

Mono audio is simply audio that has no defined volume difference in both channels. (Channels in audio mostly mean left and right part of the transducer. Left speaker is L-Channel, right speaker is R-Channel. Left side of the headphone/IEM is left channel and right side of the headphone/IEM is right channel.)

Mono audio simply means both channels have the same information. There is no difference that can cause panning from left to right that allows music to sound from left or right. For example, your small Bluetooth speaker is a mono channel audio playback device where there is no left or right indication.

STEREO AUDIO

Stereo means there is a difference between L & R channels. Which means the music allows channels to function differently. So, it is really creating the panning effect and localization stuff by level difference and time difference (Not Interaural Level, since there is no ear attached to the microphone). We will talk about interesting stuff like cross feed, but that is reserved for the future. Most of the music you hear is stereo recordings. From popular tracks to everything you listened to in your childhood mostly is in Stereo. Where music producers and makers use 2 channel speaker system to make the music in the studio.

BINAURAL AUDIO

Now binaural audio is very different than stereo. Remember that stereo recordings don't have ITD, ILD and IID? Well binaural audio has that. It's just a recording process, not a different format, so an MP3 file can hold binaural audio if it is recorded in a specific way. So, what is the way binaural audio is recorded, that it gets the ILD, ITD and IID? Which makes us understand the 3D aspect of the music way better.

Binaural audio is simply recorded with a dummy head. A head just like ours but instead of the eardrums exposed in free field, there are microphones which is situated inside an ear canal. Which captures the music in a room just like a human. And what happens when we record music inside an eardrum (DRP) and not in the free field? IID, ITD, ILD. Which simply means 3D in audio language, gets encoded in the music file itself.

So, when you hear someone review some stereo music with soundstage and imaging, consider what they are trying to explain. It's not magic. It’s just a different information wrapped in a different language, who am I kidding? Even most of my life I treated all these like absolute quantifiable metric. Are they liars? No, absolutely not. But are they are explaining psychoacoustics which is very subjective - Maybe, but the most important thing is, you know what the difference is and understand the review properly.

Now we have a basic understanding of how some things work. But before jumping into the next level, let's discuss another very crucial aspect of our ears: Acoustic Impedance.

ACOUSTIC IMPEDANCE

What is impedance? Impedance in simple understanding is resistance to a flow of any kind. Most of you reading this would have known about electrical impedance measured in Ohms. But what about the air and sound?

INPUT IMPEDANCE

At the end of your ear canal, there is a drum that detects the pressure fluctuation of the air and produces sound. If we investigate the anatomy, we can see three bones are there behind the drum that make the pressure fluctuations into music/sound. (Which is very important since the pressure fluctuations are very weak to get transmitted into liquid in our inner ear.) Think of it like that, you are blowing a paper boat which is sunk inside the water.

Now it's clear that there is a certain resistance when we blast sound into the ear canals. But the problem is, how much? How much does an ear resist the sound that is coming to your ears? That is basically decided by how long your ear canal is, how wide your ear canal is, how stiff your eardrum is, how much earwax is in your ear (EEWWW) and more factors.

This is very tricky to measure since we all have differences, and our wideband acoustic impedance is heavily determined by our anatomy which makes it more like a fingerprint, unique.

(Resistance is Resistive Nature of the Ear, Reflectance is the reflection by Eardrum of the Energy that didn’t get absorbed, Amplitude simply means the loudness)

That issue needs to be solved to provide accurate measures for a broad audience. Which simply means, we have different acoustic impedance, and we need to find some way to produce measurements that can take this into account.

Now whatever we talked about now is called Acoustic Input Impedance, which is dependent on our ear canal. But there is also an Acoustic Output Impedance for a headphone.

OUTPUT IMPEDANCE

What is the difference between open back headphones and closed back headphones or earphones? The answer is acoustic impedance.

OPEN BACK VS CLOSED BACK HEADPHONES

LOW ACOUSTIC IMPEDANCE

What is happening in the open back headphone is – the driver is throwing the sound into your ears. The sound that is not absorbed by your ears (in other word reflectance) is sent out of the headphones. Which means: 1. Driver must work extra hard to get the ears to absorb more energy in certain situations (which is also known as reactance). 2. Drivers will also work without much reflection due to reflection caused by the sound waves, which means less interference with the sound tonality (Which means less constructive and less destructive interferences in frequency response graph). 3. Due to certain reflections being thrown out rather than absorption by the eardrum, the amplitude will be reduced in some frequencies (mainly lower frequencies).

HIGH ACOUSTIC IMPEDANCE

Now what happens with the closed back headphones 1. They seal the ears so no sound can escape, which makes sound unable to get out of the ear canal to the free field and restricts the free field sound from entering the ear canal at the same time. 2. Drivers need to work very little to produce sound pressure fluctuations. Example - Earphones with tiny drivers produce so much loudness because of high acoustic impedance through sealing. A little bit of leak can cause the whole sound to fall apart. And at last, 3. Drivers experience a lot of reflection, which causes buildup in certain frequencies (mainly higher frequencies) which means there is a slight upliftment or cancellation in amplitude that can be noticed very easily.

Measuring acoustic impedance is a very interesting topic in itself too, which I want to cover in another video, but as of now for basic understanding of the subject, this much is enough.

What do we learn through the acoustic impedance? That we all have different acoustic impedances in our ears. If given the same input impedance, the chance of high output impedance varies with that is less (HpTF still exists), low output impedance messes up with the same input impedance even lesser. We will talk about it in the HPTF in the measurements video/article so don't worry if you don't understand this topic right now.

(This is the variation of Blocked Canal Measurements of Closed Backed Headphones on 5 Headphones)

LEVEL 3

Now let's start the real audio game: The Brain.

Do you remember what free field is? It is simply the natural sound source we hear around us. Now as we already discussed in LEVEL 2, our ear anatomy changes the sound based on person to person differently. But what do we call the difference in simpler terms?

EAR TRANSFER FUNCTION

[What is Transfer Function? In mathematics and physics, Transfer Function simply means the deviation from one state to the next. It goes very deep, but we just need to understand that the rate and the magnitude of a changing state is known as transfer function, for this section mainly. We give a system some input. The system can be anything, an ear, a robot, a computer, an instrument. And the system gives us an output. If the output is not the same as the input, we need to study how it is changing – we use transfer function]

Now, what is fed to our ears is not what we hear. We hear a different version of the sound that our brain processes as normal. And the way a sound is given to us and the way it is perceived by us is simply different. To analyse that we need to understand the Ear Transfer Function.

Now as we talked before, our pinnae and our ear canal change the sound (they boost certain frequencies). So, we can divide our Ear Canal Transfer Function into two parts. Number 1 is Free Field to EEP Transfer Function and Number 2 is EEP to DRP transfer function. Number 1 + Number 2 makes our Ear's Transfer Function.

But why do we need to understand pinnae & ear canal effect differently? The simple reason is, modern day systems like headphones and earphones and speakers work on different transfer functions that we need to understand, to get information about their tonality.

INTRODUCTION TO MEASUREMENTS

Now in this section you will learn about the basics of headphone measurements.

There are simply two kinds of headphone measurements we can see:

Blocked Canal (Which means the mic that is used to measure the headphone is at the ear entrance point). This measure helps us to understand the effect of pinnae in headphones and speakers. Because the ear canal is sealed, there is no way the canal is contributing to any change in the frequency. This is also known as Free Field to EEP Transfer Function.
Eardrum Reference Point – Which means, there is an object that mimics the human ear shape with pinnae and the ear canal. And we measure the headphone, earphone, speakers at the eardrum reference point. Which allows us to take the canal contributions into account. This is also known as EEP to DRP Transfer Function.

Now what these measurements allow us is, we can simply measure how much the ear is contributing with absence of each section. 1. We can measure how much the pinnae is contributing 2. We can measure how much the ear canal is contributing. 3. We can measure how much both of them are contributing together.

Which gives us a broad view of the Ear Transfer Function in high resolution and authority. We can break down which part of the ear is causing what.

HRTF (Head Related Transfer Function)

Until this point, we were talking about a single ear. How an ear changes sounds. But we do not have just a single ear. We have two ears. Which means both our head and body change sound in some way (discussed in Level 2 – EAR ANATOMY).

WHY IS HRTF IMPORTANT?

Remember that I told you, we can't hear soundstage and imaging in headphones, and we mostly hear those things in music/games? It's because of HRTF mostly.

So, it's simple enough to understand that drivers in headphones and earphones don't float in space. They are fixed and radiate music sitting in a single space. So, all the directionality of the sound, the localization happens through tonality.

POSITION IN SPACE = TONALITY DIFFERENCE

An instrument that is in front of you won't sound the same if it was playing at the back. Yes IID, ITD and ILD will make us understand where it is located. But here I am talking about the levels in the audible frequency range, for which they sound at some point in 3D space.

So as simple as it goes, how can you localise an instrument in headphones? Because the tonality changes. But how does it magically change without us knowing or doing any kind of noticeable change? Because of again, pinnae and ear canal, a.k.a Ear Transfer Function. If in real life, the instruments change the tonality by itself when positioned at different places, so it’s very natural to detect position in the headphones, the headphone just have to make the shape of it that way.

(R - Right, L – Left, F – Front, B – Back | The tonality changes in every direction, thus we can hear instruments at different position)

But this still seems a bit magical, that “due to the tonality something seems to be positioned in space”. We should hear the coloration instead of the placement, right? Yes, this is where our brain kicks in. When you know how an object sounds, or estimate the tonality close enough, your brain always incorporates that tonality change as spatial information rather than a different instrument.

So, a guitar with its own overtones and harmonics won't sound like a piano, anywhere in the space around you.

BUT THERE IS A BIG BIG PROBLEM

Because HRTF depends on the Ear Transfer Function, and we all have different shapes of pinnae and ear canals, we all hear the localization of the object differently around us.

But before starting this discussion, we should understand how HRTF gets measured.

HOW TO MEASURE / KNOW YOUR HRTF?

So, we can't hear the exact frequency of sound as nature. But we can calculate how different we are hearing than the nature. By a simple method called MIRE (Mic in Real Ear). What does it do? As I mentioned before, measurement at eardrum but for real humans. For you and me.

So, what is it? Simply the difference between what sound is thrown at us and what we are receiving. To understand it a bit better let's jump into how it is measured.

Suppose you want to measure your HRTF. You will be seated in a chair. And speakers will be placed in a 3D spherical cage all around you at some distance. You will be covered in a lot of speakers. Now a mic (very thin probe mic), will be placed inside your ear near your eardrum. Then after placing the mic, every speaker in the sphere will make a sine sweep (a signal from 20Hz – 20KHz, aka Hearing Range). And the mic will record each of the speakers and their tonality at each point in 3D space. 5°, 10°, 15°…180° in both width wise and height wise. After getting all the measurements by the mic in your ear, a program will calculate the measurements and present you with your HRTF.

Now this is a simplistic approach, there are layers on how to measure your HRTF, by rotating chair, weighted graphs and a lot more. So, if you like this video, let me know, I will make a video regarding this soon.

NEUTRALITY

Now what you have got is a fingerprint of how the sound changes in your ear. And this change will help you to understand what NEUTRAL TONALITY is. And as I already mentioned, we all have different ears, and ear canals, so we all have different HRTF. The definition of neutrality is different for everyone. You might have a big peak in 10KHz region, I might not, you might have a big dip in 7K region, I might not.

It's like a fingerprint; everyone has different definitions of neutrality in their head.

WHAT IS THE DIFFUSE FIELD (or DF)?

If you play a random noise (white noise) in your headphones, how would you perceive it? Will it sound like it is coming from a room? No because headphones sit on our ears.

The noise will sound like it is coming from inside our head. Our whole atmosphere will be wrapped by the noise. Now this is the actual representation of the headphone. It will throw music from all around you, not like a speaker in a room. Because the headphones don't take room/atmosphere into consideration while producing the sound, we call it Diffuse Field as in, all the sound is diffused in a single point in our ear entrance point projected by headphones driver.

Let's remember how we measured HRTF? There were speakers all around you. And they are projecting sound to your ears. Is that Diffuse Field? Absolutely yes. Because HRTF only can be measured in an anechoic chamber where no reflection is available.

So, the graph we saw right now, is DF-HRTF of your ear. Which is unique than another person's DF-HRTF.

(DF-HRTF is the average of all the HRTF data points we get in a HRTF measurement test, which simply means – if we take all the speakers around you into consideration what will be the average of all the directions? That is DF-HRTF)

DF-HRTF IS NOT A TARGET

Now it is easy to assume that one's DF-HRTF is the most neutral to them and we should chase that. But this is not the case, as for various reasons, subjectively and objectively listeners prefer a downward tilt in their music listening devices along with a bass boost. Which means, although we can deduce a lot of things with DF-HRTF, it's not a target that one should aim for. We will discuss things like in-room response and equal loudness contours later, but as of now we just need to remember that DF-HRTF is not a target. (There is also something called Flat Speaker in Anechoic Chamber, which we will discuss in the measurement video).

WHY SAME HEADPHONES DON'T SOUND THE SAME TO EVERYONE

While it is true that DF-HRTF is our fingerprint, it is also true that we are humans, and we have preferences. If something sounds neutral to you it doesn't mean that it is configured for your HRTF, but it means that your preference worked there too. So, it is almost impossible to know your HRTF without measuring it. Just because something sounds peaky, or dark in some region doesn't necessarily mean that your HRTF is the answer to that. It is also your preference that you made while growing up and by your music taste that has a part of contribution too.

IF WE ALL HAVE DIFFERENT HRTF, THEN HOW ARE HEADPHONES DEVELOPED?

So, yes, we all have different DF-HRTF but when some companies design some product for a mass market, they take the average DF-HRTF into consideration. What simply means is, a headphone can't be built for you, but it can be built for a large population. And averaging the population gives us a "Population Average DF-HRTF."

And as we already discussed, it's all about positioning the objects. So even game engines, movie audio, music, immersive audio for headphones - everything is based upon generic DF-HRTF.

So, what you get is a generic idea of HRTF and not your personalized one. Companies like Sony and Apple are doing research on how to get personalized HRTF for your own ears, by analysing photos of ear or including mic in their earbuds, and to be honest the future of immersive audio looks very promising to me. Which means, if everything is tuned to your HRTF, you will have a better definition of the instruments, better placement of objects in 3D space and that is a huge advantage for customers.

BUT IF EVERYTHING USES GENERIC HRTF THEN WHY CAN WE LOCALIZE INSTRUMENTS IN MY CURRENT HEADPHONES?

Because of an incredible ability of our brain:

AUDITORY SPATIAL PLASTICITY

Our brain does this insane trick when exposed to generic/non-personal HRTF. It adapts to it. It simply curves itself and makes it possible to identify the directionality of the sound, from another point of view of a different person.

It takes time, it takes some effort consciously and unconsciously, but after a certain time, our brain adapts to the nature of that generic HRTF. Have you noticed when listening to a new headphone or IEM, that after a month or two, they started sounding more correct in directionality? Not only on headphones, even if your ear is wounded somehow, it will take just a few months to get adjusted to your new HRTF.

Now, there is a magical thing that happens. Whenever your brain adapts to another HRTF, it remembers that. For how long I don't know, but suppose you are changing headphones rapidly, you are not going to forget how the previous headphone acquired the directionality. It will become like a muscle memory. Thus, we call it Auditory Spatial Plasticity. There are papers regarding this which I would love to discuss in the future.

COCKTAIL PARTY EFFECT

Suppose you walked into a room full of people and your partner started to talk to you. You automatically cancel out the environment and focus on your partner's voice. This is the power of brain understanding and filtering information. In audio it is called the Cocktail Party Effect, which also plays a huge role in identifying the localization of a sound object.

NOW THERE ARE STILL LIMITATIONS

Tonality changes the position of the sound object; we adapt to it. Easy right? Well, not really. Your brain is powerful but it's not a 3D Sound Processor. There are two things that can be very tricky to counter. Although I am not going to go deep into the topics, it is better to know.

Cone of Confusion – Although it might seem like we can map each centimetre of the 3D space around us, we do encounter confusions. And one of the main confusions we encounter is the Cone of Confusion where we simply have a hard time telling location of different sound objects laterally to our ears.
Front Back Reversal – Front Back Reversal is a classic problem, where brain gets confused if the sound is coming from the back of the head or the front.
There are also things like Localization Blur, MAMA, MAA which will be too much for this article/video thus skipping entirely.

Now, everything in the Auditory Spatial Plasticity is about how well you are trained.

If you are well versed with different headphones and earphones, and you can easily recognize the direction of the sound, you are well trained. If you are not aware of these factors and simply don't try to get localization, you can be sometimes untrained. When you forcefully try to locate something for a very long time, your brain gets rewired and processes information very abnormally. So, in the end we can't ignore the contribution of the brain to identifying all these.

MODERN TECH

At the end of this series, we will talk a lot about how modern technologies like Dolby, use techniques to up mix or downmix tracks. Which means taking a stereo mix and making it immersive by applying all the things we mentioned above, so stay tuned for next video/article.

Now before jumping onto the next section, I want to discuss a few things about couplers. So, frequency response of any headphone/IEM is measured on an Ear Simulator. A machine that mimics the outer and middle ear. Now as we understand how hard it is for sound to get into our brain, just imagine how hard it would be to measure using a computer. But due to engineers and their continuous effort we managed that. Might not be accurate or super critical but it somewhat can.

Which means, couplers also have acoustic impedance. Couplers also boost the region our ears boost. Couplers also take acoustic impedance into consideration. It also cares about the geometry of the pinnae and ear canal. And all of this affects how we measure headphones. Now, all of this is reserved for the next article or video, "Headphone Measurements," but as we understood the solid fundamentals of what makes sound reach the ears, it will be a breeze for us to go through that.

LEVEL – 4

HUMAN LIMITATIONS

Now when the Science of Sound, Ear and Brain are covered. In this section we will be talking about three things that unquestionably happen to all of us. And we need to factor in those things before we start judging something. And those three things are: 1. Auditory Masking 2. Critical Bands and 3. Equal Loudness

AUDITORY MASKING

What will happen if you drop a pin in a quiet empty room? It will ring. Hence the name "Pin Drop Silence." What will happen if the room has an AC unit and fan running at full speed? Can you hear the pin in that condition? Absolutely no. This effect is known as masking in audio. And we divide that into two parts.

1. SIMULTANEOUS MASKING

Which simply means what I explained in the pin drop example. When we have a louder source of sound, the quieter source of the sound will sound even quieter. The louder sound will mask the quieter sound.

Now, how much masking and when the masking will happen is a different topic. But the main thing is, while mixing, if the spectral balance is not taken into account, if the spatial placements of the objects are not done seriously, the masking effect becomes a lot more problematic.

We will need this topic in the upcoming video, so stay tuned for that.

2. TEMPORAL MASKING

Temporal masking happens in the time domain. It simply means when a large amplitude of tone is generated, our auditory system needs a bit of time to settle to the amplitude. (This is explained in the 1st Episode of the Freq Out where I talked about Dynamic Range Compression). So, what happens is, soon after a loud impulse, the next tone gets masked. This is the reason why ringing is so difficult to hear.

Now if the quieter tone (or masked tone) is right after the impulse, we call it Forward Masking. And if it's just before the impulse, we call it Backward Masking.

Now why is it important? Because audio compression algorithms, like algorithms for MP3 conversion take Temporal Masking into consideration. Theoretically the algorithm chops off the masked frequencies to compress the data. Which in theory, couldn't be heard anyway.

Now, we understand that we can't hear everything that is there radiating sound. But we need to talk about another fascinating thing our auditory system does.

DISCRIMINATION OF TONE

Think of it this way: if the audible range is 20Hz to 20KHz, although we can hear each and every frequency in the range, the discrimination of tone gets a bit harder. So, we use a term called "Critical Band" which uses the concept that we are only able to discriminate and compare the tonal characteristics of the sound between these spaced frequency responses. Now, people might consider it as resolution of human hearing, which is extremely false, but it is the building blocks of how we hear music.

Now tone and timbre really depend a lot on these critical bands and their interactions. For example, in a critical band if we hear a lot of interaction between harmonics, we call it roughness. It is very doubtful that BA drivers which are famous for producing 3rd Order Harmonics, sometimes get mixed up with the overtones of the instruments, creating a roughness that many people love. Again, I noticed it very few times, and it is not proved, so take this lightly, it’s not scientific anyhow - but this is how I see them.

What is impressive is that, while measuring we will be using these critical bands, to smooth out the graphs. In this way it can give us something simpler & smoother graph to work with.

Now, the big father of Psychoacoustics comes into the picture.

EQUAL LOUDNESS CONTOURS

Basically, we can't hear every frequency as loudly. Different frequencies get perceived by our ears differently at different volumes. Now, this topic is very complex, and it would do nothing but confuse you, so here are some things you need to know, as the video/article will be unbelievably long otherwise.

Equal Loudness Contours are a newer revision of Fletcher Munson Curve, which is obsolete now. So, refer to Equal Loudness Contours which use Phons instead of Decibels.
We are more insensitive to bass frequencies than midrange frequencies.
We are most sensitive to middle frequencies.
Music producers do take Equal Loudness Contour into account while mixing and it is not our duty to play with it.
Equal loudness contours are made for simple tones and not complex tones like music.
Music has dynamics, dialling in Equal Loudness without thinking in the EQ is a very big mistake itself.

These three phenomena prove that we as humans are born with limitations. And no matter how good technology gets we will still be limited by our build. Now it's not something to be sad about but it's something to acknowledge the next time you are judging something.

Due to all these limitations, we all hear what is typically produced and what is typically perceived, not it is not a bug it’s a feature. And now you know how to use this features.

LEVEL 5

If you already reached here after watching all the previous levels, congratulations you are very hard to fool now. We talked thoroughly about the entire chain of hearing, from the very first movement of air molecules to the resonances and reflections inside your ear canal, to the way brain interprets those signals as complete soundscape.

We learned step by step building a map of how sound travels from sources to mediums to ear and lastly to the brain. And now you know what matters and how much? Because everything contributes to your music.

In this level we are going to wrap up the concepts with a few myths busting.

FREQUENCY RESPONSE IS ALMOST EVERYTHING

There are two types of reviews I often see everywhere, people who takes Graphs into Consideration and people who don’t take Graphs seriously. While both are right in their own ways, there is another section of the people in audio hobby who believes that Graphs Don’t tell anything. Which is very wrong and fundamentally flawed statement.

While there is a lot of things that we need to consider while watching a graph, that is simply not told to us before. But here I am guiding through the chaos. Although frequency response graph is not the last word in the evaluation of the headphones it’s the most crucial quantifiable part of the experience we can get.

FREQUENCY RESPONSE IS NOT EVERYTHING

Headphone research in current world is far from being complete. I would go as far as saying that, we haven’t even started the real experiments with most of the things in headphone sound.

Manufacturers and researchers are working to provide us with more and more evaluations and data to work with but ultimately it drips down to the point that Frequency Response graph is not everything, but it is still most valuable insight to your audio understanding.

Where it falls short? Mainly in two aspects.

1^st – The analysis we draw from the Frequency Response is grassroot level. And we have yet to understand what produces what sonic characteristics.

2^nd – The subjective evaluation is still very spread out and not explainable by a simple metric. This is where the knowledge of understanding audio plays a huge role.

Even though these constraints exist along with the imperfection of measuring frequency response of a headphone and psychoacoustics phenomenon. Frequency Response Graph is our only friend when it comes to analysing a headphone.

FREQUENCY RESPONSE CAN’T TELL US SOUNDSTAGE/DETAILS… BLAH BLAH!

This is a myth that headphones produce soundstage. As I explained in the LEVEL 2 & 3. Everything space related you perceive in headphones and IEMs are either hard coded in the music or it is some variation of the Frequency response. Housing / Driver Quality / Damping – everything will reflect in the frequency response of the headphones.

SOUNDSTAGE IS AN ILLUSION

Typically, there are procedures which manufacturers uses to increase the spaciousness effect of the headphones like scooping / lowering down the volume of the midrange, which makes the midrange further away. Creating very big contrasty situations in the frequency response graph, which generally affects the timbre characteristics. That has nothing to do with the price of the headphones, nor it can’t be changed later. It’s just a process of trial and error and how your head is receiving the sound.

DETAILS ARE AN ILLUSION

Your headphone driver can produce 20-20hz Simultaneously, then how can it miss any detail? The fundamental thing is, they don’t miss any. But due to the masking I mentioned above and for various psychoacoustics and anatomical reasons, some frequency response shows the details better some just hide it. There is no reason to think that, if one headphone is cheap it can’t produce higher levels of detail or if some headphone is pricier, they can resolve better than the lower priced headphones.

Now I am not saying that driver quality, pricing doesn’t increase the detail retrieval at the end. I am saying that everything is dependent on the frequency response and higher quality driver, more precise and controlled frequency response is what we need to make something more resolving. There is no magical thing in this. The driver in your headphone is ultimately made to produce 20 thousand cycles at a second, no matter how costly / how affordable the headphone is. (Considering it is rated 20hz-20khz).

Now you understand when someone tells you “After using this headphone, I noticed things in the music that were not noticeable before”. It’s just the frequency response hiding the detail, there might have been a dip in the frequency response where the instrument resided, or it might be masked by other instruments. It’s the frequency response that made the instrument visible, not the increased speed of the driver. EQing a low resolving driver can also bring out more details, if we EQ it precisely with attention on head.

FLAT IS BETTER

Now this is a very big topic in itself that if flat frequency response is better or not, and most importantly where it is better and where it is not.

We simply must understand that the less deviation we get the less abnormality we experience. Now the fun thing is, abnormality is not always bad. Some boosted lower-mids can sound beautiful to some genres, some bass elevation can blow us off with certain music.

So, we need to understand flat and neutral a bit better collectively. Flat is what the baseline is, when you have a reference point, it’s easy to achieve something rather than wondering and doing trial and error. Thus, the linearity of the frequency response matters a lot.

SUBJECTIVITY IS STILL THE KING

As much as I like to say things like Objective Measurements. But ultimately what does it drips down to? Subjectivity.

Even if some headphones sound very wrong yet you have the taste of it, you like how it sounds, it is not worth to look for making it perfect objectively.

Even if something causes problems in the literal sense yet the colouration/deviation is extremely favourable to you because of your bias, there is nothing wrong in that.

Even if you ignore the bias and taste completely, you are not listening to test tones on your headphones, you are listening to music. And there is a plethora of things that causes the elevation in enjoyment factor of the music which might look bad in the objective testing.

So, at the end you are your own judge. And nothing objective can dictate what you should prefer and what you shouldn’t even as a very objective reviewer I have to admit that.

STOP GETTING BRAINWASHED

At this point you would’ve already known that most people in this hobby somewhat says something that mislead us into judging something. It can happen for various reasons. So, when you hear things like

Expensive is always better
This headphone stages like 2 channel speakers
If you want to get details, you have to spend more than $2000

You know what to make out of that. Don’t fall for traps that made by influencers and marketing campaigns.

KNOWLEDGE IS EVERYTHING

I will always say this, understanding how a headphone works and how your ears perceive the sound will save you tons and tons of money. Now there might be people who have this disposable amount of money, but often I understand that most of us, are not that kind of person.

I still remember skipping meals to afford IEMs when I started in this hobby. And now I get sent products across the globe to share my findings. So, quite a journey I have seen and the only thing that makes it easy is the Knowledge and Understanding of acoustics and nothing else. EQ is a god’s creation, learn to use it.

Stay curious, Save Money, Freq Out and Bye.

Sign Off…

Written By - Argha

YouTube - https://www.youtube.com/@audiowithargha

Discord - https://discord.gg/wCcusAAHfw

Credits

Peer Review

Audio Engineering Society, India. [ https://www.facebook.com/aesindia.org ]

TRUSTED BY OVER 50K+ MUSIC ENTHUSIASTS

Read All Reviews

Let customers speak for us

from 6449 reviews

AUDIOCULAR AC42 Magnetic DAC Dongle Holder – MagSafe-Compatible Holster

Pradeep Kulkarni

Very helpful in maintaining type C

Very helpful in maintaining type C cable

AUDIOCULAR AC42 Magnetic DAC Dongle Holder – MagSafe-Compatible Holster

11/21/2025

Ishan Thapa

Probably the best IEM you can get for 2500

Build quality is top notch. It's heavy but not uncomfortable. Coming to sound it has balanced sound signature with slight emphasis on treble.

CCA Phoenix IEM

11/18/2025

TANCHJIM Bunny Type-C DSP Plug with Mic IEM

Arjun

Outstanding audio quality in sound in this price range

Excellent studio quality clear natural audio with good enough bass, well-balanced thanks to the audio store team and Pritaam Halpawat sir, great customer service, and 100% authentic genuine products.

TANCHJIM Bunny Type-C DSP Plug with Mic IEM

11/18/2025

KZ EDX PRO X IEM With Mic & Audiocular SPARK DAC

Anonymous

Great

KZ EDX PRO X IEM With Mic & Audiocular SPARK DAC

11/18/2025

Mohit Kumar

If budget is tight go for it.

Nice I like it.

KZ EDC PRO Wired IEM With Mic

11/18/2025

AUDIOCULAR SPARK Hi-Res Portable DAC Dongle CX31993 + MAX Amp

Mihir Bhagat

Best for the budget (~929 rs)

I was using Sennheisers CX80s with Samsung Galaxy A52s and it has 3.5 mm jack but the sound quality & output levels are very bad. So, I had to get a DAC and I watched a lot of reviews before making this purchase and one thing I can say is the sound quality is better than Apple's dongle becoz it only supports 16/44.1-48KHz .

The cable is super flexible and supports both Type C & Type A connection for PCs.

AUDIOCULAR SPARK Hi-Res Portable DAC Dongle CX31993 + MAX Amp

11/17/2025

Ajinkya Sarode

Summary review of the Mega5Est

These IEMs have a unique personality and it does match the many YT reviews that I watched before making this purchase.

The tuning is very neutral leaning slightly bright. These IEMs will present all instruments and everything that is present in the mix without emphasizing. Bass and sub bass present without any mud or overpowering of other frequencies. Mids are well presented, in fact these are all about the mids which are mostly neglected in favour of bass tightness and extended treble.

I found the vocals slightly forward but not harsh, in fact nothing is overpowering in the presentation here which may come across as boring but it is simply presenting the songs as is. You can easily drift off mentally while listening, the presentation is so neutral and focus back whenever you wish to.

Overall these are very solid IEMs good for analysing and casual listening both. Tuning preferences may be highly personal and there may obviously be many who may not prefer this kind of tuning but still this doesn't take any points away.

I am happy with the purchase and prompt delivery.

Hisenior Mega5EST IEM

11/17/2025

NITESH KIRAR

KZ Castor Pro IEM With Mic - Harman Target with Improved Bass

have been using this for more than 2 months. The audio is very clear specially the vocals and the bass is also decent overall very balanced sound

KZ Castor Pro IEM With Mic

11/16/2025

EarAudio In-Ear Monitors (IEM), Earphones Carry Case

Ritvik singh

Sturdy

At the price, can't complain.
Perfect sturdy piece to carry(not in pockets) anywhere
Truthear Hexa fits perfectly.

EarAudio In-Ear Monitors (IEM), Earphones Carry Case

11/13/2025

Kaustubh Mudgale

This IEMs delivers excellent sound across all music genres and is used daily

This IEMs delivers excellent sound across all music genres and is used daily. The unboxing experience is premium. However, a minor issue has been identified: the right-side IEM produces a subtle noise when tapped while worn, which may indicate a manufacturing defect.

Truthear x Crinacle Zero:BLUE2 IEM

11/13/2025

Anjan Prasad

Great purchase in this price range

DAC sounds perfect when you connect it with laptop or PC
has a less sound when you play with Smartphone.
Other than this the IEMs are great
Great fit to ears

ROSESELSA AURORA Ultra IEM

11/10/2025

Anonymous

This DAC is awesome and a perfect value addition to my IEM

This DAC is awesome and a perfect value addition to my IEM. It enhances the quality of the audio and make it a bit loader as well. Buying and its a must have accessory for the audiophiles.

AUDIOCULAR SPARK Hi-Res Portable DAC Dongle CX31993 + MAX Amp

11/10/2025