Quibbles and Bits

Loudness

It was Lee Atwater, a controversial campaign strategist for Bush Sr.’s successful 1988 presidential campaign who coined the phrase “Perception is Reality”.  Unfortunately for Bush (but more so for Atwater, it must be said), he died before he could contribute to Bush’s unsuccessful 1992 campaign.  So the extent to which he would have thrived in today’s political theater of the absurd must remain a matter of speculation.  It is also a matter of speculation as to whether or not Atwater was an audiophile.  It’s likely that he was, because you can see how his trenchant political aphorism could have been a natural extension of his observations on the fundamentals of audiophilia.

Why do we commonly use the expression “seeing is believing”, but never “hearing is believing”?  It is as though we accept fundamentally that when we see something we are seeing the reality of that thing, but when we hear something, we only hear what our senses tell us we are hearing.  We must think there is a difference between the two, where in reality there isn’t.  We see with great precision, and hear with only slightly less.

The ear is an extraordinarily complex bio-mechanical transducer.  At the end of our ear canal lies the ear drum.  This vibrates in response to sound waves travelling down the ear canal.  These vibrations impinge upon an assembly of three interlinked tiny bones, the third of which then taps against another smaller ear-drum, at one end of an organ called the cochlea, causing it in turn to vibrate.  The cochlea is a fluid-filled organ shaped somewhat like an alien snail shell, and its interior surface is covered with approximately 16,000 tiny hairs, each connected to a nerve ending.  Each of these hairs detects vibrations in the cochlear fluid over a very narrow range of frequencies, and passes this information down the nerve cells to the brain.

The brain processes all this information, and the end result is what we perceive ourselves to hear.  Interestingly, it is clear that at no point does this system attempt to directly detect the actual sonic waveform.  It only ever detects the individual frequencies.  You could argue that it directly detects the Fourier Transform of the impinging sound wave!  But, interesting as that may be, I want to focus this column on the simplest auditory topic, that of how loud something appears to be when we listen to it, and the various implications of those observations.

Research on audibility had to await the ability to produce and measure calibrated sounds, and so it was not until the 1920’s that it became clear that we perceive loudness on a logarithmic scale.  It is generally hard to meaningfully quantify human auditory perception at an experimental level, but the general result is that a linear increase in how loud we perceive something to be requires an exponential increase in the power needed to generate the signal.  So a relatively modest variation in perceived loudness requires our ears to possess a quite colossal dynamic range in order to fundamentally detect it.

The next avenue of research involved determining how perceived loudness varies with frequency.  Harvey Fletcher and Wilden Munson soon discovered that the ear was most sensitive at frequencies in the 3-4kHz range, and dropped off noticeably at both higher and lower frequencies.  This result was in line with what they had expected, but what surprised them was that the amount of drop-off fell noticeably as the volume increased.  In other words, if something was perceived to have a flat frequency spectrum at high volume, it would appear to lose its bass and treble progressively at lower volumes (giving rise to the so-called Fletcher-Munson equal loudness curves).

These findings had their first major implications in the emerging field of telephony, where there were severe limitations in the dynamic range that could be transmitted across a telephone line.  This meant that the resultant sounds would have not only a limited dynamic range but also an apparently limited frequency content, so that voices heard over the telephone exhibited the now-familiar strangled tonality that we come to associate with the medium.  As an interesting aside, when digital telephony first came on-line in the 1980’s, the possibility existed to create telephone conversations with a wider dynamic range and frequency content, resulting in more natural and clear-sounding conversations.  Sadly, focus groups (I could strangle the person who invented focus groups) showed that consumers were taken aback by what you might call ‘high-fidelity telephony’ as they had become so conditioned to 50 years of strangled-sounding voices that they expected all telephone conversations to sound that way.  Which is why even today voice quality over telephone is quite deliberately kept decidedly lo-fi.

Today, even the most minimalist audio systems still have volume control knobs, so we can adjust the volume to be whatever we want it to be.  But ever since the early days of mono there has been an argument about whether volume control was just a convenience factor, or whether there was just the one volume setting at which everything would sound the most natural, or “right”.  For the most part, the topic is not really of mainstream interest, but it does raise its ugly head at one critical juncture.  If we do an A-B comparison between two different audio setups, it is well known that the louder of the two will tend be perceived to sound better.  An unscrupulous dealer (or even just an inept one) can sell you on what he wants to offload by the simple expedient of playing it louder on demo.

From personal experience, this factor becomes more pernicious and more dramatic the higher up the audio chain you go.  My own reference system these days inhabits somewhat stratospheric heights.  I was using it to compare a protoype Music Server playing back a file located on my NAS over an I2S link to my DAC, with the same file played through my Mac Mini over a USB link.  I was left somewhat awestruck by the magnitude of the improvement wrought by the combination of the Music Server and the I2S link.  It went way beyond anything I have ever heard before when it came to simply optimizing the digital transmission.  And, being unprocessed raw digital signals I was sending to the DAC, the volume levels would be inherently the same … wouldn’t they?  Ugh.  I had sometime earlier been verifying BitPerfect’s compatibility with iTunes’ sound check feature following the latest macOS/iTunes update, and had managed to forget that I had left that feature enabled.  So the USB-delivered signal turned out to have had 3.1dB of attenuation applied to it.

But when I was A-B switching between the two I didn’t actually detect the volume difference per se, I only detected a wide swathe of qualitative differences related to imaging, soundstaging, microdynamics, tonal richness and the like.  At least that’s what I perceived I heard.  With sound check disabled and the volume discrepancy eliminated, the Music Server with the I2S link was still clearly preferable, but the magnitude of the differences – and, much more important, the nature of the differences – were more in line with what I would have expected from a direct comparison in the digital domain.  But the whole episode poses questions that I am going to have to return to and explore further at some point.  [Bearing that in mind, it is interesting that with this latest incarnation of my reference system I find myself being much less anal about getting the volume setting *just right*.  I find that if I want it louder or quieter, I want it a good 10dB louder or quieter.  The last 3-5dB seems to make little qualitative difference.  I’m not sure how to interpret that yet, but given my apparently contradictory observations above I feel honor-bound to mention it.]

If performing A-B comparisons effectively was our biggest loudness-related problem we would all be a lot better off.  But it isn’t.  Not by far.  Most music is listened to on the sonic equivalent of transistor radios.  They have such poor dynamic range that the quiet portions of well-recorded tracks just disappear into the noise.  So to compensate, music producers (or, in practice, independent mastering specialists contracted by the labels) will address this by applying dynamic compression to the final mix.  This basically increases the apparent volume of the quiet parts.  Now, there’s nothing especially new about this.  LPs have limited dynamic range, and dynamic compression has been judiciously applied during LP mastering for decades.  However, back in the day there was some pretty sophisticated equipment used to perform that function, equipment that is now both hard to find and expensive when you do find it.  Compare that to today’s digital studios, where a digital compressor is just a mouse click away.

While analog compression can sound remarkably pleasing to the ear – and is often applied for its sonic qualities alone without reference to the compression it applies – digital dynamic compression is well known for producing a poor sonic result.  This ends up having dire consequences … record label suits with MBAs understand that when two tracks are played one after another on the radio, the casual listener tends to prefer the one that sounds louder.  So the suit instructs his mastering engineer to ramp the slider on the dynamic compressor up, up, up, and further up, so that it sounds even louder – and will be ‘liked’ even more – than the other guy’s releases.  This state of affairs is referred to as the “loudness wars”, and while it was at its peak in the first decade of the millennium, it is still a plague even now.  Music which has had extreme digital compression applied is close to unlistenable on quality audio equipment, and consequently much popular music produced over the last 15-20 years has been plumbing some all-time depths of sound quality.

Not that dynamic compression is inherently a bad thing.  Even the most conscientious producers will apply judicious dynamic compression from time to time.  This was recently brought home to me when I downloaded my monthly album from B&W’s Society of Sound [a service which, by the way, I continue to wholeheartedly recommend – $60 a year for 24 high-quality high-resolution album downloads].  The album was a 2004 recording of Shostakovich’s 11th symphony (LSO, Mstislav Rostropovich) in 24/96 resolution.  It appears to be an experiment in capturing the full dynamic range of an orchestra while preserving the totality of the dynamic range, something only a 24-bit recording can realistically hope to achieve.  I would suggest that there is quite possibly not a whit of dynamic compression in that recording.  With my system set to its normal (i.e. loud!) volume setting, the first ten minutes of the first movement were all but inaudible.  So much so that I felt obliged to load the album into Adobe Audition for further inspection.

What I saw confirmed my suspicion.  While the loudest passages do indeed reach full modulation, the quietest ones are very quiet indeed.  I’m pretty sure I don’t have another album in my ~3,200 album collection that displays such a marked dynamic contrast, and it immediately has me considering that all of my other classical recordings which I considered to be fully dynamic – and certainly all of my other recordings of the Shostakovich 11th – must in fact exhibit a significant degree of dynamic compression.  For those of you familiar with the Tischmeyer DR meter, this album measures as DR17 overall, with the four individual movements at DR18, DR17, DR18 and DR16, which are impressive enough numbers, although far from unique.  So the Tischmeyer metric is capturing the gist of the situation, but is far from telling us the whole story.

Finally, many listeners find it problematic that there are undesirable perceived loudness differences between different tracks within a large music library.  This is most evident when playing a random selection of tracks from across the library, and many listeners would like to be able to correct for it by having their system automatically apply a pre-set volume adjustment on a per-track basis during playback.  This involves making an assessment of the perceived loudness of each individual track, and then applying gain or attenuation to individual tracks so as to make their perceived loudness the same.  It is not a bad practice at all, and, depending on the nature of your music library, you may find it very valuable.  The trick is to find an algorithm which can analyze a track and make a one-size-fits-all assessment of its comparative perceived loudness.  This is much easier said than done, and is the subject of ongoing research.  When doing this, you have to bear in mind that you cannot apply so much gain to a track that you will drive its existing peaks into clipping.  For this reason, most loudness compensation schemes mostly apply different levels of attenuation to each track, and very rarely any gain.  Surprisingly, iTunes’ “sound check” system, which has been around for a while but benefits from ongoing optimization, has quite a good algorithm.

So I’m already well above my assigned word count … as usual.  I hope my perceived assessment of my dear readers’ level of patience is not out of whack … and that Editor Leebs’ indulgence is not being unduly tested! [No worries, Richard—pixels are free!—Ed.]