Audio Myths

I’ve been involved with audio technology since the 1960s, and I’m still surprised—and a bit dismayed—to see the same myths repeated again and again. How “wire” functions is understood fully, yet I continue to see fantastic claims that defy basic audio science. Indeed, too much of what I read online and in the mainstream audio press is simply wrong. For example, the notion that audio gear can measure well but sound bad is a common belief that’s easy to disprove. Of course, this assumes the right things are measured.

Likewise, unless you believe pioneering mathematician Joseph Fourier was wrong, music is in fact composed entirely of sine waves, so measuring audio gear using sine waves is perfectly acceptable. But again, the right things must be measured. For example, you can’t only measure the distortion of a power amplifier outputting 1 watt at 1 KHz, as is common. Distortion often rises at lower frequencies, especially with tube amps that use transformers. It can also rise at higher, or lower, power levels, depending on the nature of the distortion.

Another place I see specs either misused or simply missing is loudspeaker isolation products. If you want to know how (or even if) the sound has improved after placing a speaker on an isolation pad, you need to measure the sound in the room using software meant for that purpose. Showing that a neoprene pad blocks vibration is meaningless if a speaker cabinet doesn’t vibrate enough to create any sound in the first place. Indeed, there are myriad ways people can be misled, either on purpose by equipment sellers, or by well meaning but misinformed audio journalists.

Two common myths that have been soundly debunked, yet are still often repeated, relate to the need for sample rates higher than the CD standard 44.1 KHz. Tsutomu Oohashi et al reported in 2000 the results of experiments they claimed proves people can perceive ultrasonic content, thus confirming for both audiophiles and sellers of “high definition” products that CD quality audio is inadequate. Unfortunately, they made a fatal mistake: They used only one loudspeaker to play several ultrasonic frequencies at once, so IM distortion in the tweeters created difference frequencies in the audible range. When the Oohashi experiment was repeated a year later by Shogo Kiryu and Kaoru Ashihara using six separate loudspeakers[1], none of the test subjects were able to distinguish the ultrasonic content. From their summary:

“When the stimulus was divided into six bands of frequencies and presented through six loudspeakers in order to reduce intermodulation distortions, no subject could detect any ultrasounds. It was concluded that addition of ultrasounds might affect sound impression by means of some nonlinear interaction that might occur in the loudspeakers.”

Audio engineers have been investigating ultrasonics for decades, yet no legitimate tests have ever found that people can hear or otherwise perceive frequencies higher than around 20 KHz. Another researcher, Milind Kunchur, thought he found a different way to prove that high sample rates are needed: temporal resolution. He claimed that ears can detected arrival time differences as small as 5-10 microseconds, which is true, but wrongly concluded that reproducing such small timing offsets requires a sample rate higher than 44.1 KHz. What Dr. Kunchur didn’t consider is that bit depth also affects timing resolution, and 44.1 KHz at 16 bits is in fact perfectly adequate to resolve timing to as fine as anyone can hear. This is elegantly proven in a video by Monty Montgomery of Xiph.org. The link below goes directly to that part of the video, though I encourage people to watch the entire video because it debunks several other common myths about digital audio:

 

Another incorrect belief is that the 96 dB dynamic range of 16 bits is inadequate. I’ve been an professional audio engineer and musician for nearly 50 years. I’ve heard hiss from analog tape and cassettes plenty of times. I’ve heard surface noise and crackles on vinyl records. But I’ve never once noticed background noise from a CD. If you analyze recordings with audio editor software, you’ll see that the ambient room noise is usually the dominant factor. Sometimes circuit noise from the microphones or preamps is louder, but the source noise is always greater than the -96 dB noise floor of CDs. With most recordings, if you play a silent passage the VU meter reads around -70 to -80 at best, which is equal to 11-13 bits. So there’s simply no audible benefit to 24 bits. The only thing bit depth affects is the noise floor. It doesn’t affect resolution, clarity, imaging, or anything else—only residual noise.

As you can imagine, some companies have a financial interest in proving that CDs are inadequate. But when tested properly, nobody has ever been shown to reliably identify a difference between CD quality audio and higher sample rates or bit depths. Again, this has been researched for many years, always with the same conclusion. As we skeptics say, “Extraordinary claims require extraordinary proof,” and so far there has been no such proof.

So what does affect audio fidelity? How can we know if a device offered for sale is really worthwhile or just snake oil and false promises? I saw in the comments for another Copper Magazine article someone ask for a list of “what matters” with audio. I can tell you that only four parameters are needed to define everything that affects audio fidelity: Noise, frequency response, distortion, and time-based errors. But there are also subsets of these parameters.

Noise is the background hiss from analog tape or electronic circuits. A close cousin is dynamic range, which defines the span (expressed in decibels) between the background noise and the loudest level possible before the onset of gross distortion. CDs and DVDs have a very large dynamic range, so any noise you hear is either from the original analog tape, was added as a byproduct during production, or was present in the room and picked up by the microphones when the recording was made. Subsets of noise are AC hum and buzz, electronic crackling, vinyl record clicks and pops, cross-talk between channels, and even windows that rattle and buzz at high volume levels.

Frequency response is how uniformly an audio device passes a range of frequencies. Errors are heard as too much or too little bass, midrange, or treble. For most people, the audible range extends from about 20 Hz at the low end, to just shy of 20 KHz. Subsets of frequency response are physical microphonics, electronic ringing and oscillation, and acoustic ringing. These subsets are less necessary for consumers to understand, but they’re important to design engineers and acousticians.

Distortion is the common word for the more technical term nonlinearity, and it adds new frequency components that weren’t present in the original source. When music passes through a device that adds distortion, new frequencies are created that may or may not be pleasing to the ear. The design goal for high fidelity audio equipment is that all distortion be so low in level it won’t be heard. However, in some contexts a modest amount of distortion can sound pleasing, which is why phonograph records and tube-based electronics are still popular. Of course, distortion is tolerable and even desirable in guitar amplifiers, but that’s music creation, not high fidelity reproduction.

There are two basic types of distortion—harmonic and intermodulation—and both are usually present together. Harmonic distortion adds new frequencies that are musically related to the source. In layman terms, harmonic distortion adds a slightly thick or buzzy quality to music. All musical instruments create tones having harmonics, so harmonic distortion in a power amp merely changes the instrument’s character some amount.

Intermodulation (IM) distortion requires two or more frequencies to be present, and it’s far more damaging because it creates new content that’s not related musically to the original. Even a small amount of IM distortion adds a dissonant quality that can be unpleasant to hear. Another type of distortion is called aliasing, and it’s unique to digital audio. Like IM distortion, aliasing creates new frequencies not harmonically related to the original, and so is unpleasant and irritating to hear. Fortunately, in most modern audio devices, all distortions are too soft to hear.

Time-based errors affect mainly pitch and tempo. When the hole in an LP isn’t quite centered, you’ll hear the pitch rise and fall with each revolution. That’s called wow. Analog tape recorders have a different type of pitch instability called flutter. Unlike the slow pitch change of wow, flutter is more rapid giving a warbling effect. Digital audio has a unique type of timing deviation called jitter, but with all modern sound cards jitter is so much softer than the music that you’ll never hear it.

Room acoustics could be considered a fifth audio parameter, but it really isn’t. Nearby room boundaries can create frequency response errors due to wave reflections combining in the air. Reflections can also create audible echoes and reverb, but those are time-based phenomenon that occur outside the equipment so they don’t warrant their own category either. Likewise, with power amplifiers, maximum output power is important. But that’s not related to fidelity—it merely defines how loudly the amplifier can play.

The above parameters encompass everything that affects audio fidelity. If a device has noise and distortion too soft to hear, a response sufficient to uniformly accommodate the entire range of audible frequencies, and time-based errors too small to be heard, then that device will be transparent to music and other sound passing through it. However, clarity and stereo imaging are greatly affected by room acoustics. Without question, the room you listen in has far more effect on sound quality than any of the audio components.

One final myth I’ll address is the notion that there are aspects of audio that “science” doesn’t know about, or might miss when measuring. This too is easy to disprove using the null test. A null test compares any two audio sources, and combines them at equal volume with the polarity of one source reversed. So as one wave goes positive the other is negative, thus canceling completely. After nulling the two sources, any residual signal that remains reveals their difference, and that includes artifacts you might not even think to look for. Nulling has been used to measure audio devices since the 1940s. If  there really was some aspect of audio that was unknown, it would have shown up long ago in a null residual.

[1] http://www.aes.org/e-lib/browse.cfm?elib=10005