Double blind ABX testing

September 14, 2021
 by Paul McGowan

8 comments on “Double blind ABX testing”

  1. As I’ve followed numerous audio product reviewers online, particularly those who review speakers and/or headphones, the thing I’ve realized the most is how bad our internal “reference sound” is and how easily we get used to any particular sound signature. It’s no wonder that you see a huge variety of responses to these comparisons. In other words, if we just sit down and listen to a system, it’s difficult for most of us to say if the system sounds good or bad because our brains can’t recall what “really good” sounds like, presuming we feel like we’ve heard that before! We might be able to pick out things that sound “off”, but it’s much harder to compare, even if there’s only a short pause between, like with blind AB tests. We quickly lose our reference and the more we listen to system X, the more we get used to it, regardless of its pros or cons.

    The only time I’ve found audio comparisons truly useful is when the transitions between systems are immediate (which I know can be difficult) and alternate between some known reference. In other words, let’s say we have some reference “R” and two other systems (“X” and “Y”) we’re comparing for their relative goodness (and I say “relative” because “goodness” is so subjective in audio). If you follow a pattern that has no breaks between playback, giving each system just long enough for you to get used to it, and alternates like RXRYRXRYR… or RXYRYXRXYRYXR… then I’ve found it much easier to pick out qualities you like or dislike between the systems because you’re constantly having the “reference” refreshing your memory. It almost doesn’t even matter what the reference is or even if it sounds great because you’re just looking for relative differences. If you have silent pauses between them of almost any duration, the brain quickly forgets the qualities of what you just heard that the differences are much harder to discern.

    That all being said, and while I can’t prove it, I have to believe there are individuals that can/have trained their ears to discern good vs bad qualities without needing an external reference. This is true for any industry that requires ones senses or abilities. Most of us don’t have this luxury, though, hence the less useful execution of ABX tests. My 2 cents.

    1. Indeed! And how could mp3 be developed without ABX testing? This video finally clearly reveals Paul McGowan’s personal bias fearing being tested when attending an ABX test procedure fully misunderstanding that his hearing ability is tested or his personal preferences. And how true is it that we easily get used to a specific sound signature (the reviewer’s reference loudspeakers or my personal loudspeaker). Before having the financial resources for acquiring a high-end loudspeaker I mainly listened to a good pair of headphones. Thus my first high-end loudspeaker was a spherical-horn speaker system revealing the finest details of a track which I was familiar with. I couldn’t stand the sound of normal box-loudspeakers. And there isn’t only the typical reviewer problem. Even more a designer of loudspeakers having voiced his designs with incredible effort will always love his babies and thus is totally biased!

  2. If I understand correctly, ABX testing is used to find out if a listener can perceive a difference between A and B. The listener knows that A is consistently one sample and B is consistently a second sample, while X is chosen randomly as A or B. ABX testing starts with the hypothesis that A and B will be perceived identically. This seems like a reliable way to achieve the objective, and an invaluable technique for scientific research of sound quality and human perception.

    Now suppose I’m auditioning loudspeakers; nobody expects that two different models will sound identical, so ABX isn’t the right approach. We want to know which is preferred. We know how easily we are mislead by the reputation, cost and appearance of different products, even for experienced listeners, so it makes perfect sense to compare using a blind AB test. I’m glad that Paul recognizes the benefits of blind testing.

    1. Mark,
      Actually, this is precisely what Floyd Toole developed first in Canada and then working for Harman. He developed a “machine” that could change speakers very quickly to get them in the same position so that listeners could quickly choose or develop a preference.

      It was this work, or the findings, that has been instrumental in the development of almost all modernly designed speakers. You should search for the “Harman curve” for trained and untrained listeners.

      Toole’s work was specific to test speakers “blindly”.

      By the way, Paul clearly doesn’t understand the concept of placebo. It is not the placebo that works, it is the mind of the person that in some cases work. In others you have regression to baseline. If a patient is enrolled during an exacerbation, they will get out of the exacerbation regardless. It is extremely common in studies to have regression to baseline or to the mean. Scientists know not to call this “placebo works”. The term preferred is active or control. In many studies you have two or more active arms and no placebo. Dose response studies are a good example of this.

  3. Paul, for the first time, I’ve agreed with everything you’ve said. We may very well come to different conclusions about what we hear, but our approach is identical, and entirely because of the reasons you’ve cited.

  4. Thanks for the honest answer Paul.

    The most important use of blind AB testing is answering “is there really a difference?” Not so much about which is better.

    If you do detect a difference, then dig into longer listening sessions to see if you prefer one over the other.

    If you don’t hear a difference, then use your “emotional judgment.” (I tend to choose the simplest, least expensive, or easiest to use)

    I find blind A/B testing especially good for avoiding the “most expensive and/or exotic is the best” trap.

    Some people want to trust only their emotional judgments and some want to rely only on empirical data. In my experience, finding the place for both is rewarding.

    However, it doesn’t matter which you are inclined towards… as long as you enjoy the music.

  5. You make a lot of sense to me Paul. I particularly liked your explanation of why you don’t like the ‘X’ part of ABX testing. I can identify with having your stress level elevated when you don’t know whether it’s ‘A’ or ‘B’ to which you’re listening! I think introducing an increase in your stress level into the mix can skew the outcome.

    I thought of a similar phenomenon I’ve experienced with my golf game. When I was an angry (angrier? 🙂 ) young man, when my game deteriorated I used to keep trying harder which seemed to exacerbate the situation. I eventually realised that trying harder and harder was increasing my stress level, leading to worsening performance. I discovered that trying less hard seemed to improve matters, allowing me to rely more on my unconscious (natural or learned) ability. These days I try to remain relaxed and avoid moving into a stressful state of mind. I still have bad golf days but they’re generally not shockers! 😉

Leave a Reply

Stop by for a tour:
Mon-Fri, 8:30am-5pm MST

4865 Sterling Dr.
Boulder, CO 80301

Join the hi-fi family

Stop by for a tour:
4865 Sterling Dr.
Boulder, CO 80301

Join the hi-fi family

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram