AES Show Spring 2021, Part Two: Psychology and the Human Perception of Audio

July 2, 2021 Issue 140 INDUSTRY NEWS

Written by John Seetoo

As a result of COVID-19, AES Show Spring 2021, named “Global Resonance,” was conducted online from Europe. This afforded me the rare opportunity to view a number of the presentations, which would have been otherwise impossible.

This show leaned more heavily on the academic side of audio technology than the New York-based AES Show Fall 2020. In Part One of Copper’s AES Show Spring 2021 coverage (Issue 139), I looked at presentations on binaural audio, audio mixing for residential television viewing environments, and an analysis of differences between Western and Chinese hip-hop music.

In Copper Issue 84, Jack Joseph Puig stated in an interview:

“Since the message is so similar in many cases, it’s the tonality of what you’re presenting in the song that is crucial in delivering the right intent in the communication.

When you think about it, tone is the universal connector – even more so than music. Tone can make people feel different ways. Tone is the carrier. If I go to a foreign country where people don’t speak my main language, which is English, they may not know the ‘F-word’ but if they hear me using the ‘F-word’ with harsh tones, they’ll know something’s not right.

I have always been a tone junkie; that’s where it’s at. Tone is the delivery system that the heart and soul of a lyric and melody ride on in a song.”

One of the more popular topics of AES Spring 2021 Europe was the effects of sound on perception and psychology. A trio of Yonsei University scholars from Korea: Eunmi Oh, Jaeeun Lee (presenter), and Dayoung Lee – presented a video titled, Mapping voice gender and emotion to acoustic properties of natural speech. More specifically, the presentation was a study on how tone of voice conveys a variety of emotions.

Exploring tone, timbre and nuance in speech as a means to interpret emotion is a key to communication. Psychology professor Albert Mehrabian, renowned for his work in non-literal communication, categorized the voice tone sector as comprising 38 percent of human communication, with body language at 55 percent and words the remaining 7 percent.

The researchers decided to see if it was possible to map the basic acoustic parameters of natural speech samples to perceptions of gender image and affective attributes. Ms. Lee noted that a 2003 experiment that used speech synthesis to attempt a similar outcome ultimately proved inconclusive.

Screenshot from Mapping voice gender and emotion to acoustic properties of natural speech, courtesy of AES.

Ms. Lee and her colleagues decided that natural dialogue might prove a better model to use. Extracting 290 samples of actual recorded speech from an AI database of 1,000 hours of samples from 2,000 Korean speakers, the research team took 2 seconds from different parts of sentences, and matched them at -23 LUFS (Loudness Units to Full Scale) using a 16 kHz sampling rate. They included different emotional nuance excerpts in the samples, different depths of breath, and other parameters. In addition to gender, there were seven pairs of affective attributes the researchers sought to test for:

Stressed – Relaxed
Angry – Content
Hostile – Friendly
Happy – Sad
Interested – Bored
Formal – Intimate
Confident – Timid

As the test was for the attributes to be discerned via tone of voice, the experiment was conducted with 50 non-Korean speakers who each listened to 20 voice samples per session on headphones. Each participant listened first to identify gender, and then to indicate which of the seven attribute pairs best matched the sample, and which direction on the scale of those pairs did the sample tend to emphasize and to what degree.

Screenshot from Mapping voice gender and emotion to acoustic properties of natural speech, courtesy of AES.

Here are some of the results the tests revealed:

Wider pitch variations were described as more feminine whereas smaller pitch variations were identified by the majority as masculine.
Voices categorized as “intimate” were the ones most defined by pitch variation.
Shimmer, HNR (harmonic ratio) and voice breaks (non-pitch information) were predominant parameters in “timid” voices.
“Bored” voices contained a combination of both pitch and non-pitch elements.
In general, attributes generally considered more “positive” on the valence scale (i.e., happy, confident, content, et al) correlated with maximum pitch and higher pitch variance than “negative” attributes (i.e., bored, sad, timid, et al).

A spectrogram sample showed visual representations of the parameters that the subjects identified aurally:

Screenshot from Mapping voice gender and emotion to acoustic properties of natural speech, courtesy of AES:

On the other hand, emotional arousal was marked by non-pitch factors, with larger voice breaks and smaller HNR identified with active-sounding voices, and the opposite holding true for more passive-sounding voices. With Mapping voice gender and emotion to acoustic properties of natural speech, Ms. Lee and her colleagues did an admirable job of scientifically quantifying and documenting the elusive quality of “tone” that Jack Joseph Puig so poetically mentioned as the driving force in the music he produces that connects with listeners around the globe.

Screenshot photo from Mapping voice gender and emotion to acoustic properties of natural speech, courtesy of AES.

***

Along similar lines, the distorted wailing guitar solo sound of Eric Clapton’s Cream-era Gibsons, using humbucking pickups with the treble rolled off through a cranked Marshall amp, was described by him as the “woman tone” – a reference to the vocal-like nuance he was able to elicit from his guitar rig with that setting. Given how the electric guitar has become such a force in the history of popular music, starting with people like Jack Miller, Eddie Durham and Charlie Christian’s recordings with Benny Goodman and then to blues and rock and roll, the next presentation helped to explain why we find the sound of the electric guitar so compelling. Emotional and neurological responses to timbre in electric guitar and voice was presented by Sephra Scheuber. It explored the ways in which timbre affected the perception of a sound’s emotional content and people’s neurological reactions to sounds. As a psychologist as well as audio engineer, Ms. Scheuber’s experiences with the manipulation of timbre in recording the electric guitar and how it can alter emotional effects, as well as her work in music therapy with melody, harmony, and non-linguistic sounds, formed the foundation of her research. Since the isolation of timbre from other audio elements is the most difficult part of studying timbre and its effects, Ms. Scheuber explained the dearth of research into the subject was what prompted her project. The fact that most previous research into timbre almost exclusively is devoted to western orchestral instruments and little, if any, to modern electric instruments was also a motivation for her. Another reason for her choice of electric guitar is that while most orchestral instruments have an intrinsic sound that is recognizable even with extreme amounts of equalization, the electric guitar has a well-known sound but in which most people are accustomed to hearing its timbre easily altered through a variety of electronic effects, whether they are conscious of it or not. Finally, Ms. Scheuber incorporated vocal sounds into her study as a frame of reference, primarily due to the familiarity of emotional ranges expressed through the human voice. Due to the aforementioned issues regarding isolation of timbre, especially in pre-recorded material, Ms. Scheuber recorded her own sounds, using electric guitar and different effects pedals in addition to vocal performances by actors evoking various emotions.

Screenshot from Emotional and neurological responses to timbre in electric guitar and voice, courtesy of AES.

The participants were asked to listen to the recordings and indicate their impressions of the following:

Emotional categorization (happy, sad, angry, etc.)
Emotional intensity (weak to strong on a 5-point scale)

Among the results of the study:

80 percent of the participants agreed on the emotional categorizations for the vocal sounds.
Guitar sounds exhibited trends, albeit with subtle emotional identifications: some listeners might have interpreted a particular sound as “sad,” whereas others might have categorized it as “angry.”

Screenshot from Emotional and neurological responses to timbre in electric guitar and voice, courtesy of AES.

The most significant common timbre feature between the guitar and vocal recordings, in which participants concurred on the emotional content conveyed, was in the attack slope (i.e. speed of the sound's articulation). Angry sounds had the lowest, while happy sounds marked the highest. Ms. Scheuber also recorded EEG tests on the participants, for measuring neurological activity in the brain while listening to the vocal and guitar sounds, and to judge whether they were similar or dissimilar. For quantifying the data, she extracted alpha (8 – 12 Hz) and theta (4 – 8 Hz) brain wave rhythms for analysis at Cz and Fz (neural networks).

Screenshot from Emotional and neurological responses to timbre in electric guitar and voice, courtesy of AES.

The Cz (middle) portion of the brain displayed emotional reaction, with higher levels when listening to angry sounds and the lowest for happy sounds. From a therapeutic perspective, this study may indicate how music therapy featuring specific guitar sounds can be used for treatment of schizophrenia, depression, and other psychological disorders. Ms. Scheuber’s study shows a scientifically-validated depiction of how we react to guitar sounds in music, explaining why the fast attack of rhythm guitar masters like Nile Rodgers or Cory Wong usually are perceived as happy and joyful, while the slow blues of an Albert King or Funkadelic’s iconically mournful “Maggot Brain,” in which George Clinton instructed Eddie Hazel to play “as if his mother had just died,” are gripping and create a much different mood. *** Legendary producer and engineer Tom Dowd once said he had a “producer button” on his Neve console. Whenever a label executive would try to “armchair quarterback” Dowd during a mix session, Dowd would tell him to push the “producer button” to improve the sound. This inevitably would mollify the unsuspecting label rep, unaware that the button was not connected to anything in the console, but would leave him convinced that the button enhanced the sound. For a different perspective and approach to human perception in audio and how the brain might play tricks on us, Michael Lawrence spoke with audio engineer and author Ethan Winer in the presentation of Audio mythology, human bias, and how not to get fooled.

Screenshot from Audio mythology, human bias, and how not to get fooled, courtesy of AES.

Winer’s book, The Audio Expert: Everything You Need to Know About Audio, touches on a few of the themes he discussed in his interview about audio mythology and how the brain sometimes fools our perceptions of audio with confirmation bias. A YouTube video from 2009 in which Winer demonstrated and debunked numerous audio myths has logged more than 450,000 views. Winer explains that auditory memory, the fact that audio nuances are much harder to objectively perceive than visual ones, and that individual egos are all factors, can all play into audio mythology and confirmation bias. While audiophiles all have subjective opinions on preferred equipment and can argue over the merits or shortcomings of particular features ad infinitum, Winer cites several credible examples of fallacious premises, such as:

Audiophiles and engineers will often form opinions based upon what someone else whose opinion they might admire has written, taught or said, rather than empirically testing and personally listening. While audio magazines and textbooks may usually be reliable information sources, experts can sometimes be wrong. “Argument from authority” can often create entrenched audio myths.
Basing opinions on listening to equipment under different circumstances in the past, and perhaps misremembering the actual sound due to emotional reactions that may have created stronger impressions at that time, can cloud the listener’s ability to listen impartially to the same equipment in the present day.
As little as a 4-inch difference in listening position in a small space can radically change listening responses. Acoustic changes can contribute to equipment sounding different to a greater extent than electronic (equipment-related) changes, such as changing cables or connectors. Acoustics are the elephant in the room but often the last consideration for some listeners, who don’t want to take the time and effort to take corrective action.

However, Winer's underlying premise that “if measurements match between disparate equipment units, then, ergo, they must sound the same,” is one that some people feel is much less defensible. By his own admission, he says that, “if someone rolled off 15 kHz, he probably couldn’t hear it at his age.” He also cites topics like digital graininess in reverb tails as “something people can only hear with the volume turned all the way up.” Perhaps it’s a case of diminished confidence in his own hearing capabilities, but the over-reliance on tests to try to objectify something that is tied to the human sense of hearing, which is intrinsically subjective, is a topic which I explored in depth in Copper issue 138 in the article, “How Much Do We Actually Hear When We Listen?” Daniel von Recklinghausen, the renowned audio engineer at Electro Audio Dynamics and KLH, is famous for this old chestnut: “If it measures good and sounds bad, it is bad. If it sounds good and measures bad, you've measured the wrong thing.” To his credit, Winer also noted that if someone points out where he might be wrong, he is happy to be informed about it. An open mind and acknowledgement of one’s own shortcomings is a good foundation for building wisdom. For the final installment of AES Show Spring 2021, we will be covering some less-esoteric subjects, such as audio streaming, video game sound, and other topics.

Back to Copper home page

Your cart is empty

Your cart

Subtotal

AES Show Spring 2021, Part Two: Psychology and the Human Perception of Audio

Written by John Seetoo

Keep reading

Table of Contents – Issue 224

Table of Contents – Issue 224

T.H.E. Show New York 2025, Part One: A New Begi...

T.H.E. Show New York 2025, Part One: A New Begi...

Written by John Seetoo

Keep reading

Table of Contents – Issue 224

Table of Contents – Issue 224

T.H.E. Show New York 2025, Part One: A New Begi...

T.H.E. Show New York 2025, Part One: A New Begi...

We care about your privacy

Cookie Preferences