Mixing – Headphones, loudspeakers...and subjectivity

-November 11, 2013

Think of audio mixing and the image that comes to mind is one of an engineer with twenty fingers (or more) wound around a large ensemble of knobs, wearing a pair of [large] headphones that preclude the possibility of any earthly influences on the mixing process. Now consider that last artifact – the headphones – and ask yourself whether there is a possibility that there is a mismatch of some sort with the fact that in the real wold, the music being mixed will be played back more often not on headphones but on loudspeakers, which offer a substantially different acoustic experience. So should one mix with headphones or on loudspeakers, considering the fact that the latter often makes for a more expensive and elaborate proposition, since headphones allow for easier elimination of extraneous noise? Or is there a difference at all that is worth considering?

This question has been explored a few times. Richard L. King, Brett Leonard, and Grzegorz Sikora recently did in a paper titled “Loudspeakers and headphones: The effects of playback systems on listening test subjects” in the Proceedings of the Meetings on Acoustics, June 2013, published by the Acoustical Society of America. What is interesting about this paper is that the authors apply no-nonsense methods of statistical hypothesis testing to determine if there were genuine statistical differences between the results of mixing different types of music using headphones versus loudspeakers.

Hypothesis testing 101. To understand the authors' results, it's useful to have a basic understanding of statistical hypothesis testing. Suppose I had a visitor called Raju who claimed he was a native of the Indian state of Kashmir. Now I live in the South Indian state of Karnataka and the people of this state look quite different from those from Kashmir. I see Raju is distinctively fair, as I would expect a Kashmiri to be, but not quite as tall as I would expect. So I resort to a test. I dig out two pieces of demographic data: the average height µ of people in Kashmir, and its standard deviation σ. Knowing that heights tend to follow a Gaussian distribution, I reason to myself that there is less than a 1% chance that a person from Kashmir would have a height of less than µ-3σ. In Raju's case I find that he is indeed less tall than that, so I have reason to suspect his assertion. I think it is very unlikely that he is from Kashmir (though he may be an exception). In this case we made the assumption of a Gaussian distribution but in general where you cannot make such assumptions, so-called non-parametric methods can be used.

Test setup. The test setup used by the authors was curiously simple. Subjects were asked to complete a basic – almost childish – mixing task: each subject is instructed to set the level of a stereo stem (group or submix) containing a lead instrument or vocal, as presented along with a stereo stem containing the instrumental accompaniment to the excerpt. The stems were extracted from the full mix – not the raw tracks. That's it – just one level to be adjusted for an already equalized and mixed recording, and the result to be compared to the actual level in the original recording. This was to be done with headphones and with loudspeakers. Couldn't be simpler. Do expert recording engineers get it consistent?

For variety, different genres of music were used – one sample had a soprano voice with orchestra, a second had rock music with a solo vocalist, and a third had a jazz trumpet solo. A full-range monitoring system was arranged in the standard configuration recommended by the International Telecommunications Union in BS.775-2. A set of closed, supra-aural headphones were selected as being closest to the above loudspeaker response. Ten recording engineers with 10 years formal musical training and over eight years production experience were asked to mix with loudspeakers as well as headphones.

Results. The authors found that the engineers exhibited substantial difference in the final levels between headphones and loudspeakers. The diagram below, reproduced from their paper, gives you the complete picture. The difference between headphone and loudspeaker levels for classical music, for example, is more than 3.6dB. What is also noticeable is the large standard deviations, typically 3-4dB. More shocking perhaps is the large range of the levels chosen: For classical music, one engineer selected the final level to be -12.6dB and another set it at +6dB on headphones, with the corresponding figures for loudspeakers being -11.4dB and +8.4dB!

Conclusions. The authors sought to find statistical evidence that there is a substantial difference in mixing with headphones versus loudspeakers. I believe they did that reasonably well, though at first sight it may appear that the sample size of 10 is quite small. However, what is more disturbing to me is the range of results that popped out. The issue of range clearly steps out of the central question of the paper – the use of headphones versus loudspeakers – and addresses a much deeper question: if there are such wild differences in levels chosen by experienced engineers in a controlled environment where only one level needs to be set, how subjective does the mixing process really get in practice? How often can we then say we are listening to a true reproduction of performance as opposed to a mix which is a subjective reflection of the mixing engineer's coloured preferences? I think this issue, which is not highlighted in the paper's conclusions, needs a separate research study.

Also see:


Loading comments...

Write a Comment

To comment please Log In