Mixing audio in the digital realm has some inherent limitations. These limitations become more pronounced the less data there is to work with (i.e. low sample rates etc.). We found an article by software developer and author Viktor T. Toth that addresses some of these issues.
In real life, when you hear audio from two sources simultaneously, what you hear is the sum of the signals. Therein lies our problem. If you hear a group of ten people singing, the result will be louder than the singing of one person. A giant choir of a thousand will be even louder. A hundred thousand people singing an anthem in a sports stadium can be outright deafening. The point: there is no upper limit; the more voices you mix, the higher the amplitude.
With digital audio, we have a limited dynamic range. Let’s say we use 8-bit sampling; that means that every data point in the audio stream is a value between 0 and 255. When we add two such values, the result may be anywhere between 0 and 510, which simply doesn’t fit within the allowable range of 0-255.
Mr. Toth’s article goes in to detail about why using normalizing as a mixing method doesn’t hold water, and how using it on low sample-rate signals is an especially bad idea. The author busts out some math to dive into a couple of workarounds for mixing low resolution audio without normalizing, and he’s got some pretty good ideas here.
Great read – be sure to check out the full article at Viktor’s site. If you’d like to chime in, feel free to do so in the comments.