Sample Rate Inflation
John Siau, the head designer and principal at Benchmark, writes white papers and posts them to their site. As many of you know, I regard John to be one of the industry’s real experts and I defer to his expertise more times than I care to admit. His latest post is about sample rates and the Nyquist Theorem. It’s well worth the read. He manages to explain the essence of digital sampling in a clear and simple to understand way. You can click here to read it.
The bottom line on traditional PC sampling is that 96 kHz or 88.2 kHz is more than enough plenty of samples. The filters necessary to remove any frequencies higher than the sampler rate divided by 2 (the Nyquist) are well within the design abilities of current designers.
However, if you’ve been following Robert Stuart and the MQA initiative and other experts on the topic, the discussion turns to timing. There is lots of talk about the inability of lower sample rates to capture the “timing” ability of human hearing, which is pegged at between 5-10 microseconds. The claim is that unless we increase the sample rate to at least 192 kHz or 384 kHz or higher that music recordings are not delivering everything that our brains need to reconstruct a live audio event.
The need for these higher sample rates isn’t about making sure we include ultrasonics in our tracks but to subdivide time so fine that we don’t have any inter ear timing errors.
The arrival of audio signals to our ears is tremendously important for several reasons. But we have to distinguish between micro and milli second timing to understand theses reasons. The millisecond timing differences are used by our ear for directionality. Our stereophonic hearing is very good at being able to identify where a sound is coming from.
The microsecond timing stuff is the domain of neuroscientists and according to the Meridian guys there is recent research regarding the importance of getting this area of timing right.
Then there’s getting the “transients” right and the related “pre and post” ringing out of our PCM digital filters. The higher the sample rate the higher the point at which the ringing occurs. For traditional CDs, this means that the “pre-ringing” happens at just past our hearing range,. Maybe this type of error is audible or affects the sound of CD or maybe not. But if we raise the sample rate to 96 kHz it ceases being a factor.
This is the gist of what John wrote when I asked him about the benefits of higher sampling rates on the timing aspect of digital audio. Here’s his response:
“Timing accuracy is not a function of sample rate. I tried to illustrate this in my paper using the examples of the runner and the rotating wheel. Nanosecond timing variations are easily resolved with a 44.1 kHz sample rate. This is easily demonstrated with a jitter signal of a few nanoseconds. Jitter would not be an issue if we could not resolve nanosecond timing variations. It would also be impossible to make binaural recordings, which are very dependent upon phase accuracy.
Pre-ringing occurs near the cut-off frequency (Nyquist Frequency). If the cut-off frequency is 48 kHz (96 kHz sampling rate), the ringing is not an issue. At 44.1 kHz, the 22 kHz pre-ringing may be an issue if the listener has any ability to hear 22 kHz. It is more likely that the 22 kHz ringing causes IMD in the tweeter producing audible distortion at frequencies below 20 kHz. It should be noted that the duration of the pre and post ring decreases as the sample rate increases. So a 2X sample rate will shorter ring by a factor of 2 while increasing the ring frequency by a factor of 2. The combination of these two factors, give 96 kHz a significant advantage over 44.1 kHz (assuming that there is an audible defect at 44.1 kHz). The bottom line is that 44.1 kHz is so close to the limits of our hearing that is will cause audible problems unless the stars align perfectly. One small defect in a 44.1 kHz system can cause audible errors. In contrast, a 2X system has a much larger margin for error.
John Siau”
We’ll talk about IMD or intermodulation distortion soon.
You say Mark: “The bottom line is that 44.1 kHz is so close to the limits of our hearing that is will cause audible problems unless the stars align perfectly. One small defect in a 44.1 kHz system can cause audible errors. In contrast, a 2X system has a much larger margin for error.”
However, it seems that the theory regarding HRA is that it allows a better music listening experience than what CDs can deliver. On one hand, human are not supposed to hear anything over 20 or 22 kHz but, on the other hand, it appears that higher frequencies play a role in HRA combined with other factors (very interesting subject). Furthermore, the electronic industry is now building equipment that can (they say) reproduce close to 100 kHz.
Therefore, at a 96 kHz sampling rate, pre-post ringing is around 48 kHz plus, IMD (and perhaps THD) that may occur can cut that by half: meaning tweeter error getting close to the hearing threshold.
So why not use the beloved “precautionary principle” and use 192 kHz sampling rate, just to eliminate that possibility and all the controversy around that factor?
John’s explanation is certainly very thorough, but I fear it is too long, and that length may increase the difficulty for some to make the leap from his examples to the final point he is trying to make.
“The need for these higher sample rates isn’t about making sure we include ultrasonics in our tracks but to subdivide time so fine that we don’t have any inter ear timing errors.”
I don’t know about inter ear timing errors, but I envision another issue: I’ve been told there isn’t a problem, but I still wonder about the ability of capacitors to perfectly reconstruct sine wave using discrete voltage samples. So can a 22kHz (or 16kHz or 12kHz) sine wave really be reconstructed using only one or slightly more samples for a complete cycle? And perhaps more importantly can our ears tell the difference if the sine wave is distorted. The visual illustrations I’ve seen generally have the sine wave, of perhaps 1000Hz divided into 22 or so, points, which visually define a sine wave pretty well. But I would be hard pressed to imagine the shape of a sine wave with 8 or fewer points, so I have to wonder how well different electronics really perform this function. Bear in mind that I am assuming that sounds are made of different sine wave frequencies superimposed upon each other.
Two samples per wave is enough…end of story.
The ellipsis replaces a lot of detail that I had hoped would be included in the answer; assuming there is an easy answer. My question (or comment) concerns whether building something using specified parts actually performs exactly according to theory; the part in this case is a capacitor; you would probably agree that reconstructing a sine wave properly is pretty important. There has been much discussion about whether two competent pieces of equipment performing the same function using the same specifications (a DAC, for example) can have performance differences. I don’t think you’ve said there can be no differences, but certainly some of your blog’s contributors have. I believe the selection of parts (having the same measured specs) can make a significant difference in the resulting sound; one that can be easily heard.
Yes two samples is enough. The mistake you are making is thinking the dots of samples are the waveform. They aren’t. Yes if you have 20 something points it looks more like a sine wave, but the actual wave is a sine even with two sample points.
I have taken a 4 khz sine at 192 khz rates. It has 48 points and looks like a nice sine. Played that into an ADC recording at 48 khz. The wave then shows only 12 sample points per wave and looks rather blocky and not very sine like. Played that 48 khz file and recorded the result at 192 khz. You get back just as nice a looking group of samples. You can subtract one from the other and have little left between the original and the sine going through a 48khz bottleneck other that very low level noise.
Watch this video. They take some very good analog equipment and show you the analog result of passing through an AD/DA stage. Quite simple to look and understand. The other replies about Fourier analsysis and such are all correct. But simply seeing these results in video quite simply answers the question without no need for other theory for the most part.
https://www.youtube.com/watch?v=cIQ9IXSUzuM
Dennis,
Thank you very much for putting up the link to this youtube video. Mark should permanently pin it to his homepage I think. I hope it helps a few people to overcome their misunderstanding of what ‘digital sound’ is. Although I’m sure the marketing people and the, so called, audiophile journalists will continue to spread misinformation in the interest of their pockets and their jobs.
Alan wrote: “Jhn’s explanation is certainly very thorough, but I fear it is too long..
I thought it was spot on.
Alan wrote: “.. The visual illustrations I’ve seen generally have the sine wave, of perhaps 1000Hz divided into 22 or so, points, which visually define a sine wave pretty well. But I would be hard pressed to imagine the shape of a sine wave with 8 or fewer points..”
You’re looking at it from a joint the dots interpolation viewpoint, it doesn’t work that way, the waveforms are reconstructed using sinc (sin(x)/x) functions; the sine wave can be reconstructed using just two samples.
Thank you, Dave,
I appreciate the explanation. I’m not an electrical or electronics engineer and can use additional information among those who are. I read and participate in this blog to learn, since there is little that I can teach. I construct my questions to try to indicate that it’s not always “obvious.”
Separated from but still related to my previous comment, is a notion I’ve heard that we can listen to the digitally sampled voltages without additional interpolation or other reconstruction, and our hearing will integrate and reconstruct the original signals on its own. If so, a slightly distorted electronically reconstructed signal should be a piece of cake.
‘Transients, timing, inter-ear timing errors, time vs frequency domain’, whenever I hear or read any of these, in relation to digital sampling, the red alarm light lights telling me to switch off and disregard anything else the person is saying. The reason is that it is 100% certain he either has no clue what he’s talking about or he chooses to feed his audience marketing bull for the sole good of his bank account. The scientific theory goes that all components of sound are faithfully reconstructed below 1/2 the sampling frequency. It has been established and proven many decades ago. Any engineering problems related to implementation have also nothing to do with these ‘timing’ (non)-issues.
Enough said about this matter, it has been beaten to death. Whoever chooses to believe all this stuff, while scientific truth is there waiting for him to discover it, is doing it at his own ridicule and at his pocket’s peril. Mark, I don’t think there’s more point in beating this to death. Whoever wants to educate himself can do it very easily. Let the rest be flat earthers.
‘Interpolation, reconstruction etc etc’. There’s NO interpolation or other kind of ‘guessing’ going on in creating an analog wave from digital samples. Sine waves can be, mathematically, perfectly reconstructed as long as the samples are more than double their frequency. And complex waves can be ‘broken down’ to constituent sine waves. There are no steps, staircases, not-so-perfect, coarse or less-coarse sine waves and the rest. None, nada. End of story. People should try to grasp at least the basics of the Fourier transforms. And people should try to understand what the sinc function does. There’s widespread lack of fundamental understanding of the theory behing how the ADC / DAC process works. If one does not get the basic theory, one cannot go on to discuss implementational issues.
‘Sine waves can be, theoretically, perfectly…’ please replace ‘theoretically’ with ‘mathematically’ in my phrase above.
Agreed, people tend to look at the sampled waveform and think that the dots must be joined together and that’s the digitised waveform.
Hej Mark,
As always, Sweden stands united to the last man and women in applauding your efforts no matter how in vein they might seem:)
But… about a year ago I asked you if there was any value to vinyl rips at higher than CD quality. Your answer explained so much, so well; namely, you don’t record more than what the vinyl originally offers and the original vinyl offering is by definition lower than what the cd recording enables. So far, so good…
But what to do with imd/thd potential problem of “pre-ringing” with digital rips of vinyl recordings — especially, regarding my beloved and very expensive tweeters. I understand that ripping a cd at higher than redbook specs does Nothing, but what about digital recordings at 24/96Hz of vinyl recordings… might they be better than 16/41Hz or does the pre ringing not come into play when recording vinyl..? I suspect I am confusing several issues, but my question arises out of the following:
” It is more likely that the 22 kHz ringing causes IMD in the tweeter producing audible distortion at frequencies below 20 kHz. It should be noted that the duration of the pre and post ring decreases as the sample rate increases. So a 2X sample rate will shorter ring by a factor of 2 while increasing the ring frequency by a factor of 2. The combination of these two factors, give 96 kHz a significant advantage over 44.1 kHz (assuming that there is an audible defect at 44.1 kHz). The bottom line is that 44.1 kHz is so close to the limits of our hearing that is will cause audible problems unless the stars align perfectly. One small defect in a 44.1 kHz system can cause audible errors. In contrast, a 2X system has a much larger margin for error. ”
Tack så mycket,
bill dorsey
If I was to recommend a process for transferring vinyl to PCM digital, I would move to 96 kHz/24-bits for all of the reasons that you outlined. It’s not that there’s actual sound there but the margin is beneficial.
All this obsessing over sampling rates is beside the point. I attend a lot of concerts, everything from solo piano recitals up to gargantuan 20th-century orchestral spectaculars, so I can confirm that multichannel recordings faithfully convey everything you hear at a concert, minus the coughing, the riffling through the program notes, the futzing with the cell phone, and all the other distractions of the concert experience. I’ve heard fantastic recordings from SACD-DSD, 24/96 Blu-ray, and 24/192 Blu-ray that sound exactly like live performers in a concert hall. And I can verify this because I’ve sat in those very same halls.
Well-done multichannel SACD and Blu-ray recordings, which is the vast majority of them, give us 100% fidelity, PERIOD, whether they were recorded in 44/24, 48/24, 96/24, 192/24, DSD, or DXD. End of discussion.
I would lose the two lowest sample rates in your list…and I would never advocate for DSD.
Everyone knows more is always better. But more means extra material.and labor so it costs more. But that’s a good thing too cause the more you have to spend to acquire it shows the m/ore affluent you must be, so you also have and are more.
No matter how you look at it more is better!
Mark, you just have to start recording at 32/1536 and out !MORE them all!
Thanks Sal…I’ll take your suggestion under advisement.