Jump to content
  • 0
IGNORED

Is bit depth about dynamic range or data?


audiojerry

Question

I thought after all this time I was correctly explaining bit depth and sample rates to my non-audiophile friends, but now I"m not so sure. I thought that bit depth or bit size determines how much information can be captured in a single sample taken from an analog signal. So if, for example, you are recording a symphony orchestra, there are lots of instruments creating a lot of complex tonal information and sound  levels. This creates a complex analog waveform, and when you take a sample of this waveform, you are going to digitize it and store it in a file. This single sample of the waveform would obviously contain a lot of information about what was happening in this symphony orchestra in that instant of time. The larger the bit depth, the more information you can capture, and you have a better quality file to produce a better quality recording.

 

But now I'm hearing that bit depth is all about dynamic range. That seems too simplistic to me.  

Any experts out there who can set me straight?

 

 

Link to comment

Recommended Posts

  • 2
1 hour ago, Teresa said:

That is also the way I understand it. For a 16bit 44.1kHz music file samples are taken 44,100 times each second. In this case the sample is a 16 bit word represented by 1's and 0's, but just need to be two different states such as lands and pits on CDs. That 16 bit word has to completely describe the music that occurred during that 44,100th of a second. If this is not correct I also would like to know. 

A sample of audio describes the air pressure at the microphone at one instant. That is all it does. To describe tones or music, a sequence of samples is required. As shown in the sampling theorem, a sequence of samples at 44.1 kHz perfectly captures any combination of tones up to 22.05 kHz. Even the most complex music imaginable is, mathematically, nothing but a combination of tones, so a full orchestra is no more challenging than a solo flute.

 

The sampling theorem does, however, assume the samples are perfect with infinite resolution. At a fixed bit depth, the stored sample value consists of the true value plus a small quantisation error. When dither is used, the this error appears as low-level noise. At 16-bit resolution the noise can be audible as a faint hiss if the volume is turned up very high while playing a suitable synthetic test signal; in real recordings other noise sources dominate. Increasing the bit depth lowers the noise level, allowing softer sounds to be recorded. Regardless of bit depth, any number of tones rising above the noise level are recorded equally well. While this may seem unintuitive, it is how these things work, both according to the maths and in actual practice.

Link to comment
  • 0
4 minutes ago, fas42 said:

Bit depth is about the signal to noise ratio. If you reduce the depth, you get a higher level of random noise - tape hiss is the obvious analogue variant. A decent digital encoding can capture that tape hiss with ease, so "everything that matters" is being transferred

 

This all assumes that the person who might be playing around with bit depth, while recording and/or mastering, knows how to apply the correct dither, at the correct point of operations ... get it wrong, and you can hear the mistake.

 

Human hearing can compensate for random loss of data, or excess noise, remarkably well - good handling of digital data can rely on that ability, to make even poor bit depth "sound OK".

 

Dynamic range is purely about mastering decisions - nothing to do with bit depth.

Thanks for the knowledgeable reply, but it leaves me confused on a couple of levels.

 

It seems like you are saying that bit depth is not relevant to capturing all the musical information needed for a good sounding recording. If so, then why bother going beyond 16 bits? Maybe some of my confusion goes back to computer programming where a 16 bit word can contain only half the data of a 32 bit word.

 

As far as tape hiss is concerned, some of the best sounding recordings I've heard are full of tape hiss, but my ears ignore it, and only notice it when there is no music. The best sounding music has always come when listening to a reel-to-reel tape.  

Link to comment
  • 0

With regard to recording, if one can guarantee in advance that the sound levels of the music happening will only reach a precise maximum value, and you adjust the recording chain in advance to neatly peak in level for that exact value, then 16 bits will be fine. But in the real world of music making, this would never work - musicians become passionate, and easily go much higher in a transient peak - you want headroom in recording, to make sure that the recording never clips; digital is far less forgiving than analogue tape. Reduce gains in the recording chain to give some margin for this, and then you are not using the full 16 bits to capture the performance - 24 bits is a nice number for the technical side to use; you will never use that full 24 bits, but the 16 bits you're really after can be comfortably surrounded by "insurance" bits. And then when you do editing then you want maximum precision in the mathematical handling, so no matter how much you attenuate or amplify to get the sound you're after, nothing gets lost or degraded doing these operations - high precision is cheap to do these days, so the bigger the numbers which are used, 32, 64, etc, the better.

 

Only when all that is done, and you are ready to produce the output which the consumer will see, do you produce the 16 bit version - this can be neatly fit into that range - and that will have all the musical detail that matters.

 

Agree about tape hiss - human hearing can discard its presence when the playback is "in the zone" - the very best reel to reel is only roughly equivalent to 12, 13 bits depth - which is why Dolby came into the picture.

Link to comment
  • 0
11 hours ago, audiojerry said:

I thought after all this time I was correctly explaining bit depth and sample rates to my non-audiophile friends, but now I"m not so sure. I thought that bit depth or bit size determines how much information can be captured in a single sample taken from an analog signal. So if, for example, you are recording a symphony orchestra, there are lots of instruments creating a lot of complex tonal information and sound  levels. This creates a complex analog waveform, and when you take a sample of this waveform, you are going to digitize it and store it in a file. This single sample of the waveform would obviously contain a lot of information about what was happening in this symphony orchestra in that instant of time. The larger the bit depth, the more information you can capture, and you have a better quality file to produce a better quality recording.

 

But now I'm hearing that bit depth is all about dynamic range. That seems too simplistic to me.  

Any experts out there who can set me straight?

 

That is also the way I understand it. For a 16bit 44.1kHz music file samples are taken 44,100 times each second. In this case the sample is a 16 bit word represented by 1's and 0's, but just need to be two different states such as lands and pits on CDs. That 16 bit word has to completely describe the music that occurred during that 44,100th of a second. If this is not correct I also would like to know. 

I have dementia. I save all my posts in a text file I call Forums.  I do a search in that file to find out what I said or did in the past.

 

I still love music.

 

Teresa

Link to comment
  • 0
4 hours ago, Teresa said:

 

That is also the way I understand it. For a 16bit 44.1kHz music file samples are taken 44,100 times each second. In this case the sample is a 16 bit word represented by 1's and 0's, but just need to be two different states such as lands and pits on CDs. That 16 bit word has to completely describe the music that occurred during that 44,100th of a second. If this is not correct I also would like to know. 

Maybe this will help:

 

Digitizing an analog function (like the sequential voltages that comprise the waveform of moving electrons coming from the recording console) is like making a movie by stringing still images together.  The first motion pictures could only show jerky motions because they included so few stills per second. The image quality was poor because optics and film technology were both crude. Stability and reproducibility of the film path were inconsistent, so action might look too fast or slow - and you can definitely see jitter :) 

 

Movie film got bigger, moved faster, and recorded better images as it evolved.  A 16mm frame had a fraction of the information contained in a 35mm frame of the same scene shot on the same emulsion, and film technology advances improved accuracy of individual images - so quality improved because the “bit depth” increased.  The mechanical speed of film’s advance through camera and projector smoothed motion by increasing the visual sampling rate.

 

As a kid, you probably drew a series of cartoon-like sketches of a stick figure on the pages of a small pad of paper, moving one of its arms up a bit in each successive sketch. When you flipped the pages, you saw what looked like a “motion picture”.  The more pages in your pad, the more smoothly you could portray the motion by making the successive position changes smaller and cramming more of them into the same flip time.  This is another example of sampling rate.

 

You could use a pencil to make crude stick figures, or you could make more refined drawings. The less information you put into the sketches, the less like a person and the more like a bunch of moving lines the moving image looked.  If you drew artful images of a boy waving, your “movie” looked more like a boy waving - it was more accurate because there was more information in it, i.e. it had greater “bit depth”.

 

Bit depth determines the accuracy of the instantaneous value of the analog function being represented - and in this, accuracy means variance in the actual value. That parameter is voltage at most stages of audio ahead of the analog renderer that turns the signal back into air pressure waves so you can hear it (the usual output power metric is based on current).  More accurate instantaneous values in the string mean more accurate capture, storage & reproduction of the analog input signal.
 

One argument against going above 16 bits for consumer audio files is that we “can’t hear” the difference.  This would be analogous to limiting display resolution of optical media to the highest we can see. It’s a tricky path without definitive answers.  This may be too obscure or abstract to help visualize the concept, although I hope not. But the basics are simple: sampling rate is analogous to the number of stills in a second of movie film running at standard speed, and bit depth is a measure that reflects the accuracy of each still.  The digital clock is analogous to speed control of the the motors that move film through the camera & projector. That speed has to be stable and exactly the same for both, or the movie won’t look right.  The same effects manifest in digital audio, for directly analogous reasons.

 

One major difference in this analogy is that a single static sample of the audio signal makes no sound - only the sequential confluence of all samples can generate audible output. But each frame of a movie is a picture in itself. So, unlike a single sample from an audio signal, it contains information usable in isolation from the rest of the reel.

Link to comment
  • 0
11 minutes ago, bluesman said:

One major difference in this analogy is that a single static sample of the audio signal makes no sound - only the sequential confluence of all samples can generate audible output. But each frame of a movie is a picture in itself. So, unlike a single sample from an audio signal, it contains information usable in isolation from the rest of the reel.

A sample of audio is more like a single pixel in an image than a still from a video.

Link to comment
  • 0
1 hour ago, mansr said:

A sample of audio is more like a single pixel in an image than a still from a video.

I'm offering a loose functional analogy meant to be illustrative for Teresa and not a literal description.  I thought the movie analogy was more useful because each frame is an instantaneous sample of the changing visual "signal" and the end product is a dynamic sequence of these samples. As the dynamics of motion picture production and control are similar in many ways to those of an audio waveform, it just made a lot more sense to me than your example. I could be wrong - I look forward to feedback in this thread to help me improve my communication skills.

 

I suppose that pixels in an image can illustrate the same concept in a different representation, the main differences being that pixels are not samples or "complete" representations of anything.  They're components that combine to form a static image just as linked dyes combine to form the color image on emulsion based film.  And there are many different kinds of pixels that vary in shape, size, ability to display color etc.

 

For Teresa et al:  there are similarities that may help you understand the subject that started this discussion.  Each pixel in an image can display multiple colors within a designated set, e.g. RGB or cyan-magenta-yellow-black (because not all pixels are functionally alike). An 8 bit color image can carry 2^8 colors - it's like having a box of 256 crayons that divide the entire visible color spectrum into 256 parts by frequency. No matter how many shades of color are in the source, the pixels in the screen will use the closest of the 256 colors it can display to each color in the source image.  If the exact shade of red is between two in the "crayon box", it will use the closest one.  A 10 bit image can display 1024 different colors, so it can render an image closer to the original in color composition. 

 

The accuracy of color rendition is somewhat analogous to the accuracy of voltage representation in a single sample of a digitized audio waveform, in that the exact value is limited to a given number of decimal places.  So it's "rounded" up or down to fit within the limits of that digital field.  The more bits available per sample, the more accurately the value can be recorded (i.e. the more significant digits it contains and the smaller the potential difference - no pun intended - between the actual value and its digital approximation).

Link to comment
  • 0

bit depth does determine dynamic range. The CD 44.1khz sampling rate/16 bit dynamic range format was driven by commercial digital technology  practicalities of the 80s... it works most of the time but fails on complex music and music where there is significant content at dynamic range extremes. The later DVD format of 48/24 addressed the corner cases but I find that the early DVD recordings were focused on multichannel and had poor stereo versions. Higher sampling rates like 96khz and 192 khz seem to help in taming DAC digital artifacts.

 

The combination of sampling rate x bit depth = bit density, a bigger number is nice but when you get to media like DSD file sizes can be huge. I've gravitated towards 96/24 as the sweet spot for least DAC issues and reasonable file sizes. Many CD's from the 80's that  I originally detested are very enjoyable now with modern DAC's and streaming technology... wasn't the CD format, it was errors introduced by DAC and transport solution.

 

Tape and reel can have much greater "bit density" because of an essentially infinite sample rate but the analog recording capture technologies have  dynamic range challenges with loud music. The early Telarc digital LP's showcased how digital capture could better do dynamic range

for loud passages.

 

 

 

Regards,

Dave

 

Audio system

Link to comment
  • 0
16 hours ago, mansr said:

Signal to noise ratio and dynamic range are the same thing.

That's true for the equipment but not the program material, which is an important functional difference.  Program material rarely has a DR equal to the SNR of the equipment through which it's being played.

 

Unless the DR of the program is equal to or greater than the SNR of the system (which is virtually unheard of today), the desired listening level will determine the effective SNR.  If the recording is a quiet piece with limited DR, e.g. concerti for solo violin or guitar, the listener may like to listen at sufficient volume to make system background noise intrusive. Tchiakovsky's 4th has a wide DR, so I set the volume control lower to avoid excessive peak SPL.  This also lowers background noise, so the audible SNR is higher. 

Link to comment
  • 0
40 minutes ago, davide256 said:

bit depth does determine dynamic range.

I'm not suggesting otherwise and apologize if I gave that impression.  I was just trying to convey in simpler terms that the way in which it does this is to enable more accurate capture of instantaneous signal levels in the source waveform, which obviously encompasses both the bottom and the top of the DR.  And that accuracy is, in large part, determined by errors resulting from fitting each sample to the size of the "word" (i.e. bit depth).  This is quantization error, if I remember this all correctly. 

 

The other common confusion I see stemming from this is failure to understand that the bit depth of the recorded file determines only the DR of the recording.  It determines the DR of the source file you're playing, not the SNR of your playback equipment.

Link to comment
  • 0
17 hours ago, fas42 said:

 

Yes, if one is talking about it as a purely technical level concept - but dynamic range is thrown around these days as having subjective connotations - as in, "orchestral performances can't be captured by 16 bits, the sound is, too big!" ... I was referring to this subjective take on the matter.

 

Except that's not a subjective take on the matter - it's just an imprecise one. If an orchestral performance's dynamic range is 96dB or less, then it can be fully captured by 16 bits, as per @mansr's point above that all sound pressure levels above the noise floor are captured equally accurately. If the dynamic range of the performance is more than 96dB, then it cannot be fully captured at 16 bits (although the use of noise-shaping dither combined with nonlinear human hearing sensitivity and the non-uniform frequency distribution of the musical sounds mean that in practice even a performance with a dynamic range greater than 96dB usually can be effectively fully captured at 16 bits).

Link to comment
  • 0
1 hour ago, bluesman said:

I'm offering a loose functional analogy meant to be illustrative for Teresa and not a literal description.  I thought the movie analogy was more useful because each frame is an instantaneous sample of the changing visual "signal" and the end product is a dynamic sequence of these samples. As the dynamics of motion picture production and control are similar in many ways to those of an audio waveform, it just made a lot more sense to me than your example. I could be wrong - I look forward to feedback in this thread to help me improve my communication skills.

 

I suppose that pixels in an image can illustrate the same concept in a different representation, the main differences being that pixels are not samples or "complete" representations of anything.  They're components that combine to form a static image just as linked dyes combine to form the color image on emulsion based film.  And there are many different kinds of pixels that vary in shape, size, ability to display color etc.

I don't like the motion picture analogy since there each frame contains, in isolation, readily identified representations of the objects in the scene. In a single sample from an audio recording of an orchestra, there is not a part for the violin, another for the oboe, etc. There is just a single number, the isum of the air pressures contributed by each instrument at one instant.

 

If we want to properly compare sound recording to motion pictures, we must expand our view a little. The scene we wish to record as a motion picture can be thought of as a cuboid, one dimension representing time and the other two the spatial extent of a 2D projection (the view through the camera). Restricting ourselves to the black-and-white case, each point within this cuboid has a scalar value representing the light intensity at the corresponding place and time. When we record this scene, we obtain a set of samples regularly distributed like atoms in a crystal lattice. The spacing between samples in the two spatial dimensions determines smallest details we can capture while their distance along the temporal dimension limits the rate of change representable. The familiar sampling theorem can be applied independently along each of the three dimensions.

 

Going back to audio, the equivalent geometric view of what a microphone captures is a simple line (of zero thickness) extending along the time dimension. Each point along the line has a value representing the air pressure at the microphone. By sampling this value at regular intervals, we obtain a representation of all variations up to a maximum frequency.

 

Taking the graphical analogy even further, DSD is quite similar to half-tone printing, but that's straying from the topic.

 

1 hour ago, bluesman said:

For Teresa et al:  there are similarities that may help you understand the subject that started this discussion.  Each pixel in an image can display multiple colors within a designated set, e.g. RGB or cyan-magenta-yellow-black (because not all pixels are functionally alike). An 8 bit color image can carry 2^8 colors - it's like having a box of 256 crayons that divide the entire visible color spectrum into 256 parts by frequency. No matter how many shades of color are in the source, the pixels in the screen will use the closest of the 256 colors it can display to each color in the source image.  If the exact shade of red is between two in the "crayon box", it will use the closest one.  A 10 bit image can display 1024 different colors, so it can render an image closer to the original in color composition. 

 

The accuracy of color rendition is somewhat analogous to the accuracy of voltage representation in a single sample of a digitized audio waveform, in that the exact value is limited to a given number of decimal places.  So it's "rounded" up or down to fit within the limits of that digital field.  The more bits available per sample, the more accurately the value can be recorded (i.e. the more significant digits it contains and the smaller the potential difference - no pun intended - between the actual value and its digital approximation).

Colour is a whole new level of complexity. As we know, light of any colour can be separated into its spectral components using a prism. Think of this as applying the Fourier transform to a sound, revealing all the constituent frequencies. Although the visible spectrum is continuous from red to violet, our eyes contain only three types of colour receptors most sensitive to red, green, and blue light. Pure yellow light excites both red and green receptors, and our brain interprets this as yellow. Now the brain has no way of knowing why the red and green receptors are both signalling. Thus, if we create light consisting of some pure red and some pure green, the eye reacts exactly the same way, and we perceive a yellow colour. For this reason, colour displays get away with having only red, green, and blue components in each pixel. The precise wavelengths of each together with their brightness determines the range of colours (often referred to as gamut) we can trick the brain into perceiving.

Link to comment
  • 0
4 minutes ago, mansr said:

I don't like the motion picture analogy since there each frame contains, in isolation, readily identified representations of the objects in the scene. In a single sample from an audio recording of an orchestra, there is not a part for the violin, another for the oboe, etc. There is just a single number, the isum of the air pressures contributed by each instrument at one instant.

Very interesting thoughts - thanks!

 

I see the digitized waveform a bit differently, in that each and every instrument being played is present (if being played at the time of capture) in each and every sample. The single instantaneous value being captured is the summation of all values for all parts being played.  We can't separate them within an individual sample because there's no dynamic context - the samples by themselves contain data but no information, and are a perfect example of the difference between the two, in my opinion.

 

But sequenced as they were when captured, they define a complex waveform in which the individual parts can be identified by ear and in a Fourier transformation. And we could determine the contribution of each instrument to the value of that sample with a little (OK, more than a little...) mathematical manipulation.  Of the 1.3V in the 12,273,418th sample of a string trio piece, we might see that 0.2V were the violin, 0.4 were the viola, 0.5 the cello, 0.15 the natural intermodulation of the three, and 0.05 the cumulative noise.

 

Just thinkin'...........😉

Link to comment
  • 0
1 hour ago, The Computer Audiophile said:

 

The magic word in this is "available" (at 36 seconds into the video).  Bit depth determines the dynamic range available for use during recording, i.e. the maximum DR of recordings captured by the system.  This is independent of the source program itself and of playback equipment.  I suspect you could record a rock band playing a song that has a DR of 4 dB (which is typical in some genres) at an average level of 0dB with an 8 bit system and hear little if any difference compared to a 16 or 24 bit capture. The noise floor would be 20 dB below the signal in an 8 bit file, and the signal would be sufficiently loud and sufficiently compressed to render any differences in accuracy inaudible.

 

Low bit rates create an artificially high noise floor by "compressing" all signal that's within the lowest quantum level range in each sample to the same amplitude, which makes the noise as loud as any musical content that's also within that range.  Signals above the top of that range are unaffected by the noise, although they too are compressed within their ranges (which is not mentioned in the video).  

Link to comment
  • 0

Remember me? I'm the guy who originated this discussion - LOL.

 

I am delighted that my query has generated so much fantastic and exceptionally intelligent dialogue. I admit that much of what has been said is over my head in terms of technical understanding, but has been enlightening nonetheless. Please keep it going. To me it provides some insight into the complexities of capturing, recording, and producing a digitized version of music and why it has taken so long to have finally arrived at an era where digital recordings actually sound great.

 

This discussion may also parallel ongoing controversy about the best methods for recording and playback surrounding PCM and DSD, and more recently, MQA.

  

I found Bluesman's analogy to movies easy to grasp and very helpful. Thank you sir.

 

But so far, I don't feel like my original question has been answered - at least to my benefit. As best as I can determine, bit depth captures both musical data and dynamic range - it is not exclusively one or the other. Part of my remaining confusion can be attributed to my conception of dynamic range. My simplistic view is that dynamic range is the range of loudness from the noise floor (zero) to the loudest peak in terms of spl.   

Link to comment
  • 0
2 minutes ago, audiojerry said:

But so far, I don't feel like my original question has been answered - at least to my benefit. As best as I can determine, bit depth captures both musical data and dynamic range - it is not exclusively one or the other.

I agree with you.  Digital recording breaks the continuous frequency spectrum of analog sound into quanta of frequencies and compresses all levels within each quantum to a mean.  This reduces dynamic contrast within each quantum and has to affect the liveliness of reproduction to some degree.  Higher bit depth creates more quanta, so there should be less of this effect.  I must admit that I'm not sure I can actually hear it in most recordings of the same material made at 16 and 24 bits - but theoretically, it makes sense to me.

Link to comment
  • 0
5 hours ago, davide256 said:

bit depth does determine dynamic range. The CD 44.1khz sampling rate/16 bit dynamic range format was driven by commercial digital technology  practicalities of the 80s... it works most of the time but fails on complex music and music where there is significant content at dynamic range extremes. The later DVD format of 48/24 addressed the corner cases but I find that the early DVD recordings were focused on multichannel and had poor stereo versions. Higher sampling rates like 96khz and 192 khz seem to help in taming DAC digital artifacts.

 

 

 

And this is why I deliberately used the term "dynamic range" the way I did in my post above - there is a quite common belief that somehow 16 bits can't contain "high dynamic range", but this is purely an implementation problem in typical playback chains. The "failure" occurs because of, yes, digital artifacts, but this has nothing to do with the measurable technical performance of the DAC and associated circuitry; and everything to do with anomalies caused by inadequate engineering of the overall component, and system. These are unfortunately audible, and lead so many people to give, say, CD replay the thumbs down ... luckily, it turns out that careful tweaking can attenuate these issues sufficiently; but very few people are motivated to go to the lengths to achieve this. Other solutions are to buy very expensive bits of kit now which can deliver the "dynamic range" - or just wait until eventually everyday audio components are engineered well enough to do the job.

Link to comment
  • 0
2 hours ago, audiojerry said:

 

But so far, I don't feel like my original question has been answered - at least to my benefit. As best as I can determine, bit depth captures both musical data and dynamic range - it is not exclusively one or the other. Part of my remaining confusion can be attributed to my conception of dynamic range. My simplistic view is that dynamic range is the range of loudness from the noise floor (zero) to the loudest peak in terms of spl.   

 

 

Where digital "gets it wrong" for many people in the real world of playback, is that critical information that is encoded at relatively quiet levels compared to the maximum signals that are occurring at the same time, is too distorted by imperfections in the playback chain to be easily discerned by the listening mind - people hear this all the time in sub-par systems; a track which is a complex mix of sounds is played, and it "sounds a mess!" ... the dynamic range is there, as a technical, measurable characteristic, but distortion of low level information is too great - and subjectively "you can't hear what's going on" ...

 

I recently posted a clip of a track from a Ry Cooder album, and the response was that it "just collapses into a bowl of mush" - this is a classic symptom of inadequate effective resolution of the playback chain; subjectively, the "dynamic range" is not good enough ... and this has absolutely nothing to do with the encoding using only 16bits.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...