Is bit depth about dynamic range or data?

mansr · February 3, 2020

1 hour ago, fas42 said:

Bit depth is about the signal to noise ratio. [...]

Dynamic range is purely about mastering decisions - nothing to do with bit depth.

Signal to noise ratio and dynamic range are the same thing.

mansr · February 3, 2020

1 hour ago, Teresa said:

That is also the way I understand it. For a 16bit 44.1kHz music file samples are taken 44,100 times each second. In this case the sample is a 16 bit word represented by 1's and 0's, but just need to be two different states such as lands and pits on CDs. That 16 bit word has to completely describe the music that occurred during that 44,100th of a second. If this is not correct I also would like to know.

A sample of audio describes the air pressure at the microphone at one instant. That is all it does. To describe tones or music, a sequence of samples is required. As shown in the sampling theorem, a sequence of samples at 44.1 kHz perfectly captures any combination of tones up to 22.05 kHz. Even the most complex music imaginable is, mathematically, nothing but a combination of tones, so a full orchestra is no more challenging than a solo flute.

The sampling theorem does, however, assume the samples are perfect with infinite resolution. At a fixed bit depth, the stored sample value consists of the true value plus a small quantisation error. When dither is used, the this error appears as low-level noise. At 16-bit resolution the noise can be audible as a faint hiss if the volume is turned up very high while playing a suitable synthetic test signal; in real recordings other noise sources dominate. Increasing the bit depth lowers the noise level, allowing softer sounds to be recorded. Regardless of bit depth, any number of tones rising above the noise level are recorded equally well. While this may seem unintuitive, it is how these things work, both according to the maths and in actual practice.

mansr · February 3, 2020

11 minutes ago, bluesman said:

One major difference in this analogy is that a single static sample of the audio signal makes no sound - only the sequential confluence of all samples can generate audible output. But each frame of a movie is a picture in itself. So, unlike a single sample from an audio signal, it contains information usable in isolation from the rest of the reel.

A sample of audio is more like a single pixel in an image than a still from a video.

mansr · February 3, 2020

20 minutes ago, davide256 said:

Tape and reel can have much greater "bit density" because of an essentially infinite sample rate

That is a common misconception. The frequency range of tape is determined by the tape speed along with the width of the magnetic gap of the recording/playback head, roughly corresponding to sample rate in a digital system, although there is a gradual roll-off rather than a hard limit. The noise level depends on the density of magnetic domains on the tape (similar to film grain) and the track width.

Compared to CD quality digital audio, a properly set up reel-to-reel tape gives somewhat higher frequency limit and a much higher noise level. The high noise level is the reason for all the various noise reduction systems (Dolby, dbx).

mansr · February 3, 2020

1 hour ago, bluesman said:

I'm offering a loose functional analogy meant to be illustrative for Teresa and not a literal description. I thought the movie analogy was more useful because each frame is an instantaneous sample of the changing visual "signal" and the end product is a dynamic sequence of these samples. As the dynamics of motion picture production and control are similar in many ways to those of an audio waveform, it just made a lot more sense to me than your example. I could be wrong - I look forward to feedback in this thread to help me improve my communication skills.

I suppose that pixels in an image can illustrate the same concept in a different representation, the main differences being that pixels are not samples or "complete" representations of anything. They're components that combine to form a static image just as linked dyes combine to form the color image on emulsion based film. And there are many different kinds of pixels that vary in shape, size, ability to display color etc.

I don't like the motion picture analogy since there each frame contains, in isolation, readily identified representations of the objects in the scene. In a single sample from an audio recording of an orchestra, there is not a part for the violin, another for the oboe, etc. There is just a single number, the isum of the air pressures contributed by each instrument at one instant.

If we want to properly compare sound recording to motion pictures, we must expand our view a little. The scene we wish to record as a motion picture can be thought of as a cuboid, one dimension representing time and the other two the spatial extent of a 2D projection (the view through the camera). Restricting ourselves to the black-and-white case, each point within this cuboid has a scalar value representing the light intensity at the corresponding place and time. When we record this scene, we obtain a set of samples regularly distributed like atoms in a crystal lattice. The spacing between samples in the two spatial dimensions determines smallest details we can capture while their distance along the temporal dimension limits the rate of change representable. The familiar sampling theorem can be applied independently along each of the three dimensions.

Going back to audio, the equivalent geometric view of what a microphone captures is a simple line (of zero thickness) extending along the time dimension. Each point along the line has a value representing the air pressure at the microphone. By sampling this value at regular intervals, we obtain a representation of all variations up to a maximum frequency.

Taking the graphical analogy even further, DSD is quite similar to half-tone printing, but that's straying from the topic.

1 hour ago, bluesman said:

For Teresa et al: there are similarities that may help you understand the subject that started this discussion. Each pixel in an image can display multiple colors within a designated set, e.g. RGB or cyan-magenta-yellow-black (because not all pixels are functionally alike). An 8 bit color image can carry 2^8 colors - it's like having a box of 256 crayons that divide the entire visible color spectrum into 256 parts by frequency. No matter how many shades of color are in the source, the pixels in the screen will use the closest of the 256 colors it can display to each color in the source image. If the exact shade of red is between two in the "crayon box", it will use the closest one. A 10 bit image can display 1024 different colors, so it can render an image closer to the original in color composition.

The accuracy of color rendition is somewhat analogous to the accuracy of voltage representation in a single sample of a digitized audio waveform, in that the exact value is limited to a given number of decimal places. So it's "rounded" up or down to fit within the limits of that digital field. The more bits available per sample, the more accurately the value can be recorded (i.e. the more significant digits it contains and the smaller the potential difference - no pun intended - between the actual value and its digital approximation).

Colour is a whole new level of complexity. As we know, light of any colour can be separated into its spectral components using a prism. Think of this as applying the Fourier transform to a sound, revealing all the constituent frequencies. Although the visible spectrum is continuous from red to violet, our eyes contain only three types of colour receptors most sensitive to red, green, and blue light. Pure yellow light excites both red and green receptors, and our brain interprets this as yellow. Now the brain has no way of knowing why the red and green receptors are both signalling. Thus, if we create light consisting of some pure red and some pure green, the eye reacts exactly the same way, and we perceive a yellow colour. For this reason, colour displays get away with having only red, green, and blue components in each pixel. The precise wavelengths of each together with their brightness determines the range of colours (often referred to as gamut) we can trick the brain into perceiving.

mansr · February 3, 2020

22 minutes ago, tmtomh said:

Except that's not a subjective take on the matter - it's just an imprecise one. If an orchestral performance's dynamic range is 96dB or less, then it can be fully captured by 16 bits, as per @mansr's point above that all sound pressure levels above the noise floor are captured equally accurately. If the dynamic range of the performance is more than 96dB, then it cannot be fully captured at 16 bits (although the use of noise-shaping dither combined with nonlinear human hearing sensitivity and the non-uniform frequency distribution of the musical sounds mean that in practice even a performance with a dynamic range greater than 96dB usually can be effectively fully captured at 16 bits).

Even with flat 16-bit TPDF dither, a 1 kHz tone at -100 dBFS is audible over headphones, even though the dither noise is subjectively louder.

mansr · February 3, 2020

3 minutes ago, audiojerry said:

But so far, I don't feel like my original question has been answered - at least to my benefit. As best as I can determine, bit depth captures both musical data and dynamic range - it is not exclusively one or the other. Part of my remaining confusion can be attributed to my conception of dynamic range. My simplistic view is that dynamic range is the range of loudness from the noise floor (zero) to the loudest peak in terms of spl.

In this context, dynamic range and (peak) signal to noise ratio mean the same thing. As you say, it is the range from the noise floor to the loudest peak. This still leaves us with two facets of the concept referring, respectively, to a signal and to a recording medium. When discussing a signal, dynamic range denotes the ratio of the highest peaks to the background noise. In describing a recording medium, the same term is used in reference to the capability of said medium. A medium with a dynamic range of D is capable of storing a signal with a dynamic range no greater than D. As an analogy, a gallon-sized container (the medium) will happily hold a pint of water (the signal).

The term dynamic range is also used in the DR ratings of musical recordings. Here it refers, roughly, to the ratio of the peak to the average level. For the intended purpose, this is a better metric than peak to minimum since even a DR4 track can have short periods of near silence, at the ends if not elsewhere.

Sign In

Is bit depth about dynamic range or data?

Recommended Posts

mansr

Link to comment

mansr

Link to comment

mansr

Link to comment

mansr

Link to comment

mansr

Link to comment

mansr

Link to comment

mansr

Link to comment

Create an account or sign in to comment

Create an account

Sign in

Activity

Immersive

Subscriptions

My Details