Jump to content
  • 0
IGNORED

Is bit depth about dynamic range or data?


audiojerry

Recommended Posts

  • 2
1 hour ago, Teresa said:

That is also the way I understand it. For a 16bit 44.1kHz music file samples are taken 44,100 times each second. In this case the sample is a 16 bit word represented by 1's and 0's, but just need to be two different states such as lands and pits on CDs. That 16 bit word has to completely describe the music that occurred during that 44,100th of a second. If this is not correct I also would like to know. 

A sample of audio describes the air pressure at the microphone at one instant. That is all it does. To describe tones or music, a sequence of samples is required. As shown in the sampling theorem, a sequence of samples at 44.1 kHz perfectly captures any combination of tones up to 22.05 kHz. Even the most complex music imaginable is, mathematically, nothing but a combination of tones, so a full orchestra is no more challenging than a solo flute.

 

The sampling theorem does, however, assume the samples are perfect with infinite resolution. At a fixed bit depth, the stored sample value consists of the true value plus a small quantisation error. When dither is used, the this error appears as low-level noise. At 16-bit resolution the noise can be audible as a faint hiss if the volume is turned up very high while playing a suitable synthetic test signal; in real recordings other noise sources dominate. Increasing the bit depth lowers the noise level, allowing softer sounds to be recorded. Regardless of bit depth, any number of tones rising above the noise level are recorded equally well. While this may seem unintuitive, it is how these things work, both according to the maths and in actual practice.

Link to comment
  • 0
11 minutes ago, bluesman said:

One major difference in this analogy is that a single static sample of the audio signal makes no sound - only the sequential confluence of all samples can generate audible output. But each frame of a movie is a picture in itself. So, unlike a single sample from an audio signal, it contains information usable in isolation from the rest of the reel.

A sample of audio is more like a single pixel in an image than a still from a video.

Link to comment
  • 0
1 hour ago, bluesman said:

I'm offering a loose functional analogy meant to be illustrative for Teresa and not a literal description.  I thought the movie analogy was more useful because each frame is an instantaneous sample of the changing visual "signal" and the end product is a dynamic sequence of these samples. As the dynamics of motion picture production and control are similar in many ways to those of an audio waveform, it just made a lot more sense to me than your example. I could be wrong - I look forward to feedback in this thread to help me improve my communication skills.

 

I suppose that pixels in an image can illustrate the same concept in a different representation, the main differences being that pixels are not samples or "complete" representations of anything.  They're components that combine to form a static image just as linked dyes combine to form the color image on emulsion based film.  And there are many different kinds of pixels that vary in shape, size, ability to display color etc.

I don't like the motion picture analogy since there each frame contains, in isolation, readily identified representations of the objects in the scene. In a single sample from an audio recording of an orchestra, there is not a part for the violin, another for the oboe, etc. There is just a single number, the isum of the air pressures contributed by each instrument at one instant.

 

If we want to properly compare sound recording to motion pictures, we must expand our view a little. The scene we wish to record as a motion picture can be thought of as a cuboid, one dimension representing time and the other two the spatial extent of a 2D projection (the view through the camera). Restricting ourselves to the black-and-white case, each point within this cuboid has a scalar value representing the light intensity at the corresponding place and time. When we record this scene, we obtain a set of samples regularly distributed like atoms in a crystal lattice. The spacing between samples in the two spatial dimensions determines smallest details we can capture while their distance along the temporal dimension limits the rate of change representable. The familiar sampling theorem can be applied independently along each of the three dimensions.

 

Going back to audio, the equivalent geometric view of what a microphone captures is a simple line (of zero thickness) extending along the time dimension. Each point along the line has a value representing the air pressure at the microphone. By sampling this value at regular intervals, we obtain a representation of all variations up to a maximum frequency.

 

Taking the graphical analogy even further, DSD is quite similar to half-tone printing, but that's straying from the topic.

 

1 hour ago, bluesman said:

For Teresa et al:  there are similarities that may help you understand the subject that started this discussion.  Each pixel in an image can display multiple colors within a designated set, e.g. RGB or cyan-magenta-yellow-black (because not all pixels are functionally alike). An 8 bit color image can carry 2^8 colors - it's like having a box of 256 crayons that divide the entire visible color spectrum into 256 parts by frequency. No matter how many shades of color are in the source, the pixels in the screen will use the closest of the 256 colors it can display to each color in the source image.  If the exact shade of red is between two in the "crayon box", it will use the closest one.  A 10 bit image can display 1024 different colors, so it can render an image closer to the original in color composition. 

 

The accuracy of color rendition is somewhat analogous to the accuracy of voltage representation in a single sample of a digitized audio waveform, in that the exact value is limited to a given number of decimal places.  So it's "rounded" up or down to fit within the limits of that digital field.  The more bits available per sample, the more accurately the value can be recorded (i.e. the more significant digits it contains and the smaller the potential difference - no pun intended - between the actual value and its digital approximation).

Colour is a whole new level of complexity. As we know, light of any colour can be separated into its spectral components using a prism. Think of this as applying the Fourier transform to a sound, revealing all the constituent frequencies. Although the visible spectrum is continuous from red to violet, our eyes contain only three types of colour receptors most sensitive to red, green, and blue light. Pure yellow light excites both red and green receptors, and our brain interprets this as yellow. Now the brain has no way of knowing why the red and green receptors are both signalling. Thus, if we create light consisting of some pure red and some pure green, the eye reacts exactly the same way, and we perceive a yellow colour. For this reason, colour displays get away with having only red, green, and blue components in each pixel. The precise wavelengths of each together with their brightness determines the range of colours (often referred to as gamut) we can trick the brain into perceiving.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...