Timbre definition from Wikipedia, “In music, timbre, also known as tone color, is the quality of a musical note or sound or tone that distinguishes different types of sound production, such as voices and musical instruments, string instruments, wind instruments, and percussion instruments. The physical characteristics of sound that determine the perception of timbre include spectrum and envelope. In psychoacoustics, timbre is also called tone quality and tone color.” I suggest reading the Wikipedia article as timbre contains both subjective and objective attributes, both of which are discussed in detail in this post.
From a sound reproduction perspective, if ones goal is to reproduce music as faithfully as possible, then timbre (and all of its subjective and objective attributes) is a significant factor. I consider room acoustics the worst offender for destroying timbre (i.e. tone quality). If you are into the scientific research, there are a number of references in the Audio Engineering Society's library, here are a couple AES E-Library » Natural Timbre in Room Correction Systems (Part II) and AES E-Library » The Influence of the Room and of Loudspeaker Position on the Timbre of Reproduced Sound in Domestic Rooms
Professional acousticians and audio engineers routinely take acoustic measurements as part of their everyday job. If you have been doing it full-time for a career, then you can read an acoustic measurement graph and hear the sound in your head. Same as how a musician can read notes off a music sheet and hum the tune (some with perfect pitch) in their head.
While every acoustic space is unique, there are a couple of basic tenets that hold true for small room acoustics, which the majority of our listening rooms fall under this classification. These tenets are controlling room resonances and overall room decay times (i.e. RT60). This is based on a large body of knowledge specifically on small room acoustics. Here is a quick overview with a few reference links.
Just like electronic engineers use circuit diagrams and part’s list (BOM) to communicate the designs and sonic signatures of audio amplifiers, acousticians use time, energy, and frequency information to communicate the sonic signature of an acoustic environment (i.e. both speakers and room). In this article, you will be able to correlate what you see with what you hear (literally) and vice versa.
The Design Process
I am going to analyze my listening room and come up with two designs to improve the timbre of the room acoustics, one passive, and the other active. Here are a couple of pics of my very live, untreated room.
I try and make my analysis balanced between 50% what I hear and 50% what I measure. Based on that analysis, I will design and implement passive acoustic treatments. Then take another set of measurements and binaural recording of the speaker/room combo with the acoustic treatments in place.
Next, Digital Room Correction (DRC). I will fine tune the frequency response using a “target” or “designed” frequency response to reproduce the best effort tonal balance and fine tune the impulse response, (i.e. timing) for the best possible timbre from the speaker/room combo.
Meaning achieving the best possible tone quality (i.e. timbre) is limited by the physical dimensions of the listening room. Given that we can digitally manipulate all three dimensions of sound (amplitude, frequency, and time), we can create any sonic signature we want, with the limitation being the physical dimensions of the room itself. Technically, this is called a transfer function. A transfer function at this level encompasses everything that makes up the sonic signature of the speaker/room combo.
Because of digital audio, we can design and implement our own transfer functions (i.e. sonic signatures) in software with distortion and noise levels far below what we can perceive and correction at a level of resolution far greater that our ears (read:brain) can discriminate.
Historically it was thought that we could only discriminate to 1/3 of an octave (hence the 1/3 octave analog equalizer). Later research has determined that we can discriminate somewhat closer to 1/6 of an octave. So when viewing acoustical frequency response graphs, 1/6 octave smoothing is the preferred resolution to view the graphs as that is the most accurate representation of how our ear hears (or more technically correct, how the brain interprets the electrical signals).
In the digital domain, we have digital filters that can have 65,535 “bands” (or more). Compared to a 31 band 1/3 octave analog equalizer... That's a revolution.
I chose a linear phase filter (as opposed to minimum phase) as this produces the best phase coherence and time alignment. Not only is the sound “time aligned”, but some early reflections are reduced so that the phase coherence holds together long enough to hear the depth on the recording before the 3D image is destroyed by comb filtering effects of the room. Comb filtering is the root of all evil for an audiophile.
Reducing early sound reflections, (and diffusing later reflections), is critical to the realistic reproduction of any stereo recording and achieving best possible timbre (i.e. tone quality). You want to hear enough of the recording long enough so that the phase coherence or sound stage is heard before the room takes over and interferes with comb filtering “location” cues that blurs the (depth of) image and colors the sound quality with the tone of the room.
But first, this is what we need to listen for. It is a bit of science, hopefully presented in a fun and easy to hear manner as it is important to understand what is happening and especially what it sounds like. We all listen to it, but can we hear it?
Pretty cool the Hass effect. “The Haas effect is a psychoacoustic effect, described in 1949 by Helmut Haas in his Ph.D. thesis. It is often equated with the underlying precedence effect (or law of the first wavefront).”
“Haas found that humans localize sound sources in the direction of the first arriving sound despite the presence of a single reflection from a different direction. A single auditory event is perceived. A reflection arriving later than 1 millisecond after the direct sound increases the perceived level and spaciousness (more precisely the perceived width of the sound source). A single reflection arriving within 5 to 30 milliseconds can be up to 10 dB louder (My note: that’s twice as loud!) than the direct sound without being perceived as a secondary auditory event (echo). This time span varies with the reflection level. If the direct sound is coming from the same direction the listener is facing, the reflection's direction has no significant effect on the results. A reflection with attenuated higher frequencies expands the time span that echo suppression is active. Increased room reverberation time also expands the time span of echo suppression.”
Key concept. It is amazing how a 5 millisecond delay can have that much width. The majority of rock and pop (and most mono multi-track) recordings use the Hass effect extensively, along with more digital delays, reverbs, stereo expanders, etc. If you listen to rock and pop, or any other mono recorded, multi-track recording, it is fake stereo. It's all an illusion and fools our brain every time (speaking as someone that spent over 10,000 hours in the recording/mixing chair doing exactly that). Personally, I don't care. When I crank up SRV's Tin Pan Alley (DR 15) on my rock and roll audiophile system and it feels like I am at Buddy Guy's Legends night club in Chicago, the illusion is complete for me.
A bit more physics, as this is directly related to speaker location and listening position. Sound travels roughly 1 foot per 1 millisecond. The wavelength of a 20 KHz frequency is 0.68 of an inch. If my stereo's equilateral triangle is out even by an inch, I will already have destroyed some of the high frequency image (especially depth of field), because the equilateral triangle is misaligned and I am creating comb filtering at high frequencies.
The learning from this is that time alignment of everything is critical, due to the Haas effect, and its role in reproducing proper timbre. The better aligned the equilateral triangle, the more phase coherent image can be reproduced, which is one of the key attributes of reproducing the most realistic timbre. Additionally, this is why early reflections need to be tamed, typically 15 dB below the direct signal, so we don’t get the Haas effect blurring the time alignment of the stereo image (especially depth).
My design approach to modern room tuning techniques includes using passive acoustic treatment to minimize room resonances, early reflections, and over all room decay time (RT60). I also use state of the art DRC software to trim the frequency response for best effort tonal balance, time align the signal so that the waveform (all frequencies) arrives at the same time in the listening area, and minimize early reflections to enhance the depth and overall phase coherence of the stereo image before comb filtering destroys the recorded illusion.
Acoustic Analysis and Design
Fellow CA readers, I am the recipient of the 2nd worst possible sounding room award, only beaten by a room shaped like a cube. This is because the length of the room is almost twice the width. Additionally, my stereo is set up off center in the room. So how do I know it is the 2nd worst possible sounding room? I am using Bob Gold’s room mode calculator that will produce a nice graphic display of the room modes given the dimensions of my room.
According to the calculator, my rooms Schroeder cutoff frequency is 92 Hz. This is my room’s fundamental transition frequency, below this frequency, the room behaves as a resonator, above, a diffuser/reflector. This transition point is far from smooth and resonates below the cutoff and rings (like a filter) above the cutoff. Just like blowing air across the mouth of a near empty coke bottle, every room resonates a tone that rides on top of all low frequency notes. Depending on how bad it is, like my room ratio for example, will produce what is sometimes called “one note” bass tone, meaning the rooms resonant frequencies are so dominant (i.e. too much amplitude) so all the bass notes (and sometimes drums) sound like just one note is playing. Also called “room boom”.
You too can work out your rooms resonant frequencies using this calculator. Here is a frequency response measurement of my room to see if it correlates with the model. Many thanks to JohnM for his most excellent REW measurement software.
This measurement correlates well with the model. Major peaks and valleys between 92 Hz to 300 Hz. That’s the ultimate challenge isn’t it, 2nd worst possible sounding room from an acoustic perspective. If I can make this room sound good… Note the blue horizontal line is mine to help delineate the problem areas. The circled mid-range area also represents a problem area. Initially looks like too much amplitude, but the real culprit for the raised amplitude is mid-range room reverb build up. We need to look at another view to see it.
This brings up a story I feel is worth sharing so you can understand where I am coming from on this. As mentioned elsewhere on my blog, I had the good fortune to have been a live sound, recording/mixing engineer for 10 years. SQ was of major importance to me and I worked extra hours to ensure the artist/group got the best possible sound I could come up with. I worked in a several state of the art acoustic spaces, with this one below sounding so good that I gave up on my home system.
The studio control room facilities I worked in were designed from the ground up acoustically to be state of the art. The rooms sounded incredible. Perfect neutral timbre. If you ever get a chance to visit a properly designed studio control room and listen to some music... I got so used to state of the art sound, that no matter what I did in my home stereo it paled in comparison to the sound of the state of the art control rooms. And I am not talking about the gear.
The biggest difference between working in the control room and listening at home was the timbre (i.e. tone quality) of the rooms. The studio control room is designed so that the engineer sitting behind the console would hear the sound of the music picked up by the mic and room of the studio before the sound of the control room could be heard. Also known as a reflection free zone (RFZ). RFZ is control room design based on knowledge of the Haas effect.
That meant obtaining a reflection free zone at the mix position and ensuring that any room timbre (i.e. tone quality and all of its subjective and objective attributes) was as neutral sounding as possible. I.e. no coke bottle resonance effects, no boxiness, etc. If you saw the blueprints for one of these control rooms, you would see no surface is parallel and are designed to ensure early reflections did not enter the RFZ and later reflections were thoroughly diffused so any room sound was perceived as a neutral sounding extension that made the room sound a bit bigger than it really was. A very neat psychoacoustic trick.
As mentioned, the point was to hear the direct sound from the mic in the studio, plus the early reflections (i.e. tonal colorations) before you could hear the sound (i.e. timbre) of the control room. That way, when you were placing mics and eq'ing, you were not making decisions based on a hearing the tonal colorations of the control room, mixed in with the sound from the studio.
When I compared the acoustics of my home listening space versus the state of the art control room I was working in +8 hours a day, the timbre gap was so great, I gave up on a traditional speaker setup at home. Mostly I listened to headphones. Sometimes, I invited the boys over to the studio when it wasn't busy and we would listen to tunes there.
While looking at some programming sites, I came across a few Digital Signal Processing (DSP) articles. One of them was showing how you can use a well-known DSP technique, called convolution where you can digitally mix (i.e. real-time convolve) the “bit-perfect” music signal with a digital filter (both in the frequency and time domain) that was the inverse (well, they really are algorithms) of the measured room response. Convolution is a transfer function.
JRiver MC has a state of the art convolution engine to host these designed digital filters. What can be done in software far exceeds what can be done in hardware and analog domain. Every modern consumer and pro A/D D/A is performing DSP on the audio signal with digital filters (in conjunction with analog filters) already. “The precision offered by Media Center's 64bit audio engine is billions of times greater than the best hardware can utilize. In other words, it is bit-perfect on all known hardware”
A bit more searching and I found a few DRC software products that used this filter design for audio. One is called Audiolense. I downloaded the demo and ran it on my crappy Logitech G51 computer speakers. If it can make those sound good… As soon as I heard it, I knew that someone (Bernt!) had figured this out in the digital domain, which is a revolution compared to what we can do in hardware/analog audio. This is what I was waiting for.
For me, it is a new ball game and gave me the opportunity to get back into listening to music the way I heard it in those acoustically (near) perfect rooms, or at least come a lot closer than ever before.
Back to the passive acoustic filter design. The first thing I need are bass traps that have good absorption capabilities from 92 to 300 Hz. When I was in the pro audio industry, I used ASC Tube Traps (and RPG products) extensively with good success. Unfortunately, I don’t have budgets like that anymore, but I think I have found a reasonably priced bass trap that should do the job.
It is a corner trap, and should go directly behind the speakers in the corners. Because of my room’s offset, the best I can do is directly behind the speakers in a sorta corner. The idea here is twofold; one is to dampen the low end sound coming off the back of the speaker cabinet so the refection off the wall and back to the listening position is minimized. This would correspond to about 4 or 5 milliseconds delay. Remember the Haas effect video on what 5 milliseconds delay sounded like? That’s roughly 5 feet of distance, and in this case, after the main sound wave arrives, a secondary wave arrives off the wall from behind the speakers and confuses my brain on location. In this case, destroys the image from front to back. Depth of field, due to early reflections (and comb filtering) is the first thing to go. It is the green circled portion in the graph below.
With the bass traps in place, it should help dampen those resonances/ringing from 92 to 300 Hz, plus dampen the impact off the back of the speaker. This should result in a tighter (i.e. more transient) bass sound with minimal 5 millisecond later reflection so it does not blur the (depth of the) image. This is captured on the binaural recording. We can also measure this with an Energy Time Curve (ETC).
Technically, we can measure the room’s early reflections with an ETC, typically from 0 to 40 or 50 milliseconds. That’s 40 to 50 feet of travel after the direct sound arrives at the listening position. That way we can inspect anywhere along the time curve and with the wavelength calculator, turn that into distance. This allows us to figure out where the early reflections are coming from and to either dampen or diffuse accordingly.
Looking at the spikes on the graph and corresponding millisecond time reading, can be translated into feet using the wavelength calculator. Then measuring from the mic position to the point of reflection to identify where passive acoustic treatments should go.
And it is mostly the same type of acoustical treatments, one to tame the room’s resonant/ringing frequencies with bass trapping in corners. Next is diffusion or absorption of the early reflections off the floor, ceiling, and side walls. Of course, the back wall and front wall (with the windows). The windows may benefit from heavy velour curtains. Ideally, the speakers would be mounted in soffits, like in recording studio control rooms, but it’s just my living room, so it’s a design tradeoff (ha ha).
Pretty easy to correlate as one can take a tape measure, or string, or a laser distance measure, measuring from the mic, with a mirror to find the reflection points and correlate to the ETC by using the wavelength calculator.
This is an ETC measurement of my untreated room. I can label the reflections based on translating to a physical measure in the room. As it stands it is not too bad as the rule of thumb is that all early reflections should be 15 dB or more down from the main signal amplitude. I am almost there. This is simply by virtue that my listening position is as far away from any reflecting surfaces as possible, given the constraints of my room.
Check out this waterfall graph showing at which range of frequencies are producing the long decay times. This means my room is very lively as the carpet is indeed the only real absorbent material in a room that is otherwise all drywall, glass, tile, and hardwood (on top of being the 2nd worst room ratio).
What you are seeing here is sound measured in 3 dimensions, vertical scale is level or amplitude in decibels, the horizontals scale is frequency in hertz and the z scale is time in milliseconds. In my case, the time scale is from 0 to 300 milliseconds, meaning the sound has travelled roughly 300 feet (10x the length of the room and 20x the width of the room) in the room when the microphone measured 300 milliseconds after the direct sound, so that we get the sound of the room and it’s decay and display in a visual 3D graph.
I have circled the two problem areas. The one on the left is showing the room resonances with peaks and valleys, that I identified earlier. The one in the lower middle is showing the long midrange decays times, which build up more than other frequencies and caused me to incorrectly compensate by lowering the DRC "target" frequency response by -3 dB at 2 KHz. More on that later.
Let’s look at shape of the decay over time. There are ITU, IEC, ISO, BBC, and other standards bodies specification of the reverb time (spec’d as RT60) or more properly, early decay time, for critical listening environments of a minimum volume of 2500 cubic feet. The specification or preferred range is from .4 to .6 seconds decay across the frequency band, with some rise in the bottom end allowed. That’s 400 to 600 milliseconds max.
I am definitely over the .6 second mark in the midrange as circled in the graph (turns out to be .7 seconds). In this case, some broadband absorbers with good absorption in the midrange will be called for in this design. These should be mounted at the first reflection point on the ceiling and the rear wall to not only reduce early reflections, but further dampen the “brightness” and “boxiness” sound of the untreated room. If my room happened to be the opposite, i.e. dead sounding, then I would put diffuser panels on the ceiling and rear wall instead, with that .4 to .6 second decay as the target RT60.
That’s the analysis of my room acoustics and some basic acoustic design, not only based on measurements, but extended hours listening for early reflections, room modes, and midrange comb filtering. My design is to dampen the back of that pounding 15” woofer and the room modes at the cutoff frequency and harmonic ringing. In addition, absorb broadband midrange due to bare walls, plus take care of the early reflections (floor has carpet, the ceiling gets the absorber) to get rid of that “boxy” sound. We will see if it is enough or not. As a last resort, I can hang heavy (velour) curtains over the front windows plus a good portion of the wall.
Adding Passive Acoustical Room Treatments
Every listening room has a fundamental resonant frequency (plus harmonics) that will need some taming. It is simply a function of the physical dimensions of the room. Depending on how “live” or “dead” sounding the room is will determine the number of diffusers and/or absorbers for any particular sound environment to achieve the recommended RT60 decay time. The ideal design is to have all sound at all frequencies decay at the same rate and meet halfway between the RT60 specification of .4 to .6 seconds.
Every critical listening environment could benefit from this basic passive acoustic filter design pattern. A more encompassing design pattern looks may look like this:
I have used this design pattern (and portions thereof) extensively and successfully when I was in the pro audio business
Here are a few measurements to see how the passive acoustic treatments helped out the acoustics, even though I can hear the difference just standing in the room. These overlays are to compare before and after acoustic treatments. I have zoomed in the vertical scale to 2 dB per division to show detail, which exaggerates the "un-smoothness" of the frequency response.
The acoustic treatments are able to significantly dampen the circled areas almost by 5 dB at 200 Hz and 3 dB through the midrange, which is reducing the room power by half. Said another way, the passive acoustic treatments reduce the room gain by half in the identified problem areas. That's significant.
The early reflection in the 4 to 5 millisecond range has been reduced considerably as a result of the bass traps placed behind the speakers and reducing the reflection off the wall behind the speakers. This is key to the kick drum having definition and hearing all the bass notes from the bass guitar at equal loudness, both in the frequency and time domain.
Compare the two 3D waterfall graphs above, the first one before treatments and the latter after. The mid-range decay times (the boxiness sound) have been reduced as circled. Also note, the 200 Hz peak and decay has also been reduced 5 dB. I was going to screen cast switching between the two graphs so you could get a real good sense of the passive acoustic filters at work as it is much more than just the circled points, the overall sound is further diffused.
Because of the passive acoustic treatments, my room's RT60 is now within the .4 to .6 second specification across the frequency range. If I was to add any more absorbent material, I might add a couple more ceiling absorbers right over the listening position to reduce the comb filtering effects of the couch, or adding heavy velour drapes to the windows in the front of the room.
Based on my listening tests, the bottom end and midrange are much tighter defined, as is the overall stereo 3D image. An overall improvement in frequency response smoothness, with tighter definition or imaging or timing. Sounds more focused. It is easier to hear the tone quality change towards the end. It seems I am right in there for the proper decay time.
The sonic improvements that I hear line right up with what I measure and vice versa. So from a timbre perspective, I am pretty happy with the end result.
Analysis and Design Part 2
After living with the acoustic treatments for a week and listening everyday, have made a major improvement in tone quality. Dampening the “one note” bass room mode and dampening the “boxiness” comb filtering in the mids. The decay time is within specification as evidenced by both the measurements and binaural recordings that you can hear the timbre (or tone quality) improvement yourself.
What further improvements can I make to the speaker/room interface? How can I further improve the timbre? There seems to be more room to improve, especially given the frequency response still deviates quite a few (14) dB, when I should be in the +- 3 dB range across the frequency band. Even then, 1 dB either way is audible. How do I further smoothen the frequency response?
Also, what about phase coherence and timing at the listening position? Can I improve that? I remember owning Thiel CS 3.6 time aligned speakers in the consumer world and when I was recording/mixing, I was using the Urei 813C time aligns. I can hear time alignment, and I can measure it. So how do I improve the time alignment (as my speakers don’t have that feature built-in – many don’t as it is hard to do - meaning expensive) plus how do I further minimize early reflections to get the best image possible at the listening position?
Basically I need both frequency and time alignment capabilities. Just like every piece of audio gear has a sonic signature, the revolution that is digital audio, provides a facility to correct the sound in the digital domain at high resolution (64 bit data path) and low distortion. Given the computing power and sophisticated DSP software we have today, there is a classification of software that is called Digital Room Correction (DRC) software.
Therefore, I can easily create any sonic signature I want since I have more control over the frequency and time domain than my ears can discriminate using software like this. With the software, you can use default digital filters, or using a Designer, create your own. This is designing the transfer function for the speaker/room combo. In this case, a linear phase FIR filter.
Digital Room Correction
How do we do this? We design the digital filter using a “target” frequency response, one that we design in software. If time domain correction is enabled, which it is in my case, then the impulse values change with target frequency response. The best impulse response can be achieved by matching the target's high frequency roll-off, with the natural roll-off of the tweeters filtered frequency response.
For me, this tunes the filter to yield the best possible timbre for the speaker/room combo. When this is tuned properly, the timbre tunes in like a guitar string being brought into tune. I have guitars, mics, A/D converter, so I can compare live and recorded timbre of the guitars, plus shakers, tambourines, triangles, etc.
Here is an example of a "designed target" frequency response using Audiolense. I draw or enter in the data points of the frequency response curve I want (red dots).
Once I treated my room acoustically, I no longer needed to drop the target frequency response by - 3 dB at 2 KHz. That was a learning for me. Actually, a re-learning for me as I remember reading this in Don Davis excellent book on Sound System Engineering "You can't effectively (digital) eq a reverberant field".
Here is another view of the target plus the uncorrected frequency response of my speakers in the main form view of Audiolense. Note how the targets frequency extremes match the speakers natural roll-offs at the extremes.
I snuck in a little bottom end lift on the target, but given the Klipsch QB3 alignment of the ported bass bin, I can tuck in little more low end and still have the bass sound tight and not over tax the amplifiers.
Now I can have Audiolense generate the digital FIR filter (which is almost an inverse of the uncorrected response, I say almost because there are other algorithms at play here):
Here is the resultant corrected frequency response:
The uncorrected frequency response is on top and the corrected frequency response is on the bottom (along with the target). In addition to the acoustical treatments, and short of building a state of the art critical listening room from the ground up, I know of no other way to achieve this level of timbre correction, given my awful room ratio.
Let’s look at a few measurements.
Frequency response. I have zoomed way in on the amplitude scale again to show detail. The DRC is able to correct for a 14 dB swing and reduce it to +- 2 dB deviation. The spectral response is similar to preferred spectral responses as described in B&K's paper (Figure 5) and Dr. Sean Olive's paper (slide 24).
ETC looks to be in spec as almost all early reflections are – 15 dB below the main signal arrival. The early reflection of around 2 milliseconds is the first reflection off the floor to the listening position. Other than mounting the speakers in soffits, not much can be done there. The good news is on how diffuse the later reflections are. Which means the room adds little tone color to the reproduced music through the speaker/room combo.
The blue waterfall graph is as good as it is going to get given my room ratio. I can play with the decay time of the 50 to 60 Hz wave by adjusting a parameter in the time domain window in Audiolense’s Correction Procedure Designer as a next step to tune this back a bit, but I don’t notice it too much in the sound.
I had a lot of fun doing this. The timbre changes between the untreated room, treated room, and with DRC, are definitely audible.
If achieving the best possible timbre from your audiophile system is of interest to you, then this article and my previous article, Speaker to Room Calibration Walkthrough, may be of some use.