As of today, my immersive audio system is 99% selected, and much of it has arrived. In this update I'm focusing on digital to analog conversion and decoding immersive audio. There are two ways to accomplish the decoding and conversion, and they are quite different. Which way did I decide to go? Both. At least for now.
The immersive audio format with the largest marketshare for music is Dolby Atmos, by far. Apple Music, Tidal, and Amazon all stream Dolby Atmos music. Some Blu-ray Discs contain lossless Atmos as well. As I wrote last week, a format with a smaller presence overall but large presence with boutique HiFi labels, is Auro-3D. DTS:X is very similar to Dolby Atmos, but I've yet to see it grab any reasonable marketshare for music, where there isn't also an Atmos mix. DTS:X is used for many movies, but that isn't the focus of my immersive efforts. I'm a music listener who loves theater of the mind. Neither movies nor screens are allowed in my listening room.
Both Atmos and Auro-3D require decoding and conversion to analog for playback. An undecoded album would deliver traditional multichannel audio such as 5.1 and 7.1, without any height information. Decoding these formats is what provides music beyond 8 channels and expands the soundstage into height channels. This is how we get 7.1.4 and beyond Atmos, and 9.1 and beyond Auro-3D playback.
Decoding Atmos and Auro-3D can be done in either an A/V processor via HDMI or computer. I've decided to go both routes to start. How can one decide which is the best fit for a system, without hearing and using both? It would be very difficult, and far less fun. Along this journey, I plan to write extensively about the differences between the two methods, to arm members of our community with information that may help them on their own immersive audio journeys.
Using an A/V processor is the traditional method of decoding and converting the digital signal to analog audio. Getting content to the processor is done via Blu-ray Disc player, streaming device (AppleTV, Roku, etc...), or output from a computer. Pretty standard stuff, with the possible exception of using a computer. The computer in this scenario is used as a transport to output bit perfect audio for decoding by the processor. An app like Kodi or JRiver can be used for local content, as long as the file formats are supported by the app. A Mac can also be used with Apple Music, but a Windows PC can't because Apple Music doesn't support Atmos on Windows.
Either way, content must be output to the processor via HDMI. No exceptions, even for the processors that support AES67 audio over Ethernet. Those processors don't route audio into the decoder via anything other than HDMI.
The absolute best part of using an A/V processor for immersive audio is ease of use. There aren't many things easier than putting a Blu-ray Disc into a drive and tapping play. AppleTV into a processor is nearly as simple. Using a computer is a little more complex, but outputting bit perfect audio via HDMI is fairly straight forward.
In addition to the aforementioned playback simplicity, these processors have room correction / digital signal processing built-in. This can be both good and bad, but when it comes to ease of use, it's tough to beat. Room correction in A/V/ processors isn't created equal. Most are limited by the low power onboard chips, while some run a full blown computer onboard. This is the subject of at least one or two future articles because it's very important and the results vary widely.
Among other items that are a huge benefit for users of A/V processors is support. They are closed systems with fairly standard processes for resolving difficulties. Some even have fantastic support teams who can connect to one's system remotely to configure it or make adjustments. In other words, there's someone to call. This is vastly different from the usual online support many of us are used to these days.
The first A/V processor I will have in my immersive 7.1.4 audio system will be the Trinnov Altitude 32. I've done extensive research into processors and believe the Trinnov units are among the best on the market. An Altitude 16 would have also done the trick because I have 12 channels, but getting ahold of one right now is next to impossible. The AL16 handles up through 24/96 whereas the AL32 handles up through 24/192, but this isn't the type of specification on which I'd based an A/V processor decision. It's one data point that might make a difference. My Atmos content is all 24/48, and some Auro-3D content is 24/96. The big differentiator between the Altitudes is likely the number of supported channels.
The Altitude 32 should be to my place not long after I return from the Munich high end show. Once it's here, I'll work with Trinnov to fine tune the DSP for my room, then dig deep into the unit. I know several people with these processors and I hear nothing but great things about both the hardware and the people at Trinnov.
If A/V processors are this good, why would anyone go another route? Some of us just aren't satisfied until we've at least tried to squeeze every ounce of performance out of an audio system. We view possible limitations with A/V processors such as their onboard digital to analog conversion, limited DSP capabilities, and HDMI requirements as items to attempt to improve. This attempt requires a completely different route.
Every audiophile I know uses a standalone DAC to convert digital audio to analog audio. That doesn't make it right, or the only option, but it's the option that provides the best sonic results for music playback. Put another way, I don't know a single audiophile, myself included, who'd elect to route digital audio signals through an A/V processor for conversion to analog, if there was a option to use a dedicated DAC. Previously there wasn't an option for TrueHD Atmos and Auro-3D playback. Decoding chips inside A/V processors were the only game in town. Now, the game has changed.
As I wrote about previously, decoding Auro-3D, lossy Dolby Digital Plus, and lossless Dolby TrueHD Atmos on a computer are entirely possible. Apple Music streams DD+ Atmos and routes it through the built-in macOS Dolby decoder. The Dolby Reference Player decodes TrueHD Atmos for output to any DAC with enough channels, or even to WAV files for playback using Audirvana or JRiver. Once we have the decoded PCM stream, whether that's 11, 12, or 16 channels or more, we can do what we'd like with the audio. This opens up a world of options.
The first option is digital signal processing that's only limited by the power of today's fastest computers. Outputting a 12 channel 7.1.4 audio signal to an app like HQPlayer for room correction via convolution and upsamping to high rate DSD is possible. It's also possible to output to Mitch Barnett's forthcoming multichannel version of Hang Loose convolver for room correction that requires very little processing power, but like HQP, can take advantage of a crazy number of filter taps. I use 65,000 tap filters for room correction. This is well beyond the capabilities of A/V processors.
The digital to analog conversion stages in DACs are critically important and have a major effect on sound of music in one's system. Because I'm also using a computer to decode immersive audio, I've selected the new Merging Technologies HAPI Mk2 with two DA8P options cards as my DAC. I'll have 16 channels of D to A conversion using ESS ES9028PRO chips, supporting up through DSD256 and 32 bit / 384 kHz, and a "typical Dynamic Range of 125 dB (> 127 dB A-weighted) and THD+N in excess of - 116 dB (this latest number being actually close to the measurement limit of our Audio Precision equipment maxing at 118 dB THD+N)." This is the type of DAC many of us audiophiles are used to for two channel playback. It will be great to use such a high caliber unit for immersive audio as well.
Moving beyond the HDMI requirement of an A/V processor, I'll use the Ravenna protocol to send audio over my Ethernet network from a computer to the Merging DAC. Ravenna is as rock solid as it gets. When the 129 members of the Berliner Philharmoniker are playing live, there aren't any second chances. Same goes for live broadcasts. Ravenna blows UPnP/DLNA performance out of the water on all levels. It also enables me to place the computer anywhere on my network, rather than within reach of an HDMI cable.
Once the system is setup, I'll also have the iOS apps for JRiver and Audirvana at the ready, to browse my collection of immersive audio and control playback. No physical media required, once it's ripped to my NAS. Apple Music playback is a little trickier due to the way Apple only shows certain content on the phone app, but it's by no means a showstopper. There are solutions to little issues like this.
Decoding and Playback Wrap-up
In order to do my immersive system justice, and help educate audiophiles about immersive audio playback at the highest levels of performance, it's necessary for me to use both an A/V processor and computer based decoder. I'm sure I'll prefer one system over the other once I get using them both and listening. But, life isn't one size fits all. There will be many pros and cons to both systems. I can't wait to wear them out by running hours and hours of music through them and trying every configuration possible. Stay tuned for the fun stuff. I'm getting excited :~)
My Immersive System So Far
- Source: MacBook Pro and QNAP NAS
- Playback Software: Audirvana, JRiver Media Center, Dolby Reference Player, Auro-3D Plugin
- Digital Signal Processing: HQPlayer or Hang Loose Convolver
- Room Correction / Custom Convolution Filters: 65,000 tap convolution filter from Accurate Sound
- DAC: Merging Technologies HAPI Mk2 with two DA8P options cards
- A/V Processor: Trinnov Altitude 32
- Amplifiers: 5 x Mytek Brooklyn AMP+, 2 x Constellation Audio Mono 1.0 / Monoblock Power Amplifiers
- Loudspeakers Front: Wilson Audio Alexia Series 2
- Loudspeaker Center: Wilson Audio WATCH Convergent Synergy
- Loudspeaker Surround / Atmos: Wilson Audio Alida x8