Jump to content
  • bluesman
    bluesman

    The Value Proposition In Audio: Voice Control For Audiophiles (You Can't Buy Too Soon - You Can Only Sell Too Late)

     

     

    THE VALUE PROPOSITION IN AUDIO:  VOICE CONTROL FOR AUDIOPHILES

    (YOU CAN’T BUY TOO SOON – YOU CAN ONLY SELL TOO LATE)

     

     

    image1.jpg

     

     

     

    I‘M AN OLD DOG – I CAN SPOT AN OLD TRICK A MILE AWAY

     

    For me to write an intelligent and useful article on the subject, I thought it was important to understand the AS community’s thoughts, attitudes, wants, needs, and turn-offs about voice control.  Many of you read (and some responded to) my General Forum post seeking answers to a 5 question poll about voice control for audiophiles.  Well……….I got what I wanted, although it took me a while to reconcile what I learned with what I (and Chris, since he asked specifically for this article) believed.  I actually had to rewrite a large portion of this to put it all in context for me, as well as for you.  

     

    About half of all AS responders said that they have no interest at all in voice control for their audio systems.  

     

    My initial response was pure shock.  But within minutes, I found the perspective from which to make sense of this.  And from that perspective, I’ll draw some conclusions that have immediate importance to the industry and longer term impact on the audiophile community.

     

    SURVEY SAYS……...

     

    TOPIC 1: LEVEL OF INTEREST

     

    orange.png  I definitely want voice control for at least one of my audio systems.

    yellow.png  I'd only consider voice control for my audio system(s) if it does everything I now do manually.

    green.png  I'd use voice control for my audio system(s) if it controls simple basics like program & volume.

    burgundy.png  I have no interest in voice control for my audio system(s).

     

     

    Topic 1.png

     

     

     

     

    TOPIC 2: KNOWLEDGE & EXPERIENCE WITH VC

     

    orange.png  I own and regularly use voice recognition / control / activation in at least one device.

    yellow.png  I own devices with embedded voice recognition / control / activation but rarely use the feature.

    green.png  I've tried voice recognition/control/activation in the past, but it didn't work well enough for me.

    burgundy.png  I've never used voice recognition / control / activation.

     

     

    Topic 2.png

     

     

     

     

     

    TOPIC 3: HOW CURRENT IS YOUR KNOWLEDGE AND EXPERIENCE WITH VC

     

    orange.png  I use current voice actuated software in at least one device.

    yellow.png  I've only used older apps/devices, eg early smart home devices, Dragon Naturally Speaking, etc.

    green.png  I've never owned or regularly used any voice actuated devices or software.

     

     

    Topic 3.png

     

     

     

     

     

    TOPIC 4: IF YOU’D CONSIDER VOICE CONTROL, HOW WOULD YOU PREFER TO USE IT?

     

    orange.png  In a separate control device, e.g. hand held remote control

    yellow.png  As an app for my mobile device(s)

    green.png  As software embedded in players, control points etc

    burgundy.png  Embedded with other front end controls in my electronics (preamp, integrated amp, DAC etc)

    lightblue.png  Embedded in smart speakers

     

     

    Topic 4.png

     

     

     

     

    TOPIC 5:  DEAL BREAKERS FOR ME

     

    orange.png  It has to be a comprehensive control center, 99+% accurate, and consistently reliable.

    yellow.png  It has to come preinstalled in hardware and/or software.

    green.png  It has to be easily upgradeable, regardless of the form or device in which it comes to me.

    burgundy.png  Others in my home or family have to be able to use it with little or no instruction.

     

     

    Topic 5.png

     

     

     

     

    EXECUTIVE SUMMARY AND A PRÉCIS OF THIS ARTICLE’S ORGANIZATION

     

    For me, the easiest way to learn about and understand the current spectrum of voice control for audiophiles is to start by categorizing what’s now available, from simple to complex, before moving to what’s coming and what’s possible beyond that.  The simplest place to start is with those virtual assistants already available and in use by at least half of us for something (based on the poll I ran a few weeks ago).  Then we’ll move on to currently available audio devices with voice recognition and control built in, before advancing to the standalone voice control apps out there for adaptation to audiophile use. For a peek into the future, check out josh – there’s some serious effort in high end VC.

     

    Amazingly enough, there are well over 20 virtual / digital assistants out there today, some of which are welcome in my home and happily living there today.  As with any other stranger in our midst, security is critical for self-protection and privacy – but this is not as big a problem as some would have you believe (more later in the text).  For starters, here’s a table of the top virtual assistants in use and under development today, with a summary of what they can do right now:

     

    executive summary.jpg

     

     

     

    Several of these are already in widespread active use, but ongoing development is actively pushing the envelope around their utility, practicality, and potential.  The combination of voice recognition / synthesis / understanding with artificial intelligence may well be the most exciting direction for consumer products since the industrial revolution.  The above table documents the tip of the iceberg of startups joining with long standing research and development programs of the world’s leading enterprises.  Next we’ll discuss how the above systems can be acquired, adopted, or otherwise brought into our lives today, before an overview of what an audiophile can do with them and why we should all become more familiar with voice-based AI while it’s still simple to grasp and adopt.

     

    The world of virtual assistants (along with the number of them) is growing rapidly, and those provided by the major houses (Amazon, Apple, Google, and Microsoft) already offer basic control over music playback.  BY themselves, most offer almost no direct control over audio devices other than those in which they’re embedded.  They give you access to music on their corporate servers if you subscribe to one of their services, but they offer limited management of your own files through front end controls, e.g. volume, source material, play mode.  You can stream from the usual web sources if you have an account, e.g. Tidal, Spotify, Amazon Music, Apple Music etc.

     

     

    SOME DECENT AUDIO WITH VOICE CONTROL THAT YOU CAN USE RIGHT NOW

     

    For the impatient, here’s a summary table to point you toward goal directed devices and systems you can buy and use right now.  At present, I know of no voice control system, AI application, and/or dedicated device that will (or can be adapted to) replace all the knobs, switches and icons we now use to control our audio systems.  Voice controlled power switching is easy to achieve, and there are decent receivers on the market today that have some level of voice control built in with one of the name “digital assistants”.  But comprehensive voice control of an existing audiophile system is simply not yet available.

     

    There are a few immediately available options for the adventurous that will let you experience voice control over your audio sources.  For example, Braina is a VC/AI app and system for PC that can control the VLC media player right now and will give you program and playback control over several streaming sources with a little setup.  It’s free, and it truly defines the term “work in progress” – it’s crude, rough and inconsistent.  But it does enough well enough to let you feel how powerful VC will be when ready for prime time.  Be aware that Braina sometimes does some weird things – just laugh!

     

    The most practical voice controlled options for audio right now are in the smart device category, most of which are powered by Alexa, Google Assistant, or Siri.  Google devices will play 24/96 FLACs, and their high end devices have embedded Chromecast so they can be used as DLNA zones in Jriver. If you want to have voice control over JRiver playing through a Google device as a zone, you’ll have to use Alexa. If she’s not sharing a device with the GA, you can link her from an Amazon device or a third party host using an app like Helea Smart.

     

    The entry level devices are all small, relatively low powered, and a bit below the range of SQ that can be considered good by any stretch of the imagination.  But even the better low end pieces (e.g. the latest Echo Dot) are impressive in stereo pairs with a bit of EQ applied through the Alexa app.  And the high end from Amazon, Google, and Apple contain some decent active speaker systems with impressive parts, useful DSP, and sound quality that’s good enough to keep many audiophiles happy for noncritical listening (especially to streamed mid-res files of music you want to audition before buying, or can’t buy at all).

     

    You can buy multiple Echo devices right now, set them up on your WLAN, and immediately stream standard res music through them all by telling Alexa what you want to hear and from which speaker(s).  The same applies to Google devices, except for your choice of streaming sources – there are several alternatives through the Google Assistant, including YouTubeMusic, Spotify, and a host of others. You can also do this just as easily with the Apple units but you can only play music from Apple Music or iTunes (if you still have that set up on a networked device).

     

    You can read more about all this further on.  But for the impatient, you can immediately put a voice controlled system together quickly and easily, synch many stereo pairs in many rooms without audible delay or other ill effect, and enjoy casual listening that’s not too shabby.   Here’s a quick guide to getting started with VC in several easy ways:

     

    devices.jpg

     

     

     

    image15.jpeg

     

     

    Some things of which you should be aware (details later in the text):

     

    • You cannot control a serious audio system today using only your voice 
      • You can control selected functions of some equipment right now
      • You can use development platforms to create new functions if you’re up to it
      • You probably won’t be waiting long for voice control to advance enough to please you
    • You can use your voice to select & play music right now with comprehensive control
      • You’ll have to do it with currently available smart speakers and a streaming source
      • Alexa, Siri, Google Assistant et al will select your choice if it’s available in the repository used by your choice of assistant (e.g. Amazon Music for Alexa) and control basic playback functions
      • It’s easy and works great 
      • With a little effort, e.g. activating a skill for Alexa, you can control programs like JRiver Media Center with your voice and play any file or source accessible to you thorugh it
        • You cannot integrate JRMC control with other audio functions, e.g. synch playback among multiple devices with JRMC
    • There are many decent smart speakers from which to choose, but so far none reaches the performance level of a true audiophile product (except perhaps a very few like the $3000 B&O A9)
      • These are appliances, not audio equipment – reasonable expectations avoid disappointment
        • Even current entry level smart speakers like the Echo Dot and the little Google thingies sound half decent.  A pair of $20 Echo Dots (3rd gen, now on sale cheap because the 4th gen just came out) will play most music well and loudly enough not to offend
          • Even entry level smart devices from Amazon, Google, and Apple can now be set up as stereo pairs
        • The latest speakers from Amazon, Apple, etc are surprisingly good and easy to use
          • Technophobes and non-audiophiles in your life will love you for getting them a pair if they have any desire at all to listen to music but can’t / won’t use your system
          • If you’re already a subscriber (e.g. Amazon Prime, Apple Music), you only have to ask for the music you want and  (if it’s available from your plan) and it’ll play 
          • You can ask by composer, performer, genre, track or album title, etc – it’s easy
      • No matter which smart devices you buy, the assistant within is identical for all.
        • The $3000 B&O may sound much better than the $29 Google Nest.  But the Google Assistant within them is one and the same.
    • There’s a lot of very good HT audio equipment with serious voice control embedded or through one (or more) of the major assistants
    • There’s a fledgling Internet of Audio Things (IoAuT) that will probably give birth to a new generation of voice controlled audiophile equipment very soon
    • There’s a young industry combining voice recognition and control with artificial intelligence that has the potential to replace your knobs and buttons with your voice 
      • You can play with this today by downloading one of the development platforms and setting it up on your computer 
        • For example, check out Braina for a peek into the gestation of trhe revolution
          • This AI/VC software can control VLC player on your PC out of the box
          • Yes, it’s crude and no, it’s not even close to being sufficiently accurate and consistent for routine use

    But……..the combo of AI and VC is exciting and promising – don’t sell it short!

     

     

    WHAT GOES AROUND COMES AROUND

     

    ...and that’s the real message in this work.  Think of all the things that were soundly dismissed when introduced but are now taken for granted – things we can’t imagine living without.  Among the many staples of modern life are initial failures like Nintendo, Wheaties, and Dyson vacuum cleaners.  Apple was a hair’s breadth from bankruptcy in 1997.  Walt Disney’s first animation studio went bankrupt within 2 years.  Milton Hershey couldn’t give away his candy when he started his first two companies – they both failed.  Only on the third try did the Hershey Bar make it out of the starting gate.

     

    Dr Seuss’s first book was rejected by 27 publishers before finding one willing to take a chance on it.  Van Gogh sold one painting in his lifetime.  When Rovio started offering mobile games in 2003, they couldn’t generate enough interest to raise an eyebrow let alone an investor.  Fast forward 6 years to their introduction of Angry Birds, by which time there was enough demand to yank them off the starting block and into the winner’s circle.

     

    How could such great ideas come so close to failure?  For many, it was a technical hurdle that hadn’t yet been overcome.  Three critical breakthroughs that pushed mobile device games over the hump at the dawn of the 21st century were the Wireless Application Protocol (WAP) that enabled mobile phones to connect to the internet, mobile Java (2002), and color phone displays (2003).  Before this, mobile gaming was crude and slow even on devices at the state of the art.  Until a physical product and the software that controls it reach maturity and can fulfill a serious chunk of the product’s promise, it’ll only sell to the early adapters.  For others, it’s just a matter of preconceived notions and fickle consumers.  Many products and concepts that eventually succeed start off with nothing but the lonely enthusiasm of their inventors behind them amid a loud chorus of “Why would anybody want that?” - until someone who matters gives it a try and discovers greatness within it.

     

    The best advice I got in business school was to underpromise and overdeliver.  The inability to subdue enthusiasm often results in the introduction of a new concept or product as a done deal when much of the inventor’s vision and promise has yet to be realized.  It only takes one big disappointment to sour potential buyers on a second chance.  Many of us started early and learned quickly that voice recognition software was far from user friendly at its inception.  I was an early user of Naturally Speaking (1998), adopting it only when it improved from the initial product’s requirement that you enunciated each and every word separately (a limitation with which I couldn’t live).  I used it to dictate office notes after patient visits, and it was about as accurate as my transcription service had been, and transcription inaccuracy was the reason I went to voice recognition software.  But I had to proof every page just as I did for transcribed correspondence.

     

    Even today, VR software is less accurate than I’d like.  We have voice control in our Xfinity cable box remotes that works pretty well for me.  I get about 90+% of what I ask for, but my wife gets either the wrong channel or a prompt to repeat her request at least 25% of the time.  It’s just not as good as it needs to be.  On the other hand, Alexa turns on my espresso machine and audio systems, plays music from my collection over Amazon devices, controls JRiver on my computers, sets wake-up alarms, reminds us of tasks and appointments, tells us the weather forecast and current temperature outside, and does it all with 95-98% accuracy.   

     

    So the promise of voice control is great for audiophiles, and I believe it will gain wide acceptance as it approaches 99+% accuracy and shows up in more devices and programs. Based on progress to date, I expect VC with AI to reach the level of function, reliability and ease that we all expect from our audio equipment.  I like it a lot already and look forward to the future.  Let’s get to it!

     

     

    THE ELEPHANT IN THE ROOM:  ACCURACY

     

    There are a few well done tests out there comparing the accuracy of both recognition and response among Alexa, the Google Assistant, and Siri.  This one from Loup Ventures is a good example and very interesting.  The results are helpful in determining how accurate and useful each can be for audiophile use right now.  Each was asked 800 questions.  Google won with 100% understanding and 93% correct answers.  Siri was 2nd (99.8% / 83%) and Alexa  was 3rd (99.9% / 79.8%).  The same study was done a year before, and the sequential results show some limits and some progress: the order was the same a year ago but the results were 86%, 79%, and 61% correct responses.  

     

    Google seems to have nailed it in the second round with a 16% improvement that put it within the confidence interval of perfection.  But Alexa could only do the right thing about 80% of the time even after a 33% improvement, and Siri only beat out Alexa by 3% after a year of development that resulted in a fairly weak 5% increase in answering correctly.

     

    At present, the Google Assistant is the smartest, with Siri & Alexa far enough behind to matter.

     

     

    FROM REX TO ALEXA – A BRIEF HISTORY OF VOICE RECOGNITION / CONTROL

     

    The first voice activated device to be historically documented is Radio Rex from 1911.

     

    image16.jpegRex would (sometimes) leap out of his dog house when called by name.  He was held in place by an electromagnet whose energizing circuit was tuned to a resonance of about 500 Hz.  When the right voices said “Rex” loudly enough  (or any other sound source with enough energy in the 500 Hz range went off), the spectral content in that range would somehow interrupt the power to the magnet.  When the magnet’s power is interrupted, Rex is pushed out of his house by a spring.  I can’t find out how this works, but it suggests an early “Clapper”.  And, like the Clapper, it’s activated by sound and not specifically a voice - it reacts to any sounds within its sensitivity range. As both a pet and a device, Rex was inconsistent and unreliable. But he was the first – and his weaknesses set the bar much higher for market acceptance of subsequent voice activated devices.  This is a lesson we’re learning again!

     

    The next voice activated toys to hit the market with any success at all were Jill and Julie (late 1950s).

     

    image17.jpegimage18.jpeg

    These lovely ladies came from TI and had both speech recognition and voices of their own.  Julie (right) was about 3 feet tall and responded to words like pretend, hungry, yes and no.  Sadly, neither would put your vinyl on the turntable or dial up an FM station, but it’s not a stretch to call them the founding mothers of Alexa, Siri et al.

     

     

     

     

    FROM TOY TO TEMPTRESS

     

    The lovely Audrey was born in 1952 to her proud Bell Labs parents (whose names were Davis, Biddulph, and Balashek for those who care).  She was far from the fastest chip on the board, but she had a great personality!  Audrey was the sultry seductress who spawned generations of progressively smarter progeny, some of whom live and work among us today.  We have Audrey to thank for our friends Alexa, Bixby, Siri, Cortana, and the poor little Google Assistant who never got a name.

     

    Audrey was the first documented system that could recognize human speech.  She was a bit of an idiot savant – her only skill was the ability to recognize spoken digits with 97-99% accuracy if spoken to her by a voice on which she’d been trained.  She could have been useful in telecommunications, e.g. as a voice activated interface for long distance dialing or in a high end telephone (one of the reasons for her creation).  But she died alone because she was a high maintenance woman and a very expensive date.

     

    Audrey filled a full height 19” rack and sucked power like mint juleps at the Kentucky Derby.  She was a little slow as a child, and she never realized her potential.  Even rotary dialing was as fast as Audrey, and she was absolutely no match for the touch tone system invented when she was only a toddler.  By 1958, touch tone phones were in active development and Audrey was obsolete.  Then John Karlin (a psychologist at Bell Labs) drove the last nail into her coffin when he invented the keypad we now use for telephony and a million other things.

     

    When Audrey was 10, IBM debuted the Shoebox at the Seattle World’s Fair.  This device could recognize 16 English words and the numbers 0 to 9.  But it wasn’t until the early 1970s that the next generation of voice recognition technology was born.  DARPA funded research at Carnegie Mellon that bore fruit in the form of a device called Harpy, which (who?) had could recognize the vocabulary of the average 3 year old.  Harpy proved that there was a “finite state-network of possible sentences” that was the key to better identification and accuracy.  At the same time, Bell Labs made advances that enabled software to recognize and interpret multiple voices.  They threw early AI into the mix and created the foundations of today’s voice recognition and activation software – and the race is on!

     

     

    LET’S LOOK BEHIND DOOR #1 INTO THE ASSISTANTS’ LOUNGE 

     

    image20.jpeg image19.jpeg image21.jpeg  

     

     

    The first route to voice control of music playback is through the “digital assistants” already living in your smart home devices.  There are only 3 teams in the major leagues right now: Amazon, Apple, and Google.  Samsung’s a comer, and they’re pursuing the market hard - Bixby has about 2 years under his belt, but he’s still a long way from the playoffs for audio.  There’s now a Samsung Bixby Marketplace from which to explore and download Bixby Capsules, which are Samsung’s equivalent to Alexa’s Skills.

     

    Microsoft has Cortana, but she’s never been much of a help around the house (maybe because MS never put much effort into her development).  And they recently announced that they’re scaling back mobile and home uses in favor of integration with Microsoft 365 products.  They’re sequentially pulling the plug on all Cortana skills and apps over the next year, although she’ll apparently continue living in MS PCs and helping us use Outlook for the forseeable future.  Continued integration with Surface buds and ‘phones is projected, although a reason for maintaining this is not obvious to me.  

     

    Cortana will no longer live in HK’s Invoke (the MS answer to the Amazon Echo, the Google Nest, and Apple’s HomePod) - as of now, HK is planning to send a $50 voucher to every owner because their smart speakers will be rendered deaf and dumb when Cortana moves out and turns smart speakers into dumb ones! Interestingly enough, MS has started offering development tools for creating Alexa skills with their Azure bot framework.  And it appears that further collaboration with Amazon is ongoing to advance Alexa – so MS may not be out of the game, but they’re changed role from team owner to trainer.

     

     

    ALEXA AND SIRI HAVE THEIR HEADS IN THE CLOUDS

     

    image22.jpeg

     

    The deus is not in the machina.  In this version of Oz, the nerdy wizard is a series of algorithms and AI living in millions of lines of code in the server grid of the Amazon Web Services cloud.  Pictured above is probably one of many server farms around the world comprising AWS, which represents a huge chunk of the world’s business computing power and storage. For security, their locations are kept as far under the radar as possible.  It’s known that the first location was in northern Virginia, and hundreds of investigative reporters have scoured the media for clues to exact locations since that one opened about 15 years ago.

     

    The relevance of the last paragraph to voice control for audiophiles is strong.  Your voice and everything else heard by the microphones in your devices will go to the cloud and back, traversing an unknown number of servers and storage devices along the way – voice responsive assistants can do nothing on their own.  Alexa, Siri, et al require internet access even to turn on a light.  More complex tasks like making JRiver play a specific tune from your library can hurl a lot of data back and forth across the ether, using bandwidth and leaving tracks.

     

    When I ask Alexa to play music by Bill Evans using JRiver, I’m actually asking an AWS server or three to recognize, understand, and act on my request.  The involved communication channels resemble a neural network - my input travels to the cloud over the internet as the afferent signal (going to the “brain”),  The response triggers that are generated by a pretty sophisticated system of AI, predictive analytics, etc return as efferent signals that will be processed by my own system(s) into actions.  

     

    Everything heard by every voice-driven virtual assistant is archived in cloud servers, as are the responses.  So security is obviously a major issue, for which there are many good protective solutions that only work if you use them.  But by reading this far, you’ve probably figured out that every microphone in every device you own can be used to listen in on you, even without your knowledge or authorization.  Caution is essential, but risk is low if you do the right things. We’ll get to how this affects use of a true home hub later.  But here’s a hint: your nagging suspicion is correct that a WiFi-activated door lock or a security system on your WLAN could easily be hacked if you don’t make every effort to secure everything.

     

     

    THE FORECAST CALLS FOR CLOUDS

     

    Alexa has absolutely no idea what JRiver is or does.  She needs an assist from a piece of third party software called a “skill”, a chunk of code that we used to call middleware back when programmers were skinny and computers were fat.  Other such systems use their own versions of middleware to interact with networked devices.  Some systems are part of the IoT, which requires a “hub” connected to a WAN for integration of users’ devices and LANs with the cloud-based computers that make them do what they do.  Some operate directly within a LAN or from point to point, and others connect devices directly to the cloud via the internet.  The overall architecture of a given system 

     

    Fortunately, the community of developers of Alexa skills is huge.  There are already well over 100,000  skills, each vetted by Amazon and available through the Alexa app.  Admittedly, many of them do some pretty silly stuff, although I suspect that the lovers of what we consider silly stuff think that audio applications are wasting their bandwidth, to which I can only say is à chacun son goût.

     

    In any case, those skills also live on cloud servers.  In the course of turning what you say into what you hear (or see or feel or whatever else you’ve asked your virtual assistant to make happen for you), the data representing your voice must get to the appropriate skill software located somewhere on that long strange trip from your lips to your ears, as guided by the Wizard of Amazon.  In reality, many of the skill sources probably use the AWS cloud too, but it’s hard to know which cloud is which and it’s irrelevant for most consumer applications.  

     

    It should be obvious that there’s a lot of A-D, D-D, and D-A converting going on to let your voice activate an outlet, make your music louder or softer, change the track, etc.  And I’m sure that the methods and equipment chosen by each provider affect the speed, accuracy, and versatility of the “assistant” you choose.  I’ll offer a few brief comparisons among Alexa, Siri, and services that link the two (so you can do things like use Alexa on an Apple Watch and control Samsung devices from an iPhone – I’m doing both).  The intentional thwarting of cross-platform use by direct competitors has real world performance consequences.  You can leap those barriers with an iron will, a creative spirit, and a little patience.  The world of voice control and touchless device-human interaction is in its infancy, and these issues will be overcome.  With closer integration, performance becomes better and better and the entry barriers will fall.  As they do, the attractiveness of voice control for audio will grow and more of us will adopt it over time.

     

    Here’s an example of the effort needed to make Alexa control JRiver.  To interact with JRMC, Alexa uses a skill called House Band from a developer named Philosophical Creations (a source of several skills, only one of which is for audio AFAIK).  And here’s where the primitive nature of voice control for audio starts to appear.  You first have to tell Alexa to use the skill she’ll need to complete your task for you.  You can’t just tell her to play Kind of Blue on JRiver – you have to start by telling her to “launch House Band”.  When she asks what you want to do next, instruct her to “ask House Band to play the album Kind of Blue by Miles Davis”.  The music actually plays in response to this, although for now you can only make JRMC play on the zones it recognizes – and it doesn’t recognize any Amazon devices because none is DLNA compliant.

     

    All that data transmission, activity and dependence on so many remote servers and services combine to make the early adopter curve for voice controlled audio quite steep – so interest has been weak.   This AS thread on Alexa and JRMC was started about 5 years ago by Arkonovs and got one (count ‘em – one!) reply.  Although it’s easier now than it was in Feb 2016 to use voice control, it takes enough effort and commitment to keep many from trying it.  In a true sense, the physical path from your voice to JRiver is analogous to the philosophical and emotional paths to adoption of this technology – it’s a long and winding road with a lot of bumps, and it’s still under construction.  But advances are happening rapidly, so let’s look at the major players to find the best of today’s breed.

     

    DOOR #2: THE NATIVE HABITAT OF THE BEST KNOWN DIGITAL ASSISTANTS

     

    Enter the smart speaker, an elemental concept that combines a microphone, a powered speaker, and a processor in a single device that’s listening for your commands 24/7/365.  Over the last few years, they managed to take the assistant out of the box and stick him or her into a wide variety of devices that includes audio and video equipment.  We’ll get to those itinerants after we discuss the basic boxes in which they were whelped and weaned.

     

    Each assistant from one of the major players has his or her own little home. Alexa lives in the Echo or the Show, Siri lives in iStuff, and Google Assistant lives in Google and Nest hubs.  You have to buy at least one proprietary device to bring Alexa, Siri, Bixby et al home, and you have to have an account with the parent company for your new buddy to function.  Once you’ve bought a ticket to Oz, you can expand your assistant’s reach to other devices, like your PC or mobiles. But if you don’t already have the necessary account and you buy an appliance with an assistant in it, you’ll have to set up an account before you can use the voice assist function.  That means opening an Amazon account for Alexa, an Apple ID for Siri, a Samsung account for Bixby etc. There’s no way around this, even if you buy one tiny speaker for bedside music and wakeup alarms.  If you want into the wacky world of Alexa et al, you have to join their ranks.

     

    Alexa lives in thousands of products now and is also available as an app for your phone, TV etc.  Siri has a smaller but considerable array of homes away from home, starting with every iPhone and iPad in the world running any recent OS from Apple.  Bixby lives in a host of Samsung products from phones and tablets to sound bars and other audio products, and Google Assistant inhabits Nest smart products, Android TVs, and a growing list of other stuff in addition to the ever expanding line of Google-branded products. Some devices now come with 2 assistants, so you can use each for things the other won’t do.

     

    A full review and comparison of the major platforms for audiophile use will be a serious undertaking requiring a fair amount of equipment at considerable expense.  It’s certainly a worthwhile endeavor and I’m working on a plan to bring in enough stuff to do it well - I already have Amazon and Samsung platforms, plus a few Siri-loaded devices. So work is progressing, but it won’t be valuable to audiophiles without inclusion of the best smart speakers and their embedded assistants.  For that, I’m going to need assistance in securing enough units to evaluate and compare.

     

    DOOR #3: HOME THEATER

     

    Along with Alexa, Siri, and Bixby, I have 4 Samsung Smart TVs now, all with embedded media players. With a decent sound system, this is also a fine way to enjoy music – and it’s one of the easiest and most readily available platforms today for voice controlled home audio.  You can read a primer on HT for multichannel audio in my last AS article, “Entering Multichannel at the Ground Floor”.  Just find the header “CONSIDER THE HOME THEATER RECEIVER FOR MULTICHANNEL AUDIO” and start reading my suggestions for a few value-priced receivers with fine audio performance and enough flexibility to make most of us happy for both audio and HT. More and more of these come with an assistant built in.

     

    You can now buy decent AV equipment from Denon, Yamaha, Marantz and others with VC assistants inside.  Some limit you to their own choice, e.g. Alexa.  But others, like the $3300 Marantz 8015 receiver, work with “all the major voice agents” and do pretty much everything the average audiophile could want.  As an example, the 8015 streams major services, plays ALAC, Apple Lossless, DSD, FLAC & WAV, files, has serious DSP, digital inputs (coax x2, optical x2, HDMI x 7), a full complement of line level unbalanced outs, 3 audio zones, and MC capability (11.2) well beyond what most of us would ever use for home audio.  With HEOS wireless, BT, WiFi and ethernet connectivity, such devices are an integral part of your smart home and can be controlled fairly well with voice commands.

     

    Only a few smart TVs include good voice control.  Sony Bravia Smart TVs come with Alexa inside. All LG OLED, Super UHD and 4K UHD TVs with AI ThinQ® come with the Google Assistant embedded.  Amazingly enough,  Amazon and Google get along well enough in LG products to let them control Alexa’s devices as well.  The range of commands and possibilities grows daily, so LG is definitely worth watching closely despite their lack of audiophile electronics. Their TVs are easily integrated with almost anybody’s audio through intelligent controls and connections. 

     

    Cloud integration of diverse sources, processors, delivery systems, and local networks makes seamless use of devices from multiple vendors easier and more effective than ever.  We have to remember that the same Rube Goldberg touch that strings Alexa, House Band, JRiver, and your system devices together for pure audio is still required to add voice control to TV and HT setups whether you do it yourself or buy it that way.  For example, many 2017 and newer Sony 4k HDR TVs running Android OS can be updated to add Alexa, who will then use skills to fulfill your desires and to do anything else Alexa can do with whatever devices and systems you already have.  This is where those 100k+ skills come in, as you can activate any or all of them to let Alexa control your entire home, whether she lives in your TV, your thermostat, your phone, or your car (yes, your car – but auto audio’s another article in the works).

     

     

    image23.jpeg

     

    Any audiophile in the market for a good HT system should consider a voice-responsive device.  These are the easiest and most consistently accurate approach to voice controlled audio available today.  But, as with everything we love, the technology is advancing quickly enough to make obsolescence likely sooner rather than later, especially for those who pursue the state of the art.  SQ is already excellent in the better HT receivers, but voice control is still primitive and will undoubtedly be faster, more accurate, and more versatile every time you turn around for the next few years.

     

     

    NOT YET READY FOR AUDIOPHILES:  SMART HOMES AND HOME HUBS

     

    The concept of a smart home is not new.  With networked devices, you can control everything from entry locks and HVAC to lighting and curtains with your voice.  You can tell your assistant to feed the dog and warm up your espresso machine through your phone while driving home from work.  And the same systems offer varying degrees of control over your audio system.  However, I know of no existing  home hub or other smart home approach that’s capable of controlling all the functions of even the simplest legacy 2 channel stereo system.

     

    Some smart devices can be controlled entirely within your home, but most require a hub with an internet connection because of all the data streams and systems integration needed to execute a simple voice command. The hub can be integral to a single smart device on the LAN, distributed among all devices on the LAN, or embedded in a stand-alone hub device.

     

    The big guys each offer their own home hubs.  In some systems (like Amazon’s), the function of a hub is distributed among one or all devices that carry the digital assistant – no standalone device is needed.  All of the controlled devices are on your WLAN and all communicate with their cloud server(s) through your network router to the internet.  In others, like Samsung’s, you need a dedicated home hub connected to your LAN via ethernet and connected to your devices by WiFi.

     

    Interestingly, the brand new generation of Amazon smart devices includes an embedded Zigbee hub.  Zigbee is an old line (at least in the IoT world) global organization that establishes standards for wireless communication among devices on the IoT and certifies compliance of hard and soft wares for such use.  Members of the Zigbee Alliance include serious players like Apple, Google, Comcast, Lutron, Huawei, Samsung, TI, and many others.  There are about 150 current nonmember participants and about the same number of adopters.  But none of the audio industry seems to be involved, apart from big guys who include audio in their product offerings, e.g. Panasonic and Toshiba.

     

    Google hubs have been sold in both Google and Nest devices, and to tell you the truth I can’t keep track of which is which.  A Google hub is the portal through which the Google Assistant makes stuff happen on your Google-enabled devices.  And, like Alexa, that hub function is distributed across all the devices in which the Google Assistant lives – no standalone device is necessary to perform the hub functions.

     

    Similarly, Apple’s home hub function comes in multiple devices, e.g. Apple TV, iPads, and the Apple Home Pod (which is another smart speaker device similar to the Amazon Echo products).  But unlike Alexa, who demands no specific hub setup beyond using the Alexa app, Siri does require that a hub be set up on one of the above devices.  If you choose to use an iPad as the Apple home hub, you have to leave it at home, powered on and connected to your WLAN 24/7/365.  Then you can use any of the big collection of Apple HomeKit devices from multiple manufacturers in your smart home.

     

    Unfortunately, smart home hubs are not yet up to controlling audio systems directly, completely, and well.  As part of my research for this article, I now have Amazon, Samsung, Google Home, SmartLife, Smart Things, Zigbee, Z-wave, TP-Link, and Kasa hubs set up to control anything in our home that makes a sound, moves, emits energy, or does anything else but sit there.  None of these is worth the cost and effort purely for audiophile use. The biggest contribution to our audio enjoyment from home automation per se is the ability to turn things on and off.  This does not include smart speakers, which we’ll discuss again below as renderers and control points.  In these roles, there’s good reason to consider them.

     

     

    MEDIA HUBS

     

    Although I haven’t bought any, there are standalone hubs that will integrate multiple devices and platforms into voice controlled media playback in your home.  Logitech’s Harmony Hub is one such device, and it’s apparently quite fine – but it’s also expensive enough to keep me from buying one just to evaluate it for this report.  Logitech is a long time player by now, and their products have always been good enough or better when I’ve used them. Their computer peripherals are great value - I use several of their wireless keyboards and mice, and I love their Digital Crayon on my iPad.  So I suspect their hubs are pretty fine.

     

    But using a Logitech hub for voice control of your audio system is another daisy chain of independent systems and functions.  And it’s easy to confuse one of those systems for another, e.g. if you use Alexa to control the hub, is the palette of functions available to you determined by Alexa or by the Harmony Hub?  Their hubs integrate with Alexa and Google Assistant, in that you can ask either assistant to tell the hub what you want done.  But the hub can only do what the hub can do, even if Alexa or GA can do more on other devices with other skills.  

     

    The Harmony Hub can be linked to Alexa using the Harmony skill, but the hub’s dedicated remote control will not work if you set up another device to accept your vocal commands (like an iPhone, an Echo or other smart speaker).  Instead, you have to set up Harmony Express as a video service provider – and if you do that, you can only use one such remote at a time.  Does it all work?  Yes.  Does it work as well as we’d want it to?  No.  It’ll be there some day, but it’s still primitive and I do not recommend this approach for other than an educational experience and an introduction ot what will someday be great.

     

    Another sign of less than seamless integration is the simple disclaimer from Logitech that “[t]he Express Integration skill has commands which may differ from other Harmony skills and does not support the use of friendly Favorite channel names or Alexa Routines”.  In other words, you get a few from column A and a few from column B – but Logitech makes that choice for you.

     

    Like virtually all other products in this arena, everything from corporate relations to device compatibility is constantly changing because business and technology often pursue disparate paths.  Features are introduced, refined, removed, and altered frequently as the industry tries to find out what we want and what we’ll pay for it.  Like most others, the Harmony Hub website clearly describes the fluid state of interactivity with the classic disclaimer:  “Supported devices and brands are subject to change without notice”.  Yes, Virginia, you may wake up tomorrow to find that your beloved AV receiver no longer responds to your trusted assistant – this happened with the Blackberry Assistant, early versions of Google Now, and others.

     

    Then there’s Caavo, a voice controlled home hub that integrates multimedia control of “everything connected to your TV”.  Its voice controlled remote uses their proprietary system, but the box is supposed to work with Alexa and Google Assistant much as the Logitech Harmony Hub does.  This relationship is apparently still in the early courtship stage – one Verge reviewer puts it right out there: “Caavo also has integrations with Alexa and Google Assistant, although these are pretty hit or miss. I was never able to make the Alexa integration work at all, and the Google Assistant integration was so spotty I stopped trying after a while”. ‘nuf said.

     

    The Caavo gets decent reviews for general TV and HT use, but even for plain vanilla use it’s far from refined.  From the Verge review, the Caavo “... isn’t perfect, by any means [but]it’s the first remote I’ve used that even attempts to build a new foundation for how all the stuff connected to your TV should work together, and it’s a no-brainer if you’re juggling between a cable box and streaming devices connected to your TV”.  So, once again, the potential’s there and we’ll probably love the 3rd or 4th generation.  It’s only a matter of time before successful integration of what are now independent functions and products will make today’s voice control seem as primitive as Audrey (remember her?).

     

     

    SMART SPEAKERS

     

     

    image24.jpeg

     

     

    The world of smart speakers / devices and voice control for audiophiles is about to explode over the next few years.  The little ones are getting better with each new version – the current Echo Dot is a better and smarter speaker than the first generation big Echo was, and the Apple HomePod is good enough for many of us to enjoy in secondary settings like offices and background settings right now.  Like humans, each generation of smart speakers is better than its parents. I’ll provide an overview of the most popular devices, systems, and “assistants” a few paragraphs further down, and I’m planning a true comparison of the top ones just as soon as they’re good enough to be taken seriously (which will not be far in the future, if I read the tea leaves right).  But first, let’s get down to brass tacks.

     

    image25.jpegThere are smart speakers at all price points from $30 to over $2k, with decent products available from B&O (for $2250), HK, Bose, Sonos, Sony, Audio Pro, Devialet and others.  Most come with Alexa, Google Assistant, and/or Siri – but most smart speakers aren’t any smarter than the dumb ones when it comes to audio.  They can turn up your lights, lock your doors, start your car, and order dinner – but for audio you’re limited to the same palette of skills regardless of the device in which your assistant lives.  But to be honest, even if the smart speakers from B&O have the IQ of an Echo Dot, they are absolutely gorgeous (see below).  If they sound as fine as they look, I could actually see buying them.

     

    A comparison of smart speakers as audio equipment is beyond the scope of this article, both because it’s a large scale work in itself and because assembling a collection of them requires either a large budget or industry connections, neither of which I have.  I’m working on a way to audition the better ones and hope to be able to do this in the not too distant future.  The $300 Apple unit has the best reviews for SQ among the big 3 (Amazon, Google, and Apple), although it seems like there are new models with expanded quality and capabilities coming out daily.  I haven’t seen any reviews yet of the latest $300+ units.

     

    I’ve heard many of the better devices from multiple makers, although I haven’t heard truly top line pieces like the newest $3k B&O).  Of those I have heard, none will replace either of my main systems.  In fact, none could even replace my desktop studio system (iFi Nano DSD into JBL 305s) for overall balance, presentation, articulation, clarity, and general sound quality.  I could happily live with several of the better smart speakers in multi-room use, for background and casual listening – and, in fact, I do.

     

    When you throw in voice control, which I did about a year ago, it’s easier to enjoy those smart speakers in every room and overlook their sonic shortcomings.  Add the ability to get Alexa to both control JRiver and play to Echo devices via BT, and your music becomes a constant companion responsive to conversational input.  Just say “Alexa, tell HouseBand to play Kind of Blue” and Miles emerges from the speakers in your selected JRMC zone(s).  It really is a wonderful advance in my enjoyment of music at home, and I use it every day while writing, researching, reading etc.  You shoud try it!

     

     

    A FEW WORDS ABOUT SONOS & BOSE

     

    image26.jpegSonos pioneered wireless multi-room audio in  2002, although it was more concept than execution when they started.  The technical limitations were huge, e.g. dial-up AOL was the most popular ISP in the US, WiFi was in its infancy, and there were no WiFi drivers in Linux.  There was no hardware and no software to do what they wanted to do.  But the concept was as simple & brilliant as their first ads for it.

     

    And they created what I believe was the first true multi-room wireless audio network on the market, shipping their first product in 2005 IIRC.  There was growing industry interest in WiFi for home media distribution when they launched Jobs introduced the Airport Extreme (on the blazingly fast 802.11g standard!) about a year before Sonos sold product #1.  But itronically, it was Apple who enabled true mobile control of Sonos by introducing the iPhone and the App Store in 2007.  The Sonos app that soon followed let us use an iPhone for control, and the modern era of multi-room wireless audio began.

     

    Equally ironic is the fact that Amazon then made voice control of Sonos systems easy and practical by giving us Alexa, for whom there is a Sonos skill similar to the HouseBand skill used to control JRMC.  You can buy Sonos products in every guise from a small single smart speaker to sophisticated amplifiers and other electronics to drive your own speakers – and you can control every basic function with your voice.  Here’s a summary of the things Alexa can make Sonos do for you.  And here’s a link to info on the many Sonos devices available today.  I’m told (but haven’t confirmed) that the Google Assistant is now also usable with both.  But as far as I know, there’s still no way to get Google Assistant to control JRMC or other SW media player – so Alexa is the clear choice for them today.

     

    Sonos and Bose are probably the best known and most popular brands of smart speakers and systems from the audio industry (OK – maybe I’m being a bit generous to them both in calling them audio products).  Most audiophiles have heard of Sonos and Bose – and few (at least few that I know) would consider either for anything but background music.  Almost all think it’s too expensive for that use.  After listening to several of them, I don’t disagree about value received (although all of the smart speakers and associated products from the audio industry cost too much in my opinion).  

     

    You do get audio components that look less like audio components when you spring for Bose, Sonos, and others of their ilk – but that’s probably not a big consideration for most audiophiles.  Even if you or your significant other is sensitive to interior décor, the pedestrian Amazon Echo, Google Home, and Apple smart devices are as easy to camouflage as anything I’ve seen from the audio industry.

     

    Sonos equipment sounds better to me than the equivalent Bose models for about the same price.  A Bose 500 smart speaker and a 700 sub will set you back about $1k.  Even their entry level smart speaker is $200, which is coincidentally now the price of the entry level Bose too.  I can find no actual performance spec for either one – amplifier power and performance are a mystery for both, and there are no specs at all on which to base a “paper” evaluation.  From the limited listening comparisons I’ve been able to do in friends’ homes, Sonos is more pleasing to me.  But I haven’t heard either line’s latest and best products.  To be honest, the Amazon Echo Studio and an Echo Sub ($330 list for the pair) sound as good or better to me than any other smart speaker I’ve heard at any price.

     

     

    THE BOTTOM LINE:  WHAT I USE, PLUS A BIT ABOUT VOICE CONTROL OF POWER

     

    I’m a “balanced” early adopter – I want to try everything, but I only buy into new technology when the price no longer outweighs my desire to try it.  And my wife is what I call an adopter of convenience – if something will make her life easier and I can convince her of that fact (with the latter being the bigger hurdle), she’s all in.  The key is that it has to appeal to a complete technophobe who has ten thumbs, deep anxiety about using new things, and strong historical resistance to any ideas generated by me.

     

    The need to please someone else figures strongly in the planning of many audiophiles.  My wife, like the spouses and significant others of so many of us, simply won’t go near most of our equipment because she believes it’s too hard for her to use.  Sometimes she’s even right, although she (and probably most other SOs around her) can do a lot more than she thinks she can or wants to learn. As soon as video projectors became home sized and home priced, I was ready to try one.  Of course, she claimed to prefer a big flat TV to a projected image despite never having seen a home theater of any kind.  I finally dragged her to a showroom while we were out buying something she wanted, and a single glance at the 8’ image on the wall changed her mind.  

     

    Now there’s an Amazon Echo Dot in every room of our home except the bathrooms – and there are stereo pairs in the living room and master bedroom.  She’s thrilled to be able to say “Alexa, play music by Cat Stevens” and “Alexa, make it louder”, and she’s so spoiled by it that she’ll never use hardware controls to listen to music again.   With devices in several rooms, we also discovered (accidentally) that adding a sub in one central location really improves SQ everywhere (since the lows are nondirectional). 

     

    The current crop of virtual assistants is willing to learn, but some are smarter than others when it comes to our audio systems.  We adopted the lovely Alexa when the first Echo Dots came out, and we now have stereo pairs in the living room and our bedroom plus singletons in kitchen and den.  She turns both the living room and den audio systems on and off through their power strips.  This is particularly helpful with the LR system (on a small rack on the floor, under the grand piano against the back wall).  I can’t tell you how wonderful it is to say “Alexa, turn on the stereo” instead of crawling under the piano to turn on the tube electronics – the back of my head has healed completely.  She also controls some home appliances, most importantly my espresso machine. 

     

    A smart outlet does not degrade SQ in any way in my systems to my ears.  I have power cables on the DAC / preamp and the power amp that would choke a horse, and I’ve compared the system with and without the smart outlet often enough to be certain that it’s sonically transparent to me.  I’m using Samsung receptacles because I started my home automation efforts with a Samsung Smart Hub (more about which later).  I’m not sure I’d go Samsung again, but we already had 3 Samsung “smart TVs” (which is something of a misnomer, again to be discussed later) so I figured I’d stick with one system for reliability and simplicity.  It ain’t necessarily so.

     

    As stated earlier, I’ve tried multiple platforms for voice driven home automation and audio control in the course of researching this article.  Alexa, Google Home, Kasa, TP-Link, Samsung and the rest do an equally fine job of power control.  Be aware that most smart outlets and power strips are 15A devices or less.  There are some excellent 20+A units that work fine – just be sure you buy the capacity you need or you’ll be in for a spot of bother.  I can recommend ConnectSense, whose products are well made but a bit pricey ($100 for the in-wall 20A duplex).  

     

    Aeotec offers a full line of smart products based on the Z-Wave system (another hub-based IoT platform like Kasa, TP-Link, Samsung etc).  Their 40 amp smart switch is hard wired between the branch circuit and the device being controlled, but they also offer plug-in smart outlets and a host of Z-Wave system devices.  Aeotec also offers a strong differentiating extra that I consider a sustainable competitive advantage (at least, for now)  with great appeal for audiophiles: the ability to create and host an automation hub entirely within your LAN.   This enhances security and makes a lot of sense to me.  The Aeotec Z-Stick 5+, which is the device that lets you do this, works very well with a Raspberry Pi 4, so I’m exploring design of an audio control system using this combo.

     

    SECURITY CONCERNS WITH VOICE CONTROL, THE IoT, AND WiFi IN GENERAL

     

    From the day I posted my AS survey on audiophile interest in, experience with, and concerns about voice control for our systems, I started getting emails and messages of concern over security.  This is a valid and major issue for anyone whose digital footprint extends beyond the case of the device he or she is using.  Your privacy and security are at risk whenever you’re on the internet (no matter how you connect to it), on a LAN or WLAN, or using Bluetooth or any other form of communication with potential exposure of content and/or access to your devices / network.

     

    Everything on the IoT is potentially vulnerable, as are the LANs and networked devices to which each has access.  Every device with an energy based entrance pathway is potentially vulnerable, be it sound energy, light energy, RF, or even thermal.  Smart door locks, auto systems, and home security systems are regularly hacked and defeated unless protective precautions are taken.  

     

    Yes, Alexa and Siri hear everything you hear.  They also access your email, messages, contacts, and any other information on your devices and networks to which you don’t actively deny them access.  And there have been some serious issues.  So-called smart technology has been oblivious to some serious security threats over the years. Anyone could pick up a locked iPhone 4S, launch Siri by pressing the “home” button, and gain control of the phone through voice-activated commands.  Alexa and Google Assistant have their own issues, as do WiFi, Bluetooth, and every other form of connectivity.  And if there’s a way in, someone will find and exploit it.

     

    You can protect yourself and your devices with a little effort.  The basics are easy to describe but often overlooked, starting with setup.  DO NOT JUST ACCEPT ALL DEFAULTS!  Read everything before clicking anything.  Deny your digital assistant access to anything that won’t help it do what you want it to do.  Email access is unnecessary for digital assistants unless you want to send emails using only voice control. If you have financial information on any networked device, your digital assistant does not need access unless you plan to buy things with your voice (which would be inconsistent with serious concern for security issues).   We don’t want Alexa to do anything for us but control the devices we use regularly – so she can’t, because she has no access to those apps and data sources.

     

    image27.jpegYou can disable the microphones in smart devices, and you can delete all recordings made by them using the parent app.  There are now devices that will physically block or disconnect the microphones on smart speakers, e.g. the Paranoid Home pictured here:  

     

     

    If you give a credit card number over the phone within earshot of a smart device, it will record what you say and send it to the parent servers.  So don’t do stuff like that!  Remember that most digital assistants use your WLAN for all communication, so you also have to secure your networks and other networked devices by doing the following and more:

     

    • maintain all software and firmware with current updates – these include security patches
    • use your router’s firewall; many ISPs include good security – check to see what you have
    • do not use any default names for networks or any other digital entity
      • change every one of them to complex names devoid of names, location etc
      • if possible, change the default administrator account name from “admin” etc
      • change your LAN’s IP range from its default to a common alternative (e.g. 10.0 to 192.168)
        • this can hide your router’s brand and model, making it harder to hack
    • use strong account passwords and change them regularly
    • use multi-factor authentication whenever available
    • encrypt everything you can
    • use strong WiFi security with complex access keys
      • do not broadcast your SSID
      • turn off WPS
    • keep your WiFi router toward the center of your house and away from windows
    • use fixed IP addresses for every device that allows it
    • disable DHCP on your networks, leaving only as many available IP addresses on your subnet as you need for devices that won’t let youdis set a fixed IP
    • disable remote access to your network

     

    As our audio equipment becomes networked on the IoT, it joins cameras, refrigerators, dishwashers, Cadillacs, etc on a web of vulnerability.  If you take great pains to protect yourself, voice control for audio is far less risky than buying on line.  Just be careful.  Here’s some great information on securing your digital self found on the FTC’s website and here’s the FTC website on protecting your IoT devices. ‘nuf said!

     

     

    THE CUTTING EDGE

     

    I love voice control for daily listening control.  My wife and I both use it happily all day every day, although we occasionally still feel a little silly talking to a device.  I’m having great fun trying to develop a system that integrates voice control with artificial intelligence and translating its digital outputs into actions in my own audio systems.  As you might expect from my prior articles, I’m using a Raspberry Pi as a development mule and focusing on controlling a simple music player completely and accurately.  It’s not working yet, but I will prevail!

     

    I also set up my Apple Watch for voice control.  I can now manage JRiver using an app called Voice in a Can.  As you know, these digital assistants can be pretty snooty – Alexa won’t talk to Siri, and the Google Assistant doesn’t even acknowledge either of the ladies in many systems and contexts.  As you might expect, there are now apps that can open communication channels among our faithful digital staff – and Voice in a Can is one of them.  

     

    The easiest way to describe what it does is that (metaphorically speaking) it’s a simultaneous translator between Siri and Alexa.  So I can now do everything by talking to my watch that I can do by addressing Alexa in one of her residences.  Voice in a Can lets Siri tell Alexa to open the Alexa skill named House Band so it can tell JRiver to play whatever I want to hear through the zone of my choice.  FWIW, I can also set my wake-up alarm, start my espresso machine, or turn the stereo systems in the living room and library on and off by asking my watch.  Yes, this is a ridiculously complex chain of events – but it’s a start, and major advances in functionality and simplicity are just around the corner.  I really do love it!

     

    The future is bright for voice control for the audiophile.  It’s only a matter of time before the daisy chain of VC / AI / middleware / actionable output is integrated into simpler and more efficient systems.  I mentioned josh (with a lower case j) early in this piece.  This is a system aimed at the high end market for comprehensive voice control using AI, and it’s probably the tip of that iceberg.  josh seems intent on eventually doing everything you ever want to do with no input other than vocal.  MAVIS (Multimedia Audio Visual Interface System) is another advanced approach to voice control.  Here’s a web page about some of the things you can do with MAVIS right now, and audio editing is among them. You can download a MAVIS app called “Connect the Dots” from the iStore and play with it as an introduction.

     

    Be of good cheer regarding voice control for audiophiles – it’s coming and it’ll be fantastic.  Like the electric car, it has a few limitations that won’t be overcome until technical solutions are found to get around downstream barriers.  But once it matures a bit, I believe it will become integral to good audio equipment.  One reason I say this is that the switches and other hard controls now used in most electronic devices are simply not as solid and reliable as the ones we took for granted in analog equipment.  Bubble switches feel cheap, break often, and look tacky.  Tiny digital displays fail frequently and annoyingly.  Many remote controls for otherwise excellent products are flimsy, tacky, and imprecise.  Being able to converse with a DAC will be far preferable to trying to read the half of its display that still lights up.  

     

    I hope you enjoyed this and found it worthwhile to read.  Stay safe and enjoy every minute of every day!

     

     

     




    User Feedback

    Recommended Comments



    1 hour ago, The Computer Audiophile said:

    This is fantastic @bluesman

     

    I've been following josh.ai for a while now. I think josh is the company to watch in the high end space for sure. 

    Josh is cool, for sure - but they seem to be using current methods and tools to achieve something for which current methods and tools are not ideally suited.  Unless they're the ones to come out of the garage or basement with the next big thing in AI coding and output modalities, they'll be looked on as crude when someone else finally succeeds.  I know of no current platform that integrates the various functions necessary to achieve accurate and efficient voice control over devices in disparate systems, including what I know of Josh (which admittedly isn't a lot at the design level).

     

    Right now, there are too many data bouncing 'way too far over 'way too many jury-rigged networks to do this smoothly.  And platform integration is not in the cards for an industry that profits largely from differentiation, so there's not likely to be one approach shared by all. This is a lot like the world of electronic medical records.  The most they hope for is "interoperability" - and that has us unable to share data universally across all healthcare institutions plus payers and the scientific community.  All Epic users can share their data if they wish, as can users of several other major EMR platforms.  But these programs aren't written in the same languages and they run on different architectures.  So if your hospital is on Epic and the one where you ended up unconscious in the ER because you fell off the train is on Cerner, you're out of luck unless you carry your medical records around on a USB drive or a CD.

     

    It's a lot like needing home hubs for Z-wave, Zigbee, Google Home, Samsung Smart Things, SmartLife, and Apple Home because you have a few devices that work on each platform.  There are several smart speakers that require you to control some audio functions from Alexa and some from Google Assistant - this is no way to run a railroad.  As I was just saying to my watch, "Siri, tell Alexa to open House Band; Siri, tell Alexa to tell House Band to tell JRiver to play music by Wayne Henderson in the master bedroom; Siri, tell Alexa to make the music louder;"

    Share this comment


    Link to comment
    Share on other sites

    Good article. I use five Harmony Hubs (only $70 each at Amazon) at home, each tied to a different Gmail and Amazon account. This enables me to voice control one of the most annoying functions for my wife and visitors:  how to turn on a system and switch it in a particular room for watching TV, or listening to 2-channel or multichannel music. There is essentially no audiophile product I’ve purchased that this setup can’t simplify because Harmony has such a deep database of electronics products.  The separate accounts allow asking Alexa or Google to turn on stereo, for example, and it will only impact the system in that particular room without turning on stereos throughout the house. 
     

    I tried HouseBand with JRiver early on and found it frustratingly difficult to get it to work. Sounds like I should give it another go, perhaps tied to my Apple Watch. JCR 

    Share this comment


    Link to comment
    Share on other sites

    39 minutes ago, jrobbins50 said:

    Good article. I use five Harmony Hubs (only $70 each at Amazon) at home, each tied to a different Gmail and Amazon account.

    Thanks!
     

    Your willingness to use multiple accounts and devices to “integrate” functions says that you’re flexible and adventurous, like me.  But we’re in the minority by far - most people would think we’re a bit daft to go that far.......and we shouldn’t have to.  I believe it won’t be long before there are more universal platforms and approaches available to us.  But until then, let’s stretch the envelope to see how much it can hold 😁

    Share this comment


    Link to comment
    Share on other sites

    Very professional article! Piqued my interest so did a bit of Googling:

     

    System

    Platform/Hardware

    Notes

    aido

    Robot with GUI

    cameras, multiple CPU's and GPU's, not available yet, price?

    athena

     

    Open source software project written in Python

    bixby

    Samsung mobile devices

    Samsung's Google assistant

    hound

    Automotive

     

    jibo

    Robot for healthcare and education

     

    josh

     

    interfaces in posh homes to Lutron lighting, Sonos, Crestron thermostat, home security, smart TV's, home theatre

    mycroft

    Runs on Raspberry Pi etc.

    Open source software project (Python?), Mk I was $180 now sold out, Mk II coming soon

    ubi ucic

    Runs on Android and Linux

    Ubi Kit is free for developers supports Google assistant and Alexa

    viv.ai

     

    Viv is an artificial intelligence platform - intelligent personal assistant software created by the developers of Siri, bought by Samsung. Now to be integrated into Bixby 2.0

    Share this comment


    Link to comment
    Share on other sites

    3 hours ago, blue2 said:

    Very professional article! Piqued my interest so did a bit of Googling:

     

    System

    Platform/Hardware

    Notes

    aido

    Robot with GUI

    cameras, multiple CPU's and GPU's, not available yet, price?

    athena

     

    Open source software project written in Python

    bixby

    Samsung mobile devices

    Samsung's Google assistant

    hound

    Automotive

     

    jibo

    Robot for healthcare and education

     

    josh

     

    interfaces in posh homes to Lutron lighting, Sonos, Crestron thermostat, home security, smart TV's, home theatre

    mycroft

    Runs on Raspberry Pi etc.

    Open source software project (Python?), Mk I was $180 now sold out, Mk II coming soon

    ubi ucic

    Runs on Android and Linux

    Ubi Kit is free for developers supports Google assistant and Alexa

    viv.ai

     

    Viv is an artificial intelligence platform - intelligent personal assistant software created by the developers of Siri, bought by Samsung. Now to be integrated into Bixby 2.0

    Thanks for your time & comments!  There's so much potential here that I'm amazed the audio industry hasn't recognized how important VC & AI are for development and sales of future products.  We all experience mechanical controller failures in everything we use, from audio to cars to coffee machines.  Tiny touch screens, bubble switches, touch sensitive controls etc are only pseudoelectronic - they still have physical parts that fail too often.  We could eliminate most of those last century pieces and concepts by integrating excellent voice recognition and synthesis with AI. Imagine no more noisy pots, no cracked or dented bubble switches, no broken or lost knobs, minimal internal wiring, etc.

     

    Then imagine being able to control and monitor every audio parameter of interest to us in real time using voice input and synthesized voice response.  Throw in AI's ability to monitor real time performance and identify impending failures by detecting as yet inaudible changes in everything from voltage & current stability at various points to distortion to early ID of asymmetry in channel outputs.  In addition to telling your system what you want to hear (and how and where and when...), you could ask for a status check and get a verbal response plus a downloadable log report.  You could set up spontaneous verbal warnings when voltage, temperature, and other metrics go out of spec.  You could alter or switch amplifier operating characteristics to A-B changes in SQ.

     

    How about a status report when powering up, e.g.  "Good morning, Bob - your system is in perfect operating condition and ready to play"?  Get vocal alerts as needed - "THD in your left channel has increased to 105% of the right channel.  Diagnostics show early failure of V4 with no other abnormality.  Replace tube."

     

    This is cool stuff!  I can't wait to play with it all as it develops.

    Share this comment


    Link to comment
    Share on other sites

    25 minutes ago, bluesman said:

    Thanks for your time & comments!  There's so much potential here that I'm amazed the audio industry hasn't recognized how important VC & AI are for development and sales of future products.  We all experience mechanical controller failures in everything we use, from audio to cars to coffee machines.  Tiny touch screens, bubble switches, touch sensitive controls etc are only pseudoelectronic - they still have physical parts that fail too often.  We could eliminate most of those last century pieces and concepts by integrating excellent voice recognition and synthesis with AI. Imagine no more noisy pots, no cracked or dented bubble switches, no broken or lost knobs, minimal internal wiring, etc.

     

    Then imagine being able to control and monitor every audio parameter of interest to us in real time using voice input and synthesized voice response.  Throw in AI's ability to monitor real time performance and identify impending failures by detecting as yet inaudible changes in everything from voltage & current stability at various points to distortion to early ID of asymmetry in channel outputs.  In addition to telling your system what you want to hear (and how and where and when...), you could ask for a status check and get a verbal response plus a downloadable log report.  You could set up spontaneous verbal warnings when voltage, temperature, and other metrics go out of spec.  You could alter or switch amplifier operating characteristics to A-B changes in SQ.

     

    How about a status report when powering up, e.g.  "Good morning, Bob - your system is in perfect operating condition and ready to play"?  Get vocal alerts as needed - "THD in your left channel has increased to 105% of the right channel.  Diagnostics show early failure of V4 with no other abnormality.  Replace tube."

     

    This is cool stuff!  I can't wait to play with it all as it develops.

    Agree 100%
     

    It’s all cool stuff and it’s helpful (system status alerts etc...). Getting cool and helpful, not gimmicky is key. 
     

    I’d love to change a setting without navigating an endless menu that I haven’t used in 6 months. VC makes it easy. 

    Share this comment


    Link to comment
    Share on other sites

    Quote

    Google devices will play 24/96 FLACs, and their high end devices have embedded Chromecast so they can be used as DLNA zones in Jriver.  If you want to have voice control over JRiver playing through a Google device as a zone, you’ll have to add the 

    The last sentence needs an ending.

     

    Note that you don't need JRiver to use DLNA with Chromecast Audio. I use my CCA devices with two QNAP controllers (Music Station and QMusic), and with BubbleUPNP. I believe there are others too, maybe MConnect and Kazoo?). No voice control though, which I don't care about. 

     

    Share this comment


    Link to comment
    Share on other sites

    1 hour ago, audiobomber said:

    The last sentence needs an ending.

     

    Note that you don't need JRiver to use DLNA with Chromecast Audio. I use my CCA devices with two QNAP controllers (Music Station and QMusic), and with BubbleUPNP. I believe there are others too, maybe MConnect and Kazoo?). No voice control though, which I don't care about. 

     

    Whoops!  Chris was having some problems with the formatting of the original document I sent (a conversion from odt to docx).  When I converted it to a pdf for him, I must have converted the wrong draft.  Here's what it should have said:

     

    "If you want to have voice control over JRiver playing through a Google device as a zone, you’ll have to use Alexa. If she’s not sharing a device with the GA, you can link her from an Amazon device or a third party host using an app like Helea Smart."  [Chris, if you can drop this in, it will save others the irritation of the typo.]

     

    I'm sorry if I gave the erroneous impression that you had to use JRMC in order to cast to CCAs in general.  My point was that if you use Google smart speakers but want to have voice control over JRMC playing to them, you have to use Alexa either from an Alexa-enabled device or with a 3rd party integration app. Google speakers will show up as DLNA zones in JRMC if you have BubbleUPnP etc running along with JRMC, so there's some functional integration there among JRMC, Alexa and the GA (albeit crude integration).  But it's not ideal, and it's one reason we chose Amazon / Alexa for our primary smart platform.

    Share this comment


    Link to comment
    Share on other sites

    16 minutes ago, bluesman said:

    [Chris, if you can drop this in, it will save others the irritation of the typo.]

    Ahhhh. Done. 

    Share this comment


    Link to comment
    Share on other sites

    7 minutes ago, The Computer Audiophile said:

    Ahhhh. Done. 

    Thanks!  Sorry about that - I should have caught it.

    Share this comment


    Link to comment
    Share on other sites

    For an audiophile system with voice control, can't you use a Bluesound Node 2i with digital output to an external DAC which feeds your stereo?  You can then use Google Assistant or Alexa to choose a song to play on the Node 2i.  I would assume that Tidal Connect to the Node 2i could be controlled with Google Assistant and Alexa as well.

    Share this comment


    Link to comment
    Share on other sites

    1 hour ago, palpatine242 said:

    For an audiophile system with voice control, can't you use a Bluesound Node 2i with digital output to an external DAC which feeds your stereo?  You can then use Google Assistant or Alexa to choose a song to play on the Node 2i.  I would assume that Tidal Connect to the Node 2i could be controlled with Google Assistant and Alexa as well.

    Yes you can.  There are several streamers like this, but a comparison was far beyond the scope of this article. Yamaha makes 2 models with which I'm familiar, Denon has the Heos system and devices, etc.  Voice control in all of these is limited by what Alexa, GA etc can do.

     

    From the Bluesound website, "...you can use voice commands to play saved playlists, select your favorite radio station, adjust volume levels, or even group Players together".  The Node 2i will respond to Alexa using the Bluesound skill and to Google Assistant using middleware called Blue Voice. As long as you've set up everything correctly from the DAC downstream, you can control the stated functions from the Node - but you can't control any other element in your system.  There are enough downsides to this to deter me from using it.

     

    For example, your power amp gain control has to be set high enough to encompass the loudest playback you'll ever use, if Alexa's controlling your system volume at the streamer.  This leaves your speakers vulnerable to any transients generated in your front end but not attenuated by the variable gain stage, e.g. at turn-on and turn-off or when switching sources.  If you have an uncontrollable pop with any function, your voice controller can't cut the gain before executing it.

     

    The BS Node is marketed to "...instantly [breathe] new life into your decades-old stereo equipment" by adding network and web streaming sources. The only analog inputs I see are in the combo 3.5mm optical / line level jack, which must be how one connects a turntable or CD player.  I can't tell if you can stream the output over the LAN or WLAN (I assume not) but the 2 way BT should let you drive BT speakers with any source.  The USB input is only for storage devices - you can't connect a USB turntable or other real time USB source.  There are many limitations to this approach to voice control, although it does work within the limits of system and technological constraints.  Stay tuned - it will get better!

    Share this comment


    Link to comment
    Share on other sites

    Excellent article, bluesman.  Voice control in audio for the general public will probably have a better chance at success then for audiophiles.  We're just to dam picky. Sure I'd like to say "play music," then the stereo turns on and starts playing music were it left off.  I can currently do that buy pressing play from my PC, Ipad or iphone using KEF's LS50Wireless speakers via Roon.  No power on or off needed for those.  But when it comes to the nitty gritty stuff, I want to search through Roon to find what I want to listen to at that moment.  Sure I could say "play such and such," but as an audiophile, I'll want a specific version of a song, in a certain format and then that will dictate what I want to hear next.  So for simple tasks maybe it'll be ok, but for me I'd rather doing things from my ipad or PC.  Status updates or changes to your system would deiffenitly be a cool feature though.

     

    Using Comcast's remote (or voice control for any TV/Cable/Video in general) makes more since since your watching something that might take from 30 minutes to 3 hours to watch.  Pick up the remote say what you want and sit back.  It's still faster for me to punch in a 3 digit channel, hit enter (ok button) then it's done, then it is to bring the remote to your mouth, press the voice control button, say what you want to watch (about 70% correct for me) and wait for the channel to change.  

     

    The other thing for me, it would seem weird to say things out loud while listening to music.  If I'm rocking out, I'd have to turn down the volume or yell into the room to change the song being played or add something to the queue.  If I'm with guests, the last thing I wan to do is tell them what their going to hear next by saying it.  Some of the nostalgia of listening to music with my friends is playing something that will surprise them, and see there reaction. 

     

    Just my 2 cents,

    Shawn

    Share this comment


    Link to comment
    Share on other sites

    A great article.  

     

    I have voice control in my car and on my  Xfinity remote control.  Seldom use the features.  I was going to buy a faraday (sp?) envelope for the remote control.  But since the TV knows more about you than the IRS, I decided it was not worth the expense. 

     

    I never have understood why people would put active espionage equipment in their home.  Alexa, I am looking at you!

     

    I have tape over my computer camera.  I never use location services unless I am lost.  I have every possible option turned off on both phone and camera.  No Facebook (pure evil), no Twitter, no social media at all.  BTW, I do not consider Audiophile Style to be social media.  Audiophile Style in an information source and you are all my friends, right?

     

    I know all of the above is worthless, but if I can irritate some random data collection AI somewhere for at least 3 nano seconds it is worth it.  Oh, and I log into Rolls Royce.com at least 3 times a day just to screw up the ad tracker on my browser......

     

    Regards.  George Orwell

     

     

    Share this comment


    Link to comment
    Share on other sites

    5 hours ago, NOMBEDES said:

    A great article.  

     

    I have voice control in my car and on my  Xfinity remote control.  Seldom use the features.  I was going to buy a faraday (sp?) envelope for the remote control.  But since the TV knows more about you than the IRS, I decided it was not worth the expense. 

     

    I never have understood why people would put active espionage equipment in their home.  Alexa, I am looking at you!

     

    I have tape over my computer camera.  I never use location services unless I am lost.  I have every possible option turned off on both phone and camera.  No Facebook (pure evil), no Twitter, no social media at all.  BTW, I do not consider Audiophile Style to be social media.  Audiophile Style in an information source and you are all my friends, right?

     

    I know all of the above is worthless, but if I can irritate some random data collection AI somewhere for at least 3 nano seconds it is worth it.  Oh, and I log into Rolls Royce.com at least 3 times a day just to screw up the ad tracker on my browser......

     

    Regards.  George Orwell

     

    Great post. 🙂

     

    I'm also not interested in voice control. My Mac Mini doesn't have a microphone and that is the only thing I have connected to the internet.

     

    I also don't use any social media, no facebook and no twitter, etc.

     

    My HDTV is not connected to the internet, I use an indoor antenna which picks up 33 stations in my city.

     

    My only phone is a corded landline.

     

    I only use prepaid non-reloadable debit cards to purchase stuff on the internet.

     

    When my 8-year old computer dies, I'm not replacing it and I will go back to checking emails and Audiophile Style from the library once a week. The library offers free use for 30 minutes. At that point I will have to go back purchasing everything in person or old fashion mail order. At that point my apartment will not be connected to anything on-line or in the cloud.

    Share this comment


    Link to comment
    Share on other sites

    Personally, I use voice control all of the time.  I focus on using 'Lady A' (a.k.a., Alexa) for simple on/off tasks.  I found that using the Logitech Harmony remote/hub was the best method to control multiple devices.  I use the simplest version; the 'Harmony Companion.'  If the device or component has a remote, then Harmony can control it.  And, there is a Harmony skill for Alexa.  So, turn on TV, mute TV, etc. are natural commands.  Likewise, in the audio room, mute music is a fantastic voice command.  And, if you have smart lights (I use Philips Hue) then turning the lights on/off or just right for music is (again) fantastic.

     

    Personally, I do not need or want to control every button, knob or setting through voice control; unlikely that I would ever remember all the commands.  Then I would need the Cliff Note for voice control :-P   

    Share this comment


    Link to comment
    Share on other sites

    All I wish is for Siri to work with Roon. Though I have examples of Alexa and Amazon's assistant (gifts), I never plug them in because I don't want them listening to my life 24/7.   Google, Facecrook are nothing but data rapers I care not to align with them.   I have been an Apple customer since the first iPhone...I think Apple is the least intrusive into my life via Siri.

     

    I user an iPad for Roon Control, and it works seamlessly...I'm already spoiled ...Roon with Voice control would just be icing on the music listening cake! 

    Share this comment


    Link to comment
    Share on other sites

    33 minutes ago, LarryMagoo said:

    Roon with Voice control would just be icing on the music listening cake

    The new crop of VC/AI programs will almost certainly be able to control Roon.  What's needed is the ability to send the same call to the processors in response to a voice command that's generated by clicking on a Roon icon. It's not as simple as that sounds, but it's doable today.  I've been playing with this using Braina, but I've not yet succeeded.

     

    Siri is almost certainly capable of doing this now - there are many custom business applications out there resulting from licensed use of Siri technology.  But I suspect that Apple's not about to support any platforms at their own expense that don't augment their revenue stream beyond their costs.

    Share this comment


    Link to comment
    Share on other sites

    I am definitely not saying it's simple....I don't even know how when I have my iPhone and my iPad sitting closely, when I ask Siri a question how to "they" decide which device answers me..??   I would be happy to pay for such a feature to pair with Roon!   It works great when in my car or walking the dog with headphones....(I'm wearing the 'phones not the dog! 🤣)

    Share this comment


    Link to comment
    Share on other sites




    Create an account or sign in to comment

    You need to be a member in order to leave a comment

    Create an account

    Sign up for a new account in our community. It's easy!

    Register a new account

    Sign in

    Already have an account? Sign in here.

    Sign In Now




×
×
  • Create New...