Jump to content
  • bluesman
    bluesman

    The Value Proposition In Audio: Voice Control For Audiophiles (You Can't Buy Too Soon - You Can Only Sell Too Late)

     

     

    THE VALUE PROPOSITION IN AUDIO:  VOICE CONTROL FOR AUDIOPHILES

    (YOU CAN’T BUY TOO SOON – YOU CAN ONLY SELL TOO LATE)

     

     

    image1.jpg

     

     

     

    I‘M AN OLD DOG – I CAN SPOT AN OLD TRICK A MILE AWAY

     

    For me to write an intelligent and useful article on the subject, I thought it was important to understand the AS community’s thoughts, attitudes, wants, needs, and turn-offs about voice control.  Many of you read (and some responded to) my General Forum post seeking answers to a 5 question poll about voice control for audiophiles.  Well……….I got what I wanted, although it took me a while to reconcile what I learned with what I (and Chris, since he asked specifically for this article) believed.  I actually had to rewrite a large portion of this to put it all in context for me, as well as for you.  

     

    About half of all AS responders said that they have no interest at all in voice control for their audio systems.  

     

    My initial response was pure shock.  But within minutes, I found the perspective from which to make sense of this.  And from that perspective, I’ll draw some conclusions that have immediate importance to the industry and longer term impact on the audiophile community.

     

    SURVEY SAYS……...

     

    TOPIC 1: LEVEL OF INTEREST

     

    orange.png  I definitely want voice control for at least one of my audio systems.

    yellow.png  I'd only consider voice control for my audio system(s) if it does everything I now do manually.

    green.png  I'd use voice control for my audio system(s) if it controls simple basics like program & volume.

    burgundy.png  I have no interest in voice control for my audio system(s).

     

     

    Topic 1.png

     

     

     

     

    TOPIC 2: KNOWLEDGE & EXPERIENCE WITH VC

     

    orange.png  I own and regularly use voice recognition / control / activation in at least one device.

    yellow.png  I own devices with embedded voice recognition / control / activation but rarely use the feature.

    green.png  I've tried voice recognition/control/activation in the past, but it didn't work well enough for me.

    burgundy.png  I've never used voice recognition / control / activation.

     

     

    Topic 2.png

     

     

     

     

     

    TOPIC 3: HOW CURRENT IS YOUR KNOWLEDGE AND EXPERIENCE WITH VC

     

    orange.png  I use current voice actuated software in at least one device.

    yellow.png  I've only used older apps/devices, eg early smart home devices, Dragon Naturally Speaking, etc.

    green.png  I've never owned or regularly used any voice actuated devices or software.

     

     

    Topic 3.png

     

     

     

     

     

    TOPIC 4: IF YOU’D CONSIDER VOICE CONTROL, HOW WOULD YOU PREFER TO USE IT?

     

    orange.png  In a separate control device, e.g. hand held remote control

    yellow.png  As an app for my mobile device(s)

    green.png  As software embedded in players, control points etc

    burgundy.png  Embedded with other front end controls in my electronics (preamp, integrated amp, DAC etc)

    lightblue.png  Embedded in smart speakers

     

     

    Topic 4.png

     

     

     

     

    TOPIC 5:  DEAL BREAKERS FOR ME

     

    orange.png  It has to be a comprehensive control center, 99+% accurate, and consistently reliable.

    yellow.png  It has to come preinstalled in hardware and/or software.

    green.png  It has to be easily upgradeable, regardless of the form or device in which it comes to me.

    burgundy.png  Others in my home or family have to be able to use it with little or no instruction.

     

     

    Topic 5.png

     

     

     

     

    EXECUTIVE SUMMARY AND A PRÉCIS OF THIS ARTICLE’S ORGANIZATION

     

    For me, the easiest way to learn about and understand the current spectrum of voice control for audiophiles is to start by categorizing what’s now available, from simple to complex, before moving to what’s coming and what’s possible beyond that.  The simplest place to start is with those virtual assistants already available and in use by at least half of us for something (based on the poll I ran a few weeks ago).  Then we’ll move on to currently available audio devices with voice recognition and control built in, before advancing to the standalone voice control apps out there for adaptation to audiophile use. For a peek into the future, check out josh – there’s some serious effort in high end VC.

     

    Amazingly enough, there are well over 20 virtual / digital assistants out there today, some of which are welcome in my home and happily living there today.  As with any other stranger in our midst, security is critical for self-protection and privacy – but this is not as big a problem as some would have you believe (more later in the text).  For starters, here’s a table of the top virtual assistants in use and under development today, with a summary of what they can do right now:

     

    executive summary.jpg

     

     

     

    Several of these are already in widespread active use, but ongoing development is actively pushing the envelope around their utility, practicality, and potential.  The combination of voice recognition / synthesis / understanding with artificial intelligence may well be the most exciting direction for consumer products since the industrial revolution.  The above table documents the tip of the iceberg of startups joining with long standing research and development programs of the world’s leading enterprises.  Next we’ll discuss how the above systems can be acquired, adopted, or otherwise brought into our lives today, before an overview of what an audiophile can do with them and why we should all become more familiar with voice-based AI while it’s still simple to grasp and adopt.

     

    The world of virtual assistants (along with the number of them) is growing rapidly, and those provided by the major houses (Amazon, Apple, Google, and Microsoft) already offer basic control over music playback.  BY themselves, most offer almost no direct control over audio devices other than those in which they’re embedded.  They give you access to music on their corporate servers if you subscribe to one of their services, but they offer limited management of your own files through front end controls, e.g. volume, source material, play mode.  You can stream from the usual web sources if you have an account, e.g. Tidal, Spotify, Amazon Music, Apple Music etc.

     

     

    SOME DECENT AUDIO WITH VOICE CONTROL THAT YOU CAN USE RIGHT NOW

     

    For the impatient, here’s a summary table to point you toward goal directed devices and systems you can buy and use right now.  At present, I know of no voice control system, AI application, and/or dedicated device that will (or can be adapted to) replace all the knobs, switches and icons we now use to control our audio systems.  Voice controlled power switching is easy to achieve, and there are decent receivers on the market today that have some level of voice control built in with one of the name “digital assistants”.  But comprehensive voice control of an existing audiophile system is simply not yet available.

     

    There are a few immediately available options for the adventurous that will let you experience voice control over your audio sources.  For example, Braina is a VC/AI app and system for PC that can control the VLC media player right now and will give you program and playback control over several streaming sources with a little setup.  It’s free, and it truly defines the term “work in progress” – it’s crude, rough and inconsistent.  But it does enough well enough to let you feel how powerful VC will be when ready for prime time.  Be aware that Braina sometimes does some weird things – just laugh!

     

    The most practical voice controlled options for audio right now are in the smart device category, most of which are powered by Alexa, Google Assistant, or Siri.  Google devices will play 24/96 FLACs, and their high end devices have embedded Chromecast so they can be used as DLNA zones in Jriver. If you want to have voice control over JRiver playing through a Google device as a zone, you’ll have to use Alexa. If she’s not sharing a device with the GA, you can link her from an Amazon device or a third party host using an app like Helea Smart.

     

    The entry level devices are all small, relatively low powered, and a bit below the range of SQ that can be considered good by any stretch of the imagination.  But even the better low end pieces (e.g. the latest Echo Dot) are impressive in stereo pairs with a bit of EQ applied through the Alexa app.  And the high end from Amazon, Google, and Apple contain some decent active speaker systems with impressive parts, useful DSP, and sound quality that’s good enough to keep many audiophiles happy for noncritical listening (especially to streamed mid-res files of music you want to audition before buying, or can’t buy at all).

     

    You can buy multiple Echo devices right now, set them up on your WLAN, and immediately stream standard res music through them all by telling Alexa what you want to hear and from which speaker(s).  The same applies to Google devices, except for your choice of streaming sources – there are several alternatives through the Google Assistant, including YouTubeMusic, Spotify, and a host of others. You can also do this just as easily with the Apple units but you can only play music from Apple Music or iTunes (if you still have that set up on a networked device).

     

    You can read more about all this further on.  But for the impatient, you can immediately put a voice controlled system together quickly and easily, synch many stereo pairs in many rooms without audible delay or other ill effect, and enjoy casual listening that’s not too shabby.   Here’s a quick guide to getting started with VC in several easy ways:

     

    devices.jpg

     

     

     

    image15.jpeg

     

     

    Some things of which you should be aware (details later in the text):

     

    • You cannot control a serious audio system today using only your voice 
      • You can control selected functions of some equipment right now
      • You can use development platforms to create new functions if you’re up to it
      • You probably won’t be waiting long for voice control to advance enough to please you
    • You can use your voice to select & play music right now with comprehensive control
      • You’ll have to do it with currently available smart speakers and a streaming source
      • Alexa, Siri, Google Assistant et al will select your choice if it’s available in the repository used by your choice of assistant (e.g. Amazon Music for Alexa) and control basic playback functions
      • It’s easy and works great 
      • With a little effort, e.g. activating a skill for Alexa, you can control programs like JRiver Media Center with your voice and play any file or source accessible to you thorugh it
        • You cannot integrate JRMC control with other audio functions, e.g. synch playback among multiple devices with JRMC
    • There are many decent smart speakers from which to choose, but so far none reaches the performance level of a true audiophile product (except perhaps a very few like the $3000 B&O A9)
      • These are appliances, not audio equipment – reasonable expectations avoid disappointment
        • Even current entry level smart speakers like the Echo Dot and the little Google thingies sound half decent.  A pair of $20 Echo Dots (3rd gen, now on sale cheap because the 4th gen just came out) will play most music well and loudly enough not to offend
          • Even entry level smart devices from Amazon, Google, and Apple can now be set up as stereo pairs
        • The latest speakers from Amazon, Apple, etc are surprisingly good and easy to use
          • Technophobes and non-audiophiles in your life will love you for getting them a pair if they have any desire at all to listen to music but can’t / won’t use your system
          • If you’re already a subscriber (e.g. Amazon Prime, Apple Music), you only have to ask for the music you want and  (if it’s available from your plan) and it’ll play 
          • You can ask by composer, performer, genre, track or album title, etc – it’s easy
      • No matter which smart devices you buy, the assistant within is identical for all.
        • The $3000 B&O may sound much better than the $29 Google Nest.  But the Google Assistant within them is one and the same.
    • There’s a lot of very good HT audio equipment with serious voice control embedded or through one (or more) of the major assistants
    • There’s a fledgling Internet of Audio Things (IoAuT) that will probably give birth to a new generation of voice controlled audiophile equipment very soon
    • There’s a young industry combining voice recognition and control with artificial intelligence that has the potential to replace your knobs and buttons with your voice 
      • You can play with this today by downloading one of the development platforms and setting it up on your computer 
        • For example, check out Braina for a peek into the gestation of trhe revolution
          • This AI/VC software can control VLC player on your PC out of the box
          • Yes, it’s crude and no, it’s not even close to being sufficiently accurate and consistent for routine use

    But……..the combo of AI and VC is exciting and promising – don’t sell it short!

     

     

    WHAT GOES AROUND COMES AROUND

     

    ...and that’s the real message in this work.  Think of all the things that were soundly dismissed when introduced but are now taken for granted – things we can’t imagine living without.  Among the many staples of modern life are initial failures like Nintendo, Wheaties, and Dyson vacuum cleaners.  Apple was a hair’s breadth from bankruptcy in 1997.  Walt Disney’s first animation studio went bankrupt within 2 years.  Milton Hershey couldn’t give away his candy when he started his first two companies – they both failed.  Only on the third try did the Hershey Bar make it out of the starting gate.

     

    Dr Seuss’s first book was rejected by 27 publishers before finding one willing to take a chance on it.  Van Gogh sold one painting in his lifetime.  When Rovio started offering mobile games in 2003, they couldn’t generate enough interest to raise an eyebrow let alone an investor.  Fast forward 6 years to their introduction of Angry Birds, by which time there was enough demand to yank them off the starting block and into the winner’s circle.

     

    How could such great ideas come so close to failure?  For many, it was a technical hurdle that hadn’t yet been overcome.  Three critical breakthroughs that pushed mobile device games over the hump at the dawn of the 21st century were the Wireless Application Protocol (WAP) that enabled mobile phones to connect to the internet, mobile Java (2002), and color phone displays (2003).  Before this, mobile gaming was crude and slow even on devices at the state of the art.  Until a physical product and the software that controls it reach maturity and can fulfill a serious chunk of the product’s promise, it’ll only sell to the early adapters.  For others, it’s just a matter of preconceived notions and fickle consumers.  Many products and concepts that eventually succeed start off with nothing but the lonely enthusiasm of their inventors behind them amid a loud chorus of “Why would anybody want that?” - until someone who matters gives it a try and discovers greatness within it.

     

    The best advice I got in business school was to underpromise and overdeliver.  The inability to subdue enthusiasm often results in the introduction of a new concept or product as a done deal when much of the inventor’s vision and promise has yet to be realized.  It only takes one big disappointment to sour potential buyers on a second chance.  Many of us started early and learned quickly that voice recognition software was far from user friendly at its inception.  I was an early user of Naturally Speaking (1998), adopting it only when it improved from the initial product’s requirement that you enunciated each and every word separately (a limitation with which I couldn’t live).  I used it to dictate office notes after patient visits, and it was about as accurate as my transcription service had been, and transcription inaccuracy was the reason I went to voice recognition software.  But I had to proof every page just as I did for transcribed correspondence.

     

    Even today, VR software is less accurate than I’d like.  We have voice control in our Xfinity cable box remotes that works pretty well for me.  I get about 90+% of what I ask for, but my wife gets either the wrong channel or a prompt to repeat her request at least 25% of the time.  It’s just not as good as it needs to be.  On the other hand, Alexa turns on my espresso machine and audio systems, plays music from my collection over Amazon devices, controls JRiver on my computers, sets wake-up alarms, reminds us of tasks and appointments, tells us the weather forecast and current temperature outside, and does it all with 95-98% accuracy.   

     

    So the promise of voice control is great for audiophiles, and I believe it will gain wide acceptance as it approaches 99+% accuracy and shows up in more devices and programs. Based on progress to date, I expect VC with AI to reach the level of function, reliability and ease that we all expect from our audio equipment.  I like it a lot already and look forward to the future.  Let’s get to it!

     

     

    THE ELEPHANT IN THE ROOM:  ACCURACY

     

    There are a few well done tests out there comparing the accuracy of both recognition and response among Alexa, the Google Assistant, and Siri.  This one from Loup Ventures is a good example and very interesting.  The results are helpful in determining how accurate and useful each can be for audiophile use right now.  Each was asked 800 questions.  Google won with 100% understanding and 93% correct answers.  Siri was 2nd (99.8% / 83%) and Alexa  was 3rd (99.9% / 79.8%).  The same study was done a year before, and the sequential results show some limits and some progress: the order was the same a year ago but the results were 86%, 79%, and 61% correct responses.  

     

    Google seems to have nailed it in the second round with a 16% improvement that put it within the confidence interval of perfection.  But Alexa could only do the right thing about 80% of the time even after a 33% improvement, and Siri only beat out Alexa by 3% after a year of development that resulted in a fairly weak 5% increase in answering correctly.

     

    At present, the Google Assistant is the smartest, with Siri & Alexa far enough behind to matter.

     

     

    FROM REX TO ALEXA – A BRIEF HISTORY OF VOICE RECOGNITION / CONTROL

     

    The first voice activated device to be historically documented is Radio Rex from 1911.

     

    image16.jpegRex would (sometimes) leap out of his dog house when called by name.  He was held in place by an electromagnet whose energizing circuit was tuned to a resonance of about 500 Hz.  When the right voices said “Rex” loudly enough  (or any other sound source with enough energy in the 500 Hz range went off), the spectral content in that range would somehow interrupt the power to the magnet.  When the magnet’s power is interrupted, Rex is pushed out of his house by a spring.  I can’t find out how this works, but it suggests an early “Clapper”.  And, like the Clapper, it’s activated by sound and not specifically a voice - it reacts to any sounds within its sensitivity range. As both a pet and a device, Rex was inconsistent and unreliable. But he was the first – and his weaknesses set the bar much higher for market acceptance of subsequent voice activated devices.  This is a lesson we’re learning again!

     

    The next voice activated toys to hit the market with any success at all were Jill and Julie (late 1950s).

     

    image17.jpegimage18.jpeg

    These lovely ladies came from TI and had both speech recognition and voices of their own.  Julie (right) was about 3 feet tall and responded to words like pretend, hungry, yes and no.  Sadly, neither would put your vinyl on the turntable or dial up an FM station, but it’s not a stretch to call them the founding mothers of Alexa, Siri et al.

     

     

     

     

    FROM TOY TO TEMPTRESS

     

    The lovely Audrey was born in 1952 to her proud Bell Labs parents (whose names were Davis, Biddulph, and Balashek for those who care).  She was far from the fastest chip on the board, but she had a great personality!  Audrey was the sultry seductress who spawned generations of progressively smarter progeny, some of whom live and work among us today.  We have Audrey to thank for our friends Alexa, Bixby, Siri, Cortana, and the poor little Google Assistant who never got a name.

     

    Audrey was the first documented system that could recognize human speech.  She was a bit of an idiot savant – her only skill was the ability to recognize spoken digits with 97-99% accuracy if spoken to her by a voice on which she’d been trained.  She could have been useful in telecommunications, e.g. as a voice activated interface for long distance dialing or in a high end telephone (one of the reasons for her creation).  But she died alone because she was a high maintenance woman and a very expensive date.

     

    Audrey filled a full height 19” rack and sucked power like mint juleps at the Kentucky Derby.  She was a little slow as a child, and she never realized her potential.  Even rotary dialing was as fast as Audrey, and she was absolutely no match for the touch tone system invented when she was only a toddler.  By 1958, touch tone phones were in active development and Audrey was obsolete.  Then John Karlin (a psychologist at Bell Labs) drove the last nail into her coffin when he invented the keypad we now use for telephony and a million other things.

     

    When Audrey was 10, IBM debuted the Shoebox at the Seattle World’s Fair.  This device could recognize 16 English words and the numbers 0 to 9.  But it wasn’t until the early 1970s that the next generation of voice recognition technology was born.  DARPA funded research at Carnegie Mellon that bore fruit in the form of a device called Harpy, which (who?) had could recognize the vocabulary of the average 3 year old.  Harpy proved that there was a “finite state-network of possible sentences” that was the key to better identification and accuracy.  At the same time, Bell Labs made advances that enabled software to recognize and interpret multiple voices.  They threw early AI into the mix and created the foundations of today’s voice recognition and activation software – and the race is on!

     

     

    LET’S LOOK BEHIND DOOR #1 INTO THE ASSISTANTS’ LOUNGE 

     

    image20.jpeg image19.jpeg image21.jpeg  

     

     

    The first route to voice control of music playback is through the “digital assistants” already living in your smart home devices.  There are only 3 teams in the major leagues right now: Amazon, Apple, and Google.  Samsung’s a comer, and they’re pursuing the market hard - Bixby has about 2 years under his belt, but he’s still a long way from the playoffs for audio.  There’s now a Samsung Bixby Marketplace from which to explore and download Bixby Capsules, which are Samsung’s equivalent to Alexa’s Skills.

     

    Microsoft has Cortana, but she’s never been much of a help around the house (maybe because MS never put much effort into her development).  And they recently announced that they’re scaling back mobile and home uses in favor of integration with Microsoft 365 products.  They’re sequentially pulling the plug on all Cortana skills and apps over the next year, although she’ll apparently continue living in MS PCs and helping us use Outlook for the forseeable future.  Continued integration with Surface buds and ‘phones is projected, although a reason for maintaining this is not obvious to me.  

     

    Cortana will no longer live in HK’s Invoke (the MS answer to the Amazon Echo, the Google Nest, and Apple’s HomePod) - as of now, HK is planning to send a $50 voucher to every owner because their smart speakers will be rendered deaf and dumb when Cortana moves out and turns smart speakers into dumb ones! Interestingly enough, MS has started offering development tools for creating Alexa skills with their Azure bot framework.  And it appears that further collaboration with Amazon is ongoing to advance Alexa – so MS may not be out of the game, but they’re changed role from team owner to trainer.

     

     

    ALEXA AND SIRI HAVE THEIR HEADS IN THE CLOUDS

     

    image22.jpeg

     

    The deus is not in the machina.  In this version of Oz, the nerdy wizard is a series of algorithms and AI living in millions of lines of code in the server grid of the Amazon Web Services cloud.  Pictured above is probably one of many server farms around the world comprising AWS, which represents a huge chunk of the world’s business computing power and storage. For security, their locations are kept as far under the radar as possible.  It’s known that the first location was in northern Virginia, and hundreds of investigative reporters have scoured the media for clues to exact locations since that one opened about 15 years ago.

     

    The relevance of the last paragraph to voice control for audiophiles is strong.  Your voice and everything else heard by the microphones in your devices will go to the cloud and back, traversing an unknown number of servers and storage devices along the way – voice responsive assistants can do nothing on their own.  Alexa, Siri, et al require internet access even to turn on a light.  More complex tasks like making JRiver play a specific tune from your library can hurl a lot of data back and forth across the ether, using bandwidth and leaving tracks.

     

    When I ask Alexa to play music by Bill Evans using JRiver, I’m actually asking an AWS server or three to recognize, understand, and act on my request.  The involved communication channels resemble a neural network - my input travels to the cloud over the internet as the afferent signal (going to the “brain”),  The response triggers that are generated by a pretty sophisticated system of AI, predictive analytics, etc return as efferent signals that will be processed by my own system(s) into actions.  

     

    Everything heard by every voice-driven virtual assistant is archived in cloud servers, as are the responses.  So security is obviously a major issue, for which there are many good protective solutions that only work if you use them.  But by reading this far, you’ve probably figured out that every microphone in every device you own can be used to listen in on you, even without your knowledge or authorization.  Caution is essential, but risk is low if you do the right things. We’ll get to how this affects use of a true home hub later.  But here’s a hint: your nagging suspicion is correct that a WiFi-activated door lock or a security system on your WLAN could easily be hacked if you don’t make every effort to secure everything.

     

     

    THE FORECAST CALLS FOR CLOUDS

     

    Alexa has absolutely no idea what JRiver is or does.  She needs an assist from a piece of third party software called a “skill”, a chunk of code that we used to call middleware back when programmers were skinny and computers were fat.  Other such systems use their own versions of middleware to interact with networked devices.  Some systems are part of the IoT, which requires a “hub” connected to a WAN for integration of users’ devices and LANs with the cloud-based computers that make them do what they do.  Some operate directly within a LAN or from point to point, and others connect devices directly to the cloud via the internet.  The overall architecture of a given system 

     

    Fortunately, the community of developers of Alexa skills is huge.  There are already well over 100,000  skills, each vetted by Amazon and available through the Alexa app.  Admittedly, many of them do some pretty silly stuff, although I suspect that the lovers of what we consider silly stuff think that audio applications are wasting their bandwidth, to which I can only say is à chacun son goût.

     

    In any case, those skills also live on cloud servers.  In the course of turning what you say into what you hear (or see or feel or whatever else you’ve asked your virtual assistant to make happen for you), the data representing your voice must get to the appropriate skill software located somewhere on that long strange trip from your lips to your ears, as guided by the Wizard of Amazon.  In reality, many of the skill sources probably use the AWS cloud too, but it’s hard to know which cloud is which and it’s irrelevant for most consumer applications.  

     

    It should be obvious that there’s a lot of A-D, D-D, and D-A converting going on to let your voice activate an outlet, make your music louder or softer, change the track, etc.  And I’m sure that the methods and equipment chosen by each provider affect the speed, accuracy, and versatility of the “assistant” you choose.  I’ll offer a few brief comparisons among Alexa, Siri, and services that link the two (so you can do things like use Alexa on an Apple Watch and control Samsung devices from an iPhone – I’m doing both).  The intentional thwarting of cross-platform use by direct competitors has real world performance consequences.  You can leap those barriers with an iron will, a creative spirit, and a little patience.  The world of voice control and touchless device-human interaction is in its infancy, and these issues will be overcome.  With closer integration, performance becomes better and better and the entry barriers will fall.  As they do, the attractiveness of voice control for audio will grow and more of us will adopt it over time.

     

    Here’s an example of the effort needed to make Alexa control JRiver.  To interact with JRMC, Alexa uses a skill called House Band from a developer named Philosophical Creations (a source of several skills, only one of which is for audio AFAIK).  And here’s where the primitive nature of voice control for audio starts to appear.  You first have to tell Alexa to use the skill she’ll need to complete your task for you.  You can’t just tell her to play Kind of Blue on JRiver – you have to start by telling her to “launch House Band”.  When she asks what you want to do next, instruct her to “ask House Band to play the album Kind of Blue by Miles Davis”.  The music actually plays in response to this, although for now you can only make JRMC play on the zones it recognizes – and it doesn’t recognize any Amazon devices because none is DLNA compliant.

     

    All that data transmission, activity and dependence on so many remote servers and services combine to make the early adopter curve for voice controlled audio quite steep – so interest has been weak.   This AS thread on Alexa and JRMC was started about 5 years ago by Arkonovs and got one (count ‘em – one!) reply.  Although it’s easier now than it was in Feb 2016 to use voice control, it takes enough effort and commitment to keep many from trying it.  In a true sense, the physical path from your voice to JRiver is analogous to the philosophical and emotional paths to adoption of this technology – it’s a long and winding road with a lot of bumps, and it’s still under construction.  But advances are happening rapidly, so let’s look at the major players to find the best of today’s breed.

     

    DOOR #2: THE NATIVE HABITAT OF THE BEST KNOWN DIGITAL ASSISTANTS

     

    Enter the smart speaker, an elemental concept that combines a microphone, a powered speaker, and a processor in a single device that’s listening for your commands 24/7/365.  Over the last few years, they managed to take the assistant out of the box and stick him or her into a wide variety of devices that includes audio and video equipment.  We’ll get to those itinerants after we discuss the basic boxes in which they were whelped and weaned.

     

    Each assistant from one of the major players has his or her own little home. Alexa lives in the Echo or the Show, Siri lives in iStuff, and Google Assistant lives in Google and Nest hubs.  You have to buy at least one proprietary device to bring Alexa, Siri, Bixby et al home, and you have to have an account with the parent company for your new buddy to function.  Once you’ve bought a ticket to Oz, you can expand your assistant’s reach to other devices, like your PC or mobiles. But if you don’t already have the necessary account and you buy an appliance with an assistant in it, you’ll have to set up an account before you can use the voice assist function.  That means opening an Amazon account for Alexa, an Apple ID for Siri, a Samsung account for Bixby etc. There’s no way around this, even if you buy one tiny speaker for bedside music and wakeup alarms.  If you want into the wacky world of Alexa et al, you have to join their ranks.

     

    Alexa lives in thousands of products now and is also available as an app for your phone, TV etc.  Siri has a smaller but considerable array of homes away from home, starting with every iPhone and iPad in the world running any recent OS from Apple.  Bixby lives in a host of Samsung products from phones and tablets to sound bars and other audio products, and Google Assistant inhabits Nest smart products, Android TVs, and a growing list of other stuff in addition to the ever expanding line of Google-branded products. Some devices now come with 2 assistants, so you can use each for things the other won’t do.

     

    A full review and comparison of the major platforms for audiophile use will be a serious undertaking requiring a fair amount of equipment at considerable expense.  It’s certainly a worthwhile endeavor and I’m working on a plan to bring in enough stuff to do it well - I already have Amazon and Samsung platforms, plus a few Siri-loaded devices. So work is progressing, but it won’t be valuable to audiophiles without inclusion of the best smart speakers and their embedded assistants.  For that, I’m going to need assistance in securing enough units to evaluate and compare.

     

    DOOR #3: HOME THEATER

     

    Along with Alexa, Siri, and Bixby, I have 4 Samsung Smart TVs now, all with embedded media players. With a decent sound system, this is also a fine way to enjoy music – and it’s one of the easiest and most readily available platforms today for voice controlled home audio.  You can read a primer on HT for multichannel audio in my last AS article, “Entering Multichannel at the Ground Floor”.  Just find the header “CONSIDER THE HOME THEATER RECEIVER FOR MULTICHANNEL AUDIO” and start reading my suggestions for a few value-priced receivers with fine audio performance and enough flexibility to make most of us happy for both audio and HT. More and more of these come with an assistant built in.

     

    You can now buy decent AV equipment from Denon, Yamaha, Marantz and others with VC assistants inside.  Some limit you to their own choice, e.g. Alexa.  But others, like the $3300 Marantz 8015 receiver, work with “all the major voice agents” and do pretty much everything the average audiophile could want.  As an example, the 8015 streams major services, plays ALAC, Apple Lossless, DSD, FLAC & WAV, files, has serious DSP, digital inputs (coax x2, optical x2, HDMI x 7), a full complement of line level unbalanced outs, 3 audio zones, and MC capability (11.2) well beyond what most of us would ever use for home audio.  With HEOS wireless, BT, WiFi and ethernet connectivity, such devices are an integral part of your smart home and can be controlled fairly well with voice commands.

     

    Only a few smart TVs include good voice control.  Sony Bravia Smart TVs come with Alexa inside. All LG OLED, Super UHD and 4K UHD TVs with AI ThinQ® come with the Google Assistant embedded.  Amazingly enough,  Amazon and Google get along well enough in LG products to let them control Alexa’s devices as well.  The range of commands and possibilities grows daily, so LG is definitely worth watching closely despite their lack of audiophile electronics. Their TVs are easily integrated with almost anybody’s audio through intelligent controls and connections. 

     

    Cloud integration of diverse sources, processors, delivery systems, and local networks makes seamless use of devices from multiple vendors easier and more effective than ever.  We have to remember that the same Rube Goldberg touch that strings Alexa, House Band, JRiver, and your system devices together for pure audio is still required to add voice control to TV and HT setups whether you do it yourself or buy it that way.  For example, many 2017 and newer Sony 4k HDR TVs running Android OS can be updated to add Alexa, who will then use skills to fulfill your desires and to do anything else Alexa can do with whatever devices and systems you already have.  This is where those 100k+ skills come in, as you can activate any or all of them to let Alexa control your entire home, whether she lives in your TV, your thermostat, your phone, or your car (yes, your car – but auto audio’s another article in the works).

     

     

    image23.jpeg

     

    Any audiophile in the market for a good HT system should consider a voice-responsive device.  These are the easiest and most consistently accurate approach to voice controlled audio available today.  But, as with everything we love, the technology is advancing quickly enough to make obsolescence likely sooner rather than later, especially for those who pursue the state of the art.  SQ is already excellent in the better HT receivers, but voice control is still primitive and will undoubtedly be faster, more accurate, and more versatile every time you turn around for the next few years.

     

     

    NOT YET READY FOR AUDIOPHILES:  SMART HOMES AND HOME HUBS

     

    The concept of a smart home is not new.  With networked devices, you can control everything from entry locks and HVAC to lighting and curtains with your voice.  You can tell your assistant to feed the dog and warm up your espresso machine through your phone while driving home from work.  And the same systems offer varying degrees of control over your audio system.  However, I know of no existing  home hub or other smart home approach that’s capable of controlling all the functions of even the simplest legacy 2 channel stereo system.

     

    Some smart devices can be controlled entirely within your home, but most require a hub with an internet connection because of all the data streams and systems integration needed to execute a simple voice command. The hub can be integral to a single smart device on the LAN, distributed among all devices on the LAN, or embedded in a stand-alone hub device.

     

    The big guys each offer their own home hubs.  In some systems (like Amazon’s), the function of a hub is distributed among one or all devices that carry the digital assistant – no standalone device is needed.  All of the controlled devices are on your WLAN and all communicate with their cloud server(s) through your network router to the internet.  In others, like Samsung’s, you need a dedicated home hub connected to your LAN via ethernet and connected to your devices by WiFi.

     

    Interestingly, the brand new generation of Amazon smart devices includes an embedded Zigbee hub.  Zigbee is an old line (at least in the IoT world) global organization that establishes standards for wireless communication among devices on the IoT and certifies compliance of hard and soft wares for such use.  Members of the Zigbee Alliance include serious players like Apple, Google, Comcast, Lutron, Huawei, Samsung, TI, and many others.  There are about 150 current nonmember participants and about the same number of adopters.  But none of the audio industry seems to be involved, apart from big guys who include audio in their product offerings, e.g. Panasonic and Toshiba.

     

    Google hubs have been sold in both Google and Nest devices, and to tell you the truth I can’t keep track of which is which.  A Google hub is the portal through which the Google Assistant makes stuff happen on your Google-enabled devices.  And, like Alexa, that hub function is distributed across all the devices in which the Google Assistant lives – no standalone device is necessary to perform the hub functions.

     

    Similarly, Apple’s home hub function comes in multiple devices, e.g. Apple TV, iPads, and the Apple Home Pod (which is another smart speaker device similar to the Amazon Echo products).  But unlike Alexa, who demands no specific hub setup beyond using the Alexa app, Siri does require that a hub be set up on one of the above devices.  If you choose to use an iPad as the Apple home hub, you have to leave it at home, powered on and connected to your WLAN 24/7/365.  Then you can use any of the big collection of Apple HomeKit devices from multiple manufacturers in your smart home.

     

    Unfortunately, smart home hubs are not yet up to controlling audio systems directly, completely, and well.  As part of my research for this article, I now have Amazon, Samsung, Google Home, SmartLife, Smart Things, Zigbee, Z-wave, TP-Link, and Kasa hubs set up to control anything in our home that makes a sound, moves, emits energy, or does anything else but sit there.  None of these is worth the cost and effort purely for audiophile use. The biggest contribution to our audio enjoyment from home automation per se is the ability to turn things on and off.  This does not include smart speakers, which we’ll discuss again below as renderers and control points.  In these roles, there’s good reason to consider them.

     

     

    MEDIA HUBS

     

    Although I haven’t bought any, there are standalone hubs that will integrate multiple devices and platforms into voice controlled media playback in your home.  Logitech’s Harmony Hub is one such device, and it’s apparently quite fine – but it’s also expensive enough to keep me from buying one just to evaluate it for this report.  Logitech is a long time player by now, and their products have always been good enough or better when I’ve used them. Their computer peripherals are great value - I use several of their wireless keyboards and mice, and I love their Digital Crayon on my iPad.  So I suspect their hubs are pretty fine.

     

    But using a Logitech hub for voice control of your audio system is another daisy chain of independent systems and functions.  And it’s easy to confuse one of those systems for another, e.g. if you use Alexa to control the hub, is the palette of functions available to you determined by Alexa or by the Harmony Hub?  Their hubs integrate with Alexa and Google Assistant, in that you can ask either assistant to tell the hub what you want done.  But the hub can only do what the hub can do, even if Alexa or GA can do more on other devices with other skills.  

     

    The Harmony Hub can be linked to Alexa using the Harmony skill, but the hub’s dedicated remote control will not work if you set up another device to accept your vocal commands (like an iPhone, an Echo or other smart speaker).  Instead, you have to set up Harmony Express as a video service provider – and if you do that, you can only use one such remote at a time.  Does it all work?  Yes.  Does it work as well as we’d want it to?  No.  It’ll be there some day, but it’s still primitive and I do not recommend this approach for other than an educational experience and an introduction ot what will someday be great.

     

    Another sign of less than seamless integration is the simple disclaimer from Logitech that “[t]he Express Integration skill has commands which may differ from other Harmony skills and does not support the use of friendly Favorite channel names or Alexa Routines”.  In other words, you get a few from column A and a few from column B – but Logitech makes that choice for you.

     

    Like virtually all other products in this arena, everything from corporate relations to device compatibility is constantly changing because business and technology often pursue disparate paths.  Features are introduced, refined, removed, and altered frequently as the industry tries to find out what we want and what we’ll pay for it.  Like most others, the Harmony Hub website clearly describes the fluid state of interactivity with the classic disclaimer:  “Supported devices and brands are subject to change without notice”.  Yes, Virginia, you may wake up tomorrow to find that your beloved AV receiver no longer responds to your trusted assistant – this happened with the Blackberry Assistant, early versions of Google Now, and others.

     

    Then there’s Caavo, a voice controlled home hub that integrates multimedia control of “everything connected to your TV”.  Its voice controlled remote uses their proprietary system, but the box is supposed to work with Alexa and Google Assistant much as the Logitech Harmony Hub does.  This relationship is apparently still in the early courtship stage – one Verge reviewer puts it right out there: “Caavo also has integrations with Alexa and Google Assistant, although these are pretty hit or miss. I was never able to make the Alexa integration work at all, and the Google Assistant integration was so spotty I stopped trying after a while”. ‘nuf said.

     

    The Caavo gets decent reviews for general TV and HT use, but even for plain vanilla use it’s far from refined.  From the Verge review, the Caavo “... isn’t perfect, by any means [but]it’s the first remote I’ve used that even attempts to build a new foundation for how all the stuff connected to your TV should work together, and it’s a no-brainer if you’re juggling between a cable box and streaming devices connected to your TV”.  So, once again, the potential’s there and we’ll probably love the 3rd or 4th generation.  It’s only a matter of time before successful integration of what are now independent functions and products will make today’s voice control seem as primitive as Audrey (remember her?).

     

     

    SMART SPEAKERS

     

     

    image24.jpeg

     

     

    The world of smart speakers / devices and voice control for audiophiles is about to explode over the next few years.  The little ones are getting better with each new version – the current Echo Dot is a better and smarter speaker than the first generation big Echo was, and the Apple HomePod is good enough for many of us to enjoy in secondary settings like offices and background settings right now.  Like humans, each generation of smart speakers is better than its parents. I’ll provide an overview of the most popular devices, systems, and “assistants” a few paragraphs further down, and I’m planning a true comparison of the top ones just as soon as they’re good enough to be taken seriously (which will not be far in the future, if I read the tea leaves right).  But first, let’s get down to brass tacks.

     

    image25.jpegThere are smart speakers at all price points from $30 to over $2k, with decent products available from B&O (for $2250), HK, Bose, Sonos, Sony, Audio Pro, Devialet and others.  Most come with Alexa, Google Assistant, and/or Siri – but most smart speakers aren’t any smarter than the dumb ones when it comes to audio.  They can turn up your lights, lock your doors, start your car, and order dinner – but for audio you’re limited to the same palette of skills regardless of the device in which your assistant lives.  But to be honest, even if the smart speakers from B&O have the IQ of an Echo Dot, they are absolutely gorgeous (see below).  If they sound as fine as they look, I could actually see buying them.

     

    A comparison of smart speakers as audio equipment is beyond the scope of this article, both because it’s a large scale work in itself and because assembling a collection of them requires either a large budget or industry connections, neither of which I have.  I’m working on a way to audition the better ones and hope to be able to do this in the not too distant future.  The $300 Apple unit has the best reviews for SQ among the big 3 (Amazon, Google, and Apple), although it seems like there are new models with expanded quality and capabilities coming out daily.  I haven’t seen any reviews yet of the latest $300+ units.

     

    I’ve heard many of the better devices from multiple makers, although I haven’t heard truly top line pieces like the newest $3k B&O).  Of those I have heard, none will replace either of my main systems.  In fact, none could even replace my desktop studio system (iFi Nano DSD into JBL 305s) for overall balance, presentation, articulation, clarity, and general sound quality.  I could happily live with several of the better smart speakers in multi-room use, for background and casual listening – and, in fact, I do.

     

    When you throw in voice control, which I did about a year ago, it’s easier to enjoy those smart speakers in every room and overlook their sonic shortcomings.  Add the ability to get Alexa to both control JRiver and play to Echo devices via BT, and your music becomes a constant companion responsive to conversational input.  Just say “Alexa, tell HouseBand to play Kind of Blue” and Miles emerges from the speakers in your selected JRMC zone(s).  It really is a wonderful advance in my enjoyment of music at home, and I use it every day while writing, researching, reading etc.  You shoud try it!

     

     

    A FEW WORDS ABOUT SONOS & BOSE

     

    image26.jpegSonos pioneered wireless multi-room audio in  2002, although it was more concept than execution when they started.  The technical limitations were huge, e.g. dial-up AOL was the most popular ISP in the US, WiFi was in its infancy, and there were no WiFi drivers in Linux.  There was no hardware and no software to do what they wanted to do.  But the concept was as simple & brilliant as their first ads for it.

     

    And they created what I believe was the first true multi-room wireless audio network on the market, shipping their first product in 2005 IIRC.  There was growing industry interest in WiFi for home media distribution when they launched Jobs introduced the Airport Extreme (on the blazingly fast 802.11g standard!) about a year before Sonos sold product #1.  But itronically, it was Apple who enabled true mobile control of Sonos by introducing the iPhone and the App Store in 2007.  The Sonos app that soon followed let us use an iPhone for control, and the modern era of multi-room wireless audio began.

     

    Equally ironic is the fact that Amazon then made voice control of Sonos systems easy and practical by giving us Alexa, for whom there is a Sonos skill similar to the HouseBand skill used to control JRMC.  You can buy Sonos products in every guise from a small single smart speaker to sophisticated amplifiers and other electronics to drive your own speakers – and you can control every basic function with your voice.  Here’s a summary of the things Alexa can make Sonos do for you.  And here’s a link to info on the many Sonos devices available today.  I’m told (but haven’t confirmed) that the Google Assistant is now also usable with both.  But as far as I know, there’s still no way to get Google Assistant to control JRMC or other SW media player – so Alexa is the clear choice for them today.

     

    Sonos and Bose are probably the best known and most popular brands of smart speakers and systems from the audio industry (OK – maybe I’m being a bit generous to them both in calling them audio products).  Most audiophiles have heard of Sonos and Bose – and few (at least few that I know) would consider either for anything but background music.  Almost all think it’s too expensive for that use.  After listening to several of them, I don’t disagree about value received (although all of the smart speakers and associated products from the audio industry cost too much in my opinion).  

     

    You do get audio components that look less like audio components when you spring for Bose, Sonos, and others of their ilk – but that’s probably not a big consideration for most audiophiles.  Even if you or your significant other is sensitive to interior décor, the pedestrian Amazon Echo, Google Home, and Apple smart devices are as easy to camouflage as anything I’ve seen from the audio industry.

     

    Sonos equipment sounds better to me than the equivalent Bose models for about the same price.  A Bose 500 smart speaker and a 700 sub will set you back about $1k.  Even their entry level smart speaker is $200, which is coincidentally now the price of the entry level Bose too.  I can find no actual performance spec for either one – amplifier power and performance are a mystery for both, and there are no specs at all on which to base a “paper” evaluation.  From the limited listening comparisons I’ve been able to do in friends’ homes, Sonos is more pleasing to me.  But I haven’t heard either line’s latest and best products.  To be honest, the Amazon Echo Studio and an Echo Sub ($330 list for the pair) sound as good or better to me than any other smart speaker I’ve heard at any price.

     

     

    THE BOTTOM LINE:  WHAT I USE, PLUS A BIT ABOUT VOICE CONTROL OF POWER

     

    I’m a “balanced” early adopter – I want to try everything, but I only buy into new technology when the price no longer outweighs my desire to try it.  And my wife is what I call an adopter of convenience – if something will make her life easier and I can convince her of that fact (with the latter being the bigger hurdle), she’s all in.  The key is that it has to appeal to a complete technophobe who has ten thumbs, deep anxiety about using new things, and strong historical resistance to any ideas generated by me.

     

    The need to please someone else figures strongly in the planning of many audiophiles.  My wife, like the spouses and significant others of so many of us, simply won’t go near most of our equipment because she believes it’s too hard for her to use.  Sometimes she’s even right, although she (and probably most other SOs around her) can do a lot more than she thinks she can or wants to learn. As soon as video projectors became home sized and home priced, I was ready to try one.  Of course, she claimed to prefer a big flat TV to a projected image despite never having seen a home theater of any kind.  I finally dragged her to a showroom while we were out buying something she wanted, and a single glance at the 8’ image on the wall changed her mind.  

     

    Now there’s an Amazon Echo Dot in every room of our home except the bathrooms – and there are stereo pairs in the living room and master bedroom.  She’s thrilled to be able to say “Alexa, play music by Cat Stevens” and “Alexa, make it louder”, and she’s so spoiled by it that she’ll never use hardware controls to listen to music again.   With devices in several rooms, we also discovered (accidentally) that adding a sub in one central location really improves SQ everywhere (since the lows are nondirectional). 

     

    The current crop of virtual assistants is willing to learn, but some are smarter than others when it comes to our audio systems.  We adopted the lovely Alexa when the first Echo Dots came out, and we now have stereo pairs in the living room and our bedroom plus singletons in kitchen and den.  She turns both the living room and den audio systems on and off through their power strips.  This is particularly helpful with the LR system (on a small rack on the floor, under the grand piano against the back wall).  I can’t tell you how wonderful it is to say “Alexa, turn on the stereo” instead of crawling under the piano to turn on the tube electronics – the back of my head has healed completely.  She also controls some home appliances, most importantly my espresso machine. 

     

    A smart outlet does not degrade SQ in any way in my systems to my ears.  I have power cables on the DAC / preamp and the power amp that would choke a horse, and I’ve compared the system with and without the smart outlet often enough to be certain that it’s sonically transparent to me.  I’m using Samsung receptacles because I started my home automation efforts with a Samsung Smart Hub (more about which later).  I’m not sure I’d go Samsung again, but we already had 3 Samsung “smart TVs” (which is something of a misnomer, again to be discussed later) so I figured I’d stick with one system for reliability and simplicity.  It ain’t necessarily so.

     

    As stated earlier, I’ve tried multiple platforms for voice driven home automation and audio control in the course of researching this article.  Alexa, Google Home, Kasa, TP-Link, Samsung and the rest do an equally fine job of power control.  Be aware that most smart outlets and power strips are 15A devices or less.  There are some excellent 20+A units that work fine – just be sure you buy the capacity you need or you’ll be in for a spot of bother.  I can recommend ConnectSense, whose products are well made but a bit pricey ($100 for the in-wall 20A duplex).  

     

    Aeotec offers a full line of smart products based on the Z-Wave system (another hub-based IoT platform like Kasa, TP-Link, Samsung etc).  Their 40 amp smart switch is hard wired between the branch circuit and the device being controlled, but they also offer plug-in smart outlets and a host of Z-Wave system devices.  Aeotec also offers a strong differentiating extra that I consider a sustainable competitive advantage (at least, for now)  with great appeal for audiophiles: the ability to create and host an automation hub entirely within your LAN.   This enhances security and makes a lot of sense to me.  The Aeotec Z-Stick 5+, which is the device that lets you do this, works very well with a Raspberry Pi 4, so I’m exploring design of an audio control system using this combo.

     

    SECURITY CONCERNS WITH VOICE CONTROL, THE IoT, AND WiFi IN GENERAL

     

    From the day I posted my AS survey on audiophile interest in, experience with, and concerns about voice control for our systems, I started getting emails and messages of concern over security.  This is a valid and major issue for anyone whose digital footprint extends beyond the case of the device he or she is using.  Your privacy and security are at risk whenever you’re on the internet (no matter how you connect to it), on a LAN or WLAN, or using Bluetooth or any other form of communication with potential exposure of content and/or access to your devices / network.

     

    Everything on the IoT is potentially vulnerable, as are the LANs and networked devices to which each has access.  Every device with an energy based entrance pathway is potentially vulnerable, be it sound energy, light energy, RF, or even thermal.  Smart door locks, auto systems, and home security systems are regularly hacked and defeated unless protective precautions are taken.  

     

    Yes, Alexa and Siri hear everything you hear.  They also access your email, messages, contacts, and any other information on your devices and networks to which you don’t actively deny them access.  And there have been some serious issues.  So-called smart technology has been oblivious to some serious security threats over the years. Anyone could pick up a locked iPhone 4S, launch Siri by pressing the “home” button, and gain control of the phone through voice-activated commands.  Alexa and Google Assistant have their own issues, as do WiFi, Bluetooth, and every other form of connectivity.  And if there’s a way in, someone will find and exploit it.

     

    You can protect yourself and your devices with a little effort.  The basics are easy to describe but often overlooked, starting with setup.  DO NOT JUST ACCEPT ALL DEFAULTS!  Read everything before clicking anything.  Deny your digital assistant access to anything that won’t help it do what you want it to do.  Email access is unnecessary for digital assistants unless you want to send emails using only voice control. If you have financial information on any networked device, your digital assistant does not need access unless you plan to buy things with your voice (which would be inconsistent with serious concern for security issues).   We don’t want Alexa to do anything for us but control the devices we use regularly – so she can’t, because she has no access to those apps and data sources.

     

    image27.jpegYou can disable the microphones in smart devices, and you can delete all recordings made by them using the parent app.  There are now devices that will physically block or disconnect the microphones on smart speakers, e.g. the Paranoid Home pictured here:  

     

     

    If you give a credit card number over the phone within earshot of a smart device, it will record what you say and send it to the parent servers.  So don’t do stuff like that!  Remember that most digital assistants use your WLAN for all communication, so you also have to secure your networks and other networked devices by doing the following and more:

     

    • maintain all software and firmware with current updates – these include security patches
    • use your router’s firewall; many ISPs include good security – check to see what you have
    • do not use any default names for networks or any other digital entity
      • change every one of them to complex names devoid of names, location etc
      • if possible, change the default administrator account name from “admin” etc
      • change your LAN’s IP range from its default to a common alternative (e.g. 10.0 to 192.168)
        • this can hide your router’s brand and model, making it harder to hack
    • use strong account passwords and change them regularly
    • use multi-factor authentication whenever available
    • encrypt everything you can
    • use strong WiFi security with complex access keys
      • do not broadcast your SSID
      • turn off WPS
    • keep your WiFi router toward the center of your house and away from windows
    • use fixed IP addresses for every device that allows it
    • disable DHCP on your networks, leaving only as many available IP addresses on your subnet as you need for devices that won’t let youdis set a fixed IP
    • disable remote access to your network

     

    As our audio equipment becomes networked on the IoT, it joins cameras, refrigerators, dishwashers, Cadillacs, etc on a web of vulnerability.  If you take great pains to protect yourself, voice control for audio is far less risky than buying on line.  Just be careful.  Here’s some great information on securing your digital self found on the FTC’s website and here’s the FTC website on protecting your IoT devices. ‘nuf said!

     

     

    THE CUTTING EDGE

     

    I love voice control for daily listening control.  My wife and I both use it happily all day every day, although we occasionally still feel a little silly talking to a device.  I’m having great fun trying to develop a system that integrates voice control with artificial intelligence and translating its digital outputs into actions in my own audio systems.  As you might expect from my prior articles, I’m using a Raspberry Pi as a development mule and focusing on controlling a simple music player completely and accurately.  It’s not working yet, but I will prevail!

     

    I also set up my Apple Watch for voice control.  I can now manage JRiver using an app called Voice in a Can.  As you know, these digital assistants can be pretty snooty – Alexa won’t talk to Siri, and the Google Assistant doesn’t even acknowledge either of the ladies in many systems and contexts.  As you might expect, there are now apps that can open communication channels among our faithful digital staff – and Voice in a Can is one of them.  

     

    The easiest way to describe what it does is that (metaphorically speaking) it’s a simultaneous translator between Siri and Alexa.  So I can now do everything by talking to my watch that I can do by addressing Alexa in one of her residences.  Voice in a Can lets Siri tell Alexa to open the Alexa skill named House Band so it can tell JRiver to play whatever I want to hear through the zone of my choice.  FWIW, I can also set my wake-up alarm, start my espresso machine, or turn the stereo systems in the living room and library on and off by asking my watch.  Yes, this is a ridiculously complex chain of events – but it’s a start, and major advances in functionality and simplicity are just around the corner.  I really do love it!

     

    The future is bright for voice control for the audiophile.  It’s only a matter of time before the daisy chain of VC / AI / middleware / actionable output is integrated into simpler and more efficient systems.  I mentioned josh (with a lower case j) early in this piece.  This is a system aimed at the high end market for comprehensive voice control using AI, and it’s probably the tip of that iceberg.  josh seems intent on eventually doing everything you ever want to do with no input other than vocal.  MAVIS (Multimedia Audio Visual Interface System) is another advanced approach to voice control.  Here’s a web page about some of the things you can do with MAVIS right now, and audio editing is among them. You can download a MAVIS app called “Connect the Dots” from the iStore and play with it as an introduction.

     

    Be of good cheer regarding voice control for audiophiles – it’s coming and it’ll be fantastic.  Like the electric car, it has a few limitations that won’t be overcome until technical solutions are found to get around downstream barriers.  But once it matures a bit, I believe it will become integral to good audio equipment.  One reason I say this is that the switches and other hard controls now used in most electronic devices are simply not as solid and reliable as the ones we took for granted in analog equipment.  Bubble switches feel cheap, break often, and look tacky.  Tiny digital displays fail frequently and annoyingly.  Many remote controls for otherwise excellent products are flimsy, tacky, and imprecise.  Being able to converse with a DAC will be far preferable to trying to read the half of its display that still lights up.  

     

    I hope you enjoyed this and found it worthwhile to read.  Stay safe and enjoy every minute of every day!

     

     

     



    User Feedback

    Recommended Comments



    43 minutes ago, LarryMagoo said:

    when I ask Siri a question how to "they" decide which device answers me..??

    That's a great question!  One of these days, I'm going to start a conversation between Alexa and Siri to see where it goes.

    Share this comment


    Link to comment
    Share on other sites

    looking forward to the day it happens. 
     

    As a challenge I previously spent a lot of time trying to get Alexa to play songs on my system. This was more a challenge than creating a way to import full library but managed to successfully do a few albums. 
     

    I used IFTTT, Alexa voice > IFTTT > Setup module to send command to my raspberry PI, (Alexa to IFTTT website back to my home, 2 secs delay), which has home automation software, then from software I was able to setup receive command from IFTTT trigger the following (effectively 1 command to play one song from my NAS to my Streamers. it did work, long process to get just 10 albums, but was my favourite albums at the time. As I said was just something I was wanting to achieve. 
     

    I’ve not used it in years now. 
     

    POST /AVTransport/ctrl HTTP/1.1\x0D\x0AHOST: 192.168.1.190:8080\x0D\x0ASOAPACTION: "urn:schemas-upnp-org:service:AVTransport:1#SetAVTransportURI"\x0D\x0ACONTENT-TYPE: text/xml; charset="utf-8"\x0D\x0AContent-Length: 550\x0D\x0A\x0D\x0A<?xml version="1.0" encoding="utf-8"?>\x0D\x0A<s:Envelope s:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">\x0D\x0A   <s:Body>\x0D\x0A      <u:SetAVTransportURI xmlns:u="urn:schemas-upnp-org:service:AVTransport:1">\x0D\x0A         <InstanceID>0</InstanceID>\x0D\x0A         <CurrentURI>http://192.168.1.100:9000/disk/NON-DLNA/O0$1$8I19248142.aif</CurrentURI>\x0D\x0A         <CurrentURIMetaData />\x0D\x0A      </u:SetAVTransportURI>\x0D\x0A   </s:Body>\x0D\x0A</s:Envelope>

     

     

    Share this comment


    Link to comment
    Share on other sites
    40 minutes ago, ASRMichael said:

    it did work, long process to get just 10 albums, but was my favourite albums at the time

    ...and that's how progress is made.  It's easier now, and before long it'll be the status quo.  Check out the latest AI/VC efforts on line - they're really pretty cool, even though they're still add-ons.  Once some genius figures out how to integrate them into SoCs, we'll be in fat city!

    Share this comment


    Link to comment
    Share on other sites

    Had a quick look at some plug ins which support http commands. The software I use for home automation is called Demopad similar to Creston I suppose. I have Hue lights, heating, AV/Hifi setup. All works nice on iPad. Being able to create your own visual look is pretty cool, & being able to use multiply IF’s is great. Although very time consuming. 
     

    I discussed years ago with Demopad voice control, the fact all home automation can’t trigger http, being able to scan albums covers on the webpage of streamer library would be great. Visual recognition of album covers. Then for each visual taken (album cover) it would assign a tag. As soon as you can the tag it’s a breeze to control via voice control app, via siri. What we need is AI to recognise artist art cover. 
     

    As I said It was many years ago I was doing this, probably 6-7 years ago. 
     

    It’s just a hobby I’ve learnt over time with help from internet with regards understanding protocols & having API’s available. Any let’s not forget the good old HEX for remote control. 
     

    AI is moving so fast at the moment, currently working with company on a vision system for our food manufacturing factory. Effectively the vision system company is using AI to program the vision system for thousands of variables. They said it’s not perfect yet, cuts down time by 90% currently. With it moving so fast cannot think what the future holds...

     

    cheers

    Share this comment


    Link to comment
    Share on other sites



    Create an account or sign in to comment

    You need to be a member in order to leave a comment

    Create an account

    Sign up for a new account in our community. It's easy!

    Register a new account

    Sign in

    Already have an account? Sign in here.

    Sign In Now



×
×
  • Create New...