Whose voice is it anyway?
Some years ago, I achieved the distinction of (probably) being the only evangelist to start a conference talk with: “last weekend, I farted”.
Taking a slightly dramatic pause for breath after that statement, giving the audience time to ponder what was coming next, I revealed that not only had a bodily function taken place; somehow a technical function did too — my computer fired up iTunes and started playing The Eagles.
It’s a lovely notion to think that my subconscious was passing judgement on “Hotel California” but in truth, what took place was a classic “false positive”. Violently false as it happens, but let’s not go into that any further.
Farting, however, has become something of a theme when it comes to thinking about voice interaction. Tweets like this one are surprisingly common:
We could pick into the bones of that phrase and revisit an old annoyance of mine, the notion that people buy “an Alexa” rather than the device which houses her — anything from an Echo to a refrigerator . There is a whole separate article later on the notion of assistants taking on a physical presence.
In any early-adopter tech, there is a level of tolerance as things come together, and this kind of issue is going to happen. Early adopters are forgiving, but they are also finite.
Once less tech-savvy users get this sort of thing in their head, they start to think the assistant has its personality — especially in the hype-driven world of “AI” — and maybe even a propensity for toilet humour; and of course, the inevitable thing happens.
“Alexa, can you fart?”
In the UK at least, when you utter this phrase, you are met with the offer of a particular Alexa Skill which cuts the mustard (and the cheese), and thus you are asked to enable it. Once that initial cycle is done, you meet with a random fart noise.
However, that’s where the fun begins. Let’s say you have friends round (and anecdotally, I know many people who have done this, including a friend’s elderly neighbour who wasn’t remotely amused) — the obvious next step is repeating the phrase, and so:
Most people will either be amused or bemused, but either way, they’ll remember. Those with a positive impression are very likely to share the experience, just because they will.
The impression these people are spreading is that Alexa can fart. She is perceived to have both personality and capability; when in fact all that’s happening is that Alexa acts as a developer platform, and allows third parties to deliver this kind of service.
For the sake of discovery, which has long been an issue for such platforms, the Skill has been made a bit easier to find by enabling a shortcut. As mentioned, the first use of the skill (and any use on an Echo Show or Spot device, where the screen comes into play) makes this clear; but the perception remains.
There is a broader point of debate here, around whether ‘assistants’ should be just that, with their capabilities and personalisation; or whether third-party apps should carry the weight. I’ll return to that in a future post.
For today, assistants are somewhere in between, with a mixture of internal capability, third party capability, and some routing between the two ends to make things easier. However, it does lead to the problem of whose voice is active.
Developing my first skill in 2015, working with an experienced voice designer, this was one of the first things we came up against. What was the actual source, nature and context of the conversation? Allow me to illustrate:
“Alexa, open Skyscanner.”
“Welcome to Skyscanner flight search; we can find prices for one-way and return flights between two cities, where do you want to go?”
“We found flights starting at £180; the cheapest is with easyJet”.
The above phrase was our first interpretation, having spent much time in apps and web. “We” being the app developer, despite Alexa reading the answer.
A more direct version of invoking the skill, such as “Alexa, ask Skyscanner for…” would have given a clue that the assistant is asking a trusted friend for the answer, and delivering the response, or perhaps acquiring a skill from Skyscanner in order to answer.
It’s never 100% clear though; Alexa is not responsible for the answer and shouldn’t be. However, let’s say that particular skill is giving a “fart” style shortcut, what does the user perceive? Could it be fixed with deliberate ambiguity such as “there are flight prices…” or more assertively, “Skyscanner has flight prices”? Possibly, but if the assistant had routed you from a direct query and you hadn’t mentioned Skyscanner, would that last answer be a surprise?
For balance this isn’t just about Alexa, where I’ve spent most of my time — it’s a debate for all assistant platforms. Witness this famous example seen in Google Assistant:
The most blatant example of this is a headline which continues to be seen and shared across social media. Note a caveat that the article linked doesn’t say what this tweet claims, but it hasn’t stopped people sharing the claim long afterwards:
Think about that for a moment: “she can now give you financial advice”. She really can’t: a skill exists which (with some setup and permissions) can do something of the sort. However, once more, perception says the assistant is, well, assisting.
There are of course rigorous approval and certification processes for these services to be enabled, and there are T&Cs which cover any reactive confusion. However, at this early adopter stage of assistants, trust and security are of primary importance, and it would be a shame for a false perception to cause problems early on.
Similar problems with mobile apps persist to this day; it’s commonplace for a travel search app like Skyscanner to be blamed, with 1-star review in the app store, for a bumpy flight it ‘recommended’ — never underestimate the power of perception, where 2+2 often equals 5. The more comfortable products are to use, the more they become a compliant tool or a path of least resistance to blame.
So next time you ask your assistant to fart and celebrate the fact: stop and think what’s happening. Think about whose voice (or emission) you’re enabling, and if you’re a skill builder, think about the different perceptions of the content you deliver.
I’ll return to various spin-off subjects arising from this in later blogs.