With Google Home now available in Canada and receiving more useful features and Amazon’s Alexa expected to be released in the near future, many companies find themselves pondering how to have a totally different kind of conversation with their customers.

A conversation that is can be started at any time, is automated, and coordinates with an AI assistant to push relevant data out to an endpoint device in real-time. In most cases, it’s unsupported by any visual content and unprompted by a push notification. Sound like a challenge?

Well, you’re not alone. Amazon’s Alexa service may not have officially launched in Canada, and Google’s third-party Actions aren’t available yet, but that hasn’t stopped developers from perfecting their approach to building voice-first interactions. Toronto-based TribalScale and Calgary-based Splice Software serve as examples of developer shops that have already coordinated with U.S. clients to build Alexa Skills. TribalScale is on Amazon’s list of Alex Skill builders and Splice’s Dialog Suite cloud platform allows clients to build interactive voice experiences across a range of devices.

ITBusiness.ca spoke with Tara Kelly, CEO of Splice, and Josh Wilks, vice-president of engineering at TribalScale about their best practices for building voice-first experiences.

Do the backend infrastructure work

Just like businesses that wanted to serve their customers with mobile apps had to rejig their infrastructure from a desktop-oriented stance, voice-first applications will require some work on the backend. Unlike a mobile app that would have some client-based component, voice-first applications reside entirely in the cloud.

At TribalScale, Wilks says this is often solved by gathering information in an asynchronous manner and bringing it to a single end-point portal. Using AWS Lamba to host the cloud-based Skills, network calls are made to collect the necessary information and that is cached in the application for the next time a user requests it.

“With Amazon Alexa, because you’re talking about a lot of data points, they’re usually from different end points on a server,” he says. “Any latency is bad because the user has to watch the Alexa light spin while that’s happening, so we bring data from multiple endpoints into a single endpoint.”

Delivering data in this sort of real-time and on-demand way is an exciting notion, says Kelly. “This is going to drive towards better architecture about how to store that information so you’re ready for that real time relationship with your customer.”

Building an API that links to your data and pushes it out at the request of Alexa or Actions for Google is the final step in completing the infrastructure overhaul that surfaces the right details when customers ask for it.

Write a script, map the conversation

Since you won’t be able to push out information through Alexa or Google Home, it’s necessary to anticipate what the user will ask your voice-first application.  Make sure the answers are up to date, because that’s what customers will expect, Kelly says.

To determine what information you should be putting in your Alexa Skill, look no further than the frequently asked questions (FAQ) section on your website, she advises. Similarly, if you operate a call centre and know what the most commonly asked quetions are there, you’ve got the right idea. But if the question is too complex to answer in a concise manner, without visuals, then it’s not the best choice for a voice-first app.

At TribalScale, the team creates a voice user interface (VUI) document that defines all the interactions that will occur with an app. This is how it approached its first client project with the Professional Golfing Association Tour Skill. It’s similar to the wireframing process of a mobile app, Wilks says.

“We came up with our own structure of how we create voice user interfaces and how we iterate on them,” he says. “How will the user journey develop and how will the dialogue go?”

A post shared by Brian Jackson (@urbanpaddler) on

That document anticipates user errors and can support multiple conversational flows so the experience feels as much like having a real conversation than like interacting with a robot. Do some real world user testing to understand how customers will think the experience will work, Wilks says.

Personalize and advertise

Just like you’d spend a lot of time considering the visuals you use to communicate your brand on a website or even in an ad, think carefully about what the audio experience will reflect about your brand, Kelly says. At Splice, customers can select from dozens of different voices.

“Similar to scent, people recognize the sound of someone’s voice as a key point to their identity,” she says. “I can’t imagine why brands would feel comfortable letting brands own their voice with the customer.”

At TribalScale, the development team lets Alexa provide the voice for its clients apps. But it’s looking at other ways to add a personalized touch, Wilks says. For example with its PGA Tour Skill, it could be useful if the user was able to save their favourite player and always hear an update about them when checking in with the Skill.

“We’re trying to build personalization aspects so that users can get to different aspects of the applications much faster and easier,” he says.

TribalScale gets around the challenge of not having any client-side installation to work with by using the device ID. The Skill could store some persistent data based on these user IDs and offer a dynamic configuration as a result.

Share on LinkedIn Share with Google+