Designing for experiences in the car can be challenging. It’s environmental context where the car is influences the experience. Can the experience be the same far away in the wilderness to in the heart of a capital? As is the experience the same for when it’s only the driver versus when there are passengers on board? During the Voice of the Car summit on seventh and eighth of April several companies talked about how they are tackling this very issue.
Throughout time more and more features have been added to ease the commute in the car. Yet each piece demands attention in one way or another. This would be an ideal scenario for voice interfaces. The driver can keep their attention towards the road and operate each feature to their own wishes. Even though the car is a closed environment the driver is not always heard. This can be due to fellow passengers or by the music that is playing to loud. By placing a camera inside of the car different voices can be heard at the same time. It traces the faces and can differentiate what each person is saying. The driver is the one being listened to by the voice assistant. Hi Auto, is the company behind this piece of technology. It can deliver a major contribution to voice in the car. In order for the driver to make the decisions in the car. Not the side driver or the screaming children in the back. The driver is in control and can have a meaningful experience with voice in the car.
Having the voice assistant inside the car listen to your commands is one thing. The other thing is to create meaningful experiences as P.ota.to puts it. By doing the more well-known Wizard of Ozz testing inside of the car. Hereby, one creates an experience and let one tester play the voice assistant who can only say the written lines of the script. While the driver can ask all the questions. In order to see if the voice skill/action works. Test it in various environments to know if it works just as well in the busy city as the calm country road. This creates a user centered voice skill/action.
To create experiences inside of the car, that would enhance the customer loyalty takes a while. Soundhound therefor showed their approach, one that I would say applies to all voice experiences and not just inside the car. The approach consists of three parts. It starts with deliver on the core functionality. What is the essence of the product, what does it really need to do in order to work? Make this work first in a way that the user can reach the end goal fast. Let’s say Spotify is making a voice Action/skill. The first thing in the delivery phase would be to be able to start the music by naming the artist name or album. Secondly, it’s about differentiating with unique capabilities and understand follow-ups. For the Spotify example it would mean that the user can ask for music for a certain moment during the day, such as ‘play music for during dinner’. As well as when a user asks for an artist ‘play music of Beyoncé’ then the voice assistant could ask ‘from her latest album or a list of all her hits?’. By doing so the voice assistant seems to get the user by asking a follow-up question for more specific information. The third and final step would be to delight. Make sure the consumer get above-and-beyond experiences by handling complexity, exceptions and filtering. Like, ‘play the latest duet with Beyoncé and put the song in my morning commute playlist’. Each of these steps is slowly built up. You will get to know what the user wants by having that conversation with them. Maybe they ask for things you initially did not imagine. That’s why you should start at the delivery of functionalities at first, do not delight the user immediately with a huge voice skill/action. Slowly built it up.
Besides, to know whether the voice Action/Skill is designed correctly it needs to be tested. Therefore, Bespoken developed an automatic testing system for voice experiences. If you can type out your test you can automate it very easily, they say. You only have to do it once and you can run it perpetually. It works as follows, it sends the audio via a speaker to a test device. It waits for the test device to answer, it captures the audio response via a microphone and the graphical response via a camera. Then it converts the audio to text with speech-to-text and the image to text with machine vision/OCR. For testing in the car there could be more noises added in the audio file and send out via the speaker in order to customize it to every possible situation. By doing this there is no specific need for men power anymore. It is possible to write a test once and keep customizing it to various situations.
In conclusion, to build and test a voice experience for in the car. There needs to be taken into account what happens in the car and what kind of commute it is. One of my favorite examples for a true car experience is an app called Road Tales. What they do is that you can only use the app when driving on the highway. They made sure that they know all the highways and important points next to the highway. This could be a tower or a gas station. Combining that with children’s stories. The stories only worked when the car was driving on the highway and when there was an important point next to the highway, the story included it in a special way. All to entertain the children in the back of the car. This showcased that they really looked at who was in the commute, which function needed to be fulfilled and how they could do it and keep it exciting.