Siri Usability Testing Study

“Hey Siri” 👋

To better understand what users can accomplish with Apple’s virtual assistant, Siri, my team and I at the University of Washington conducted a usability test study.

Read through our process to see what we found and our top design recommendations.

October 2021 - December 2021
8 Weeks
Annie Liu, Carol Lei, Lyle Hamm,
Kayli Ly
Researcher
Usability Testing, Interviewing,
Documentation, Technical
Support
UserTesting, Siri, Zoom, Figma

Date
Duration
Team

Role
Skills

Tools

01. Background

Apple’s intelligent assistant, Siri, is designed to assist Apple device users accomplish everyday tasks with their voice. Siri automates tasks for their user, such as setting timers, checking the weather, making search engine queries, and more. With Siri, our team saw an opportunity to evaluate a world-class product and make tangible usability recommendations that could improve the experiences of many.

02. Exploratory Study

Survey

Prior to usability testing, our team first conducted an exploratory study. Using Google Forms, our team developed a survey to learn how users currently employ Siri in everyday life. We chose to conduct a survey to get feedback from a broad range of Siri users. Our survey findings would inform the design of our usability test.

This survey had three specific focuses to help us understand aspects of Siri that we were most interested in investigating and felt would generate valuable information about the product: Siri Usage, Language & Culture, and Perception of Siri.

When did you first use Siri?

||

How often do you use Siri?

||

What language(s) do you speak to Siri with?

||

On which devices do you use Siri?

||

On which device do use Siri most often?

||

How do you primarily activate Siri?

||

What do you use Siri for?

||

What command(s) do you ask Siri most often?

||

How accurate do you consider Siri's transcription? (how well Siri understands what you said)

||

How effective do you consider Siri to be when assisting voice commands?

||

How conversational do you consider Siri to be?

||

How comfortable do you feel using Siri in public?

||

How has your perception of Siri's capabilities changed since you first used Siri?

||

Are there any features you wish Siri had?

||

When did you first use Siri? || How often do you use Siri? || What language(s) do you speak to Siri with? || On which devices do you use Siri? || On which device do use Siri most often? || How do you primarily activate Siri? || What do you use Siri for? || What command(s) do you ask Siri most often? || How accurate do you consider Siri's transcription? (how well Siri understands what you said) || How effective do you consider Siri to be when assisting voice commands? || How conversational do you consider Siri to be? || How comfortable do you feel using Siri in public? || How has your perception of Siri's capabilities changed since you first used Siri? || Are there any features you wish Siri had? ||

Sample of Survey Questions Asked

🙋🏻‍♀️ Participant Criteria

Participant must have some degree of experience with Siri.

💻 Distribution

Our team launched this survey in highly-trafficked online environments. This included various subreddits (such as r/surveyexchange and r/smarthome) and several University of Washington related Discord and Slack servers. The survey was live between October 20-24 of 2021 and collected 150 responses.

📊 Analysis Methods

Qualitative data was analyzed with the affinity mapping method and coded by the three focus areas, and quantitative data was analyzed through various data visualizations.

Survey Results

Here’s what 150 survey responses taught us about Siri.

Siri Usage

The devices on which participants use Siri were 70.7% iPhone, 12.7% Apple Watch and 12.7% HomePods.
There were slightly more participants reporting Siri activation through voice command, 57.3% compared to 40% activation through pressing and holding a button.
Many participants used Siri for daily task completion (76.7%), knowledge questions (62%), and making calls or sending texts (52.7%).

Language & Culture

The survey respondents we reached primarily interact with Siri in English (90%), with 10% in other languages, such as Spanish, French and German.
Participants reported that they wished Siri had better language understanding and intention detection:
- “Better understanding of my language especially when searching for songs”
- “Greater support for local languages and dialects; but that seems to be an umbrella problem for Language Tasks in general.”
A few participants expressed interest in Siri understanding more languages, such as Hindi and Portuguese.

Perception of Siri

Participants rated the following areas on a scale of 1-5 (1 being a low/negative rating and 5 being a high/positive rating).

The results are the averages from all responses. A value below 3 indicates a negative perception and a value above 3 indicates a positive perception.

Survey Insights & Recommendations

See how the survey results impacted the design of the usability test.

01. Users wish Siri was more conversational.

Siri is often unable to continue conversations after a task is done and directs to a resource rather than talk about it. Users experience frustration when Siri responds with “Here’s what I found on the internet.” instead of vocalizing the requested information. Users also expressed frustration over multi-task execution. When a user has multiple tasks for Siri, they would rather list these tasks in one recording rather than state each task one after the other.

Our first insight reveals a desire to speak with Siri with a more natural flow.

Usability Test Recommendation: Study user interactions with multi-task execution. We’ll explore how current users would verbally structure multi-task commands and how they expect Siri to respond.

02. Users wish Siri had more app integration.

After sorting through our affinity map, we found that codes related to app integration came up the most often (13 codes out of 71). Responses mentioned how they wish Siri would interact with more apps outside of Apple’s ecosystem, such as Spotify/Amazon music and Uber Eats.

From the second insight, we know that there is limited functionality when users want Siri to execute tasks with third-party applications.

Usability Test Recommendation: Determine what kinds of third-party applications users want Siri to integrate with and how users verbally construct these commands. We’ll ask the user to make a command executed with a third-party application to observe possible gaps between what Siri delivers the user and what the user expects out of this experience.

03. Users wish Siri had improved transcription capabilities.

Survey results show that those who communicated with Siri in a foreign language gave less 5’s on “How accurate do you consider Siri's transcription?” compared to those who spoke with Siri in English. Siri currently supports 21 languages, but Siri has difficulty understanding different accents and dialects of a language. Survey responses also indicate they wish Siri could understand the user’s speaking patterns or common words they use.

The third insight reveals a need for improved transcription to better Siri’s overall usability. Users expressed frustration over Siri’s inability to learn from previous verbal interactions.

Usability Test Recommendation: Study how users want Siri to learn from them. We’ll study how the user interacts with Siri to correct an error as a result of being misunderstood.

Welcome to the Usability Test

Thank you for being here.

03. Usability Testing

My team and I conducted a total of eight usability test sessions remotely over Zoom, between November 15 - 24, 2021. We used the think-aloud protocol and carried out pre- and post-test interviews. For the usability test, participants were asked to complete six tasks using Siri. These six tasks were designed with our survey insights in mind.

Participants were recruited through UserTesting and were recommended to take part in the usability test from a quiet, undisturbed location where they could speak and have their camera turned on. During the session, participants screen-shared their iPhone to the Zoom call so our team could observe Siri’s responses. All sessions were recorded with the participant’s consent, stored in UserTesting, and referenced for note-taking.

Each test session had a moderator, two notetakers, and technical support. I acted as technical support for all sessions and took notes when technical assistance wasn’t needed.

For more, see our test kit. The test kit includes: screening questionnaire, reminder email, consent form, moderated test script, pre-test questionnaire, post-test questionnaire, interview question debrief, and note taking template.

Participants were guided with the following presentation during the usability test.

04. Analysis & Findings

While designing the usability test, we had coded each task by three main areas of focus: Multitasking, Continued Conversation, and Transcription. These codes shaped how we analyzed our qualitative and quantitative data following the eight usability test sessions.

Qualitative Methods

💬 Dialogue Examination

We examined the dialogue between Siri and the participant. Using sticky notes, we mapped each interaction by what the participant asked and linked it to Siri’s response.

This allowed us to group common phrases and see how Siri’s responses differed. Different participants will word the same commands slightly differently, and Siri will relay the same information in slightly different ways.

📝 Affinity Mapping

We analyzed the post interview transcripts through affinity mapping. We put key phrases onto sticky notes in a Miro Board and grouped notes by similar categories. This affinity map helped us best capture what users thought of Siri.

Qualitative Findings

Users face difficulties speaking with Siri.
- Oftentimes, Siri would cut people off mid-sentence, or Siri would stop listening.
Users avoid prompting Siri with multitask questions.
- Users know Siri won’t be able to handle them successfully.
Users are unaware of the extent of Siri’s abilities.
- Many participants stated that they didn’t know Siri could do things like send emails or check their calendar.

Quantitative Methods

During the usability test, our notetakers timed how long it took each participant to complete each task. The following images illustrate our analysis of this quantitative data.

Quantitative Findings

Siri performs considerably worse on tasks encoded to involve writing and transcription.
- About 20% worse on average than continued conversation tasks or multitasks
The longer it took to complete a task, the less likely the task was to be completed.
- Users spend the most amount of time on writing and transcription tasks with an average of 27.90 sec/task.

05. Recommendations

Finally, our team makes the following design recommendations to improve Siri’s usability:

01.

Improve the Siri onboarding experience to showcase the extent of Siri’s capabilities.

Users stated that they would utilize Siri more post-study after realizing that Siri was more capable than they had thought.

Several of our participants mentioned that they didn’t even realize Siri was capable of executing such tasks, like sending emails or text messages. They were really pleased that Siri could do this for them and said that they would start using Siri more in the future. If we increase users’ understanding of what Siri can do, we’ll see an increase in Siri usage.

02.

Strengthen Siri and third-party application compatibility.

Siri defaults to executing tasks in applications developed by Apple. Users that prefer or exclusively use third-party apps for certain tasks (such as email) do not use Siri for this reason.

Even when participants specify a third-party application, like “Hey Siri send an email with Gmail,” Siri still opens up the Mail app. Several of our participants stated that they don’t use those applications and instead use Outlook or Gmail. Since Siri isn’t compatible with those applications, the user is unable to execute certain everyday tasks with Siri. If those applications are made Siri compatible, we can provide a more powerful experience to the user.

03.

Have Siri ask users if they would like to set default applications for everyday tasks.

When users ask to schedule an event, set a reminder, or send an email, Siri should ask for the user’s preferred application to execute the task in and remember their preference.

04.

Shorten Siri’s response time after hearing the user say, “Hey Siri.” Allow the user to have more control over when Siri stops listening.

Users often repeat themselves, because Siri either begins listening too late after activation or cuts off the user before they finish speaking.

If Siri’s response time is shortened and Siri had a better understanding of when the user stops talking, users would experience less frustration when interacting with Siri and overall efficiency would improve.

Our final Presentation Slides can be viewed here.

For the full Usability Test Results Report, please contact me directly at annielili.liu@gmail.com.

Reflection

Conducting Usability Tests: The Importance of Planning & Data Analysis

Looking back, I feel like the planning stage is the most important part of conducting a usability test study. In the planning, we ensure that the test is robust and addresses everything that we want to examine. It’s too late to test something we’re interested in, after all the test sessions have been completed. I also wish that our team had done more planning in regards to how we were going to analyze the data. Data analysis can be challenging, so knowing what data we’re collecting, why we’re collecting it, and how we’ll derive meaning from it would be beneficial to know earlier on versus after collection. Data analysis is more than just affinity mapping.

My Contributions to This Project

As a team member, I work to maintain an organized workflow. I formatted and made final edits to all of our team’s plans and reports, created supplemental note-taking materials for usability testing and designed all presentations (including the slides used for the usability test). My familiarity with both macOS and Windows allowed me to act as technical support during testing sessions. I made a high impact on our exploratory study. My individual efforts in distributing our survey led to a significant majority of the 150 survey respondents.

…

Overall, I enjoyed this study and look forward to conducting more usability tests in the future! 😄

MORE SELECTED WORKS: INSTRUMENTAL | MIMO | THE VERA PROJECT BRANDING