Behavioral Prototype (HCDE 451 Wizard of Oz)

Context

This was a behavioral prototyping test for us to explore conversational user interaction scenarios within a COVID-19 chatbot scheduling service. As a group of three, we had a facilitator, a wizard, and a scribe as follows:

  • Rylie Sweem — Facilitator: Directed the testing, communicated with the user, and orchestrated the session. Rylie gathered participants and created an introduction and storyline for the tests.
  • Generous Yeh — Wizard: The “wizard behind the curtain” manipulating the prototype that the user does not see to accomplish a real-time reaction to their actions. Generous acted as the chatbot and communicated with participants in real-time using responses from our prototype.
  • Sarah Le — Scribe: Someone to capture notes and record the user’s actions. Sarah also communicated with Generous over Slack, discreetly telling him when testing would begin and making sure he’s in the loop.

Both Sarah and Rylie attended the Zoom meetings to interact with participants, while Generous (AKA the wizard) was on standby until participants began the chatbot process, which he then was in charge of using prototype interactions to act as the chatbot. We had two participants total.

Below we will discuss how we created our chatbot, how the testing went, and the changes we made which were informed by testing.

Prototype

Using LucidChart, we prototyped our “chatbot” that registered the user(s) for a COVID-19 test. Our process began with a flow chart mapping out the ideal scenario if the user followed every text. Then, we started accounting for possible errors and accommodations. Below is our initial diagram for the chatbot that we followed for our testing (updated chart found at bottom):

The users were prompted to text “COVID-19 TEST” to a third team member that acted as the Wizard to start the interactive experience with the chatbot. Participants were briefed that our class provided chatbots for us to test and that no real scheduling would occur. If the test was successfully scheduled, the team would cancel the appointment for the participant. This added to the believability of the prototype.

Evaluation sessions were held on Zoom, where the participant met with the facilitator and the scribe. The wizard was off the call, texting the participant in real time. The team stayed in contact through Slack messages and after testing, the wizard joined the Zoom call to reveal to the participant the magic and wizardry. Our demo video is below.

A constraint that our team encountered is that we had to conduct the user testing virtually so we were unable to observe the participant’s body language or phone screen throughout the process. However, we requested the participant to share their thoughts out loud so the scribe member could capture the participant’s thought processes and reactions. Additionally, we requested that the participants record their phone screens so we could capture the interactions they had.

Videos from the wizard’s point of view can also be found here (screens only): https://drive.google.com/drive/u/1/folders/15aLTCE7dlnmKl533Mxp73s24iSiFK2GU

Analysis

What worked well in our Wizard of Oz prototype, as noted by participants, was that it was intuitive to follow and the language was clear. Participants were not bogged down by unnecessary details and were able to schedule a test successfully. Additionally, both participants believed the chatbot was real which shows that our prototype was effective in hiding the actual fidelity of the chatbot.

On the flip side, one of the hiccups from the Wizard of Oz tests was that both participants were confused with the process of selecting a time and date for their tests. These specific chatbot questions were too open ended, and it left them wondering how they should respond, delaying the overall process. Users were also wondering whether they needed to respond to the chatbot using capitalization, further lengthening the scheduling process as they stopped to think about this a few times, however they were able to quickly learn that capitalization didn’t matter. One participant also noted after the test that they wanted more information regarding the insurance process and would have liked some guidance within the chatbot.

Receiving feedback showed that there are a few adjustments we could have made to our prototype. In a few of our questions, we ask the user to reply “YES” or “NO” when the user could skip this step and directly answer the questions, so we removed “REPLY YES/NO” from the dialogues. By leaving yes or no questions more open-ended, we created more pathways depending on the user response and mitigated confusion regarding capitalization.

Another spot for improvement was the way we presented information. We were told that splitting up our text could help readability and remove confusion. While we did not encounter this problem in our two tests, we saw the merit. Specifically, when indicating options (i.e. time availability) we could present these options on new lines, like a list for enhanced readability for the user.

While we tested two participants, they followed a similar procedure and the conversation moved smoothly without much error to address. We could not exactly ask them to try “errors” or we risked giving away that there was something going on. With further testing, we’d expect to discover more error cases to address, extending our flow chart and possible dialogue to include options that the user might say “no” to, such as middle-of-the-night time slots. The user testing feedback is summarized in the following lists:

Things about the Wizard of Oz prototype that worked well:

  • Language was clear to participants
  • Participants believed that the prototype was real
  • Participants thought the process is straightforward and questions are easy to understand

Things about the Wizard of Oz prototype that could be improved

  • Choosing a date and time was too open-ended and could have had an easier flow
  • Be consistent with accepting user’s responses such as capitalization and no capitalization to eliminate confusion
  • Elaborate on insurance information so the user feel prepared and know what to do after they successfully scheduled an appointment
  • Break down big block texts into smaller texts to ease eyes of the reader

Effectiveness of overall design

  • Design was mostly effective, as both participants followed the happy path and successfully “scheduled” a test
  • Participants both said they would use it
  • The introduction of how the user testing go was not included in the video but we made sure to give details instructions and give the participant the opportunity to ask any questions before we began the user testing

Iterations

We addressed feedback from our user testing and class critique to modify our prototype. Here are a few examples of the things we changed:

  • Breaking up long messages into smaller lines of text
  • Removing some of the suggested responses
  • Adding more error states

Revised chatbot flow chart after testing, analysis, and reflection below!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Generous Y

Generous Y

Student at the University of Washington studying Human Centered Design & Engineering /// currently at synHacks /// http://generousyeh.com/