Speech-to-Text Transcription Tool Design

Crafting a user-friendly transcription solution to enhance web accessibility

Internship @ Berkeley IT
UX Design
Idea Validation
0 to 1 Web App
MY ROLE
UX Designer
Responsible for defining the design brief, synthesizing research insights, facilitating user interviews, prototyping, and usability testing
TEAM
UX Lead
Web Access Evangelist
Front-end Engineer
2 Back-end Engineers
TIMELINE
Summer 2023 (10 weeks)
TOOLS
Figma
Miro
Adobe CC
background

UC Berkeley's compliance with the DOJ on web accessibility faced barriers due to inefficient transcription practices

Under the consent decree between UC Berkeley and the Department of Justice (DOJ), video and audio content on UC Berkeley’s website and subdomains must conform to WCAG 2.0 Level AA. This includes providing transcripts, closed and live captions, audio description, alternative text, resizable text, and more.

Despite these efforts, discussions with the Web Accessibility Evangelist uncovered a significant gap in existing transcription solutions, leading to the removal of over 20,000 videos of classroom lectures and podcasts from public access.

opportunity framing

With Whisper AI’s open-source automatic speech recognition (ASR) system, there is an opportunity to enhance transcription accuracy on the backend.

What if we develop a home-grown transcription tool that offers the high accuracy of outsourced human captioning and the quick turnaround time of automated captioning tools to better serve the Berkeley campus needs?

project scope

Idea Validation

  • User Research: Conduct comprehensive user research to understand the pain points of existing transcription tools and identify core features aligned with Berkeley campus needs and priorities

  • MVP Design & Functionality Assessment: Design a testable prototype representing the minimum viable product (MVP) of the transcription tool and assess the potential utility through usability testing

what i delivered

✨ Defined and prototyped 4 core features

  • Conducted 9 user interviews to uncover major pain points associated with existing transcription tools

  • Utilized insights gathered from user interviews to identify, prioritize, and prototype 4 core functionalities essential for addressing user needs

✨ Established proof of concept & informed product roadmap

  • Demonstrated the viability of a home-grown transcription through conducting 9 usability testings

  • Utilized insights gathered from usability testing to inform the phased development of the transcription tool, guiding feature prioritization and refinement in alignment with user needs and priorities.

project outcome

🎉 We are currently in the active design and development phase of the first iteration of the transcription tool!

Following idea validation, we have decided to opt for phased releases. In this initial phase, our focus is to empower users to export the auto-generated transcripts/ captions in over 10 formats. This feature, accompanied by informative prompts suggesting the best export formats depending on the use case, aims to cater to diverse transcription needs across the campus, enhancing accessibility and usability.

here's HOW I TACKLED THE CHALLENGE

01

Discover

UNVEILING USER PERSPECTIVES

Conducting interviews with 9 staff members who were active transcription tool users

Goal 1

Understand primary goals and tasks to accomplish when using current transcription tools

Goal 2

Understand user preferences and workflow patterns when using current transcription tools

Goal 3

Identify valuable features, pain points, and missing functionalities of current transcription tools

9 x 45-min user interviews

  • Director of Academic Technology

  • Publication Management System Coordinator

  • Business Analyst for Accessibility Remediation

  • Education Technology Specialist

  • Accessible Technology Coordinator etc.

key findings

4 major challenges with existing transcription tools

Inaccurate Transcription

The perceived accuracy of automated transcripts varied significantly across existing tools. Common factors that tend to induce error include technical terminology, proper nouns, acronym, homophone, multilingual content, and accents in speakers

Poor Punctuation & Formatting

Current transcription tools fall short in maintaining accurate punctuation and capitalization. Some existing tools either completely omit punctuation and capitalization or include only basic punctuations such as commas and periods, hindering readability.

Difficulty in Identifying Speakers

Speaker identification capabilities varied among different tools. Some lacked such capabilities, necessitating manual differentiation and input. Some tools seemed to be better at distinguishing voices at lower pitch levels

Cumbersome Editing Workflow

Users found editing transcripts cumbersome and time consuming. Common issues of existing tools include absence of built-in editing capabilities, insufficient formatting, inadequate error handling, limited keyboard shortcuts, and limited collaboration features

02

Ideate

Ideating features

Ideating and prioritizing core features for MVP development

Drawing from user research insights, I led the ideation of essential features to address each of the four major pain points for MVP development. Through collaborative sessions with our engineering team, each proposed feature was carefully evaluated for feasibility and potential impact. Using a prioritization matrix, we ensured the MVP focused on critical features, aligning with user needs and optimizing resource allocation.

designing key features

Designing 4 key features based on feasibility assessment to address the four major pain points identified in user research

user flow

Crafting essential user flows for the MVP design with strategic feature integration

With the primary objective to design a minimum viable product (MVP), I focused on conceiving two fundamental user flows critical to our transcription tool’s core functionality: (1) uploading audio and video files for transcription and (2) editing and downloading generated transcripts. Then, I proceeded to integrate the proposed features into the user flows.

Maintaining a balance between feature richness and MVP integrity was critical throughout this process. For instance, recognizing the high implementation effort required for a fully integrated editing solution, I explored leveraging Google Docs as an alternative platform. This approach aimed to deliver a seamless editing experience while upholding the MVP concept.

03

Refine

usability testing

Validating and iterating the design through 9 usability tests

I translated the user flow into an interactive Figma prototype and conducted 9 usability sessions with the same participants previously interviewed. This iterative process aimed to evaluate the effectiveness of the proposed features in addressing user pain points, ensuring that the design enhancements aligned with user needs and expectations.

The central challenge throughout the iteration process again lies in finding the optimal balance between creating a MVP that aligns with technical feasibility while delivering valuable features to users to foster adoption of the product.

Pain point 1: Lengthy transcription setting workflow

Before

  • Extended Workflow: Transcription setting options were useful but the overall workflow was longer compared to their experience with current tools

  • Confusion Over Options: Users wanted to be able to visually explore and test these different options

After

  • Redesigned Workflow: Integrate some of the transcription setting options directly into the editing workflow. Users can conveniently test and adjust these settings while reviewing their transcripts

Pain point 2: Uncertainty among less technical users in selecting suitable file formats for export

Before

  • Less technical users frequently experienced uncertainty when it came to selecting the most suitable file format for their specific needs during the export phase of the transcription process

After

  • Informative Prompt: Include help text for each format option, clarifying the purpose and compatibility of each format to assist users in making informed choices

04

Final Deliverables

Transcribe your audio/ video in 4 easy steps

  • Utilize custom vocabulary to define specialized terms for enhanced accuracy

  • Pre-define transcription settings, such as number of speaker to enhance speaker recognition

Simplifying transcript editing

  • Easily assign speaker labels for seamless speaker identification

  • Explore additional transcription settings for enhanced customization

  • Choose from diverse export options to suit various transcription use cases

05

Impact

DEFINING SUCCESS THROUGH USER SENTIMENTS

8 out of 9 usability testing participants expressed their readiness to utilize this transcription tool

“The tool seems very user-friendly and straightforward. It already includes the two features I would ask for, which is custom vocabulary and manual speaker identification, so it’s awesome.”

Web Accessibility Coordinator, UCOP

“The design is clean, succinct, and easy to understand. The UI is really good. If I am to create a transcription, this tool will be great!”

Business Analyst for Accessibility Remediation, UC Berkeley

project outcome

🎉 We are currently in the active design and development phase of the first iteration of the transcription tool!

Following idea validation, we have decided to opt for phased releases. In this initial phase, our focus is to empower users to export the auto-generated transcripts/ captions in over 10 formats. This feature, accompanied by informative prompts suggesting the best export formats depending on the use case, aims to cater to diverse transcription needs across the campus, enhancing accessibility and usability.

takeaways

Navigating through ambiguity

The absence of a well-defined design brief necessitated me to take the initiative to establish conversations with my supervisor and the Web Access Team to collaboratively define and refine the project’s scope. Furthermore, it makes me recognize the importance of self-directed learning to fill in the gaps. Conducting independent secondary research throughout the project has allowed me to better understand the problem space.