Under the consent decree between UC Berkeley and the Department of Justice (DOJ), video and audio content on UC Berkeley’s website and subdomains must conform to WCAG 2.0 Level AA. This includes providing transcripts, closed and live captions, audio description, alternative text, resizable text, and more.
Despite these efforts, discussions with the Web Accessibility Evangelist uncovered a significant gap in existing transcription solutions, leading to the removal of over 20,000 videos of classroom lectures and podcasts from public access.
With Whisper AI’s open-source automatic speech recognition (ASR) system, there is an opportunity to enhance transcription accuracy on the backend.
User Research: Conduct comprehensive user research to understand the pain points of existing transcription tools and identify core features aligned with Berkeley campus needs and priorities
MVP Design & Functionality Assessment: Design a testable prototype representing the minimum viable product (MVP) of the transcription tool and assess the potential utility through usability testing
Conducted 9 user interviews to uncover major pain points associated with existing transcription tools
Utilized insights gathered from user interviews to identify, prioritize, and prototype 4 core functionalities essential for addressing user needs
Demonstrated the viability of a home-grown transcription through conducting 9 usability testings
Utilized insights gathered from usability testing to inform the phased development of the transcription tool, guiding feature prioritization and refinement in alignment with user needs and priorities.
Following idea validation, we have decided to opt for phased releases. In this initial phase, our focus is to empower users to export the auto-generated transcripts/ captions in over 10 formats. This feature, accompanied by informative prompts suggesting the best export formats depending on the use case, aims to cater to diverse transcription needs across the campus, enhancing accessibility and usability.
Understand primary goals and tasks to accomplish when using current transcription tools
Understand user preferences and workflow patterns when using current transcription tools
Identify valuable features, pain points, and missing functionalities of current transcription tools
Director of Academic Technology
Publication Management System Coordinator
Business Analyst for Accessibility Remediation
Education Technology Specialist
Accessible Technology Coordinator etc.
The perceived accuracy of automated transcripts varied significantly across existing tools. Common factors that tend to induce error include technical terminology, proper nouns, acronym, homophone, multilingual content, and accents in speakers
Current transcription tools fall short in maintaining accurate punctuation and capitalization. Some existing tools either completely omit punctuation and capitalization or include only basic punctuations such as commas and periods, hindering readability.
Speaker identification capabilities varied among different tools. Some lacked such capabilities, necessitating manual differentiation and input. Some tools seemed to be better at distinguishing voices at lower pitch levels
Users found editing transcripts cumbersome and time consuming. Common issues of existing tools include absence of built-in editing capabilities, insufficient formatting, inadequate error handling, limited keyboard shortcuts, and limited collaboration features
Drawing from user research insights, I led the ideation of essential features to address each of the four major pain points for MVP development. Through collaborative sessions with our engineering team, each proposed feature was carefully evaluated for feasibility and potential impact. Using a prioritization matrix, we ensured the MVP focused on critical features, aligning with user needs and optimizing resource allocation.
With the primary objective to design a minimum viable product (MVP), I focused on conceiving two fundamental user flows critical to our transcription tool’s core functionality: (1) uploading audio and video files for transcription and (2) editing and downloading generated transcripts. Then, I proceeded to integrate the proposed features into the user flows.
Maintaining a balance between feature richness and MVP integrity was critical throughout this process. For instance, recognizing the high implementation effort required for a fully integrated editing solution, I explored leveraging Google Docs as an alternative platform. This approach aimed to deliver a seamless editing experience while upholding the MVP concept.
I translated the user flow into an interactive Figma prototype and conducted 9 usability sessions with the same participants previously interviewed. This iterative process aimed to evaluate the effectiveness of the proposed features in addressing user pain points, ensuring that the design enhancements aligned with user needs and expectations.
The central challenge throughout the iteration process again lies in finding the optimal balance between creating a MVP that aligns with technical feasibility while delivering valuable features to users to foster adoption of the product.
Extended Workflow: Transcription setting options were useful but the overall workflow was longer compared to their experience with current tools
Confusion Over Options: Users wanted to be able to visually explore and test these different options
Redesigned Workflow: Integrate some of the transcription setting options directly into the editing workflow. Users can conveniently test and adjust these settings while reviewing their transcripts
Less technical users frequently experienced uncertainty when it came to selecting the most suitable file format for their specific needs during the export phase of the transcription process
Informative Prompt: Include help text for each format option, clarifying the purpose and compatibility of each format to assist users in making informed choices
Utilize custom vocabulary to define specialized terms for enhanced accuracy
Pre-define transcription settings, such as number of speaker to enhance speaker recognition
Easily assign speaker labels for seamless speaker identification
Explore additional transcription settings for enhanced customization
Choose from diverse export options to suit various transcription use cases
“The tool seems very user-friendly and straightforward. It already includes the two features I would ask for, which is custom vocabulary and manual speaker identification, so it’s awesome.”
Web Accessibility Coordinator, UCOP
“The design is clean, succinct, and easy to understand. The UI is really good. If I am to create a transcription, this tool will be great!”
Business Analyst for Accessibility Remediation, UC Berkeley
Following idea validation, we have decided to opt for phased releases. In this initial phase, our focus is to empower users to export the auto-generated transcripts/ captions in over 10 formats. This feature, accompanied by informative prompts suggesting the best export formats depending on the use case, aims to cater to diverse transcription needs across the campus, enhancing accessibility and usability.
The absence of a well-defined design brief necessitated me to take the initiative to establish conversations with my supervisor and the Web Access Team to collaboratively define and refine the project’s scope. Furthermore, it makes me recognize the importance of self-directed learning to fill in the gaps. Conducting independent secondary research throughout the project has allowed me to better understand the problem space.