Audio and Video Accessibility: Captions, Transcripts and More

Introduction

As part of the Aims Digital Accessibility Initiative, we adhere to the Web Content Accessibility Guidelines (WCAG 2.1, Level AA), required by the Americans with Disabilities Act, Title II and Colorado State regulation HB21-1110 - with deadlines coming up this April of 2026.
- This helps to ensure best practices for many different types of content including audio and video, along with helping to improve the user experience for all who use Aims D2L courses.
For audio/video accessibility specifically: using captions, transcripts, and in some cases audio descriptions, helps users to process audio or video information at their own pace:
- So that everyone, including people with individual or a wide range of disabilities, can perceive, understand, and interact with this digital content.
Having captions and transcripts can actually make the difference for whether or not someone can consume audio or video content you upload or embed within a D2L course.
- This goes for content that you create and third-party content that you source like YouTube videos.

Summary

Providing captions and transcripts are core methods for achieving media accessibility.
Providing these tools - is a requirement of the Web Content Accessibility Guidelines (WCAG 2.1 AA).
Prerecorded audio-only content (e.g., podcasts) must be accompanied by a transcript.
Prerecorded videos with audio must include accurate closed captions and, if key visual information is not spoken, audio descriptions must also be provided.
Both live audio and video streaming events require live captions.
AI tools can be used to generate captions, transcripts, and audio descriptions, but faculty need to review them for accuracy, spelling, and proper context.
If a video is hosted on the YuJa Enterprise Video platform and embedded into D2L, Panorama would check for the presence of captions.
Media players must be fully operable by keyboard, not just with a mouse, and must support display of captions, transcripts, and audio descriptions.
Avoid video content that flashes too quickly - can trigger seizures in users with photosensitive epilepsy.
Avoid auto-playing animated gifs. Let the user choose to play/pause the content.
Treat an interactive simulation that has an audio track like a video: captions must be provided to help people with hearing impairments interpret the content and context of the simulation.
For an interactive simulation that has no audio track, provide a text version of what occurs within the simulation and what data is shared - adjacent to the simulation.

Benefits of audio and video accessibility: for people with disabilities

People who are hard of hearing or deaf:
Benefit from captions and transcripts to understand spoken content.
People who have low vision or blindness:
Benefit from audio descriptions (AD) in videos, to hear key visual information, and also transcripts that can be read by screen readers to understand dialogue or narration for audio files.
People who have cognitive difficulties like ADD, dyslexia, ADHD:
Benefit from transcripts for review and comprehension of information at their own speed.
People with mobility or sensory-motor difficulties who can’t use a mouse:
Benefit when media player buttons can be navigated with a keyboard - so they can access media and control when to play/pause .

A note for everyone

Captions allow viewing of videos in different environments like a busy restaurant or a quiet library reading room.
Transcripts can also help people learning languages to understand spoken language and learn vocabulary.

Audio accessibility

High-quality recorded audio and clear speech benefit everyone.

Two types of audio files

Prerecorded audio-only podcast streaming files or any type of audio file in D2L.
- This applies to podcasts created at Aims or those from third-party sources.
- Provide a transcript, if the podcast player has functionality to add one.
- If the player does not support transcripts, generate a transcript for the podcast, check it for accuracy, upload it into D2L, and place it adjacent to the link to the podcast player.
Live audio-only streaming
- Live captions should be provided for all live audio-only streaming content.
- Once the live streaming content is completed:
Prepare your prerecorded audio to post online with a transcript as discussed above.

Video accessibility

High-quality recorded video with audio benefits everyone.

Two types of video

Prerecorded video with audio
- A closed caption file (.WebVTT or .SRT formats) must be provided along with the video.
  - Videos can be uploaded to the YuJa Enterprise Video platform - they are auto-captioned, and you can link to them within D2L.
If the captions do not describe key visual information, then audio descriptions would be needed.
- Write these to describe that key visual information and manually add audio descriptions via the YuJa video platform.
Live video with audio streaming
- Provide live captions for live video streaming events (called CART services: Communication Access Realtime Translation).
- A sign-language interpreter can also be considered.
- After the event, the recorded video can be uploaded to YuJa Enterprise Video platform.
  - The video will be auto-captioned. Check for accuracy.

About captions and transcripts

Why are these important?

Vital for people with hearing difficulties or deafness. They provide access to dialogue and explain non-speech sounds like music and laughter that occur in the media.
Also help identify speakers, overcome audio challenges like background noise or distinguishing people’s accents, and help to ensure overall access to media content.

Captions

There are two main types:
Closed Captions (CC) can be turned on or off in a media player when provided with the video.
Open Captions (OC) are permanently embedded into a video.
Captions must be accurate and properly synchronized with the video and audio.
- Inaccurate captions can lead to miscomprehension of video content, especially with complex or specialized terminology like math and science equations, formulas, etc.
Should include all dialogue, crucial sounds, and meaningful non-speech information like music.
Captions need to clearly identify a speaker, especially when there are multiple speakers.
Help to improve comprehension for individuals with cognitive impairments, and they can enhance focus, boost retention, and support diverse learning styles.
Captions are different from subtitles:

Subtitles are used for language translation and it is assumed the user can hear the audio but may not understand the language, so they typically only translate dialogue.

More about captions

If a video is hosted on the YuJa Enterprise Video platform and embedded into D2L, then Panorama in D2L would check for the presence of captions.
When uploading a video to the YuJa video platform it automatically captions it. The first screen shot below shows Media Details > Accessibility page. The second screen shot shows using the Video Editor to edit video captions for accuracy, spelling errors, and context.

Media Details auto-captioning set to default.

Video Editor to edit captions.

If a video has no captions

Generate a transcript and post that in D2L or on a media server and link to it next to the video link in D2L. Use a program like NoteGPT or another AI tool to generate a transcript.
To add NoteGPT for use with YouTube, install the Google Chrome browser extension.https://notegpt.io/

Example of a YouTube video that has no captions: the content provider did not provide captions.

Transcripts

Transcripts are text versions of media content (dialogue, actions).
Provide a complete written record of the content, allowing for self-paced reading, review, and easy searching for keywords or specific sections of text.

Two types of transcripts

Static Transcripts (HTML or text files): Include all necessary speech and non-speech audio information.
Interactive transcripts do the same. This functionality works within an HTML5 media player.
It highlights spoken words and allows users to click text to jump to that point in the media.
In general, if a media player has an interactive transcript functionality, you can enable that feature so that transcripts are available.
The YuJa video platform will generate transcripts at the same time it auto-captions a video. So feel free to use these as well as the captions.
For audio-only content (like a podcast) a transcript is the primary accessibility requirement.

Aims video example: Let's Tour Allied Health and Sciences! Transcript file. Note that there are errors in the transcript text example which would need to be checked for accuracy.

Let's Tour Allied Health Sciences! video example with a transcript

Example of a third-party podcast on Apple podcasts that has a View Transcript button and of the transcript file.

LeVar Burton Reads podcast example with transcript

LeVar Burton Reads podcast transcript file

AI tools and audio/video accessibility

Generative AI tools, like Google Gemini, ChatGPT and others, can be used by students and instructors. Provide prompts to have the tool automatically analyze a transcript and list summaries and key takeaways, helping users to process and review content more efficiently.
Also when given prompts, these tools can analyze video content, identify key visual elements, and automatically generate audio descriptions.

AI-generated captions and transcripts can pose accessibility challenges

These text files will need people to review them and fine-tune text descriptions to ensure accuracy, spelling, proper context, and that nuances in emotional tone are explained.
Another thing to check is timing of captions to make sure that they are not out of sync with the audio and video.

Audio descriptions

Audio description (AD) is a secondary audio track that narrates key visual information for people who are blind or who have low vision.
Narrates essential visual information not conveyed through dialogue that is captioned (e.g. scenes, actions, settings), ensuring full comprehension.
After writing or generating an audio description, set these up manually in the YuJa video platform. Automatically pause a video to insert a description of a complex scene with key visual information.

Media details audio description

Accessible media players

All media player buttons/controls must be labeled and be accessible by keyboard so that they are screen reader compatible. So users do not face a barrier to access.
- Many people cannot use a mouse or choose not to, relying entirely on keyboard navigation.
- Users must be able to adjust playback speed and the appearance of captions (size, color).
A media player needs to function reliably across different browsers and devices.
Avoid using autoplay for audio or video content. Give users full control to play/pause media.
Have the ability to play captions and interactive transcripts. They can also play audio descriptions if needed and available.

Example of Able Player, an accessible media player with captions, interactive transcripts, AD functionality.

Able player media player with interactive transcript and audio descriptions

Flashing content in video

Avoid video content that flashes too quickly - more than 3 times per second - can trigger seizures in users with photosensitive epilepsy. Avoiding use of flashing content at all is best.
Photosensitive epilepsy is a type of epilepsy where seizures are triggered by flashing lights, patterns, or other visual stimuli.
When creating video content, avoid any rapid changes in brightness or color that might be perceived
as a flash.

Animated GIFs

Avoid auto-playing animated gifs. Let the user choose to play/pause.
The GIF must not contain anything that flashes more than three times in any one-second period.
Descriptive GIFs: If the GIF is essential to understanding the content (e.g., showing a step-by-step process), the alt text must describe every meaningful frame or action.
Illustrative GIFs (Memes): If the GIF is purely for illustrative or emotional effect, the alt text should convey that emotion.

Interactive Simulations

For an interactive simulation that has an audio track - treat this like a video: captions must be provided to help people with hearing impairments interpret the content and context of the simulation.
If a user can't use a mouse, and uses a keyboard to navigate, they must be able to navigate using the Tab, Enter, and Arrow keys.
- When a keyboard user tabs to buttons or links, focus Indicators (a highlight) should display on any button or link within a simulation.
Provide "Help" instructions that explain how to navigate the simulation with a keyboard.
Avoid time limits, or allow the user to pause/extend the time.
For an interactive simulation that has no audio track, provide a text version of what occurs within the simulation and what data is shared - adjacent to the simulation.
- Example: "In this simulation, the user can drag a magnet toward a coil. As the magnet enters the coil, the voltmeter needle moves to the right, indicating a positive current."

Conclusion

Providing captions and transcripts are core methods for achieving media accessibility.
Providing these tools - is a requirement of the Web Content Accessibility Guidelines (WCAG 2.1 AA).
Prerecorded audio-only content (e.g., podcasts) must be accompanied by a transcript.
Prerecorded videos with audio must include accurate closed captions and, if key visual information is not spoken, audio descriptions must also be provided.
Both live audio and video streaming events require live captions.
AI tools can be used to generate captions, transcripts, and audio descriptions, but faculty need to review them for accuracy, spelling, and proper context.
If a video is hosted on the YuJa Enterprise Video platform and embedded into D2L, Panorama would check for the presence of captions.
Media players must be fully operable by keyboard, not just with a mouse, and must support display of captions, transcripts, and audio descriptions.
Avoid video content that flashes too quickly - can trigger seizures in users with photosensitive epilepsy.
Avoid auto-playing animated gifs. Let the user choose to play/pause the content.
Treat an interactive simulation that has an audio track like a video: captions must be provided to help people with hearing impairments interpret the content and context of the simulation.
For an interactive simulation that has no audio track, provide a text version of what occurs within the simulation and what data is shared - adjacent to the simulation.