Whisper and WhisperX for podcast transcription

I just realized that I hadn’t posted this. Several months ago, I read about whisper.cpp and started playing with it. To quote from their docs, whisper.cpp is a

High-performance inference of OpenAI’s Whisper automatic speech recognition (ASR) model:

  • Plain C/C++ implementation without dependencies
  • Apple Silicon first-class citizen – optimized via ARM NEON, Accelerate framework, Metal and Core ML
https://github.com/ggerganov/whisper.cpp

In other words, a fast and free speech transcription app that runs on your laptop. Damn!

In fact, it’s so efficiently written that you can transcribe on your iOS phone. Or browser. Haven’t tried those yet.

Anyway, that gave me an idea: a couple of friends of mine run a podcast called TGN. They’ve been at it for a few years, and have around 250 episodes of an hour each. Could I use whisper.cpp to produce a complete set of episode transcripts? If I did, would that be useful?

(Insert a few months of side project hacking, nights and weekends.)

It works, pretty well. For podcasts, however, you end up with a wall of text because Whisper doesn’t do what’s called ‘speaker diarization,’ that is, identifying one voice or another. It’s on their roadmap, though.

I was sharing the progress on the TGN slack when an employee of the company OctoML DM’d me. They have a WhisperX image that does diarization, and he offered to help me use it for the project.

(More nights and weekends. Me finding bugs for OctoML. Adding a second podcast. Getting help from a couple of friends, including the a-ha mkdocs idea from David.)

Voila! May I present:

The key useful bits include

  • Full text searching
  • Local mirrored copies of the podcast MP3, raw transcript text and (in progress attempts) to mirror the episode web page.
  • Using mkdocs to build static websites that look OK, and can render Markdown into HTML.

Lots of features yet to build but it’s been a really fun side project. Source code is all on GitHub, right now I’m in the prefect branch, trying out a workflow rewrite using Prefect.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.