In this project I built a semi-automated workflow to download podcasts, create transcripts and generate insights from those podcasts. The insights can be emailed or distributed in any other way. ## Transcribing Podcasts (or any audio file!) for free Nvidia published a multilingual speed-to-text model called [Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3), and it's open source. It was originally built for Nvidia's own CUDA architecture (which means you would need a Nvidia GPU to use it), but thanks to the [parakeet-mlx](https://github.com/senstella/parakeet-mlx) project, it also works perfectly on Apple's M-series chips on a Mac. It actually runs so well that it allows you to quickly transcribe any audio file with a high level of accuracy. Compared to the transcription APIs offered by various AI vendors, Parakeet is free - you only pay for electricity (well...and the Mac). I used Claude Code to create a little Python script that takes any audio file, transcribes it using parakeet-mlx and outputs a markdown file. This script can serve as the foundation for other, more complex use cases, like a local transcription web app (like my own [EchoScribe](https://github.com/michaeldiestelberg/EchoScribe)) or generating summaries and insights with AI. The script is [available on GitHub](https://github.com/michaeldiestelberg/local-podcast-transcription). ![[local-audio-transcription.png]] ### Using the script in Raycast Since I’m a big fan of Raycast, I also created a copy of the script that has the required configuration to serve as a script command in Raycast. This allows you to easily start a transcription from anywhere just by typing “transcribe” and providing the path to the file you want to transcribe. More detailed instructions for setting it up can be found in the README file inside the Github repo. ![[podcast transcription raycast.png]] ## Extracting Insights from Podcasts I'm using various AI models to extract insights from the transcribed podcasts. Instead of listening to those podcasts for many hours I will get a summary and a list of insights that I can read in a few minutes. I was hoping to use local AI models for this job, but even on a modern M4 Pro chip from Apple those models struggle with large amounts of content due to limited memory. It’s not that a local model like qwen3 didn’t do the job at all - the quality of the generated insights from a large model like GPT-5 or Claude Opus was just much better (even when using the smaller models like GPT-5 mini). Since my ultimate goal is to automate the entire process from podcast transcription to insight generation, I built a little AI CLI tool that allows me to process the podcast transcripts (or anything else really) on the command line. I uploaded the project on GitHub for everyone to use or extend: [AI CLI on Github](https://github.com/michaeldiestelberg/ai-cli). ![[ai-cli.png]]