
I first discovered Whisper, an AI transcription tool, through a Thomas Frank video demonstrating how to automate the process of transcribing and summarizing voice recordings using Dropbox, Pipe Dream, Whisper, ChatGPT, and Notion. Intrigued by the idea, I decided to implement a similar system using Whisper running locally to convert my Apple Voice Memos into text that I can summarize with ChatGPT or just copy into Apple Notes. The process is more manual, but I prefer that route for the context of my work.
To set up Whisper, I researched the required steps and installed the necessary components. Below are the unique steps I discovered for working with Apple Voice Memos:
1. Remove spaces in M4A files. When naming files on iPhone, it is easiest to use spaces for the name, but this causes issues when trying to process the files with FFmpeg and Whisper. My approach was to replace spaces with underscores.
2. The M4A file must be converted into an MP3. To do this, use FFmpeg in the command line to specify the file and folder, then convert the file to MP3 format.
3. Run the MP3 file through Whisper AI to transcribe the audio into text. I've tried both the base and medium models. I found that the medium model provided better punctuation, though it took significantly longer to run, and the base is typically good enough, so I use base most of the time.
4. Summarize the transcribed text by copying it into Chat GPT. At the time of writing, I have GPT Plus and access to GPT-4, which I prefer to use for summarizing as it is the up-to-date model, but that means this is a manual process for now.
To further automate the process, I sought to create a script that would allow me to drop my audio files into a folder and run a batch transcription process. I leveraged ChatGPT to write a script to get me started. The script required some troubleshooting and back-and-forth with ChatGPT, but I eventually arrived at a working solution. The [GitHub Repo](https://github.com/your_repo_url) linked here houses the code that automates the following tasks:
* Searches for input files in a specified directory, adds underscores to filenames to prevent issues with Whisper AI and FFmpeg.
* Runs FFmpeg to convert the M4A file to create an MP3 version.
* Passes the MP3 file to Whisper AI for transcription.
* Deletes the MP3 file and moves the M4A file to another directory to prevent processing it again.
This entire process took only two days to complete, from discovering Whisper to having a fully functional transcription tool at my disposal. The ability to transcribe voice memos without having to listen and type them up is incredibly convenient and opens up using Voice Memos as a regular process for me. The speed at which I was able to build out this tool, with the help of ChatGPT, is a testament to the possibility of leveraging AI tools.
I don't usually post about code-based projects, but it is so impactful that I wanted to share. I hope you find this guide useful and wish you the best in your endeavors!
Links Below
Thomas Frank Video - How I Use AI to take perfect notes...without typing