Transcriptions for the Progression Project Podcast

sflinux · April 12, 2024, 1:40am

I would like to contribute to the Progression Project Podcast by sharing transcriptions of the podcasts.  
My hope is that Erik has to put 0% effort into the transcriptions, so he can focus on what he does well.  
The plan is to use OpenAI Whisper to transcribe, which is not perfect, but does a pretty good job, though it doesn't decipher who says what.  
My original hope was to have a way to search the podcasts to make it easier to find stuff.  I would like to share the fruits of my labor.
My question to Erik and the forum, is where would be the best place for the transcriptions to reside?  
Here in this thread?  
Or somewhere else?
I started off with the Stacy Peralta episode, but figure it makes more sense to post them chronologically, so I am starting back at the beginning.  
I can have OpenAI Whisper transcribe a few a podcasts a day, so it will likely take a few months to catch up.  
If others want to crowd source, we can do it quicker.  
Below is what OpenAi Whisper transcribed from the Stacy Peralta interview (I don't know how to hyperlink on this forum).  
Ep 132 Stacy Peralta = https://app.box.com/s/8xi3snn4bt4p8oyscxppohncjz0l8yuy
If someone wants to proof read the file generated by OpenAI Whisper and human edit, I can update the original openAI file.  
It would be nice if the documents were living like Wikipedia.
The plan is to upload the transcriptions into this folder:
[Progression Project Podcast Transcription Folder] = https://app.box.com/s/wfyu0dut0dingm31gfm4hya4ec6vxicd

Erik · April 12, 2024, 1:51am

I love it! I could start a new forum category which could be an easily managed. If you could do the back catalog I can upload transcripts for the new episodes. I’m using an editor now that creates transcripts so I’ll have them moving forward.

Thanks! E

Matt · April 12, 2024, 9:16am

I think it should be worth doing, especially with the way software is improving

@sflinux your example would be better if it distinguished speakers, makes it 10x easier to read. Really hard to read at the moment

Erik · April 12, 2024, 12:47pm

The way I’m doing it now it’s broken up by speaker. I will post future episodes in full transcripts.

sflinux · April 13, 2024, 4:35am

@Matt,
Thanks for the feedback.
I believe the technical term is Speaker Diarization.
The script WhisperNote is what we need:
“A simple Python script to Transcribe audio and perform Speaker Diarization using OpenAI’s Whisper and pyannote.audio”

I will test it out.

Erik · April 13, 2024, 11:42am

You guys see that Apple Podcasts now provides transcripts? Pretty cool feature. But I think think it’s a good idea to post them here.

sflinux · April 13, 2024, 3:54pm

That is a cool new feature that I was unaware of, thanks for sharing Erik:

You can search within the episode.
Though I wish the “select all” feature worked to copy the text for exporting with ease on Mac.

Erik · April 25, 2024, 3:12pm

Just launched the Transcripts category and posted the last episode.

Topic		Replies	Views
About Transcripts Podcast Transcripts	0	107	April 25, 2024
About the Podcast Transcripts category Podcast Transcripts	0	44	April 25, 2024
Stacy Peralta Podcast Podcast Discussion	29	973	April 13, 2024
Google Podcasts - what app do peeps listen to the poddie on? Podcast Discussion	5	207	May 2, 2024
Foiling Podcasts List	58	4010	July 12, 2025

Transcriptions for the Progression Project Podcast

Related topics