Transcriptions for the Progression Project Podcast

I would like to contribute to the Progression Project Podcast by sharing transcriptions of the podcasts.  
My hope is that Erik has to put 0% effort into the transcriptions, so he can focus on what he does well.  
The plan is to use OpenAI Whisper to transcribe, which is not perfect, but does a pretty good job, though it doesn't decipher who says what.  
My original hope was to have a way to search the podcasts to make it easier to find stuff.  I would like to share the fruits of my labor.
My question to Erik and the forum, is where would be the best place for the transcriptions to reside?  
Here in this thread?  
Or somewhere else?
I started off with the Stacy Peralta episode, but figure it makes more sense to post them chronologically, so I am starting back at the beginning.  
I can have OpenAI Whisper transcribe a few a podcasts a day, so it will likely take a few months to catch up.  
If others want to crowd source, we can do it quicker.  
Below is what OpenAi Whisper transcribed from the Stacy Peralta interview (I don't know how to hyperlink on this forum).  
Ep 132 Stacy Peralta = https://app.box.com/s/8xi3snn4bt4p8oyscxppohncjz0l8yuy
If someone wants to proof read the file generated by OpenAI Whisper and human edit, I can update the original openAI file.  
It would be nice if the documents were living like Wikipedia.
The plan is to upload the transcriptions into this folder:
[Progression Project Podcast Transcription Folder] = https://app.box.com/s/wfyu0dut0dingm31gfm4hya4ec6vxicd


1 Like

I love it! I could start a new forum category which could be an easily managed. If you could do the back catalog I can upload transcripts for the new episodes. I’m using an editor now that creates transcripts so I’ll have them moving forward.

Thanks! E

1 Like

I think it should be worth doing, especially with the way software is improving

@sflinux your example would be better if it distinguished speakers, makes it 10x easier to read. Really hard to read at the moment

The way I’m doing it now it’s broken up by speaker. I will post future episodes in full transcripts.

1 Like

@Matt,
Thanks for the feedback.
I believe the technical term is Speaker Diarization.
The script WhisperNote is what we need:
“A simple Python script to Transcribe audio and perform Speaker Diarization using OpenAI’s Whisper and pyannote.audio”

I will test it out.

You guys see that Apple Podcasts now provides transcripts? Pretty cool feature. But I think think it’s a good idea to post them here.

That is a cool new feature that I was unaware of, thanks for sharing Erik:

You can search within the episode.
Though I wish the “select all” feature worked to copy the text for exporting with ease on Mac.

Just launched the Transcripts category and posted the last episode.

1 Like