Skip to content

Martuka/speech-to-text

Repository files navigation

speech-to-text

This repo uses some of the examples provided by Google to use their Speech to Text API.

There is also a short reminder on how to get prepared to use it.


  1. Ensure you have gcloud set up and that you have created and activated a service account. Follow the Quickstart guide to set it up and get your credential key. You will need a credit card. The first hour is free, then it's around 1.44$ per hour. Keep in mind that the first time you use Google API, you can get 300$ of free credits and one year to spend it. Don't miss this opportunity if you create a Google API project.
    Once you have downloaded your credential key (a .json file), you will need to set the GOOGLE_APPLICATION_CREDENTIALS variable to the path to the file. You could set it in your .bashrc file, but then if you were to use multiple Google API paid service, it could conflict. A simple solution could be to set it in your python script directly, and this is what we are going to do.
    At the beginning of each script, replace the following line
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'REPLACE_WITH_PATH_TO_CREDENTIALS'

with the right value. For instance, in my case it would be

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/Users/me/Developer/nlp/STT/CCC-media-transcription.json'
  1. Now that everything is prepared, we can use the example code provided by Google to transcript a speech. There are three kinds of scripts used in this example. One for audio files up to one minute long, one for audio files up to 180 minutes long and one for streaming input, when the source is a microphone. For audio files longer than 1 minute, you need to push the file in a bucket on the Google Cloud Storage and pass the url to the transcribe_async.py script.
    You can get all the details and differences looking at the Google documentation.
    We will just cover the case where audio file are shorter than one minute. To do this, you must first transcode your audio file to a FLAC format with only one channel (mono). You can do this with ffmpeg very easily with the following command:
ffmpeg -i your_audio_file.wav -ac 1 output_file.flac

Then, you can use the transcribe.py script to get a transcript of your audio file. It will print the result in the console as well as in a result.txt file.
Finally, run

python3 transcribe.py output_file.flac

or

python3 transcribe_async.py gsc://your_bucket_name/your_file.flac

for a longer file and enjoy

About

Speech to Text Google API example

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages