Asterisk Voicemail Transcription via IBM Cloud Speech-to-Text API
Update June 2022: I updated this guide to use the new IBM speech-to-text model. They have replaced the old “en-US_NarrowbandModel” model with a new “en-US_Telephony” model. Simply replace the old model name in your API request with the new model name. The old model is deprecated and scheduled to be removed from service on September 15, 2022.
Updated February 2020: I recently updated this guide to reflect setup on a new Asterisk Server running FreePBX 14 and Asterisk 13. Our current Asterisk server runs on a small Vultr Cloud Compute server (2 CPU, 4GB RAM, 80GB SSD) for $20/month. They give you the option to upload custom ISO (e.g. the latest FreePBX distro) or you can choose an ISO from their library (e.g. a recent FreePBX distro). Pro Tip- After you setup your server, don’t forget to remove the ISO (aka CD image) from your server configuration so it does not keep booting to the ISO after each reboot.
Introduction
This guide briefly explains how to configure Asterisk PBX to send voicemail as an email with message as mp3 attachment and a text transcription via the IBM Cloud Speech-to-Text API. IBM provides 500 minutes of Speech-to-Text for free per month, then charges $0.02/minute for each additional minute.
I was using this Asterisk Transcriptions with Google (backup link) script, but moved to the IBM service because Google has discontinued V1 of their Speech Recognition API and Google seems to charge for V2 of their API. I did not use AWS speech to text because their API does not provide an immediate transcription response. You have to upload the job and keep checking for results.
Sending MP3-formatted Voicemail Attachments
Go to the article Asterisk setup voicemail to send email with mp3 attachment and follow Nicolas Bernaerts’ instructions. You must successfully implement sending emails with MP3 attachments via his custom script (/usr/sbin/sendmailmp3) before you proceed.
Get IBM Cloud Credentials for the Speech-to-Text service
Go to the IBM Speech to Text page for current rates. Click “Get Started Free” to signup. Once signed up, go to your IBM Resource List and open your Speech to Text service to view your IBM credentials. Make note of your API KEY and your API URL.
Sending Transcription with Voicemail Attachments
- We will use the “curl” command to send your voicemail file to IBM and retrieve the transcription results. If the command is not already installed, install it now.
# Debian or Ubuntu OS
apt-get install curl
# Redhat or CentOS
yum install curl - Test IBM Cloud Speech-to-Text API by running the following command. Replace API_PASSWORD and API_URL with your IBM Cloud Credentials. Specify the full path to an existing recording from your Asterisk mailboxes (/var/spool/asterisk/voicemail/default/) or your Asterisk custom recordings (/var/lib/asterisk/sounds/custom/).
curl -X POST -u apikey:API_PASSWORD --header "Content-Type: audio/wav" --data-binary @/var/lib/asterisk/sounds/en/cdir-transferring-further-assistance.wav "API_URL/v1/recognize?model=en-US_Telephony&smart_formatting=true"
The example above should produce a result similar to the following:{
"results": [
{
"alternatives": [
{
"confidence": 0.9182463884353638,
"transcript": "we are now transferring you out of the company directory please hold on for further assistance "
}
],
"final": true
}
],
"result_index": 0
}
- Go to the article Asterisk Voicemail with Speech Recognition using Google API (backup link) and download/install Nicolas Bernaerts’ updated sendmail script. This will replace your existing WAV2MP3 script with a new script that can fetch transcriptions.
Note: You do NOT need to install the “sox” or “flac” packages they mention. Asterisk records voicemails in wav audio format. In the guide above, the files had to be converted to “.flac” format for Google, but IBM can use the original “.wav” audio file. - Replace the following lines:
# convert wav file to flac compatible for Google speech recognition
sox stream.part3.wav -r 16000 -b 16 -c 1 audio.flac vad reverse vad reverse lowpass -2 2500
# call Google Voice Recognition sending flac file as POST
curl --data-binary @audio.flac --header 'Content-type: audio/x-flac; rate=16000' 'https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&pfilter=0&lang='$LANGUAGE'&maxresults=1' 1>audio.txt
# extract the transcript and confidence results
FILETOOBIG=`cat audio.txt | grep "<HTML>"`
TRANSCRIPT=`cat audio.txt | cut -d"," -f3 | sed 's/^.*utterance\":\"\(.*\)\"$/\1/g'`
CONFIDENCE=`cat audio.txt | cut -d"," -f4 | sed 's/^.*confidence\":0.\([0-9][0-9]\).*$/\1/g'`
With these new lines:CURL_OPTS=""
API_PASSWORD="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
API_URL="https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
# Send WAV to IBM Cloud Speech to Text API. Must use "Narrowband" (aka 8k) model since WAV is 8k sample.
curl -s $CURL_OPTS -k -u apikey:$API_PASSWORD -X POST \
--limit-rate 40000 \
--header "Content-Type: audio/wav" \
--data-binary @stream.part3.wav \
"${API_URL}/v1/recognize?model=en-US_Telephony&smart_formatting=true" 1>audio.txt
# Extract transcript results from JSON response
TRANSCRIPT=`cat audio.txt | grep transcript | sed 's#^.*"transcript": "##g' | sed 's# "$##g'` - Optional – If you use a proxy server, you may need to specify curl options similar to those shown below.
CURL_OPTS="-x squid.example.org:3128"
- Optional Tip – I added the following option to my CURL OPTS on an older server that was receiving an SSL error. This forced TLSv1 to resolve the issue.
CURL_OPTS="--tlsv1 -x squid.example.org:3128"
- You may want to remove the extra lines related to FILETOOBIG and CONFIDENCE since we do not use these with IBM Cloud.
- Call into your Asterisk PBX and leave a message. Asterisk should now send you an email with a transcription!