Asterisk Voicemail Transcription via IBM Cloud Speech-to-Text API

Updated February 2020 – I recently updated this guide to reflect setup on a new Asterisk Server running FreePBX 14 and Asterisk 13. Our current Asterisk server runs on a small Vultr Cloud Compute server (2 CPU, 4GB RAM, 80GB SSD) for $20/month. They give you the option to upload custom ISO (e.g. the latest FreePBX distro) or you can choose an ISO from their library (e.g. a recent FreePBX distro). Pro Tip- After you setup your server, don’t forget to remove the ISO (aka CD image) from your server configuration so it does not keep booting to the ISO after each reboot.

Introduction

This guide briefly explains how to configure Asterisk PBX to send voicemail as an email with message as mp3 attachment and a text transcription via the IBM Cloud Speech-to-Text API. IBM provides 500 minutes of Speech-to-Text for free per month, then charges $0.02/minute for each additional minute.

I was using this Asterisk Transcriptions with Google script, but moved to the IBM service because Google has discontinued V1 of their Speech Recognition API and Google seems to charge for V2 of their API.  I did not use AWS speech to text because their API does not provide an immediate transcription response. You have to upload the job and keep checking for results.

Sending MP3-formatted Voicemail Attachments

Go to the article Asterisk setup voicemail to send email with mp3 attachment and follow Nicolas Bernaerts’ instructions. You must successfully implement sending emails with MP3 attachments via his custom script (/usr/sbin/sendmailmp3) before you proceed.

Get IBM Cloud Credentials for the Speech-to-Text service

Go to the IBM Speech to Text page for current rates. Click “Get Started Free” to signup. Once signed up, go to your IBM Resource List and open your Speech to Text service to view your IBM credentials. Make note of your API KEY and your API URL.

Sending Transcription with Voicemail Attachments

  1. We will use the “curl” command to send your voicemail file to IBM and retrieve the transcription results.  If the command is not already installed, install it now.
    # Debian or Ubuntu OS
    apt-get install curl
    # Redhat or CentOS
    yum install curl
  2. Test IBM Cloud Speech-to-Text API by running the following command.  Replace API_PASSWORD and API_URL with your IBM Cloud Credentials.  Specify the full path to an existing recording from your Asterisk mailboxes (/var/spool/asterisk/voicemail/default/) or your Asterisk custom recordings (/var/lib/asterisk/sounds/custom/).

    curl -X POST -u apikey:API_PASSWORD --header "Content-Type: audio/wav" --data-binary @/var/lib/asterisk/sounds/en/cdir-transferring-further-assistance.wav "API_URL/v1/recognize?model=en-US_NarrowbandModel&smart_formatting=true"

    The example above should produce a result similar to the following:

    {
    "results": [
    {
    "alternatives": [
    {
    "confidence": 0.9182463884353638,
    "transcript": "we are now transferring you out of the company directory please hold on for further assistance "
    }
    ],
    "final": true
    }
    ],
    "result_index": 0
    }
  3. Go to the article Asterisk Voicemail with Speech Recognition using Google API and download/install Nicolas Bernaerts’ updated sendmail script. This will replace your existing WAV2MP3 script with a new script that can fetch transcriptions.

    Note: You do NOT need to install the “sox” or “flac” packages they mention. Asterisk records voicemails in wav audio format. In the guide above, the files had to be converted to “.flac” format for Google, but IBM can use the original “.wav” audio file.
  4. Replace the following lines:
    # convert wav file to flac compatible for Google speech recognition
    sox stream.part3.wav -r 16000 -b 16 -c 1 audio.flac vad reverse vad reverse lowpass -2 2500
    # call Google Voice Recognition sending flac file as POST
    curl --data-binary @audio.flac --header 'Content-type: audio/x-flac; rate=16000' 'https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&pfilter=0&lang='$LANGUAGE'&maxresults=1' 1>audio.txt
    # extract the transcript and confidence results
    FILETOOBIG=`cat audio.txt | grep "<HTML>"`
    TRANSCRIPT=`cat audio.txt | cut -d"," -f3 | sed 's/^.*utterance\":\"\(.*\)\"$/\1/g'`
    CONFIDENCE=`cat audio.txt | cut -d"," -f4 | sed 's/^.*confidence\":0.\([0-9][0-9]\).*$/\1/g'`


    With these new lines:

    CURL_OPTS=""
    API_PASSWORD="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
    API_URL="https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"

    # Send WAV to IBM Cloud Speech to Text API. Must use "Narrowband" (aka 8k) model since WAV is 8k sample.
    curl -s $CURL_OPTS -k -u apikey:$API_PASSWORD -X POST \
    --limit-rate 40000 \
    --header "Content-Type: audio/wav" \
    --data-binary @stream.part3.wav \
    "${API_URL}/v1/recognize?model=en-US_NarrowbandModel&smart_formatting=true" 1>audio.txt

    # Extract transcript results from JSON response
    TRANSCRIPT=`cat audio.txt | grep transcript | sed 's#^.*"transcript": "##g' | sed 's# "$##g'`

  5. Optional – If you use a proxy server, you may need to specify curl options similar to those shown below.
    CURL_OPTS="-x squid.example.org:3128"
  6. Optional Tip – I added the following option to my CURL OPTS on an older server that was receiving an SSL error. This forced TLSv1 to resolve the issue.
    CURL_OPTS="--tlsv1 -x squid.example.org:3128"
  7. You may want to remove the extra lines related to FILETOOBIG and CONFIDENCE since we do not use these with IBM Cloud.
  8. Call into your Asterisk PBX and leave a message.  Asterisk should now send you an email with a transcription!

5 thoughts on “Asterisk Voicemail Transcription via IBM Cloud Speech-to-Text API

  1. We are using FreeBBX distro SHMZ6.6.
    Spent about 3 hours on this last night but got it working nicely on FreePBX 13! I own an IT company with 7 employees and this helps so much.

    Your instructions were spot on, just ran into some technical issues with the backend and figuring out IBM’s website. The instructions for the sendmailmp3 were a bit off for my distro but I made it work.

    It works perfectly! I’m so happy you wrote this!

  2. Great Article, Jason. It works pretty well..

    A couple of questions….

    I am using IBM Speech to Text API. If I want to get the best quality possible of the transcript… Would be better to convert the wav file to flac ? I guess it would not help, since sox will already lost quality, and Asterisk (afaik) does not allow to record directly on flac…

    And …. if I am detecting both speakers, if there any way, to easily have the transcript split by speaker ? Your “sed” commands its fantastic, but if I have a complex JSON file with speakers detection, it does not give me the speakers of each sentence….

    1. Hello Andres! Glad to hear this worked for you. Thank you for the positive feedback.

      Your original WAV file recording will give you the best possible quality. Converting the audio file to another format will only give you the same or WORSE audio quality! For example, attempting to convert the standard “narrowband” (8khz) file to a “wideband” (16khz) file will generally cause quality loss. I would only recommend converting the file as a last resort if the transcription service could NOT accept the original file.

      I believe you would need a much better recording quality to detect multiple speakers. Traditional “narrowband” VoIP audio quality (e.g. g711) is probably not sufficient for multiple speaker detection. Many newer VoIP handsets support “HD Voice” or “HD VoIP” (e.g. g722) “wideband” protocols that may be able to deliver high enough audio quality. If both callers are using HD phones and an HD protocol, it may be possible for Asterisk to record voicemail (or call) audio as a “wideband” (16khz) WAV instead of the default “narrowband” (8khz) WAV.

      Someone else had a similar (unanswered) question here: Record Voicemail in HD-Audio (16khz). I recommend posting details to their question in case someone is able to help both of you.

      Learn more about HD VoIP here: High Definition VoIP on Asterisk-FreePBX.

      Good luck! Jason

  3. This tutorial worked amazing for me however I noticed I get back a weird character such as
    %HESITATION this is a test voicemail %HESITATION if I have pauses or the word “umm”

    Is there a way to modify the script to just remove %HESITATION before building the email body?

    1. Kevin,

      Glad you asked. I also wondered about eliminating these placeholders, so I reviewed the Speech-to-Text API documentation and found a “smart_formatting” option that eliminated those markings. This option also did a better job of formatting the voicemail text.

      Old URL
      https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?continuous=true&model=en-US_NarrowbandModel

      New URL
      https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?model=en-US_NarrowbandModel&smart_formatting=true

      You’ll notice I also removed the “continuous” query parameter. When I manually queried the API, it said this parameter is no longer required.

      Jason

Leave a Reply

Your email address will not be published. Required fields are marked *