Asterisk Voicemail Transcription via IBM Bluemix Speech-to-Text API

This guide briefly explains how to configure Asterisk PBX to send voicemail as an email with message as mp3 attachment and a text transcription via the IBM Bluemix Speech-to-Text API.  I implemented the IBM service because Google has discontinued V1 of their Speech Recognition API and Google seems to charge for V2 of their API.  IBM provides 1000 minutes of Speech-to-Text for free, then charges $0.02/minute.

Sending MP3-formatted Voicemail Attachments

Go to Asterisk setup voicemail to send email with mp3 attachment and follow the instructions. You must successfully implement sending emails with MP3 attachments via their custom script (/usr/sbin/sendmailmp3) before you proceed.

Get a Bluemix ID and Credentials for the Speech-to-Text service

Follow instructions at Obtaining Bluemix Credentials to setup credentials for IBM’s Speech-to-Text API.

Sending Transcription with Voicemail Attachments

  1. We will use the “curl” command to send your voicemail file to IBM and retrieve the transcription results.  If the command is not already installed, install it now.
    # Debian or Ubuntu OS
    apt-get install curl
    # Redhat or CentOS
    yum install curl
  2. Test IBM Bluemix Speech-to-Text API.  Replace API_USERNAME and API_PASSWORD with your Bluemix Credentials.  Specify the full path to an existing recording from your Asterisk mailboxes (/var/spool/asterisk/voicemail/default/) or your Asterisk custom recordings (/var/lib/asterisk/sounds/custom/).
    curl -k -u API_USERNAME:API_PASSWORD -X POST \
        --limit-rate 40000 \
        --header "Content-Type: audio/wav" \
        --data-binary @/var/lib/asterisk/sounds/cdir-transferring-further-assistance.wav \
        "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?continuous=true&model=en-US_NarrowbandModel"

    The example above produced the following result:

    {
     "results": [
      {
       "alternatives": [
       {
        "confidence": 0.9182463884353638, 
        "transcript": "we are now transferring you out of the company directory please hold on for further assistance "
       }
       ], 
       "final": true
      }
     ], 
     "result_index": 0
    }
  3. Go to Asterisk Voicemail with Speech Recognition using Google API and download/install their updated sendmail script.Note: You do NOT need to install the “sox” or “flac” packages they mention. Asterisk records voicemails in wav audio format. In the guide above, the files had to be converted to “.flac” format for Google, but IBM can use the original “.wav” audio file.
  4. Replace the following lines:
    # convert wav file to flac compatible for Google speech recognition
    sox stream.part3.wav -r 16000 -b 16 -c 1 audio.flac vad reverse vad reverse lowpass -2 2500
    
    # call Google Voice Recognition sending flac file as POST
    curl --data-binary @audio.flac --header 'Content-type: audio/x-flac; rate=16000' 'https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&pfilter=0&lang='$LANGUAGE'&maxresults=1' 1>audio.txt
    
    # extract the transcript and confidence results
    FILETOOBIG=`cat audio.txt | grep "<HTML>"`
    TRANSCRIPT=`cat audio.txt | cut -d"," -f3 | sed 's/^.*utterance\":\"\(.*\)\"$/\1/g'`
    CONFIDENCE=`cat audio.txt | cut -d"," -f4 | sed 's/^.*confidence\":0.\([0-9][0-9]\).*$/\1/g'`

    With these new lines:

    CURL_OPTS=""
    API_USERNAME="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
    API_PASSWORD="XXXXXXXXXXXX"
    
    # Send WAV to Watson Speech to Text API. Must use "Narrowband" (aka 8k) model since WAV is 8k sample.
    curl -s $CURL_OPTS -k -u $API_USERNAME:$API_PASSWORD -X POST \
        --limit-rate 40000 \
        --header "Content-Type: audio/wav" \
        --data-binary @stream.part3.wav \
        "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?continuous=true&model=en-US_NarrowbandModel" 1>audio.txt
    
    # Extract transcript results from JSON response
    TRANSCRIPT=`cat audio.txt | grep transcript | sed 's#^.*"transcript": "##g' | sed 's# "$##g'`
  5. If you use a proxy server, you may need to specify curl options similar to those shown below.
    CURL_OPTS="-x squid.example.org:3128"
  6. You may want to remove the extra lines related to FILETOOBIG and CONFIDENCE since they are no longer used.
  7. Call into your Asterisk PBX and leave a message.  Asterisk should now send you an email with a transcription!

3 thoughts on “Asterisk Voicemail Transcription via IBM Bluemix Speech-to-Text API

  1. We are using FreeBBX distro SHMZ6.6.
    Spent about 3 hours on this last night but got it working nicely on FreePBX 13! I own an IT company with 7 employees and this helps so much.

    Your instructions were spot on, just ran into some technical issues with the backend and figuring out IBM’s website. The instructions for the sendmailmp3 were a bit off for my distro but I made it work.

    It works perfectly! I’m so happy you wrote this!

  2. Great Article, Jason. It works pretty well..

    A couple of questions….

    I am using IBM Speech to Text API. If I want to get the best quality possible of the transcript… Would be better to convert the wav file to flac ? I guess it would not help, since sox will already lost quality, and Asterisk (afaik) does not allow to record directly on flac…

    And …. if I am detecting both speakers, if there any way, to easily have the transcript split by speaker ? Your “sed” commands its fantastic, but if I have a complex JSON file with speakers detection, it does not give me the speakers of each sentence….

    1. Hello Andres! Glad to hear this worked for you. Thank you for the positive feedback.

      Your original WAV file recording will give you the best possible quality. Converting the audio file to another format will only give you the same or WORSE audio quality! For example, attempting to convert the standard “narrowband” (8khz) file to a “wideband” (16khz) file will generally cause quality loss. I would only recommend converting the file as a last resort if the transcription service could NOT accept the original file.

      I believe you would need a much better recording quality to detect multiple speakers. Traditional “narrowband” VoIP audio quality (e.g. g711) is probably not sufficient for multiple speaker detection. Many newer VoIP handsets support “HD Voice” or “HD VoIP” (e.g. g722) “wideband” protocols that may be able to deliver high enough audio quality. If both callers are using HD phones and an HD protocol, it may be possible for Asterisk to record voicemail (or call) audio as a “wideband” (16khz) WAV instead of the default “narrowband” (8khz) WAV.

      Someone else had a similar (unanswered) question here: Record Voicemail in HD-Audio (16khz). I recommend posting details to their question in case someone is able to help both of you.

      Learn more about HD VoIP here: High Definition VoIP on Asterisk-FreePBX.

      Good luck! Jason

Leave a Reply

Your email address will not be published. Required fields are marked *