Printer Friendly Version Printer Friendly Version

Asterisk Voicemail Transcription via IBM Bluemix Speech-to-Text API

This guide briefly explains how to configure Asterisk PBX to send voicemail as an email with message as mp3 attachment and a text transcription via the IBM Bluemix Speech-to-Text API.  I implemented the IBM service because Google has discontinued V1 of their Speech Recognition API and Google seems to charge for V2 of their API.  IBM provides 1000 minutes of Speech-to-Text for free, then charges $0.02/minute.

Sending MP3-formatted Voicemail Attachments

Go to Asterisk setup voicemail to send email with mp3 attachment and follow the instructions. You must successfully implement sending emails with MP3 attachments via their custom script (/usr/sbin/sendmailmp3) before you proceed.

Get a Bluemix ID and Credentials for the Speech-to-Text service

Follow instructions at Obtaining Bluemix Credentials to setup credentials for IBM’s Speech-to-Text API.

Sending Transcription with Voicemail Attachments

  1. We will use the “curl” command to send your voicemail file to IBM and retrieve the transcription results.  If the command is not already installed, install it now.
    # Debian or Ubuntu OS
    apt-get install curl
    # Redhat or CentOS
    yum install curl
  2. Test IBM Bluemix Speech-to-Text API.  Replace API_USERNAME and API_PASSWORD with your Bluemix Credentials.  Specify the full path to an existing recording from your Asterisk mailboxes (/var/spool/asterisk/voicemail/default/) or your Asterisk custom recordings (/var/lib/asterisk/sounds/custom/).
    curl -k -u API_USERNAME:API_PASSWORD -X POST \
        --limit-rate 40000 \
        --header "Content-Type: audio/wav" \
        --data-binary @/var/lib/asterisk/sounds/cdir-transferring-further-assistance.wav \
        "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?continuous=true&model=en-US_NarrowbandModel"

    The example above produced the following result:

    {
     "results": [
      {
       "alternatives": [
       {
        "confidence": 0.9182463884353638, 
        "transcript": "we are now transferring you out of the company directory please hold on for further assistance "
       }
       ], 
       "final": true
      }
     ], 
     "result_index": 0
    }
  3. Go to Asterisk Voicemail with Speech Recognition using Google API and download/install their updated sendmail script.Note: You do NOT need to install the “sox” or “flac” packages they mention. Asterisk records voicemails in wav audio format. In the guide above, the files had to be converted to “.flac” format for Google, but IBM can use the original “.wav” audio file.
  4. Replace the following lines:
    # convert wav file to flac compatible for Google speech recognition
    sox stream.part3.wav -r 16000 -b 16 -c 1 audio.flac vad reverse vad reverse lowpass -2 2500
    
    # call Google Voice Recognition sending flac file as POST
    curl --data-binary @audio.flac --header 'Content-type: audio/x-flac; rate=16000' 'https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&pfilter=0&lang='$LANGUAGE'&maxresults=1' 1>audio.txt
    
    # extract the transcript and confidence results
    FILETOOBIG=`cat audio.txt | grep "<HTML>"`
    TRANSCRIPT=`cat audio.txt | cut -d"," -f3 | sed 's/^.*utterance\":\"\(.*\)\"$/\1/g'`
    CONFIDENCE=`cat audio.txt | cut -d"," -f4 | sed 's/^.*confidence\":0.\([0-9][0-9]\).*$/\1/g'`

    With these new lines:

    CURL_OPTS=""
    API_USERNAME="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
    API_PASSWORD="XXXXXXXXXXXX"
    
    # Send WAV to Watson Speech to Text API. Must use "Narrowband" (aka 8k) model since WAV is 8k sample.
    curl -s $CURL_OPTS -k -u $API_USERNAME:$API_PASSWORD -X POST \
        --limit-rate 40000 \
        --header "Content-Type: audio/wav" \
        --data-binary @stream.part3.wav \
        "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?continuous=true&model=en-US_NarrowbandModel" 1>audio.txt
    
    # Extract transcript results from JSON response
    TRANSCRIPT=`cat audio.txt | grep transcript | sed 's#^.*"transcript": "##g' | sed 's# "$##g'`
  5. If you use a proxy server, you may need to specify curl options similar to those shown below.
    CURL_OPTS="-x squid.example.org:3128"
  6. You may want to remove the extra lines related to FILETOOBIG and CONFIDENCE since they are no longer used.
  7. Call into your Asterisk PBX and leave a message.  Asterisk should now send you an email with a transcription!

Leave a Reply

Your email address will not be published. Required fields are marked *