Articles

Asterisk Voicemail Transcription via IBM Cloud Speech-to-Text API

Update June 2022: I updated this guide to use the new IBM speech-to-text model. They have replaced the old “en-US_NarrowbandModel” model with a new “en-US_Telephony” model. Simply replace the old model name in your API request with the new model name. The old model is deprecated and scheduled to be removed from service on September 15, 2022.

Updated February 2020: I recently updated this guide to reflect setup on a new Asterisk Server running FreePBX 14 and Asterisk 13. Our current Asterisk server runs on a small Vultr Cloud Compute server (2 CPU, 4GB RAM, 80GB SSD) for $20/month. They give you the option to upload custom ISO (e.g. the latest FreePBX distro) or you can choose an ISO from their library (e.g. a recent FreePBX distro). Pro Tip- After you setup your server, don’t forget to remove the ISO (aka CD image) from your server configuration so it does not keep booting to the ISO after each reboot.

Introduction

This guide briefly explains how to configure Asterisk PBX to send voicemail as an email with message as mp3 attachment and a text transcription via the IBM Cloud Speech-to-Text API. IBM provides 500 minutes of Speech-to-Text for free per month, then charges $0.02/minute for each additional minute.

I was using this Asterisk Transcriptions with Google (backup link) script, but moved to the IBM service because Google has discontinued V1 of their Speech Recognition API and Google seems to charge for V2 of their API.  I did not use AWS speech to text because their API does not provide an immediate transcription response. You have to upload the job and keep checking for results.

Sending MP3-formatted Voicemail Attachments

Go to the article Asterisk setup voicemail to send email with mp3 attachment and follow Nicolas Bernaerts’ instructions. You must successfully implement sending emails with MP3 attachments via his custom script (/usr/sbin/sendmailmp3) before you proceed.

Get IBM Cloud Credentials for the Speech-to-Text service

Go to the IBM Speech to Text page for current rates. Click “Get Started Free” to signup. Once signed up, go to your IBM Resource List and open your Speech to Text service to view your IBM credentials. Make note of your API KEY and your API URL.

Sending Transcription with Voicemail Attachments

  1. We will use the “curl” command to send your voicemail file to IBM and retrieve the transcription results.  If the command is not already installed, install it now.
    # Debian or Ubuntu OS
    apt-get install curl
    # Redhat or CentOS
    yum install curl
  2. Test IBM Cloud Speech-to-Text API by running the following command.  Replace API_PASSWORD and API_URL with your IBM Cloud Credentials.  Specify the full path to an existing recording from your Asterisk mailboxes (/var/spool/asterisk/voicemail/default/) or your Asterisk custom recordings (/var/lib/asterisk/sounds/custom/).

    curl -X POST -u apikey:API_PASSWORD --header "Content-Type: audio/wav" --data-binary @/var/lib/asterisk/sounds/en/cdir-transferring-further-assistance.wav "API_URL/v1/recognize?model=en-US_Telephony&smart_formatting=true"

    The example above should produce a result similar to the following:

    {
    "results": [
    {
    "alternatives": [
    {
    "confidence": 0.9182463884353638,
    "transcript": "we are now transferring you out of the company directory please hold on for further assistance "
    }
    ],
    "final": true
    }
    ],
    "result_index": 0
    }
  3. Go to the article Asterisk Voicemail with Speech Recognition using Google API (backup link) and download/install Nicolas Bernaerts’ updated sendmail script. This will replace your existing WAV2MP3 script with a new script that can fetch transcriptions.

    Note: You do NOT need to install the “sox” or “flac” packages they mention. Asterisk records voicemails in wav audio format. In the guide above, the files had to be converted to “.flac” format for Google, but IBM can use the original “.wav” audio file.
  4. Replace the following lines:
    # convert wav file to flac compatible for Google speech recognition
    sox stream.part3.wav -r 16000 -b 16 -c 1 audio.flac vad reverse vad reverse lowpass -2 2500
    # call Google Voice Recognition sending flac file as POST
    curl --data-binary @audio.flac --header 'Content-type: audio/x-flac; rate=16000' 'https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&pfilter=0&lang='$LANGUAGE'&maxresults=1' 1>audio.txt
    # extract the transcript and confidence results
    FILETOOBIG=`cat audio.txt | grep "<HTML>"`
    TRANSCRIPT=`cat audio.txt | cut -d"," -f3 | sed 's/^.*utterance\":\"\(.*\)\"$/\1/g'`
    CONFIDENCE=`cat audio.txt | cut -d"," -f4 | sed 's/^.*confidence\":0.\([0-9][0-9]\).*$/\1/g'`


    With these new lines:

    CURL_OPTS=""
    API_PASSWORD="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
    API_URL="https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"

    # Send WAV to IBM Cloud Speech to Text API. Must use "Narrowband" (aka 8k) model since WAV is 8k sample.
    curl -s $CURL_OPTS -k -u apikey:$API_PASSWORD -X POST \
    --limit-rate 40000 \
    --header "Content-Type: audio/wav" \
    --data-binary @stream.part3.wav \
    "${API_URL}/v1/recognize?model=en-US_Telephony&smart_formatting=true" 1>audio.txt

    # Extract transcript results from JSON response
    TRANSCRIPT=`cat audio.txt | grep transcript | sed 's#^.*"transcript": "##g' | sed 's# "$##g'`

  5. Optional – If you use a proxy server, you may need to specify curl options similar to those shown below.
    CURL_OPTS="-x squid.example.org:3128"
  6. Optional Tip – I added the following option to my CURL OPTS on an older server that was receiving an SSL error. This forced TLSv1 to resolve the issue.
    CURL_OPTS="--tlsv1 -x squid.example.org:3128"
  7. You may want to remove the extra lines related to FILETOOBIG and CONFIDENCE since we do not use these with IBM Cloud.
  8. Call into your Asterisk PBX and leave a message.  Asterisk should now send you an email with a transcription!

MacBook Early 2011 OSX Cheat Sheet

These OSX startup key combinations are valid on a MacBook Pro (Early 2011).  I had the misfortune of OSX 10.10 upgrade problems and had to use various commands to recover and perform a clean install of OSX 10.10.

  • Startup Manager (Press “Option” key during startup) will let you choose to boot from a specific OSX volume, USB volume, or network volume.
  • Internet Recovery (Command + Option + R) will reinstall OS version that originally came with computer.
  • Recovery Partition (Command + R) will reinstall OS version that is currently installed on your hard disk.
  • Safe Mode (Press “Shift” during startup)
  • Verbose Mode (Command + V) will display boot messages.
  • Save + Verbose Mode (Shift + Command + V)
  • Single User Mode (Command + S) will boot to a single user shell
  • Diagnostics (Option + D) will download and run Apple Hardware Test
  • Reset NVRAM (Option + Command + P + R)
  • Reset SMC (Shift + Control + Option + Power)

More key combinations are listed in Apple article 201255, such as:

  • Boot CD (Press “C” key during startup) will start from bootable CD/DVD/USB.
  • Eject CD (Press “Eject” or “F12” or hold mouse button or hold trackpad button) to force MacBook to eject disc.
  • Net Boot (Press “N” key during startup) will attempt to start from a network server.
  • Net Boot (Option + N) will start from a network server
  • Target Disk Mode (Press “T” key during startup) will allow the hard disk in this computer to be used as an external hard disk in another computer, when the two computers are connected via FireWire.

The help article above also includes the following tip:  “For the best experience with startup keys, press the keys immediately after the startup tone plays.”

Proper Mail Date Header Formatting (RFC 4021, RFC 2822, RFC 822) and Analysis of 132k Date Headers

Emails generated by our applications were displaying universal time instead of a user’s local time when viewed in Thunderbird.  Our servers use universal time.  The emails we send to our users include a date header with UTC.  Other email clients automatically convert this to the user’s local time.

I compared the date header in our messages to date headers in other messages in my inbox.  I found that there were a variety of formats and wasn’t sure which one was correct, so I turned to the RFCs.  I found RFC 4021 first, which referred to RFC 2822 and RFC 822.  The proper “date-time” syntax was originally defined in RFC 822 section 5.1. and later clarified in RFC 2822 section 3.3.

Even the clarifications in RFC 2822 allowed several different formats, so I analyzed 132,037 date headers on one of our mail servers, hoping to determine if a specific format was most common.  I found that the following format is by far the most common:

Date: Tue, 18 Nov 2014 15:57:11 +0000

 

The day of month and hour of day are both two digits.  The time zone is a 4-digit offset prefixed with either a “+” or “-“.  In the example above, the server is set to GMT or UTC, so the offset is “+0000”.  Be aware that “+0000” and “-0000” are not handled the same!  RFC 2822 section 3.3 says that offset “-0000” should be treated as an unknown timezone.

The following PHP code will output the format above:

echo "Date: " . date("D, d M Y H:i:s O");

It also seems acceptable to place the time zone or a comment in parenthesis after the date.  Here are a few examples:

Date: Fri, 03 Dec 2010 16:02:30 -0600 (CST)
Date: Fri, 9 Sep 2005 16:38:47 -0400 (added by postmaster@attrh1i.attrh.att.com)

I have attached the data file that contains 130,606 date headers (after removing 2k of mangled records that included other email contents) in case you would like to perform additional analysis.

smtp-header-dates-20141117.txt

Squid3 error: “swap.state.new: (122) Disk quota exceeded” RESOLVED

We maintain a small outbound Squid proxy/cache server for our VPN users.  The Squid3 service on Debian 7 (Wheezy) died.  When we restarted the service, it would immediately termiate with these errors:

swap.state.new: (122) Disk quota exceeded
FATAL: storeDirOpenTmpSwapLog: Failed to open swap log.
Squid Cache (Version 3.1.20): Terminated abnormally.

The Squid service runs in a small Linux Vserver VPS with various CPU, memory, and disk restrictions.  I found that the VPS had reached its inode limit and could not create any new files.  Increasing the inode limit and restarting the VPS guest immediately resolved the problem.

Full error log output:

sudo tail /var/log/squid3/cache.log

2014/11/10 22:52:37| Starting Squid Cache version 3.1.20 for x86_64-pc-linux-gnu...
2014/11/10 22:52:37| Process ID 25908
2014/11/10 22:52:37| With 1024 file descriptors available
2014/11/10 22:52:37| Initializing IP Cache...
2014/11/10 22:52:37| DNS Socket created at [::], FD 7
2014/11/10 22:52:37| DNS Socket created at 0.0.0.0, FD 8
2014/11/10 22:52:37| Adding domain dnihost.net from /etc/resolv.conf
2014/11/10 22:52:37| Adding nameserver X.X.X.X from /etc/resolv.conf
2014/11/10 22:52:37| Adding nameserver Y.Y.Y.Y from /etc/resolv.conf
2014/11/10 22:52:37| helperOpenServers: Starting 5/5 'digest_pw_auth' processes
2014/11/10 22:52:37| Unlinkd pipe opened on FD 23
2014/11/10 22:52:37| Local cache digest enabled; rebuild/rewrite every 3600/3600 sec
2014/11/10 22:52:37| Store logging disabled
2014/11/10 22:52:37| Swap maxSize 6291456 + 16384 KB, estimated 485218 objects
2014/11/10 22:52:37| Target number of buckets: 24260
2014/11/10 22:52:37| Using 32768 Store buckets
2014/11/10 22:52:37| Max Mem size: 16384 KB
2014/11/10 22:52:37| Max Swap size: 6291456 KB
2014/11/10 22:52:37| /cache/squid/swap.state.new: (122) Disk quota exceeded
FATAL: storeDirOpenTmpSwapLog: Failed to open swap log.
Squid Cache (Version 3.1.20): Terminated abnormally.
CPU Usage: 0.010 seconds = 0.005 user + 0.005 sys
Maximum Resident Size: 28096 KB
Page faults with physical i/o: 0

How to increase Linux Vserver VPS guest inode limit:

# View existing setting (500k files)
cat /etc/vservers/GUEST-NAME/dlimits/root/inodes_total
500000
# Edit setting
sudo vi /etc/vservers/GUEST-NAME/dlimits/root/inodes_total
# View new setting (1M files)
cat /etc/vservers/GUEST-NAME/dlimits/root/inodes_total
1000000
# Restart VPS guest
sudo vserver GUEST-NAME restart

Debian Wheezy + Postfix Cluebringer (Policyd v2) + IPv6

I enabled Postfix Cluebringer (policyd v2) on Debian Wheezy so that I could place restrictions on customer outbound mail traffic.  I disabled all modules except core and quota.  I configured two quota rules (X messages/day and X MB/day).  Mail was flowing.  The quota status was being tracked in SQL.  Everything seemed to be working fine.

Several customers were unable to send messages.  The affected customers were connecting to our SMTP servers via IPv6.  I can now confirm that postfix-cluebringer v2.0.10 does *NOT* include support for IPv6.  The following message was logged to the cbpolicyd.log file each time a customer tried to send a message via SMTP over IPv6:

[2014/03/09-17:49:19 – 4009] [CBPOLICYD] ERROR: Protocol data validation error, required parameter ‘client_address’ was not found or invalid format

I confirmed that policyd v2.0.14 does NOT support IPv6 either, so don’t waste time trying to upgrade v2.0.10 to v2.0.14.

I manually upgraded to the v2.1.x pre-release (policyd v2.1.x-201310261831), which DOES support IPv6.  The SQL schema had MANY minor changes between v2.0.x and v2.1.x.  I compared schemas and determined that the SQL upgrade guide was incomplete.  Rather than trying to deal with upgrade issues, I created a new database and manually recreated our quota records/limits.  If you can’t afford to loose your customer quota status/history (see quotas_tracking table), be sure to backup that data and restore those records after the upgrade.  Good luck.