Skip to content

Telephony

DialoX bots can work as interactive, smart IVR systems on traditional phone line. Contact us for more information about this.

The dial statement

By using dial, you can forward the current phone call to a new destination. In the calling PBX this is usually implemented using a SIP INVITE.

The phone number given to dial is a full E164 phone number, e.g. including country code and the + sign.

dialog main do
  say "Please hold."
  dial "+31201234567"
end

The bot stays in control of the call. In case the calling party does not pick up the dialog continues where it left off. The event.payload variable is then filled with a string which describes the cause why the caller did not pick up:

dialog main do
  say "Please hold."
  dial "+31201234567"

  branch event.payload do
  "BUSY" ->
    say "The line is busy."

  "NOANSWER" ->
    say "The caller did not answer."

  true ->
    say "We are unable to connect you through due to an unknown error."
  end
end

The following call denial reasons are supported:

event.payload Description
CHANUNAVAIL Channel unavailable (for example in sip.conf, when using qualify=, the SIP chan is unavailable)
BUSY Returned busy
NOANSWER No Answer (i.e SIP 480 or 604 response)
ANSWER Call was answered
CANCEL Call attempt cancelled (i.e user hung up before the call connected)
DONTCALL Privacy manager don’t call
CONGESTION Congestion, or anything else (some other error setting up the call)

Dial timeout

The default timeout of the dial command is 10 seconds, so the call will be tried for 10 seconds before control is returned to the script. It is possible to override this using the timeout: option:

dialog main do
  say "Please hold."
  dial "+31201234567", timeout: 2000
end

The timeout value is specified in milliseconds.

The refer statement

By using refer, you can forward the current phone call to a new destination, usually implemented using a SIP REFER. This removes the bot from the call.

The phone number given to dial is a full E164 phone number, e.g. including country code and the + sign.

dialog main do
  say "Please hold."
  refer "+31201234567"
end

The refer statement does not return, as the conversation is stopped once the forwarding takes place.

Speech adaptation

The phone backend uses Google Speech to Text for recognizing a user's voice.

As such, it is possible to hint the recognizer about certain words, phrases or entities that should be recognized better. This process is called speech adaptation.

The adapter takes hints that are extracted from the expecting: part of ask. Certain references to entities that are contained in the expecting clause are automatically converted to their corresponding class token:

Bubblescript entity Class token
[amount_of_money] $MONEY
[phone_number] $FULLPHONENUM
[number] $OOV_CLASS_DIGIT_SEQUENCE
[date] $FULLDATE
[time] $TIME

So when you have a script like this:

dialog main do
  ask "When is your birthdate?", expecting: entity(match: "[time]")
end

The phone adapter will automatically add $TIME to its speech context.

Other entities or just text and labels will not be automatically taken as speech_hints. So for labels and text you need to define explicit speech_hints, see paragraph below.

speech_hints: option to ask

It is also possible to override these speech hints by providing a speech_hints: option to ask:

dialog main do
  ask "Where do you live?", speech_hints: ["i live at $POSTALCODE"]
end

In this case, you would use the class tokens directly as part of the sample phrases.

Special speech hints

Having the bot wait quite long, allowing it to capture full sentences, can be accomplished by adding "PATIENTLY" as a speech hint:

dialog main do
  ask "Please explain the issue", speech_hints: ["PATIENTLY"]
end

DTMF and quick replies

Whenever an ask with quick_replies is encountered, these quick replies are automatically mapped onto the DTMF numbers.

So in the following case:

dialog main do
  ask "Do you want to continue?", expecting: ["Yes", "No"]
end

Pressing 1 would automatically select "Yes" as the answer.

Note there is no hint ("press 1 for yes") spoken, nor are the actual options (the quick replies) read aloud, you would have to implement this yourself in Bubblescirpt.

Automatic replacements

In some cases the text-to-speech engine always makes the same mistake spelling out certain names, et cetera. The phone adapter can be configured to automatically replace strings in the SSML output. Note that this replacement is being done before any Speech Markdown processing.

To do this, create a voice_config YAML file with the following contents:

output_translate:
  $i18n: true
  en:
    - { from: "Hi", to: "Hai" }
    - { from: "Arjan", to: '(Arjan)[sub:"Aryuhn"]' }

This would replace all occurrences of the word Hi with the word Hai, and, more realisticly, annotate the name Arjan with a Speech Markdown sub tag, so that it is pronounced correctly.

Turn taking

The bot automatically starts recognizing speech as soon as it finishes its sentence.

Turn taking on voice is quite tricky and there are several timeouts involved. When speech hints are given, the timeouts are shortened because a speech hint in the prompt means that we expect a closed answer. The following table shows how the timings are configured:

When Idle timeout Active timeout
Open question 8 seconds 1 second
Closed question 4 seconds 300 ms second

To make it more clear that the bot expects the user to say something, it is possible to let the user hear short "beep" tones, a low beep before the user is supposed to say something, and a high one once the bot finishing the speech recognition. This setting can be enabled in the settings of the phone connector in the DialoX studio.

Webhook API

After connecting a bot to the phone channel in the settings in the studio, a new webhook endpoint is available for your bot which should be called from the PBX voice adapter.

The endpoint is called:

POST https://bsqd.me/phone/webhook/<identifier>/<callerid>/<channelid>`

The URL parameters of which are:

  • identifier - the bot ID (in the case when chosen "connect through bot ID" in the settings), or otherwise <pool>-<number>.
  • callerid - The E.164-formatted phone number of the person who is calling (no leading + sign). In the case of an anonymous call, it should be a random string starting with x_.
  • channelid - A unique string identifying the current call. The channel ID must stay the same for the duration of the call.

The webhook POST body is an JSON-formatted payload, identical to the Chat REST API chat input payload. read more.

Webhook API request payload examples

Most chatbot conversations start with an initial, empty, request:

{
}

Request which includes recognized speech:

{
  "action": {
    "type": "user_message",
    "payload": {
      "text": "Hello my name is John",
      "input_type": "voice"
    }
  }
}

Request which includes recognized speech plus the user's recorded voice:

{
  "action": {
    "type": "user_message",
    "payload": {
      "text": "Hello my name is John",
      "data": "data:audio/mp3;base64,JLKjflsdjfiowajefwua8ssdu9af9uwe898ufo...",
      "input_type": "voice"
    }
  }
}

DTMF request which includes a single DTMF character (e.g. for 'press 1 to...'); only use when get_dtmf was NOT set:

{
  "action": {
    "type": "user_message",
    "payload": {
      "text": "1",
      "input_type": "dtmf"
    }
  }
}

Request as a response to the get_dtmf request:

{
  "action": {
    "type": "user_message",
    "payload": {
      "text": "1234",
      "data": "1234",
      "type": "numeric"
    }
  }
}

Request as a response to the get_audio request:

{
  "action": {
    "type": "user_attachment",
    "payload": {
      "url": "data:audio/mp3;base64,JLKjflsdjfiowajefwua8ssdu9af9uwe898ufo...",
      "type": "audio",
      "metadata": {
        "content_type": "audio/mp3"
      }
    }
  }
}

The following requests send events instead of user messages:

The $no_input event is used to indicate to the bot that no speech was captured while it was listening:

{
  "action": {
    "type": "user_event",
    "payload": {
      "name": "$no_input"
    }
  }
}

The $dial_return event is used to indicate the result of a previous dial command; e.g. when the called party does not respond or is busy:

{
  "action": {
    "type": "user_event",
    "payload": {
      "name": "$dial_return",
      "payload": "BUSY"
    }
  }
}

The $hangup event is used to indicate that the user has hung up the phone. The payload is used to indicate the reason for the hangup.

{
  "action": {
    "type": "user_event",
    "payload": {
      "name": "$hangup",
      "payload": "user hangup"
    }
  }
}

Webhook API response payload

The https://bsqd.me/phone/webhook/<identifier>/<callerid>/<channelid> endpoint has the following return value:

{
  // whether or not this is the final response
  "is_final": false,

  // the locale in which the bot speaks
  "locale": "nl",

  // the text that the bot says, as SSML.
  "ssml": "<speak><s>Hello...</s></speak>",

  // the configured Text-to-speech voice (optional)
  "voice": {
    "type": "google",
    "name": "nl-NL-Standard-A",
    "gender": "FEMALE",
    "locale": "nl"
  },

  // Text-to-speech payload only available via phone channel
  "tts": {
    // the MP3 URL to the text-to-speech file that needs to be played
    "url": "https://...",
    // Informational display text to be displayed while speech is playing
    "display_text": "Hello...",
  },

  // Any speech context elements to be passed into Google's speech-to-text `speechContexts` object (optional)
  "speechContexts": [{
    "phrases": ["I want a cookie"]
  }],

  // if call control is set, is_final is true and we need to perform something based on the contents of the call control variable.
  "call_control": {
    // see below
  },

  // when beep is true, play short beeps around user speech-to-text capturing
  "beep": true,

  // when record is true, the user's speech needs to be recorded while doing speech-to-text
  "record": true,

  // when we request DTMF input
  "get_dtmf": {
    // the maximum nr of digits
    "num_digits": 10,
    // send when pressing this char (result is sent excluding this char)
    "finish_on_key": "#",
  },

  // when we request audio recording
  "get_audio": {
    // stop recording when pressing this char
    "finish_on_key": "#",
  },
}

Call control - dial

Set up a new call leg with an invite and connect the first leg of the call to it, when it is picked up. The timeout parameter tells how long to wait for the party to pick up, before returning the call to the bot.

"call_control": {
  "type": "dial",
  "number": "+3164123456",
  "timeout": 10000
}

If the call is returned to the bot because it is unsuccesful or the second party hung up, a $dial_return event must be sent to the bot to indicate that the bot is back in control. (see above)

Dial with announce

"call_control": {
  "type": "dial",
  "number": "+3164123456",
  "announce": {
    "ssml": "<s>Hello this is the announcement</s>"
  }
}

Before the called party gets connected to the calling party, the given announcement message should be played.

Call control - refer

Refer the current call leg (SIP REFER) to the given number. The bot goes "out of the loop" and is no longer in control of the call.

"call_control": {
  "type": "refer",
  "number": "+3164123456"
}

Call control - hangup

Hang up the call

"call_control": {
  "type": "hangup"
}