The NLP pipeline

NLP stands for Natural Language Processing: The automated analysis of user messages.

Whenever any message created by the user enters the system, being a text message or a transcribed speech utterance, it goes through a sequence of processing steps called the NLP pipeline.

Each step in the pipeline has their own responsibility: for instance language detection, message tokenization or intent classification.

Since DialoX release 2.52 this pipeline is configurable per bot so that it can be tuned to the specific use case of the bot. This is done by creating a nlp YAML file in the root of the bot, or in a subdirectory when building a skill.

The default pipeline YAML looks like the following. Each individual pipeline step is documented below.

# This is the default NLP pipeline
pipeline_steps:
- step: pattern_ignore
  options:
    pattern: "^[.]"
- step: markup_stripper
- step: auto_translator
- step: spacy_tokenizer
- step: duckling_entity_extractor
- step: bml_intent_matcher
- step: dialogflow_intent_classifier
  options:
    agent_from: bot
- step: qna_intent_classifier
- step: dialogflow_intent_classifier
  options:
    agent_from: defaults