AI (Artificial Intelligence)

is a type of computer system that is able to perform tasks that normally require human-like interaction: decision-making, speech recognition and understanding, translation between languages, and more.

Automated Speech Recognition (ASR)

is a technology that allows users of information systems to speak entries rather than punching numbers on a keypad.

Alternative paths (unexpected conversational turns)

outcomes of conversation that are not expected, that represent exceptions if the user sends answers that the chatbot cannot process. For example, for delivery within Germany, the user enters an address that is outside of Germany. In this case, no delivery can take place. Such edge cases should be considered in advance. For example, you can inform the user that you only deliver within Germany or you can give them the option to enter the address again in case of a misunderstanding.



is a computer program that acts as an intelligent intermediary between people, digital systems, and Internet-enabled things and is able to interpret text, hand gestures, images or video provided by users in real-time and respond to questions.



a channel is where your chatbot will exist. These days, chatbots can operate in almost any channel where a two-way conversation is possible, pick one where your target audience hangs out. Examples of channels are Facebook Messenger, WhatsApp, Telegram, Web-widget, etc.


is a computer program that simulates a human conversation either written or spoken, allowing users to interact with digital devices as if they were communicating with a real person. Chatbots often use combinations of click commands and keywords (such as asking a customer to choose a topic e.g. money transferring, checking balance) and machine learning to help resolve problems or direct customers to a live agent for further troubleshooting and resolution.

Chat widget

a graphical user interface (GUI) component that enables users to interact with a conversational AI system through a chat-based interface. It is a user-facing element that provides a platform for users to input their queries or messages and receive responses from the AI system in a conversational format.

The chat widget is typically embedded within a website, mobile app, or other digital platforms, allowing users to have interactive conversations with the conversational AI system.

Conversational AI

is a dialogue system that interacts with users based on the principles of human-to-human communication. This communication usually happens through voice or text messages, or some non-verbal signal (i.e., gestures) which is available on the device. The key to this interaction is that the speaker and the machine can understand each other and hold a conversation on topics the AI has learned about. Developers use Conversational AI to build conversational user interfaces, chatbots and virtual assistants for a variety of use cases and integrate them into chat interfaces such as messaging platforms, smart devices, social networks and websites.

Conversational Script

is a set of dialogues in the conversation, which is usually created by the Conversational Designer before development.

Conversational Flow

has two meanings: a) the way the conversation is going, b) a kind of flowchart that represents parts of the whole conversation. In terms of Playbook, the conversational flow is used primarily in the above mentioned (b) meaning. Since the bot is technically a state-machine system, every flow should illustrate those small independent bot states, users’ reactions as an input, and the bot’s output, the connection between all of these parts. Every state is a stage after some user input, during which the bot performs some logic or reacts somehow, after that bot moves internally into the following state.


conversational script parts, which assure chatbot users that their input was received. “Sure”, “Okey”, “Excellent”, and “I see” are all various ways that the bot can acknowledge the user input and make them feel that they are being heard. These also add a touch of humanity to the bot and build trust with the user. Confirmation might be implicit and explicit.

Customer agent platform

is a centralised platform that helps businesses manage their customer communication touchpoints through various channels.


Data labeling

the process of annotating or tagging data to provide labels or annotations that describe specific aspects or elements of the data. Data labeling is a critical step in training machine learning models for conversational AI, as it helps the models understand and learn from the labeled data, improving their ability to comprehend and generate appropriate responses during conversations. Data labeling typically involves annotating different components of a conversation, such as user utterances, system responses, intents, entities, dialogue acts, sentiment, or other relevant attributes.

Dual-tone multi-frequency (DTMF)

is the sounds or tones generated by a telephone when the numbers are pressed. Used in old voicebots via phone calls.



are data buckets that contain words and phrases with similar characteristics, they can be fields, data or text describing just about anything – time, place, person, item, number, etc. With entities, it becomes easy to extract important information from the user’s utterances. Examples of extracted info are: phone number, e-mail address, name, type of insurance, etc.

Explicit confirmation

is a situation when a bot asks the user to confirm by repeating parts of the query explicitly. It is useful for situations when the bot’s confidence in recognising the intent is not high enough or when the stakes are high. For example, if the tasks involve transferring large sums of money, or sending a message to a number of contacts it helps to check the data again to make sure there is no confusion.


Fallback (CatchAll)

is a common term for bot reaction on “No match” or on “No input” events. Fallback is a designed, not system state, so CUI/UX designers should be responsible to create them.

Example #1 (No match):

User: Do you know where the kangaroo lives?

Bot: I didn’t get that. Can you please rephrase?

Example #2 (No input):

User: [speech not recognised]

Bot: Can you please repeat?

Foundation Model

a base or fundamental model that serves as the starting point for the development of more specialized or advanced models in the field of artificial intelligence. In the context of large language models, a foundation model is a powerful and comprehensive language model that is pre-trained on a vast amount of text data to understand and generate human-like language. It serves as the building block or baseline for creating more specific models tailored to particular tasks or domains.


Generative AI

Generative AI refers to a class of AI models and algorithms that are designed to generate new content, such as text, images, music, or even entire videos. These models are trained on large datasets and learn patterns and structures to create new content that is similar to the data they were trained on. Generative AI models use techniques like neural networks, deep learning, and reinforcement learning to understand and mimic the underlying patterns in the training data and produce novel outputs. While conversational AI involves understanding and generating human-like conversations, generative AI is broader in scope and can generate content beyond just conversational interactions. Generative AI can be used for creative purposes, such as generating artwork or music, as well as for other applications like data synthesis, content creation, and even deepfakes.

Global intent

according to the bot flow structure and code realisation, there are such intents which are available from direct user requests at every moment during the dialogue.


Happy Path (golden path, main path)

is a kind of conversational flow, when users achieve their goal in the most simple and obvious way, engage with the bot, writes, click on or say the right thing, and follow through until the desired outcome is achieved — whatever that is. For example, if you have a pizza restaurant chatbot, then the Happy Path will be that the customer orders a pizza from your restaurant. If you’re using your chatbot internally for HR or onboarding purposes, then the Happy Path is that the user successfully receives the right information that they need.


Implicit confirmation

is one that does not require confirmation from the user, but also leaves the option open for the user to confirm or deny. It makes the conversation a lot more natural, and closer to how humans talk with each other.


the main idea or purpose of the user’s utterance. Bots are usually built by using a set of intents. The scope of intents is representing the user needs and the bot’s possible replies. An intent is a finite set of phrases. A banking NLU system, for example, should be able to respond to “Can you show me my bank account?” or “Can you send money to someone?”, and both those phrases correspond to one intent — “banking account”.

IVR (Interactive Voice Response)

a technology that allows users to interact with a computerised system using voice and telephone keypad inputs. It is commonly used in telephone-based systems to provide automated self-service options and route calls to the appropriate resources or departments. IVR systems are designed to handle a large volume of incoming calls and provide a seamless and efficient user experience. They use pre-recorded voice prompts and menus to guide callers through a series of options and gather information or provide automated responses. IVR systems often integrate with speech recognition technology to enable voice-based input from callers.


LLMs (Large Language Models)

sophisticated artificial intelligence models that have been trained on vast amounts of text data to understand and generate human-like language. These models leverage deep learning techniques, particularly using neural networks with many layers, to process and analyze textual information. LLMs are designed to understand and generate text in a way that resembles human language patterns, allowing them to perform a wide range of natural language processing tasks. They can comprehend and generate coherent sentences, understand context, detect sentiment, translate languages, answer questions, summarize documents, and even engage in conversational interactions. One of the most well-known examples of an LLM is OpenAI’s GPT (Generative Pre-trained Transformer) series, including models like GPT-3. These models have achieved remarkable language generation capabilities, demonstrating the potential for various applications in content creation, virtual assistants, chatbots, language translation, and more.

Local intent

in contrast to global intents, there are such intents, which are available within some dialogue branches, after certain conditions or circumstances, or after, for example, slot-filling.


Mock object

a simulated or imitation object that is created for testing and development purposes. Mock objects are used to mimic the behaviour of real objects or components within a conversational AI system, allowing developers to test and validate their code in isolation.

Multimodal interface (also mixmodal interface)

it processes various user input modes, such as speech, text, touch, hand gestures, etc. and supports various bot outputs. Also, it combines modalities in a way to replicate interpersonal human interaction. The multimodal interfaces allow users to flexibly switch between different types of interaction. For example, using voice technology as an input mechanism mixed with a graphical user interface (GUI) as the output for the user.


NER (Named Entity Recognition)

a subtask of natural language processing (NLP) that focuses on identifying and classifying named entities in text. Named entities are specific objects, locations, names of people, organizations, dates, quantities, and other elements that carry semantic meaning. NER helps extract important information from user queries or statements, enabling the system to understand and respond appropriately. By identifying and categorising named entities, NER provides context and relevance to the conversation, facilitating more meaningful interactions.

NLG (Natural Language Generation)

the component or process of a conversational AI system that focuses on generating human-like, natural language responses to interact with users. By generating coherent and contextually appropriate responses, NLG enhances the user experience, improves user engagement, and creates a more interactive and satisfying conversation. NLG is a key technology in transforming structured data and system prompts into meaningful and engaging human language responses, allowing conversational AI systems to have dynamic and interactive conversations with users.

NLP (Natural Language Processing)

a branch of artificial intelligence that focuses on the interaction between computers and human language. It involves the analysis, understanding, and generation of natural language to enable effective communication and interaction between humans and machines.

NLU (Natural Language Understanding)

the component or process of a conversational AI system that focuses on interpreting and comprehending the natural language input provided by users. NLU is responsible for extracting the meaning, intent, and context from user queries, commands, or statements, enabling the conversational AI system to understand and respond appropriately. It involves a range of techniques and algorithms that analyse and process the user’s input to derive actionable insights. NLU plays a critical role in conversational AI systems as it bridges the gap between human language and machine understanding. By accurately interpreting user input, NLU enables the conversational AI system to provide relevant and contextually appropriate responses, improving the overall user experience.

NLU Engine

is an AI-powered solution for extracting information from utterances in a human language (the user’s utterances) to use it in further dialogue. A good NLU system is designed to keep the conversation going smoothly even when it doesn’t receive enough information from the client. In its most simplified form, the process of “understanding” a language consists of the following major steps:  text preprocessing (query); classification of the request, correlation with one of the classes known to the system (definition of intent); retrieving query parameters (entity retrieval).

No match

an event when a dialogue system doesn’t match users’ utterances with the predesigned states in the bot’s code.

No input

an event means that a dialogue system doesn’t detect any input (voice, text, etc.) from a user.



a predefined structure or template used to recognise and interpret user input or generate system responses. Patterns play a crucial role in natural language understanding and generation, allowing conversational AI systems to understand user queries, extract relevant information, and generate appropriate responses. By defining and recognising patterns in user input, conversational AI systems can accurately interpret user intent, gather relevant information, and generate appropriate and contextually relevant responses. Patterns serve as a foundational building block for designing conversational AI systems that can engage in meaningful and effective conversations with users.


a component or system that assists users in formulating or refining their input during a conversation with a conversational AI system. The purpose of a prompter is to guide users, suggest possible options, or provide contextually relevant prompts to facilitate a smoother and more effective conversation. Prompters play a crucial role in improving user engagement, reducing user effort, and ensuring successful interactions between users and conversational AI systems. They are designed to make conversations more intuitive, efficient, and user-friendly, ultimately leading to a more satisfying conversational AI experience.


Sample dialogues

is a conversational design technique, that allows understanding of how our future CUI will work.


Bot: Hi! This is a Pizza bot. Would you order a huge cheesy pizza?

User: Yes, sure!

Bot: What kind of pizza would you like to order?


a predefined sequence of instructions or dialogue that guides the behaviour of a conversational AI system during a conversation with a user. A script outlines the flow, structure, and specific responses of the system based on different user inputs or system prompts.

Scripts are commonly used in conversational AI to design and control conversational interactions, ensuring a consistent and effective user experience.

Skill discovery

is a crucial element in making virtual assistants more effective and humanlike. Skill is a Conversational AI component containing the dialogue configuration. Skill usually consists of some functionality, i.e. “weather forecasting”, “applying forms”, etc. Skill discovery is a process of informing the user about the bot’s capabilities.


there are chunks of information incorporated into the user’s speech. While speaking or typing, some people will give info up front, while others will provide it piece by piece. Two different people might say the same things in two different ways and to make your bot flexible, it is important to teach a dialogue system to understand what information already exists in a request. Designed slots help to define, which bits of information (slots) still need to be requested from the user by the bot. Slots are usually entities.


this is a process of filling slots during bot-human conversation. Creating an appointment is a good use case for filling the slots. A scheduling appointment requires a date, time, and location. The user can say “I need an appointment for Elm on Tuesday at 3 p.m.” (filling all slots at once) or “I need an appointment” (filling no slots, so the bot has to ask additional questions until all the slots would be filled). CUI/UX specialists should predesign all possible variants including full- and partially-filled slots.

Speech synthesis (text-to-speech, TTS)

this is the technology and process of converting written text into spoken audio. It involves generating human-like speech from textual input, allowing conversational AI systems to interact with users using natural-sounding voices.


this term comes from system analysis and state-machine model. Each step in dialogue systems is a state. Some examples of states are waiting for user input, performing some internal logic, reacting to a user’s request, moving to the following state, keeping or performing clients’ or external system data, etc.


Training phrases

are a bunch of user utterance examples. It is better to collect or create a variety of training phrases for each intent, according to chosen NLU classification model. The best sources for gathering groups of training phrases are sets of saved dialogues with users and customer support agents.


a specific unit or element within a script that represents a meaningful component of a conversation. Script tokens are used to break down the dialogue or instructions into smaller, manageable parts, allowing for easier processing, analysis, and manipulation by conversational AI systems. In addition to their role in model training, script tokens are also important in conversation management, allowing developers to modify or extend the conversation flow by manipulating or replacing specific tokens. This flexibility enables conversational AI systems to adapt to different use cases, handle various scenarios, and provide personalised and dynamic conversations.



is a continuous piece of speech beginning and ending with a clear pause. As speakers take turns to produce utterances, they strive to make them recognisable: that is, not only for the counterpart to distinguish words and sentences, but also to interpret the meaning of the utterance and react in the desired way. For example, when a user states “I’d like to order a pizza please”, the entire sentence is the utterance. There is no strict rule about what an utterance comprises. It can be a sentence, but it does not need to be a complete sentence. It can also consist of multiple sentences. Fun fact: even a simple sigh can be understood as an utterance.

UX (user experience)

the term used to describe the whole user’s interaction with a system and the user’s emotional reaction. User experience designers take utmost care to ensure that systems are intuitive and easy to comprehend. Conversational User Experience is a user experience that combines chat, voice, or any other natural language-based technology to mimic a human conversation.

UI (user interface)

any graphical interface, which usually interacts with people via display using a touchscreen or some peripheral devices (buttons, mouse). It can be modal windows on websites, mobile applications, devices with hardware buttons, etc.



is a bot which communicates through voice replies. Voicebots can be smart (with NLU) and rule-based, with DTMF. Voicebots usually run on IVR, Smart IVR, and smart assistants in multispeakers. Voicebot can imitate specific human voice patterns: pauses, accents, etc.


Wake word

is the gateway between you and your digital assistant. Common wake words include “Hey, [bot name]”. The phrase causes the bot to begin recording an end user’s request so it can be sent for processing. When the bot detects its wake word, it records the next spoken request and sends a recording of the user’s request for intent processing and sends back a response or initiates an action.


a mechanism or integration method that allows real-time communication and data transfer between two applications or systems. Specifically, it enables communication between a conversational AI platform (such as a chatbot or virtual assistant) and external services or applications. Webhooks enable conversational AI systems to interact with various external services and systems seamlessly. They can be used for a wide range of purposes, such as querying databases, fetching real-time data, integrating with APIs, performing calculations, or connecting with other applications.

Get a demo

Please tell us about yourself and we’ll get back as soon as we can.


Business email

Company name

Work phone


Contact Us

Please, fill in the form and we will contact you shortly.


Business email

Company name


Thank you for reaching out!

We appreciate you contacting Tovie AI and will get back to you as soon as we can.

Obrigado por estender a mão!

Agradecemos o seu contato e entraremos em contato o mais rápido possível.

Thank you for reaching out!

We appreciate you contacting Tovie AI and will get back to you as soon as we can.