MARCO
Introduction
MARCO is a MultiAgent Real-time Chat Orchestration framework for automating tasks using LLMs. MARCO addresses key challenges in utilizing LLMs for complex, multi-step task execution. It incorporates robust guardrails to steer LLM behavior, validate outputs, and recover from errors that stem from inconsistent output formatting, function and parameter hallucination, and lack of domain knowledge.
Environment Aspects
Require complex interactions with environments:
- planning,
- tools usage,
- reasoning,
- interaction with humans.
Challenges
- LLMs are probabilistic next token prediction systems and by design, non-deterministic which can introduce inconsistencies in the output generation that can prove challenging for features like function calling, parameter value grounding, etc.
- There are also challenges of domain specific knowledge which can be an advantage and disadvantage at the same time.
- LLMs have biases inherent in them which can lead to hallucinations, at the same time it may not have the right internal domain specific context which needs to be provided to get the expected results from an LLM.
Framework Emphasis
- Multi-turn Interface for,
- User conversation to execute tasks
- Executing tools with deterministic graphs providing status updates, intermediate results and requests to fetch additional inputs or clarify from user.
- Controllable Agents using a symbolic plan expressed in natural language task execution procedure (TEP) to guide the agents through the conversation and steps required to solve the task
- Shared Hybrid Memory structure, with Long term memory shared across agents which stores complete context information with Agent TEPs, tool updates, dynamic information and conversation turns.
- Guardrails for ensuring correctness of tool invocations, recover for common LLM error conditions using reflection and to ensure general safety of the system.
- Evaluation mechanism for different aspects and tasks of a multi-agent system.
MARCO
Problem Statement
Problem Description
A task with Intent
- Out-Of-Domain(OOD): defined as any user query which is not in scope of the system such as malicious query to jailbreak the system, foul language or unsupported requests
- Info: defined as getting information from predefined data-sources and indexed documents (),
- Action: defined as a performing a usecase related task () which involves following a series of instructions/steps (Task Execution Procedure, ) defined for the usecase and accordingly invoking the right set of tools/functions () with the identified required parameters () for each function respectively.
Objective
The objective for a task automation system is to,
- interpret the user intent for each query,
- identify the relevant usecase ,
- understand the steps mentioned in its ,
- accordingly call the right sequence of tools with required parameters
- correlate , tool responses and requirements and conversation context to communicate back with the user, and
- be fast and responsive for a real-time chat.
Components
1. Intent Classifier
Intent Classifier’s (IC) primary role is to understand the intent behind an incoming user message considering the conversation context, and to seamlessly orchestrate between RAG for answering informational queries and Multi-Agents system (MARS) to execute supported tasks.
2. Multi-Agent Reasoner and Orchestrator (MARS)
- understanding the user’s request and tool responses in the chat context
- planning and reasoning for the next action according to the Task Execution Procedure (TEP) steps,
- selecting relevant LLM Agent for the task, and
- invoking the relevant tools/tasks with their required parameters.
Info
The key component of MARS are the LLM Agents, which we call Task Agents. These Task Agents comprise of their own TEP steps, tools/functions also known as Deterministic Tasks, SubTask-Agents (dependent Task Agents) and common instructions for reasoning and output formatting.
Deterministic Tasks
Task Execution Procedure(TEP) steps can be very complex with multiple instructions and steps to follow based on a given usecase scenario. While some of these steps require high judgement and reasoning (understanding natural language to parse required arguments, intents, performing checks defined in plain text without writing explicit code), most of the steps in the TEP are deterministic sequence of API calls, processing and propagating the output gathered from to and so on. Such sequence of deterministic steps can be encapsulated as a single tool to the LLM Agent, which when called performs the sequence of these deterministic steps and communicates with the agent intermittently with updates or any high judgement reasoning or inputs required by the underlying APIs.
Task Agents
A usecase TEP can be divided into multiple Sub-Tasks which are logical abstractions of complex steps inside the TEP
3. Guardrails
We introduce guardrails to identify issues and prompt the LLM-Agents to reflect on their mistakes, correcting their responses. Common issues and proposed guardrail solutions are:
- Incorrect Output Formatting: Generating incorrect formats despite detailed instructions, causing parsing issues. If parsing fails, a reflection prompt is added to the Agent’s chat history, and sent for a retry.
- Function Hallucination: Hallucinating non-existent function names, even when prompted to use only existing tools. Our guardrails checks if the generated function name exists in the available tools and Sub-Agents. If not, reflection prompt is added.
- Function Parameter Value Hallucination: When making function calls with required parameters, LLMs sometimes hallucinate parameter values instead of asking relevant questions to the user.
- Lack of Domain Knowledge: Although pre-trained LLMs possess good general world knowledge, they may lack certain domain-specific knowledge, especially in lesser known domains.