Share via


Legacy input and output agent schema

Note

Databricks recommends migrating to the ResponsesAgent schema to author agents. See Author AI agents in code.

AI agents must adhere to specific input and output schema requirements to be compatible with other features on Databricks. This page explains how to use the legacy agent authoring signatures and interfaces: ChatAgent interface, ChatModel interface, the SplitChatMessageRequest input schema, and the StringResponse output schema.

Author a legacy ChatAgent agent

The MLflow ChatAgent interface is similar to, but not strictly compatible with, the OpenAI ChatCompletion schema.

ChatAgent easily wraps existing agents for Databricks compatibility.

To learn how to create a ChatAgent, see the examples in the following section and MLflow documentation - What is the ChatAgent interface.

To author and deploy agents using ChatAgent, install the following`:

  • databricks-agents 0.16.0 or above
  • mlflow 2.20.2 or above
  • Python 3.10 or above.
    • To meet this requirement, you can use serverless compute or Databricks Runtime 13.3 LTS or above.
%pip install -U -qqqq databricks-agents>=0.16.0 mlflow>=2.20.2

What if I already have an agent?

If you already have an agent built with LangChain, LangGraph, or a similar framework, you don't need to rewrite your agent to use it on Databricks. Instead, just wrap your existing agent with the MLflow ChatAgent interface:

  1. Write a Python wrapper class that inherits from mlflow.pyfunc.ChatAgent.

    Inside the wrapper class, keep your existing agent as an attribute self.agent = your_existing_agent.

  2. The ChatAgent class requires you to implement a predict method to handle non-streaming requests.

    predict must accept:

    • messages: list[ChatAgentMessage], which is a list of ChatAgentMessage each with a role (like "user" or "assistant"), the prompt, and an ID.

    • (Optional) context: Optional[ChatContext] and custom_inputs: Optional[dict] for extra data.

    import uuid
    
    # input example
    [
      ChatAgentMessage(
        id=str(uuid.uuid4()),  # Generate a unique ID for each message
        role="user",
        content="What's the weather in Paris?"
      )
    ]
    

    predict must return a ChatAgentResponse.

    import uuid
    
    # output example
    ChatAgentResponse(
      messages=[
        ChatAgentMessage(
          id=str(uuid.uuid4()),  # Generate a unique ID for each message
          role="assistant",
          content="It's sunny in Paris."
        )
      ]
    )
    
  3. Convert between formats

    In predict, convert the incoming messages from list[ChatAgentMessage] into the input format your agent expects.

    After your agent generates a response, convert its output to one or more ChatAgentMessage objects and wrap them in a ChatAgentResponse.

Tip

Convert LangChain output automatically

If you are wrapping a LangChain agent, you can use mlflow.langchain.output_parsers.ChatAgentOutputParser to automatically convert LangChain outputs into the MLflow ChatAgentMessage and ChatAgentResponse schema.

The following is a simplified template for converting your agent:

from mlflow.pyfunc import ChatAgent
from mlflow.types.agent import ChatAgentMessage, ChatAgentResponse, ChatAgentChunk
import uuid


class MyWrappedAgent(ChatAgent):
  def __init__(self, agent):
    self.agent = agent

  def predict(self, messages, context=None, custom_inputs=None):
    # Convert messages to your agent's format
    agent_input = ... # build from messages
    agent_output = self.agent.invoke(agent_input)
    # Convert output to ChatAgentMessage
    return ChatAgentResponse(
      messages=[ChatAgentMessage(role="assistant", content=agent_output, id=str(uuid.uuid4()),)]
    )

  def predict_stream(self, messages, context=None, custom_inputs=None):
    # If your agent supports streaming
    for chunk in self.agent.stream(...):
      yield ChatAgentChunk(delta=ChatAgentMessage(role="assistant", content=chunk, id=str(uuid.uuid4())))

For complete examples, see the notebooks in the following section.

ChatAgent examples

The following notebooks show how to author streaming and non-streaming ChatAgents using the popular libraries OpenAI, LangGraph, and AutoGen.

LangGraph

If you are wrapping a LangChain agent, you can use mlflow.langchain.output_parsers.ChatAgentOutputParser to automatically convert LangChain outputs into the MLflow ChatAgentMessage and ChatAgentResponse schema.

LangGraph tool-calling agent

Get notebook

OpenAI

OpenAI tool-calling agent

Get notebook

OpenAI Responses API tool-calling agent

Get notebook

OpenAI chat-only agent

Get notebook

AutoGen

AutoGen tool-calling agent

Get notebook

DSPy

DSPy chat-only agent

Get notebook

To learn how to expand the capabilities of these agents by adding tools, see AI agent tools.

Streaming ChatAgent responses

Streaming agents deliver responses in a continuous stream of smaller, incremental chunks. Streaming reduces perceived latency and improves user experience for conversational agents.

To author a streaming ChatAgent, define a predict_stream method that returns a generator that yields ChatAgentChunk objects - each ChatAgentChunk contains a portion of the response. Read more about ideal ChatAgent streaming behavior in the MLflow docs.

The following code shows an example predict_stream function, for complete examples of streaming agents, see ChatAgent examples:

def predict_stream(
  self,
  messages: list[ChatAgentMessage],
  context: Optional[ChatContext] = None,
  custom_inputs: Optional[dict[str, Any]] = None,
) -> Generator[ChatAgentChunk, None, None]:
  # Convert messages to a format suitable for your agent
  request = {"messages": self._convert_messages_to_dict(messages)}

  # Stream the response from your agent
  for event in self.agent.stream(request, stream_mode="updates"):
    for node_data in event.values():
      # Yield each chunk of the response
      yield from (
        ChatAgentChunk(**{"delta": msg}) for msg in node_data["messages"]
      )

Author a legacy ChatModel agent

Important

Databricks recommends the ChatAgent interface for creating agents or gen AI apps. To migrate from ChatModel to ChatAgent, see MLflow documentation - Migrate from ChatModel to ChatAgent.

ChatModel is a legacy agent authoring interface in MLflow that extends OpenAI's ChatCompletion schema, allowing you to maintain compatibility with platforms supporting the ChatCompletion standard while adding custom functionality. See MLflow: Getting Started with ChatModel for additional details.

Authoring your agent as a subclass of mlflow.pyfunc.ChatModel provides the following benefits:

  • Enables streaming agent output when invoking a served agent (bypassing {stream: true} in the request body).
  • Automatically enables AI Gateway inference tables when your agent is served, providing access to enhanced request log metadata, such as the requester name.
  • Allows you to write agent code compatible with the ChatCompletion schema using typed Python classes.
  • MLflow automatically infers a chat completion-compatible signature when logging the agent, even without an input_example. This simplifies the process of registering and deploying the agent. See Infer Model Signature during logging.

The following code is best run in a Databricks notebook. Notebooks provide a convenient environment for developing, testing, and iterating on your agent.

The MyAgent class extends mlflow.pyfunc.ChatModel, implementing the required predict method. This ensures compatibility with Mosaic AI Agent Framework.

The class also includes the optional methods _create_chat_completion_chunk and predict_stream to handle streaming outputs.

# Install the latest version of mlflow
%pip install -U mlflow
dbutils.library.restartPython()
import re
from typing import Optional, Dict, List, Generator
from mlflow.pyfunc import ChatModel
from mlflow.types.llm import (
  # Non-streaming helper classes
  ChatCompletionRequest,
  ChatCompletionResponse,
  ChatCompletionChunk,
  ChatMessage,
  ChatChoice,
  ChatParams,
  # Helper classes for streaming agent output
  ChatChoiceDelta,
  ChatChunkChoice,
)

class MyAgent(ChatModel):
  """
  Defines a custom agent that processes ChatCompletionRequests
  and returns ChatCompletionResponses.
  """
  def predict(self, context, messages: list[ChatMessage], params: ChatParams) -> ChatCompletionResponse:
    last_user_question_text = messages[-1].content
    response_message = ChatMessage(
      role="assistant",
      content=(
        f"I will always echo back your last question. Your last question was: {last_user_question_text}. "
      )
    )
    return ChatCompletionResponse(
      choices=[ChatChoice(message=response_message)]
    )

  def _create_chat_completion_chunk(self, content) -> ChatCompletionChunk:
    """Helper for constructing a ChatCompletionChunk instance for wrapping streaming agent output"""
    return ChatCompletionChunk(
      choices=[ChatChunkChoice(
        delta=ChatChoiceDelta(
          role="assistant",
          content=content
        )
      )]
    )

  def predict_stream(
    self, context, messages: List[ChatMessage], params: ChatParams
  ) -> Generator[ChatCompletionChunk, None, None]:
    last_user_question_text = messages[-1].content
    yield self._create_chat_completion_chunk(f"Echoing back your last question, word by word.")
    for word in re.findall(r"\S+\s*", last_user_question_text):
      yield self._create_chat_completion_chunk(word)

agent = MyAgent()
model_input = ChatCompletionRequest(
  messages=[ChatMessage(role="user", content="What is Databricks?")]
)
response = agent.predict(context=None, messages=model_input.messages, params=None)
print(response)

While you define the agent class MyAgent in one notebook, we recommend creating a separate driver notebook. The driver notebook logs the agent to Model Registry and deploys the agent using Model Serving.

This separation follows the workflow recommended by Databricks for logging models using MLflow's Models from Code methodology.

SplitChatMessageRequest input schema (deprecated)

SplitChatMessagesRequest allows you to pass the current query and history separately as agent input.

  question = {
    "query": "What is MLflow",
    "history": [
      {
        "role": "user",
        "content": "What is Retrieval-augmented Generation?"
      },
      {
        "role": "assistant",
        "content": "RAG is"
      }
    ]
  }

StringResponse output schema (deprecated)

StringResponse allows you to return the agent's response as an object with a single string content field:

{"content": "This is an example string response"}