Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Note
Databricks recommends migrating to the ResponsesAgent
schema to author agents. See Author AI agents in code.
AI agents must adhere to specific input and output schema requirements to be compatible with other features on Databricks. This page explains how to use the legacy agent authoring signatures and interfaces: ChatAgent
interface, ChatModel
interface, the SplitChatMessageRequest
input schema, and the StringResponse
output schema.
Author a legacy ChatAgent agent
The MLflow ChatAgent
interface is similar to, but not strictly compatible with, the OpenAI ChatCompletion
schema.
To learn how to create a ChatAgent
, see the examples in the following section and MLflow documentation - What is the ChatAgent interface.
To author and deploy agents using ChatAgent
, install the following`:
databricks-agents
0.16.0 or abovemlflow
2.20.2 or above- Python 3.10 or above.
- To meet this requirement, you can use serverless compute or Databricks Runtime 13.3 LTS or above.
%pip install -U -qqqq databricks-agents>=0.16.0 mlflow>=2.20.2
What if I already have an agent?
If you already have an agent built with LangChain, LangGraph, or a similar framework, you don't need to rewrite your agent to use it on Databricks. Instead, just wrap your existing agent with the MLflow ChatAgent
interface:
Write a Python wrapper class that inherits from
mlflow.pyfunc.ChatAgent
.Inside the wrapper class, keep your existing agent as an attribute
self.agent = your_existing_agent
.The
ChatAgent
class requires you to implement apredict
method to handle non-streaming requests.predict
must accept:messages: list[ChatAgentMessage]
, which is a list ofChatAgentMessage
each with a role (like "user" or "assistant"), the prompt, and an ID.(Optional)
context: Optional[ChatContext]
andcustom_inputs: Optional[dict]
for extra data.
import uuid # input example [ ChatAgentMessage( id=str(uuid.uuid4()), # Generate a unique ID for each message role="user", content="What's the weather in Paris?" ) ]
predict
must return aChatAgentResponse
.import uuid # output example ChatAgentResponse( messages=[ ChatAgentMessage( id=str(uuid.uuid4()), # Generate a unique ID for each message role="assistant", content="It's sunny in Paris." ) ] )
Convert between formats
In
predict
, convert the incoming messages fromlist[ChatAgentMessage]
into the input format your agent expects.After your agent generates a response, convert its output to one or more
ChatAgentMessage
objects and wrap them in aChatAgentResponse
.
Tip
Convert LangChain output automatically
If you are wrapping a LangChain agent, you can use mlflow.langchain.output_parsers.ChatAgentOutputParser
to automatically convert LangChain outputs into the MLflow ChatAgentMessage
and ChatAgentResponse
schema.
The following is a simplified template for converting your agent:
from mlflow.pyfunc import ChatAgent
from mlflow.types.agent import ChatAgentMessage, ChatAgentResponse, ChatAgentChunk
import uuid
class MyWrappedAgent(ChatAgent):
def __init__(self, agent):
self.agent = agent
def predict(self, messages, context=None, custom_inputs=None):
# Convert messages to your agent's format
agent_input = ... # build from messages
agent_output = self.agent.invoke(agent_input)
# Convert output to ChatAgentMessage
return ChatAgentResponse(
messages=[ChatAgentMessage(role="assistant", content=agent_output, id=str(uuid.uuid4()),)]
)
def predict_stream(self, messages, context=None, custom_inputs=None):
# If your agent supports streaming
for chunk in self.agent.stream(...):
yield ChatAgentChunk(delta=ChatAgentMessage(role="assistant", content=chunk, id=str(uuid.uuid4())))
For complete examples, see the notebooks in the following section.
ChatAgent
examples
The following notebooks show how to author streaming and non-streaming ChatAgents
using the popular libraries OpenAI, LangGraph, and AutoGen.
LangGraph
If you are wrapping a LangChain agent, you can use mlflow.langchain.output_parsers.ChatAgentOutputParser
to automatically convert LangChain outputs into the MLflow ChatAgentMessage
and ChatAgentResponse
schema.
LangGraph tool-calling agent
OpenAI
OpenAI tool-calling agent
OpenAI Responses API tool-calling agent
OpenAI chat-only agent
AutoGen
AutoGen tool-calling agent
DSPy
DSPy chat-only agent
To learn how to expand the capabilities of these agents by adding tools, see AI agent tools.
Streaming ChatAgent responses
Streaming agents deliver responses in a continuous stream of smaller, incremental chunks. Streaming reduces perceived latency and improves user experience for conversational agents.
To author a streaming ChatAgent
, define a predict_stream
method that returns a generator that yields ChatAgentChunk
objects - each ChatAgentChunk
contains a portion of the response. Read more about ideal ChatAgent
streaming behavior in the MLflow docs.
The following code shows an example predict_stream
function, for complete examples of streaming agents, see ChatAgent examples:
def predict_stream(
self,
messages: list[ChatAgentMessage],
context: Optional[ChatContext] = None,
custom_inputs: Optional[dict[str, Any]] = None,
) -> Generator[ChatAgentChunk, None, None]:
# Convert messages to a format suitable for your agent
request = {"messages": self._convert_messages_to_dict(messages)}
# Stream the response from your agent
for event in self.agent.stream(request, stream_mode="updates"):
for node_data in event.values():
# Yield each chunk of the response
yield from (
ChatAgentChunk(**{"delta": msg}) for msg in node_data["messages"]
)
Author a legacy ChatModel agent
Important
Databricks recommends the ChatAgent
interface for creating agents or gen AI apps. To migrate from ChatModel to ChatAgent, see MLflow documentation - Migrate from ChatModel to ChatAgent.
ChatModel
is a legacy agent authoring interface in MLflow that extends OpenAI's ChatCompletion schema, allowing you to maintain compatibility
with platforms supporting the ChatCompletion standard while adding custom functionality. See MLflow: Getting Started with ChatModel for additional details.
Authoring your agent as a subclass of mlflow.pyfunc.ChatModel provides the following benefits:
- Enables streaming agent output when invoking a served agent (bypassing
{stream: true}
in the request body). - Automatically enables AI Gateway inference tables when your agent is served, providing access to enhanced request log metadata, such as the requester name.
- Allows you to write agent code compatible with the ChatCompletion schema using typed Python classes.
- MLflow automatically infers a chat completion-compatible signature when logging the agent, even without an
input_example
. This simplifies the process of registering and deploying the agent. See Infer Model Signature during logging.
The following code is best run in a Databricks notebook. Notebooks provide a convenient environment for developing, testing, and iterating on your agent.
The MyAgent
class extends mlflow.pyfunc.ChatModel
, implementing the required predict
method. This ensures compatibility with Mosaic AI Agent Framework.
The class also includes the optional methods _create_chat_completion_chunk
and predict_stream
to handle streaming outputs.
# Install the latest version of mlflow
%pip install -U mlflow
dbutils.library.restartPython()
import re
from typing import Optional, Dict, List, Generator
from mlflow.pyfunc import ChatModel
from mlflow.types.llm import (
# Non-streaming helper classes
ChatCompletionRequest,
ChatCompletionResponse,
ChatCompletionChunk,
ChatMessage,
ChatChoice,
ChatParams,
# Helper classes for streaming agent output
ChatChoiceDelta,
ChatChunkChoice,
)
class MyAgent(ChatModel):
"""
Defines a custom agent that processes ChatCompletionRequests
and returns ChatCompletionResponses.
"""
def predict(self, context, messages: list[ChatMessage], params: ChatParams) -> ChatCompletionResponse:
last_user_question_text = messages[-1].content
response_message = ChatMessage(
role="assistant",
content=(
f"I will always echo back your last question. Your last question was: {last_user_question_text}. "
)
)
return ChatCompletionResponse(
choices=[ChatChoice(message=response_message)]
)
def _create_chat_completion_chunk(self, content) -> ChatCompletionChunk:
"""Helper for constructing a ChatCompletionChunk instance for wrapping streaming agent output"""
return ChatCompletionChunk(
choices=[ChatChunkChoice(
delta=ChatChoiceDelta(
role="assistant",
content=content
)
)]
)
def predict_stream(
self, context, messages: List[ChatMessage], params: ChatParams
) -> Generator[ChatCompletionChunk, None, None]:
last_user_question_text = messages[-1].content
yield self._create_chat_completion_chunk(f"Echoing back your last question, word by word.")
for word in re.findall(r"\S+\s*", last_user_question_text):
yield self._create_chat_completion_chunk(word)
agent = MyAgent()
model_input = ChatCompletionRequest(
messages=[ChatMessage(role="user", content="What is Databricks?")]
)
response = agent.predict(context=None, messages=model_input.messages, params=None)
print(response)
While you define the agent class MyAgent
in one notebook, we recommend creating a separate driver notebook. The driver notebook logs the agent to Model Registry and deploys the agent using Model Serving.
This separation follows the workflow recommended by Databricks for logging models using MLflow's Models from Code methodology.
SplitChatMessageRequest input schema (deprecated)
SplitChatMessagesRequest
allows you to pass the current query and history separately as agent input.
question = {
"query": "What is MLflow",
"history": [
{
"role": "user",
"content": "What is Retrieval-augmented Generation?"
},
{
"role": "assistant",
"content": "RAG is"
}
]
}
StringResponse output schema (deprecated)
StringResponse
allows you to return the agent's response as an object with a single string content
field:
{"content": "This is an example string response"}