Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In this article, you learn how to write query requests for foundation models that are optimized for chat tasks and send them to your model serving endpoint.
The examples in this article apply to querying foundation models that are made available using either:
- Foundation Models APIs which are referred to as Databricks-hosted foundation models.
- External models which are referred to as foundation models hosted outside of Databricks.
Requirements
- See Requirements.
- Install the appropriate package to your cluster based on the querying client option you choose.
Query examples
The examples in this section show how to query the Meta Llama 3.3 70B Instruct model made available by the Foundation Model APIs pay-per-token endpoint, databricks-meta-llama-3-3-70b-instruct
, using the different client options.
For a batch inference example, see Perform batch LLM inference using AI Functions.
OpenAI client
To use the OpenAI client, specify the model serving endpoint name as the model
input.
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()
response = openai_client.chat.completions.create(
model="databricks-meta-llama-3-3-70b-instruct",
messages=[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is a mixture of experts model?",
}
],
max_tokens=256
)
To query foundation models outside of your workspace, you must use the OpenAI client directly. You also need your Databricks workspace instance to connect the OpenAI client to Databricks. The following example assumes you have a Databricks API token and openai
installed on your compute.
import os
import openai
from openai import OpenAI
client = OpenAI(
api_key="dapi-your-databricks-token",
base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)
response = client.chat.completions.create(
model="databricks-meta-llama-3-3-70b-instruct",
messages=[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is a mixture of experts model?",
}
],
max_tokens=256
)
SQL
Important
The following example uses the built-in SQL function, ai_query. This function is in Public Preview and the definition might change.
SELECT ai_query(
"databricks-meta-llama-3-3-70b-instruct",
"Can you explain AI in ten words?"
)
REST API
Important
The following example uses REST API parameters for querying serving endpoints that serve foundation models. These parameters are in Public Preview and the definition might change. See POST /serving-endpoints/{name}/invocations.
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": " What is a mixture of experts model?"
}
]
}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-meta-llama-3-3-70b-instruct/invocations \
MLflow Deployments SDK
Important
The following example uses the predict()
API from the MLflow Deployments SDK.
import mlflow.deployments
# Only required when running this example outside of a Databricks Notebook
export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"
client = mlflow.deployments.get_deploy_client("databricks")
chat_response = client.predict(
endpoint="databricks-meta-llama-3-3-70b-instruct",
inputs={
"messages": [
{
"role": "user",
"content": "Hello!"
},
{
"role": "assistant",
"content": "Hello! How can I assist you today?"
},
{
"role": "user",
"content": "What is a mixture of experts model??"
}
],
"temperature": 0.1,
"max_tokens": 20
}
)
Databricks Python SDK
This code must be run in a notebook in your workspace. See Use the Databricks SDK for Python from an Azure Databricks notebook.
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole
w = WorkspaceClient()
response = w.serving_endpoints.query(
name="databricks-meta-llama-3-3-70b-instruct",
messages=[
ChatMessage(
role=ChatMessageRole.SYSTEM, content="You are a helpful assistant."
),
ChatMessage(
role=ChatMessageRole.USER, content="What is a mixture of experts model?"
),
],
max_tokens=128,
)
print(f"RESPONSE:\n{response.choices[0].message.content}")
LangChain
To query a foundation model endpoint using LangChain, you can use the ChatDatabricks ChatModel class and specify the endpoint
.
%pip install databricks-langchain
from langchain_core.messages import HumanMessage, SystemMessage
from databricks_langchain import ChatDatabricks
messages = [
SystemMessage(content="You're a helpful assistant"),
HumanMessage(content="What is a mixture of experts model?"),
]
llm = ChatDatabricks(endpoint_name="databricks-meta-llama-3-3-70b-instruct")
llm.invoke(messages)
As an example, the following is the expected request format for a chat model when using the REST API. For external models, you can include additional parameters that are valid for a given provider and endpoint configuration. See Additional query parameters.
{
"messages": [
{
"role": "user",
"content": "What is a mixture of experts model?"
}
],
"max_tokens": 100,
"temperature": 0.1
}
The following is an expected response format for a request made using the REST API:
{
"model": "databricks-meta-llama-3-3-70b-instruct",
"choices": [
{
"message": {},
"index": 0,
"finish_reason": null
}
],
"usage": {
"prompt_tokens": 7,
"completion_tokens": 74,
"total_tokens": 81
},
"object": "chat.completion",
"id": null,
"created": 1698824353
}
Supported models
See Foundation model types for supported chat models.