Use foundation models

2025-08-05

In this article, you learn which options are available to write query requests for foundation models and how to send them to your model serving endpoint. You can query foundation models that are hosted by Databricks and foundation models hosted outside of Databricks.

For traditional ML or Python models query requests, see Query serving endpoints for custom models.

Mosaic AI Model Serving supports Foundation Models APIs and external models for accessing foundation models. Model Serving uses a unified OpenAI-compatible API and SDK for querying them. This makes it possible to experiment with and customize foundation models for production across supported clouds and providers.

Query options

Mosaic AI Model Serving provides the following options for sending query requests to endpoints that serve foundation models:

Method	Details
OpenAI client	Query a model hosted by a Mosaic AI Model Serving endpoint using the OpenAI client. Specify the model serving endpoint name as the `model` input. Supported for chat, embeddings, and completions models made available by Foundation Model APIs or external models.
SQL function	Invoke model inference directly from SQL using the `ai_query` SQL function. See Example: Query a foundation model.
Serving UI	Select Query endpoint from the Serving endpoint page. Insert JSON format model input data and click Send Request. If the model has an input example logged, use Show Example to load it.
REST API	Call and query the model using the REST API. See POST /serving-endpoints/{name}/invocations for details. For scoring requests to endpoints serving multiple models, see Query individual models behind an endpoint.
MLflow Deployments SDK	Use MLflow Deployments SDK's predict() function to query the model.
Databricks Python SDK	Databricks Python SDK is a layer on top of the REST API. It handles low-level details, such as authentication, making it easier to interact with the models.

Requirements

A model serving endpoint.
A Databricks workspace in a supported region.
- Foundation Model APIs regions
- External models regions
To send a scoring request through the OpenAI client, REST API or MLflow Deployment SDK, you must have a Databricks API token.

Important

As a security best practice for production scenarios, Databricks recommends that you use machine-to-machine OAuth tokens for authentication during production.

For testing and development, Databricks recommends using a personal access token belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.

Install packages

After you have selected a querying method, you must first install the appropriate package to your cluster.

OpenAI client

To use the OpenAI client, the databricks-sdk[openai] package needs to be installed on your cluster. Databricks SDK provides a wrapper for constructing the OpenAI client with authorization automatically configured to query generative AI models. Run the following in your notebook or your local terminal:

!pip install databricks-sdk[openai]>=0.35.0

The following is only required when installing the package on a Databricks Notebook

dbutils.library.restartPython()

REST API

Access to the Serving REST API is available in Databricks Runtime for Machine Learning.

MLflow Deployments SDK

!pip install mlflow

The following is only required when installing the package on a Databricks Notebook

dbutils.library.restartPython()

Databricks Python SDK

The Databricks SDK for Python is already installed on all Azure Databricks clusters that use Databricks Runtime 13.3 LTS or above. For Azure Databricks clusters that use Databricks Runtime 12.2 LTS and below, you must install the Databricks SDK for Python first. See Databricks SDK for Python.

Foundation model types

The following table summarizes the supported foundation models based on task type.

Task type	Description	Supported models	When to use? Recommended use cases
Chat	Models designed to understand and engage in natural, multi-turn conversations. They are fine-tuned on large datasets of human dialogue, which enables them to generate contextually relevant responses, track conversational history, and provide coherent, human-like interactions across various topics.	The following are supported Databricks-hosted foundation models: `databricks-gpt-oss-20B`* `databricks-gpt-oss-120B`* `databricks-gemma-3-12b` `databricks-claude-sonnet-4` `databricks-claude-opus-4` `databricks-llama-4-maverick` `databricks-claude-3.7-sonnet` `databricks-meta-llama-3-3-70b-instruct` `databricks-meta-llama-3-1-405b-instruct` `databricks-meta-llama-3-1-8b-instruct` The following are supported external models: OpenAI GPT and o series models Anthropic Claude models Google Gemini models	Recommended for scenarios where natural, multi-turn dialogue and contextual understanding are needed: Virtual assistants Customer support bots Interactive tutoring systems.
Embeddings	Embedding models are machine learning systems that transform complex data—such as text, images, or audio—into compact numerical vectors called embeddings. These vectors capture the essential features and relationships within the data, allowing for efficient comparison, clustering, and semantic search.	The following are supported Databricks-hosted foundation model: `databricks-gte-large-en` `databricks-bge-large-en` The following are supported external models: OpenAI text embedding models Cohere text embedding models Google text embedding models	Recommended for applications where semantic understanding, similarity comparison, and efficient retrieval or clustering of complex data are essential: Semantic search Retrieval augmented generation (RAG) Topic clustering Sentiment analysis and text analytics
Vision	Models designed to process, interpret, and analyze visual data—such as images and videos so machines can "see" and understand the visual world.	The following are supported Databricks-hosted foundation models: `databricks-claude-sonnet-4` `databricks-claude-opus-4` `databricks-claude-3.7-sonnet` The following are supported external models: OpenAI GPT and o series models with vision capabilities Anthropic Claude models with vision capabilities Google Gemini models with vision capabilities Other external foundation models with vision capabilities that are OpenAI API compatible are also supported.	Recommended wherever automated, accurate, and scalable analysis of visual information is needed: Object detection and recognition Image classification Image segmentation Document understanding
Reasoning	Advanced AI systems designed to simulate human-like logical thinking. Reasoning models integrate techniques such as symbolic logic, probabilistic reasoning, and neural networks to analyze context, break down tasks, and explain their decision-making.	The following are supported Databricks-hosted foundation model: `databricks-gpt-oss-20B`* `databricks-gpt-oss-120B`* `databricks-claude-sonnet-4` `databricks-claude-opus-4` `databricks-claude-3.7-sonnet` The following are supported external models: OpenAI models with reasoning capabilities Anthropic Claude models with reasoning capabilities Google Gemini models with reasoning capabilities	Recommended wherever automated, accurate, and scalable analysis of visual information is needed: Code generation Content creation and summarization Agent orchestration

Function calling

Databricks Function Calling is OpenAI-compatible and is only available during model serving as part of Foundation Model APIs and serving endpoints that serve external models. For details, see Function calling on Azure Databricks.

Structured outputs

Structured outputs is OpenAI-compatible and is only available during model serving as part of Foundation Model APIs. For details, see Structured outputs on Azure Databricks.

Chat with supported LLMs using AI Playground

You can interact with supported large language models using the AI Playground. The AI Playground is a chat-like environment where you can test, prompt, and compare LLMs from your Azure Databricks workspace.

AI playground