Query an embedding model

2025-08-07

In this article, you learn how to write query requests for foundation models that are optimized for embeddings tasks and send them to your model serving endpoint.

The examples in this article apply to querying foundation models that are made available using either:

Foundation Models APIs which are referred to as Databricks-hosted foundation models.
External models which are referred to as foundation models hosted outside of Databricks.

Requirements

See Requirements.
Install the appropriate package to your cluster based on the querying client option you choose.

Query examples

The following is an embeddings request for the gte-large-en model made available by Foundation Model APIs pay-per-token, using the different client options.

OpenAI client

To use the OpenAI client, specify the model serving endpoint name as the model input.


from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()

response = openai_client.embeddings.create(
  model="databricks-gte-large-en",
  input="what is databricks"
)

To query foundation models outside your workspace, you must use the OpenAI client directly, as demonstrated below. The following example assumes you have a Databricks API token and openai installed on your compute. You also need your Databricks workspace instance to connect the OpenAI client to Databricks.


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.embeddings.create(
  model="databricks-gte-large-en",
  input="what is databricks"
)

SQL

Important

The following example uses the built-in SQL function, ai_query. This function is in Public Preview and the definition might change.


SELECT ai_query(
    "databricks-gte-large-en",
    "Can you explain AI in ten words?"
  )

REST API

Important

The following example uses REST API parameters for querying serving endpoints that serve foundation models or external models. These parameters are in Public Preview and the definition might change. See POST /serving-endpoints/{name}/invocations.


curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d  '{ "input": "Embed this sentence!"}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-gte-large-en/invocations

MLflow Deployments SDK

Important

The following example uses the predict() API from the MLflow Deployments SDK.


import mlflow.deployments

export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

embeddings_response = client.predict(
    endpoint="databricks-gte-large-en",
    inputs={
        "input": "Here is some text to embed"
    }
)

Databricks Python SDK


from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="databricks-gte-large-en",
    input="Embed this sentence!"
)
print(response.data[0].embedding)

LangChain

To use a Databricks Foundation Model APIs model in LangChain as an embedding model, import the DatabricksEmbeddings class and specify the endpoint parameter as follows:

%pip install databricks-langchain

from databricks_langchain import DatabricksEmbeddings

embeddings = DatabricksEmbeddings(endpoint="databricks-gte-large-en")
embeddings.embed_query("Can you explain AI in ten words?")

The following is the expected request format for an embeddings model. For external models, you can include additional parameters that are valid for a given provider and endpoint configuration. See Additional query parameters.


{
  "input": [
    "embedding text"
  ]
}

The following is the expected response format:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": []
    }
  ],
  "model": "text-embedding-ada-002-v2",
  "usage": {
    "prompt_tokens": 2,
    "total_tokens": 2
  }
}

Supported models

See Foundation model types for supported embedding models.

Check whether embeddings are normalized

Use the following to check if the embeddings generated by your model are normalized.


  import numpy as np

  def is_normalized(vector: list[float], tol=1e-3) -> bool:
      magnitude = np.linalg.norm(vector)
      return abs(magnitude - 1) < tol

Share via

Query an embedding model

Requirements

Query examples

OpenAI client

SQL

REST API

MLflow Deployments SDK

Databricks Python SDK

LangChain

Supported models

Check whether embeddings are normalized

Additional resources

Feedback

Additional resources