Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In this article, you learn how to write query requests for foundation models optimized for vision tasks, and send them to your model serving endpoint.
Mosaic AI Model Serving provides a unified API to understand and analyze images using a variety of foundation models, unlocking powerful multimodal capabilities. This functionality is available through select Databricks-hosted models as part of Foundation Model APIs and serving endpoints that serve external models.
Requirements
- See Requirements.
- Install the appropriate package to your cluster based on the querying client option you choose.
Query examples
from openai import OpenAI
import base64
import httpx
client = OpenAI(
api_key="dapi-your-databricks-token",
base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)
# encode image
image_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
image_data = base64.standard_b64encode(httpx.get(image_url).content).decode("utf-8")
# OpenAI request
completion = client.chat.completions.create(
model="databricks-claude-3-7-sonnet",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "what's in this image?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
],
}
],
)
print(completion.choices[0].message.content)
The Chat Completions API supports multiple image inputs, allowing the model to analyze each image and synthesize information from all inputs to generate a response to the prompt.
from openai import OpenAI
import base64
import httpx
client = OpenAI(
api_key="dapi-your-databricks-token",
base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)
# Encode multiple images
image1_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image1_data = base64.standard_b64encode(httpx.get(image1_url).content).decode("utf-8")
image2_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image2_data = base64.standard_b64encode(httpx.get(image1_url).content).decode("utf-8")
# OpenAI request
completion = client.chat.completions.create(
model="databricks-claude-3-7-sonnet",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What are in these images? Is there any difference between them?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image1_data}"},
},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image2_data}"},
},
],
}
],
)
print(completion.choices[0].message.content)
Supported models
See Foundation model types for supported vision models.
Input image requirements
This section applies only to Foundation Model APIs. For external models, refer to the provider's documentation.
Multiple images per request
- Up to 20 images for Claude.ai
- Up to 100 images for API requests
- All provided images are processed in a request, which is useful for comparing or contrasting them.
Size limitations
- Images larger than 8000x8000 px will be rejected.
- If more than 20 images are submitted in one API request, the maximum allowed size per image is 2000 x 2000 px.
Image resizing recommendations
- For optimal performance, resize images before uploading if they are too large.
- If an image's long edge exceeds 1568 pixels or its size exceeds ~1,600 tokens, it will be _automatically scaled down_while preserving aspect ratio.
- Very small images (under 200 pixels on any edge) may degrade performance.
- To reduce latency, keep images within 1.15 megapixels and at most 1568 pixels in both dimensions.
Image quality considerations
- Supported formats: JPEG, PNG, GIF, WebP.
- Clarity: Avoid blurry or pixelated images.
- Text in images:
- Ensure text is legible and not too small.
- Avoid cropping out key visual context just to enlarge the text.
Calculate costs
This section applies only to Foundation Model APIs. For external models, refer to the provider's documentation.
Each image in a request to foundation model adds to your token usage.
Token counts and estimates
If no resizing is needed, estimate tokens with: tokens = (width px × height px) / 750
Approximate token counts for different image sizes:
Image Size | Tokens |
---|---|
200×200 px (0.04 MP) | ~54 |
1000×1000 px (1 MP) | ~1334 |
1092×1092 px (1.19 MP) | ~1590 |
Limitations of image understanding
This section applies only to Foundation Model APIs. For external models, refer to the provider's documentation.
There are limitations to the advanced image understanding of the Claude model on Databricks:
- People identification: Cannot identify or name people in images.
- Accuracy: May misinterpret low-quality, rotated, or very small images (<200 px).
- Spatial reasoning: Struggles with precise layouts, such as reading analog clocks or chess positions.
- Counting: Provides approximate counts, but may be inaccurate for many small objects.
- AI-generated images: Cannot reliably detect synthetic or fake images.
- Inappropriate content: Blocks explicit or policy-violating images.
- Healthcare: Not suited for complex medical scans (for example, CTs and MRIs). It's not a diagnostic tool.
Review all outputs carefully, especially for high-stakes use cases. Avoid using Claude for tasks requiring perfect precision or sensitive analysis without human oversight.