Response API in Azure OpenAi

Question

Response API in Azure OpenAi

Muhammad Umer 0

Hi

I am banging my head for couple of days to figure out what i am missing. Azure recently announced that they support ResponseAPI on azure hosted OpenAI models.

I have a GPT-4o deployed and im trying to use the following code to query via Response API. But i always get 404 Deployment not found issue.

If i switch it to chat completion - it just works fine.

I am not at a point where i have no clue whats wrong - its publically stated that it support Response API.

      const modelConfig = {
        endpoint: 'ttps://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/,
        azureADTokenProvider,
        deployment: deploymentName,
        apiVersion: this.config.apiVersion,
      };

      this.client = new AzureOpenAI(modelConfig);

this.client.responses.create({
          // model: 'gpt-4o', // Empty when using Azure OpenAI deployment
          input: request.input,
          text: promptSchema,
          instructions: this.config.systemPrompt,
          max_output_tokens:
            request.max_output_tokens || this.config.model.maxResponseTokens,
          temperature: request.temperature || this.config.temperature,
        });

Manas Mohanty 8,150 Microsoft External Staff Moderator

Hi Muhammad Umer

Hope you found the observation and code snippets from Jerald Felix useful

Wanted to emphasize that You seem to have made a typo in endpoint.

it should be https instead of ttps

endpoint: 'ttps://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/,

404's will also come if you are not passing API keys or API version correctly or testing in a not supported regions.

Supported regions for response api

australiaeast
eastus
eastus2
francecentral
japaneast
norwayeast
polandcentral
southindia
swedencentral
switzerlandnorth
uaenorth
uksouth
westus
westus3


from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

# 1️⃣ Auth
token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)
client = AzureOpenAI(
    base_url="https://<your-resource>.openai.azure.com/openai/v1/",
    azure_ad_token_provider=token_provider,
    api_version="preview"
)

# 2️⃣ First turn – create a response
first = client.responses.create(
    model="gpt-4o",            # your deployment name
    input="Summarise the plot of Dune in a single sentence."
)
print(first.output_text)       # ➜  “In a desert future…”

# 3️⃣ Follow-up turn – reference the previous response
second = client.responses.create(
    model="gpt-4o",
    previous_response_id=first.id,
    input=[{
        "role": "user",
        "content": "Now explain why spice is so valuable."
    }]
)
print(second.output_text)

Reference used - https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/responses?tabs=python-secure

Hope it fixes the 404 issue at your side now

Thank you

Muhammad Umer 0 Reputation points

2025-08-08T06:27:25.5566667+00:00

It still doesnt solve the issue. the url start from https - there was a typo when i pasted here.

1 answer

Your answer

Manas Mohanty 8,150 Reputation points Microsoft External Staff Moderator

2025-08-07T08:02:21.2233333+00:00

Hi Muhammad Umer

Hope you found the observation and code snippets from Jerald Felix useful

Wanted to emphasize that You seem to have made a typo in endpoint.

it should be https instead of ttps

endpoint: 'ttps://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/,

404's will also come if you are not passing API keys or API version correctly or testing in a not supported regions.

Supported regions for response api

australiaeast

eastus

eastus2

francecentral

japaneast

norwayeast

polandcentral

southindia

swedencentral

switzerlandnorth

uaenorth

uksouth

westus

westus3

from openai import AzureOpenAI from azure.identity import DefaultAzureCredential, get_bearer_token_provider # 1️⃣ Auth token_provider = get_bearer_token_provider( DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default" ) client = AzureOpenAI( base_url="https://<your-resource>.openai.azure.com/openai/v1/", azure_ad_token_provider=token_provider, api_version="preview" ) # 2️⃣ First turn – create a response first = client.responses.create( model="gpt-4o", # your deployment name input="Summarise the plot of Dune in a single sentence." ) print(first.output_text) # ➜ “In a desert future…” # 3️⃣ Follow-up turn – reference the previous response second = client.responses.create( model="gpt-4o", previous_response_id=first.id, input=[{ "role": "user", "content": "Now explain why spice is so valuable." }] ) print(second.output_text)

Reference used - https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/responses?tabs=python-secure

Hope it fixes the 404 issue at your side now

Thank you
Muhammad Umer 0 Reputation points

2025-08-08T06:27:25.5566667+00:00

It still doesnt solve the issue. the url start from https - there was a typo when i pasted here.

Answer 1

Hey Muhammad Umer!

It looks like you’re digging into Azure OpenAI’s brand-new Responses API. Below is a quick primer on what it is, why you might choose it over plain Chat Completions or the (soon-to-retire) Assistants API, plus a ready-to-paste code snippet and a few gotchas that trip folks up on day one.

What the heck is the Responses API?

Stateful by design. Instead of re-sending the entire conversation on every turn, you post input once and Azure returns a response_id. Pass that ID as previous_response_id in the next call and the service stitches the history together for you. Conversation state lives for 30 days (or until you delete() it).

One API ≈ two APIs. It merges the best bits of Chat Completions and the Assistants API—tool calling, JSON‐formatted function output, vector-store search, streaming, even the new computer-use-preview model—under a single /responses route.

Token savings. Because you’re not shipping the full transcript every time, you often cut prompt-side tokens by 70-90 %.

Preview-only (for now). You must hit the api-version=preview endpoint and use an SDK release dated May 2025 or later. Older clients will 404.

Where and with which models can I use it?

Region (Aug 2025)	Core text models*	Special models
australiaeast, eastus, eastus2, francecentral, japaneast, norwayeast, polandcentral, southindia, swedencentral, switzerlandnorth, uaenorth, uksouth, westus, westus3	gpt-4o (2024-11-20 / 08-06 / 05-13), gpt-4.1, gpt-4o-mini, gpt-4.1-nano/mini, o1, o3, o3-mini, o4-mini	computer-use-preview, gpt-image-1
australiaeast, eastus, eastus2, francecentral, japaneast, norwayeast, polandcentral, southindia, swedencentral, switzerlandnorth, uaenorth, uksouth, westus, westus3	`gpt-4o (2024-11-20 / 08-06 / 05-13)`, `gpt-4.1`, `gpt-4o-mini`, `gpt-4.1-nano/mini`, `o1`, `o3`, `o3-mini`, `o4-mini`	`computer-use-preview`, `gpt-image-1`

*Not every model is lit up in every region, so always check your portal first.

Quick-start (Python, SDK ≥ 1.23.0)

python
Copy
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

# 1️⃣ Auth
token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)
client = AzureOpenAI(
    base_url="https://<your-resource>.openai.azure.com/openai/v1/",
    azure_ad_token_provider=token_provider,
    api_version="preview"
)

# 2️⃣ First turn – create a response
first = client.responses.create(
    model="gpt-4o",            # your deployment name
    input="Summarise the plot of Dune in a single sentence."
)
print(first.output_text)       # ➜  “In a desert future…”

# 3️⃣ Follow-up turn – reference the previous response
second = client.responses.create(
    model="gpt-4o",
    previous_response_id=first.id,
    input=[{
        "role": "user",
        "content": "Now explain why spice is so valuable."
    }]
)
print(second.output_text)

Tip: If you store first.id somewhere (Redis, Cosmos DB…), you can pick up the conversation later without re-feeding the transcript.

Known limitations (August 2025)

Web Search tool isn’t wired up yet on Azure (still OpenAI-only).

Image generation editing/streaming: multi-turn edits arrive “soon.”

File upload: PDFs are okay, but user_data uploads and raw image-file inputs are still blocked.

Retention: 30-day hard limit; export before that if you need long-tail analytics.

SDK parity: Python, Java, and .NET already support responses. Node’s “AzureOpenAI” client gets it in v1.11; until then, call REST or pass extra_headers={"ms-azure-ai-type": "openai"}.

When should I not use Responses API?

Use case	Better choice
One-shot Q&A or stateless function call	Chat Completions (less overhead)
One-shot Q&A or stateless function call	Chat Completions (less overhead)
Need first-class vector search & RAG orchestration today	Assistants API (until mid-2026)
Heavy multistep tool invocation with long-lived memory across many threads	Wait for Agents SDK + Responses GA (slated 1H 2026)

Troubleshooting checklist

404 or 409 “OperationNotAllowed” – ensure you’re in a Responses-enabled region and your resource has the preview feature registered.

previous_response_id ignored – are you passing it and re-using the same model deployment? Cross-model chaining isn’t supported yet.

Tool call schema errors still show up as 400/500. Validate with pydantic locally before sending.

Slow first token on o4-mini? Enable stream=True to pipe bytes while the full JSON builds.

Hope this clears up the mystery around the Responses API and gets you building quickly. Shout if you hit any other snags!

Best regards,

Jerald Felix

Share via

Response API in Azure OpenAi

1 answer

Your answer