Response API in Azure OpenAi

Muhammad Umer 0 Reputation points
2025-08-04T21:36:43.8766667+00:00

Hi

I am banging my head for couple of days to figure out what i am missing. Azure recently announced that they support ResponseAPI on azure hosted OpenAI models.

I have a GPT-4o deployed and im trying to use the following code to query via Response API. But i always get 404 Deployment not found issue.

If i switch it to chat completion - it just works fine.

I am not at a point where i have no clue whats wrong - its publically stated that it support Response API.

      const modelConfig = {
        endpoint: 'ttps://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/,
        azureADTokenProvider,
        deployment: deploymentName,
        apiVersion: this.config.apiVersion,
      };

      this.client = new AzureOpenAI(modelConfig);

this.client.responses.create({
          // model: 'gpt-4o', // Empty when using Azure OpenAI deployment
          input: request.input,
          text: promptSchema,
          instructions: this.config.systemPrompt,
          max_output_tokens:
            request.max_output_tokens || this.config.model.maxResponseTokens,
          temperature: request.temperature || this.config.temperature,
        });


Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
{count} votes

1 answer

Sort by: Most helpful
  1. Jerald Felix 4,450 Reputation points
    2025-08-05T00:47:11.6433333+00:00

    Hey Muhammad Umer!

    It looks like you’re digging into Azure OpenAI’s brand-new Responses API. Below is a quick primer on what it is, why you might choose it over plain Chat Completions or the (soon-to-retire) Assistants API, plus a ready-to-paste code snippet and a few gotchas that trip folks up on day one.


    1. What the heck is the Responses API?
    • Stateful by design. Instead of re-sending the entire conversation on every turn, you post input once and Azure returns a response_id. Pass that ID as previous_response_id in the next call and the service stitches the history together for you. Conversation state lives for 30 days (or until you delete() it).

    One API ≈ two APIs. It merges the best bits of Chat Completions and the Assistants API—tool calling, JSON‐formatted function output, vector-store search, streaming, even the new computer-use-preview model—under a single /responses route.

    Token savings. Because you’re not shipping the full transcript every time, you often cut prompt-side tokens by 70-90 %.

    Preview-only (for now). You must hit the api-version=preview endpoint and use an SDK release dated May 2025 or later. Older clients will 404.


    1. Where and with which models can I use it?
    Region (Aug 2025) Core text models* Special models
    australiaeast, eastus, eastus2, francecentral, japaneast, norwayeast, polandcentral, southindia, swedencentral, switzerlandnorth, uaenorth, uksouth, westus, westus3 gpt-4o (2024-11-20 / 08-06 / 05-13), gpt-4.1, gpt-4o-mini, gpt-4.1-nano/mini, o1, o3, o3-mini, o4-mini computer-use-preview, gpt-image-1
    australiaeast, eastus, eastus2, francecentral, japaneast, norwayeast, polandcentral, southindia, swedencentral, switzerlandnorth, uaenorth, uksouth, westus, westus3 gpt-4o (2024-11-20 / 08-06 / 05-13), gpt-4.1, gpt-4o-mini, gpt-4.1-nano/mini, o1, o3, o3-mini, o4-mini computer-use-preview, gpt-image-1

    *Not every model is lit up in every region, so always check your portal first.


    1. Quick-start (Python, SDK ≥ 1.23.0)
    python
    Copy
    from openai import AzureOpenAI
    from azure.identity import DefaultAzureCredential, get_bearer_token_provider
    
    # 1️⃣ Auth
    token_provider = get_bearer_token_provider(
        DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
    )
    client = AzureOpenAI(
        base_url="https://<your-resource>.openai.azure.com/openai/v1/",
        azure_ad_token_provider=token_provider,
        api_version="preview"
    )
    
    # 2️⃣ First turn – create a response
    first = client.responses.create(
        model="gpt-4o",            # your deployment name
        input="Summarise the plot of Dune in a single sentence."
    )
    print(first.output_text)       # ➜  “In a desert future…”
    
    # 3️⃣ Follow-up turn – reference the previous response
    second = client.responses.create(
        model="gpt-4o",
        previous_response_id=first.id,
        input=[{
            "role": "user",
            "content": "Now explain why spice is so valuable."
        }]
    )
    print(second.output_text)
    

    Tip: If you store first.id somewhere (Redis, Cosmos DB…), you can pick up the conversation later without re-feeding the transcript.


    1. Known limitations (August 2025)
    • Web Search tool isn’t wired up yet on Azure (still OpenAI-only).

    Image generation editing/streaming: multi-turn edits arrive “soon.”

    File upload: PDFs are okay, but user_data uploads and raw image-file inputs are still blocked.

    Retention: 30-day hard limit; export before that if you need long-tail analytics.

    SDK parity: Python, Java, and .NET already support responses. Node’s “AzureOpenAI” client gets it in v1.11; until then, call REST or pass extra_headers={"ms-azure-ai-type": "openai"}.


    1. When should I not use Responses API?
    Use case Better choice
    One-shot Q&A or stateless function call Chat Completions (less overhead)
    One-shot Q&A or stateless function call Chat Completions (less overhead)
    Need first-class vector search & RAG orchestration today Assistants API (until mid-2026)
    Heavy multistep tool invocation with long-lived memory across many threads Wait for Agents SDK + Responses GA (slated 1H 2026)

    1. Troubleshooting checklist

    404 or 409 “OperationNotAllowed” – ensure you’re in a Responses-enabled region and your resource has the preview feature registered.

    previous_response_id ignored – are you passing it and re-using the same model deployment? Cross-model chaining isn’t supported yet.

    Tool call schema errors still show up as 400/500. Validate with pydantic locally before sending.

    Slow first token on o4-mini? Enable stream=True to pipe bytes while the full JSON builds.


    Hope this clears up the mystery around the Responses API and gets you building quickly. Shout if you hit any other snags!

    Best regards,

    Jerald Felix

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.