Edit

Share via


Import an OpenAI-compatible Google Gemini API

APPLIES TO: All API Management tiers

This article shows you how to import an OpenAI-compatible Google Gemini API to access models such as gemini-2.0-flash. For these models, Azure API Management can manage an OpenAI-compatible chat completions endpoint.

Learn more about managing AI APIs in API Management:

Prerequisites

Import an OpenAI-compatible Gemini API using the portal

  1. In the Azure portal, navigate to your API Management instance.

  2. In the left menu, under APIs, select APIs > + Add API.

  3. Under Define a new API, select Language Model API.

    Screenshot of creating a passthrough language model API in the portal.

  4. On the Configure API tab:

    1. Enter a Display name and optional Description for the API.

    2. In URL, enter the following base URL from the Gemini OpenAI compatibility documentation: https://generativelanguage.googleapis.com/v1beta/openai

    3. In Path, append a path that your API Management instance uses to route requests to the Gemini API endpoints.

    4. In Type, select Create OpenAI API.

    5. In Access key, enter the following:

      1. Header name: Authorization.
      2. Header value (key): Bearer followed by your API key for the Gemini API.

    Screenshot of importing a Gemini LLM API in the portal.

  5. On the remaining tabs, optionally configure policies to manage token consumption, semantic caching, and AI content safety. For details, see Import a language model API.

  6. Select Review.

  7. After settings are validated, select Create.

API Management creates the API and configures the following:

  • A backend resource and a set-backend-service policy that direct API requests to the Google Gemini endpoint.
  • Access to the LLM backend using the Gemini API key you provided. The key is protected as a secret named value in API Management.
  • (optionally) Policies to help you monitor and manage the API.

Test Gemini model

After importing the API, you can test the chat completions endpoint for the API.

  1. Select the API that you created in the previous step.

  2. Select the Test tab.

  3. Select the POST Creates a model response for the given chat conversation operation, which is a POST request to the /chat/completions endpoint.

  4. In the Request body section, enter the following JSON to specify the model and an example prompt. In this example, the gemini-2.0-flash model is used.

    {
        "model": "gemini-2.0-flash",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant"
            },
            {
                "role": "user",
                "content": "How are you?"
            }
        ],
        "max_tokens": 50
    }
    

    When the test is successful, the backend responds with a successful HTTP response code and some data. Appended to the response is token usage data to help you monitor and manage your language model token consumption.

    Screenshot of testing a Gemini LLM API in the portal.