2026, Jan 04 03:00

How to Update Azure OpenAI Finetuned GPT Deployment Capacity via ARM API in Python (and Fix 200 OK No-Change)

Learn how to update Azure OpenAI finetuned GPT deployment capacity via the ARM API in Python and troubleshoot 200 OK no-change responses after a backend fix.

Updating the capacity of a finetuned GPT deployment on Azure via the ARM management API is a routine operation, yet it recently started returning 200 OK without actually changing the capacity. Below is a concise walkthrough of the workflow, the observed behavior, and what changed.

Problem overview

The goal is to change the tokens-per-minute capacity of an existing Azure OpenAI deployment using Python. The approach relies on fetching the current deployment configuration, preserving its properties, and issuing a PUT with an updated sku.capacity. Authentication is performed with a bearer token obtained via az account get-access-token.

Code example that reproduces the issue

The following script demonstrates the full flow: read the current deployment, modify only the capacity, and send a PUT to update the deployment. Replace placeholders with your values and pass a valid bearer token.

import json
import requests
target_capacity = 3  # 3 means 3000 tokens/minute.
# Auth and resource identifiers
bearer_token = "YOUR_BEARER_TOKEN"  # Use token from `az account get-access-token`
sub_id = ""
rg_name = ""
account_name = ""
deploy_name = ""
# ARM API params and headers
api_params = {"api-version": "2023-05-01"}
req_headers = {
    "Authorization": f"Bearer {bearer_token}",
    "Content-Type": "application/json"
}
# Get current deployment configuration
arm_url = (
    f"https://management.azure.com/subscriptions/{sub_id}/resourceGroups/{rg_name}"
    f"/providers/Microsoft.CognitiveServices/accounts/{account_name}/deployments/{deploy_name}"
)
resp = requests.get(arm_url, params=api_params, headers=req_headers)
if resp.status_code != 200:
    print(f"Failed to get current deployment: {resp.status_code}")
    print(resp.reason)
    if hasattr(resp, "json"):
        print(resp.json())
    exit(1)
existing_cfg = resp.json()
# Preserve sku.name and properties, update only capacity
patch_body = {
    "sku": {
        "name": existing_cfg["sku"]["name"],
        "capacity": target_capacity
    },
    "properties": existing_cfg["properties"]
}
print("Updating deployment capacity...")
resp = requests.put(
    arm_url,
    params=api_params,
    headers=req_headers,
    data=json.dumps(patch_body)
)
print(f"Status code: {resp.status_code}")
print(f"Reason: {resp.reason}")
if hasattr(resp, "json"):
    print(resp.json())

What was going on

The flow above used to complete in seconds and update sku.capacity as expected. It later began returning a 200 OK while the deployment’s capacity remained unchanged, effectively failing silently. The root cause was a bug on the provider side, not an issue in the client code or API usage.

Resolution

The backend bug has been fixed. Updating the capacity of a finetuned GPT model on Azure using this Python approach now works again. No changes to the request shape, API version, or workflow are required; rerunning the same sequence—GET the deployment, keep properties intact, PUT with the new sku.capacity—should apply the capacity update.

Why this matters

When managing Azure OpenAI deployments at scale, consistency between control-plane acknowledgments and actual state changes is crucial. A 200 OK that doesn’t reflect the requested update can cause confusion in automation and capacity planning. Recognizing that such behavior can stem from a transient backend issue helps avoid unnecessary refactors on the client side.

Takeaways

The practical path for updating capacity remains the same: authenticate using a bearer token from az account get-access-token, read the current deployment via the ARM endpoint, and submit a PUT with the unchanged properties and an updated sku.capacity. If you ever see 200 OK with no effective change, consider the possibility of a temporary service-side problem before revisiting your code structure.

With the provider-side fix in place, the workflow above should again reflect the requested capacity changes promptly.