Claude API — HTTP Interface Reference

Everything you need to make your first claude api call: authentication, the messages endpoint, rate limits by tier, and which SDK saves the most boilerplate.

Rate-limit reminder

The claude api enforces both requests-per-minute and tokens-per-minute limits. Both apply simultaneously — hitting the TPM ceiling on a large Opus prompt is the more common blocker for teams that send long contexts frequently.

Authentication and API key handling

Every claude api request requires an x-api-key header carrying your account's API key. Keys are issued from the Anthropic developer console and have no expiry by default — rotation is a manual process, and best practice is to treat keys as secrets with the same discipline you apply to database passwords. Store them in environment variables or a secrets manager; never paste them into source files.

There is no OAuth flow for the claude api in standard tiers. The single-header bearer model is straightforward: generate a key, store it, pass it with every request. Enterprise tiers layer on workspace-level key management and audit logging, but the request format stays the same. If you are behind a corporate proxy, the standard HTTPS_PROXY environment variable is honoured by both official SDKs without extra configuration.

Core endpoints

The claude api surface is intentionally narrow. Most developers spend their entire integration lifetime on a single endpoint. The POST /v1/messages route accepts a model identifier, a messages array in chat format, an optional system string, and a max_tokens ceiling. The response is a JSON object with a content array holding the model's reply. That is the whole loop for a non-streaming call.

Streaming is enabled by adding "stream": true to the request body. The API then returns server-sent events, each carrying a content delta, until the final message_stop event signals completion. The official Python and Node SDKs wrap this in an async iterator, so you do not parse raw SSE unless you are rolling your own HTTP client. A GET /v1/models endpoint lists available models with their identifiers, which is useful for programmatically selecting the newest available checkpoint without hardcoding a version string.

Endpoints at a glance

Endpoint	Method	Purpose
/v1/messages	POST	Send a message and receive a model response
/v1/messages (stream)	POST	Same as above with SSE streaming enabled
/v1/models	GET	List available model identifiers
/v1/messages/batches	POST	Submit async batch of message requests
/v1/messages/batches/{id}	GET	Retrieve batch job status and results

Rate limits and tier behaviour

The claude api enforces two independent rate-limit axes: requests per minute and tokens per minute. Both apply simultaneously, and you can hit either ceiling independently. A workload that sends many short prompts might exhaust RPM first; a workload with long contexts will usually hit TPM. Current limits are tiered by account plan — trial accounts start low, and limits scale up as account age and spend history grow.

Opus carries lower RPM limits than Sonnet or Haiku because each Opus request consumes more compute. Teams running mixed-model pipelines often find it effective to route bulk, short tasks to Haiku and reserve the Opus RPM headroom for latency-sensitive deep analysis. The batch endpoint side-steps per-minute limits entirely for async workloads — you submit a batch, poll for completion, and the response arrives without consuming your interactive RPM quota. For academic context on rate-limiting patterns in large inference systems, the NIST AI resource library catalogues relevant published work.

SDKs and client libraries

Anthropic maintains official SDKs for Python and TypeScript. Both handle retry logic on transient errors (408, 429, 5xx), streaming via async iterators, and type-safe request construction. The Python SDK is available via pip and works with both sync and async codebases. The TypeScript SDK ships as an npm package and integrates cleanly with Next.js, Express, and vanilla Node. If you prefer raw HTTP calls — perhaps in a language without an official SDK — the request format is straightforward JSON over HTTPS and the response is equally simple to parse.

Community-maintained clients exist for Go, Ruby, Java, and Rust. Quality varies, but the Go and TypeScript community variants are widely used in production. The key behaviour to verify in any third-party client is correct retry handling on 529 (overloaded) responses, which the official SDKs back off on exponentially. Missing that behaviour causes thundering-herd effects at scale.

"The claude api reference here gave us a clean mental model before we touched the SDK. We shipped our first integration in an afternoon instead of a day."

— Anouk L. FontaineFrontend Lead · Verdemore Atelier · Paris

Frequently asked questions about the Claude API

How do I authenticate with the Claude API?

Pass your API key in the x-api-key request header on every call. Keys are generated in the Anthropic developer console. Store them as environment variables — never in source files. The official SDKs pick up the key automatically from the ANTHROPIC_API_KEY environment variable.

What is the main endpoint for the Claude API?

POST /v1/messages is the primary endpoint. You pass a model identifier, a messages array, and optional system prompt. The response returns a content array with the model output. A streaming variant is available by adding "stream": true to the request body.

What rate limits does the Claude API enforce?

The API enforces both RPM (requests per minute) and TPM (tokens per minute) limits simultaneously. Limits vary by plan tier and model. Opus carries lower RPM limits than Sonnet or Haiku. The batch endpoint bypasses interactive rate limits for async workloads.

Which SDKs are available for the Claude API?

Official SDKs exist for Python and TypeScript/Node.js — both handle retries, streaming, and type-safe requests. Community SDKs cover Go, Ruby, Java, and Rust. For any client, verify that it handles 529 (overloaded) responses with exponential backoff.

Does the Claude API support streaming responses?

Yes. Set "stream": true in the request body and the API returns server-sent events with incremental content deltas. The official Python and TypeScript SDKs wrap the SSE stream in async iterators so you do not need to parse raw events.

Check current API pricing before you build

Token costs and caching discounts change — the pricing page keeps current figures so your cost model stays accurate.

Open the pricing reference