Build with frontier multi-model LLMs
vipcloud.ai is an OpenAI-compatible gateway for Kimi, Qwen, DeepSeek, MiniMax, and GLM. One API key, USD billing, edge-routed via Cloudflare. Drop into any OpenAI SDK in 30 seconds.
Quickstart
Get a response in three steps.
-
1.Get an API key
Join the waitlist. We'll send you a key shaped
vc-sk-...within 24h. -
2.Install the OpenAI SDK
# Python pip install openai// Node npm install openai -
3.Make your first callPython
from openai import OpenAI client = OpenAI( api_key="vc-sk-...", base_url="https://vipcloud.ai/v1", ) resp = client.chat.completions.create( model="deepseek-chat", messages=[{"role": "user", "content": "Hello!"}], ) print(resp.choices[0].message.content)
Node.jsimport OpenAI from "openai"; const client = new OpenAI({ apiKey: "vc-sk-...", baseURL: "https://vipcloud.ai/v1", }); const resp = await client.chat.completions.create({ model: "deepseek-chat", messages: [{ role: "user", content: "Hello!" }], }); console.log(resp.choices[0].message.content);
curlcurl https://vipcloud.ai/v1/chat/completions \ -H "Authorization: Bearer vc-sk-..." \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-chat", "messages": [{"role":"user","content":"Hello!"}] }'
Authentication
Send your API key as a Bearer token. Identical to OpenAI's auth scheme.
Authorization: Bearer vc-sk-...
Keys are scoped to your account. Rotate any time from your dashboard. Never commit them to git — use environment variables (VIPCLOUD_API_KEY) or a secret manager.
Chat Completions
POST /v1/chat/completions — fully OpenAI-compatible. Same request/response shape.
Request
| Field | Type | Description |
|---|---|---|
| model | string | Model alias. See Models. |
| messages | array | Conversation history with role and content. |
| temperature | number | 0–2. Default 1. |
| max_tokens | integer | Max completion tokens. |
| stream | boolean | If true, server-sent events. See Streaming. |
| tools | array | Function calling. Provider support varies — see Models. |
Response
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1745604000,
"model": "deepseek-chat",
"choices": [{
"index": 0,
"message": { "role": "assistant", "content": "Hello!" },
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 9, "completion_tokens": 2, "total_tokens": 11 }
}
Streaming (SSE)
Set stream: true for token-by-token delivery. Server-Sent Events frames are OpenAI-shaped — your existing SDK code works unchanged.
# Python streaming stream = client.chat.completions.create( model="deepseek-chat", messages=[{"role": "user", "content": "Count to 5"}], stream=True, ) for chunk in stream: delta = chunk.choices[0].delta.content or "" print(delta, end="", flush=True)
Frames look like data: {"choices":[{"delta":{"content":"Hello"}}]}. The stream ends with data: [DONE]. The final pre-DONE chunk includes usage when the upstream supports it.
Models
Pass any of these IDs as the model field. We route to the right upstream and translate where needed.
| Model ID | Provider | Status | Notes |
|---|---|---|---|
| deepseek-chat | DeepSeek | live | V3.2 chat. 64K ctx, function calling. |
| deepseek-reasoner | DeepSeek | live | Reasoning model. 64K ctx. |
| deepseek-v3.2 | DeepSeek | live | Alias of deepseek-chat. |
| glm-4-flash | Zhipu BigModel | free | Permanently free upstream. Powers our demo. |
| glm-4-air | Zhipu BigModel | live | Fast, cheap, 128K ctx. |
| glm-4-plus | Zhipu BigModel | live | Top-tier, 128K ctx, vision-capable. |
| glm-4 | Zhipu BigModel | live | General-purpose chat. |
| kimi-k2 | Moonshot | soon | 200K ctx, agentic tools. |
| qwen3-max | Alibaba | soon | 119 languages, vision. |
| minimax-text-01 | MiniMax | soon | Text + voice + video roadmap. |
Live list: GET https://vipcloud.ai/v1/models
Errors
Errors return a JSON envelope matching OpenAI's shape.
{
"error": {
"message": "Invalid API key.",
"type": "invalid_request_error",
"code": "invalid_api_key"
}
}
| Status | Code | Meaning |
|---|---|---|
| 400 | bad_request | Malformed JSON or missing required field. |
| 401 | invalid_api_key | Missing or wrong Bearer token. |
| 404 | model_not_found | Unknown model ID. See Models. |
| 429 | rate_limited | Too many requests. Honor retry-after. |
| 501 | not_implemented | Provider/feature not yet wired (e.g. soon-models). |
| 502 | upstream_error | Upstream returned an error. Retry with backoff. |
| 504 | upstream_timeout | Upstream took too long. Default 5min cap on streaming. |
Migrate from OpenAI
Two changes. That's it.
# Before client = OpenAI( api_key="sk-openai-...", ) client.chat.completions.create(model="gpt-4o", ...) # After client = OpenAI( api_key="vc-sk-...", base_url="https://vipcloud.ai/v1", # NEW ) client.chat.completions.create(model="deepseek-chat", ...) # model swap
Streaming, function calling, JSON mode work without code changes — we translate provider quirks behind the scenes. Vision and audio are model-specific; check Models.
Rate Limits
Defaults below. Need higher? Email [email protected] with your use case.
| Tier | RPM | RPD | Concurrency |
|---|---|---|---|
| Free demo (no key) | 5/min/IP | 2,000 / day total | 1 |
| Pay-as-you-go | 60/min | unlimited | 10 |
| Pro / Enterprise | 600+/min | unlimited | 100+ |
429 responses include Retry-After in seconds. Use exponential backoff with jitter.
Pricing
Per 1M tokens, USD. Indicative — final rates locked when billing goes live (May 2026).
Markup: ~15% over upstream list price. Pre-paid credits, no monthly minimum.
| Model | Input | Output | Notes |
|---|---|---|---|
| glm-4-flash | $0.00 | $0.00 | free |
| deepseek-chat | $0.31 | $1.27 | V3.2, 64K ctx |
| deepseek-reasoner | $0.63 | $2.53 | reasoning |
| glm-4-air | $0.16 | $0.16 | 128K ctx |
| glm-4-plus | $8.05 | $8.05 | flagship |
| kimi-k2 | $0.69 | $2.88 | 200K ctx (soon) |
| qwen3-max | $1.38 | $5.52 | multilingual (soon) |
Free demo on the homepage uses glm-4-flash. Token usage is reported in the usage object on every response.
FAQ
Do I need to sign up with each upstream provider?
No. We hold the upstream accounts. You sign up with email, pay in USD by card, get a vc-sk- key. That's it.
Where is the gateway hosted?
Cloudflare edge worldwide, with origin in Hong Kong (Tencent Lighthouse). Median latency under 200ms from US/EU. Tunneled — no public origin IP exposed.
Is OpenAI's full API surface supported?
Today: /v1/chat/completions (streaming + non-streaming) and /v1/models. Embeddings, image gen, and audio are on the roadmap when matching upstreams ship.
What about data residency?
Your prompt is forwarded vipcloud → upstream provider for inference. Each upstream's terms of service apply to your content. We do not cache or train on your data. For sensitive workloads, review each provider's published data policy and pick the one that matches your compliance posture.
Can I self-host or get a dedicated instance?
Yes — Enterprise tier offers dedicated upstream pools, custom rate limits, SLA, and BYO-key (you bring your own DeepSeek/etc. keys). Email [email protected].
What's the SLA?
99.5% gateway uptime targeted (we tunnel via Cloudflare, so edge is theirs). Upstream availability follows each provider — we route around brief outages where possible. Status page coming.