vipcloud.ai docs

Build with frontier multi-model LLMs

vipcloud.ai is an OpenAI-compatible gateway for Kimi, Qwen, DeepSeek, MiniMax, and GLM. One API key, USD billing, edge-routed via Cloudflare. Drop into any OpenAI SDK in 30 seconds.

OpenAI drop-in

Same SDK, swap base_url.

No upstream account

Pay USD. We handle upstream.

Edge-routed

Cloudflare → HK origin.

Quickstart

Get a response in three steps.

1.Get an API key

Join the waitlist. We'll send you a key shaped vc-sk-... within 24h.

2.Install the OpenAI SDK

# Python
pip install openai

// Node
npm install openai

3.Make your first call

Python

from openai import OpenAI

client = OpenAI(
    api_key="vc-sk-...",
    base_url="https://vipcloud.ai/v1",
)

resp = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "vc-sk-...",
  baseURL: "https://vipcloud.ai/v1",
});

const resp = await client.chat.completions.create({
  model: "deepseek-chat",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(resp.choices[0].message.content);

curl

curl https://vipcloud.ai/v1/chat/completions \
  -H "Authorization: Bearer vc-sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [{"role":"user","content":"Hello!"}]
  }'

Authentication

Send your API key as a Bearer token. Identical to OpenAI's auth scheme.

Authorization: Bearer vc-sk-...

Keys are scoped to your account. Rotate any time from your dashboard. Never commit them to git — use environment variables (VIPCLOUD_API_KEY) or a secret manager.

Chat Completions

POST /v1/chat/completions — fully OpenAI-compatible. Same request/response shape.

Request

Field	Type	Description
model	string	Model alias. See Models.
messages	array	Conversation history with `role` and `content`.
temperature	number	0–2. Default 1.
max_tokens	integer	Max completion tokens.
stream	boolean	If `true`, server-sent events. See Streaming.
tools	array	Function calling. Provider support varies — see Models.

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1745604000,
  "model": "deepseek-chat",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "Hello!" },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 9, "completion_tokens": 2, "total_tokens": 11 }
}

Streaming (SSE)

Set stream: true for token-by-token delivery. Server-Sent Events frames are OpenAI-shaped — your existing SDK code works unchanged.

# Python streaming
stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Count to 5"}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Frames look like data: {"choices":[{"delta":{"content":"Hello"}}]}. The stream ends with data: [DONE]. The final pre-DONE chunk includes usage when the upstream supports it.

Models

Pass any of these IDs as the model field. We route to the right upstream and translate where needed.

Model ID	Provider	Status	Notes
deepseek-chat	DeepSeek	live	V3.2 chat. 64K ctx, function calling.
deepseek-reasoner	DeepSeek	live	Reasoning model. 64K ctx.
deepseek-v3.2	DeepSeek	live	Alias of `deepseek-chat`.
glm-4-flash	Zhipu BigModel	free	Permanently free upstream. Powers our demo.
glm-4-air	Zhipu BigModel	live	Fast, cheap, 128K ctx.
glm-4-plus	Zhipu BigModel	live	Top-tier, 128K ctx, vision-capable.
glm-4	Zhipu BigModel	live	General-purpose chat.
kimi-k2	Moonshot	soon	200K ctx, agentic tools.
qwen3-max	Alibaba	soon	119 languages, vision.
minimax-text-01	MiniMax	soon	Text + voice + video roadmap.

Live list: GET https://vipcloud.ai/v1/models

Errors

Errors return a JSON envelope matching OpenAI's shape.

{
  "error": {
    "message": "Invalid API key.",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

Status	Code	Meaning
400	bad_request	Malformed JSON or missing required field.
401	invalid_api_key	Missing or wrong Bearer token.
404	model_not_found	Unknown model ID. See Models.
429	rate_limited	Too many requests. Honor `retry-after`.
501	not_implemented	Provider/feature not yet wired (e.g. soon-models).
502	upstream_error	Upstream returned an error. Retry with backoff.
504	upstream_timeout	Upstream took too long. Default 5min cap on streaming.

Migrate from OpenAI

Two changes. That's it.

# Before
client = OpenAI(
    api_key="sk-openai-...",
)
client.chat.completions.create(model="gpt-4o", ...)

# After
client = OpenAI(
    api_key="vc-sk-...",
    base_url="https://vipcloud.ai/v1",  # NEW
)
client.chat.completions.create(model="deepseek-chat", ...)  # model swap

Streaming, function calling, JSON mode work without code changes — we translate provider quirks behind the scenes. Vision and audio are model-specific; check Models.

Rate Limits

Defaults below. Need higher? Email [email protected] with your use case.

Tier	RPM	RPD	Concurrency
Free demo (no key)	5/min/IP	2,000 / day total	1
Pay-as-you-go	60/min	unlimited	10
Pro / Enterprise	600+/min	unlimited	100+

429 responses include Retry-After in seconds. Use exponential backoff with jitter.

Pricing

Per 1M tokens, USD. Indicative — final rates locked when billing goes live (May 2026).

Markup: ~15% over upstream list price. Pre-paid credits, no monthly minimum.

Model	Input	Output	Notes
glm-4-flash	$0.00	$0.00	free
deepseek-chat	$0.31	$1.27	V3.2, 64K ctx
deepseek-reasoner	$0.63	$2.53	reasoning
glm-4-air	$0.16	$0.16	128K ctx
glm-4-plus	$8.05	$8.05	flagship
kimi-k2	$0.69	$2.88	200K ctx (soon)
qwen3-max	$1.38	$5.52	multilingual (soon)

Free demo on the homepage uses glm-4-flash. Token usage is reported in the usage object on every response.

FAQ

Do I need to sign up with each upstream provider?

No. We hold the upstream accounts. You sign up with email, pay in USD by card, get a vc-sk- key. That's it.

Where is the gateway hosted?

Cloudflare edge worldwide, with origin in Hong Kong (Tencent Lighthouse). Median latency under 200ms from US/EU. Tunneled — no public origin IP exposed.

Is OpenAI's full API surface supported?

Today: /v1/chat/completions (streaming + non-streaming) and /v1/models. Embeddings, image gen, and audio are on the roadmap when matching upstreams ship.

What about data residency?

Your prompt is forwarded vipcloud → upstream provider for inference. Each upstream's terms of service apply to your content. We do not cache or train on your data. For sensitive workloads, review each provider's published data policy and pick the one that matches your compliance posture.

Can I self-host or get a dedicated instance?

Yes — Enterprise tier offers dedicated upstream pools, custom rate limits, SLA, and BYO-key (you bring your own DeepSeek/etc. keys). Email [email protected].

What's the SLA?

99.5% gateway uptime targeted (we tunnel via Cloudflare, so edge is theirs). Upstream availability follows each provider — we route around brief outages where possible. Status page coming.