VC vipcloud.ai Docs
vipcloud.ai docs

Build with frontier multi-model LLMs

vipcloud.ai is an OpenAI-compatible gateway for Kimi, Qwen, DeepSeek, MiniMax, and GLM. One API key, USD billing, edge-routed via Cloudflare. Drop into any OpenAI SDK in 30 seconds.

OpenAI drop-in
Same SDK, swap base_url.
No upstream account
Pay USD. We handle upstream.
Edge-routed
Cloudflare → HK origin.

Quickstart

Get a response in three steps.

  1. 1.Get an API key

    Join the waitlist. We'll send you a key shaped vc-sk-... within 24h.

  2. 2.Install the OpenAI SDK
    # Python
    pip install openai
    // Node
    npm install openai
  3. 3.Make your first call
    Python
    from openai import OpenAI
    
    client = OpenAI(
        api_key="vc-sk-...",
        base_url="https://vipcloud.ai/v1",
    )
    
    resp = client.chat.completions.create(
        model="deepseek-chat",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(resp.choices[0].message.content)
    Node.js
    import OpenAI from "openai";
    
    const client = new OpenAI({
      apiKey: "vc-sk-...",
      baseURL: "https://vipcloud.ai/v1",
    });
    
    const resp = await client.chat.completions.create({
      model: "deepseek-chat",
      messages: [{ role: "user", content: "Hello!" }],
    });
    console.log(resp.choices[0].message.content);
    curl
    curl https://vipcloud.ai/v1/chat/completions \
      -H "Authorization: Bearer vc-sk-..." \
      -H "Content-Type: application/json" \
      -d '{
        "model": "deepseek-chat",
        "messages": [{"role":"user","content":"Hello!"}]
      }'

Authentication

Send your API key as a Bearer token. Identical to OpenAI's auth scheme.

Authorization: Bearer vc-sk-...

Keys are scoped to your account. Rotate any time from your dashboard. Never commit them to git — use environment variables (VIPCLOUD_API_KEY) or a secret manager.

Chat Completions

POST /v1/chat/completions — fully OpenAI-compatible. Same request/response shape.

Request

Field Type Description
modelstringModel alias. See Models.
messagesarrayConversation history with role and content.
temperaturenumber0–2. Default 1.
max_tokensintegerMax completion tokens.
streambooleanIf true, server-sent events. See Streaming.
toolsarrayFunction calling. Provider support varies — see Models.

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1745604000,
  "model": "deepseek-chat",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "Hello!" },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 9, "completion_tokens": 2, "total_tokens": 11 }
}

Streaming (SSE)

Set stream: true for token-by-token delivery. Server-Sent Events frames are OpenAI-shaped — your existing SDK code works unchanged.

# Python streaming
stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Count to 5"}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Frames look like data: {"choices":[{"delta":{"content":"Hello"}}]}. The stream ends with data: [DONE]. The final pre-DONE chunk includes usage when the upstream supports it.

Models

Pass any of these IDs as the model field. We route to the right upstream and translate where needed.

Model ID Provider Status Notes
deepseek-chatDeepSeekliveV3.2 chat. 64K ctx, function calling.
deepseek-reasonerDeepSeekliveReasoning model. 64K ctx.
deepseek-v3.2DeepSeekliveAlias of deepseek-chat.
glm-4-flashZhipu BigModelfreePermanently free upstream. Powers our demo.
glm-4-airZhipu BigModelliveFast, cheap, 128K ctx.
glm-4-plusZhipu BigModelliveTop-tier, 128K ctx, vision-capable.
glm-4Zhipu BigModelliveGeneral-purpose chat.
kimi-k2Moonshotsoon200K ctx, agentic tools.
qwen3-maxAlibabasoon119 languages, vision.
minimax-text-01MiniMaxsoonText + voice + video roadmap.

Live list: GET https://vipcloud.ai/v1/models

Errors

Errors return a JSON envelope matching OpenAI's shape.

{
  "error": {
    "message": "Invalid API key.",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}
Status Code Meaning
400bad_requestMalformed JSON or missing required field.
401invalid_api_keyMissing or wrong Bearer token.
404model_not_foundUnknown model ID. See Models.
429rate_limitedToo many requests. Honor retry-after.
501not_implementedProvider/feature not yet wired (e.g. soon-models).
502upstream_errorUpstream returned an error. Retry with backoff.
504upstream_timeoutUpstream took too long. Default 5min cap on streaming.

Migrate from OpenAI

Two changes. That's it.

# Before
client = OpenAI(
    api_key="sk-openai-...",
)
client.chat.completions.create(model="gpt-4o", ...)

# After
client = OpenAI(
    api_key="vc-sk-...",
    base_url="https://vipcloud.ai/v1",  # NEW
)
client.chat.completions.create(model="deepseek-chat", ...)  # model swap

Streaming, function calling, JSON mode work without code changes — we translate provider quirks behind the scenes. Vision and audio are model-specific; check Models.

Rate Limits

Defaults below. Need higher? Email [email protected] with your use case.

Tier RPM RPD Concurrency
Free demo (no key)5/min/IP2,000 / day total1
Pay-as-you-go60/minunlimited10
Pro / Enterprise600+/minunlimited100+

429 responses include Retry-After in seconds. Use exponential backoff with jitter.

Pricing

Per 1M tokens, USD. Indicative — final rates locked when billing goes live (May 2026).

Markup: ~15% over upstream list price. Pre-paid credits, no monthly minimum.

Model Input Output Notes
glm-4-flash$0.00$0.00free
deepseek-chat$0.31$1.27V3.2, 64K ctx
deepseek-reasoner$0.63$2.53reasoning
glm-4-air$0.16$0.16128K ctx
glm-4-plus$8.05$8.05flagship
kimi-k2$0.69$2.88200K ctx (soon)
qwen3-max$1.38$5.52multilingual (soon)

Free demo on the homepage uses glm-4-flash. Token usage is reported in the usage object on every response.

FAQ

Do I need to sign up with each upstream provider?

No. We hold the upstream accounts. You sign up with email, pay in USD by card, get a vc-sk- key. That's it.

Where is the gateway hosted?

Cloudflare edge worldwide, with origin in Hong Kong (Tencent Lighthouse). Median latency under 200ms from US/EU. Tunneled — no public origin IP exposed.

Is OpenAI's full API surface supported?

Today: /v1/chat/completions (streaming + non-streaming) and /v1/models. Embeddings, image gen, and audio are on the roadmap when matching upstreams ship.

What about data residency?

Your prompt is forwarded vipcloud → upstream provider for inference. Each upstream's terms of service apply to your content. We do not cache or train on your data. For sensitive workloads, review each provider's published data policy and pick the one that matches your compliance posture.

Can I self-host or get a dedicated instance?

Yes — Enterprise tier offers dedicated upstream pools, custom rate limits, SLA, and BYO-key (you bring your own DeepSeek/etc. keys). Email [email protected].

What's the SLA?

99.5% gateway uptime targeted (we tunnel via Cloudflare, so edge is theirs). Upstream availability follows each provider — we route around brief outages where possible. Status page coming.