The online version runs Naia's own models in the cloud, so you can talk to Naia from a browser or app without your own GPU.
⏳ The full cloud service is planned (coming soon). For now use the 4.4 Offline version (your own GPU) and the 4.3 free Live Demo. The How it works / developer API below is the same gateway Realtime API the demo already uses.
Plan
- Tier B — rented cloud GPU (planned): $0.33 / hour (planned price, billed per active minute); time-reservation under review. Not available yet.
- A full online service opens once a GPU pool is secured. (Cloud RunPod/Vast is currently under temporary suspension for prep/experimentation — demo & offline run on local GPUs.)
- Until then, the main paths are the offline (owned) $10/month subscription for all Naia models (individuals only; B2B by separate arrangement) + the free demo.
Models
- naia-0.9-omni-24g (available now — offline / demo)
- naia-0.9-coding-24g · naia-0.9-omni-48g (planned)
Full pricing & lineup: 4.1 Model pricing.
How it works — gateway Realtime API (developers)
The shared entry point for online (cloud), the web demo, and naia-os. naia-0.9-omni-24g is served over an OpenAI Realtime API–compatible WebSocket. The client connects to the gateway, which admits it onto a backend (a local GPU slot, or — when cloud is enabled — a cloud Pod).
1. Issue an API key
Create one on the dashboard API Keys page. (The live demo auto-issues a short-lived key.)
2. Connect + authenticate
Endpoint (prod):
wss://gateway.nextain.io/v1/realtime?model=naia-0.9-omni-24g&instance=<userId>:<random>
- Always connect over
wss://(TLS) — your key and audio travel over it. instance= a stable session id. Reusing it across reconnects re-attaches to the same reservation. (Native clients may sendX-Naia-OS-Instance; browsers can't send headers, so use the query param.)
Right after the socket opens, send credentials as the first message (browsers can't send headers):
{ "setup": { "apiKey": "<your key>", "locale": "en", "instanceId": "<userId>:<random>" } }
3. Configure the session — persona / reference voice
After the server sends session.created, configure with session.update.
{
"type": "session.update",
"model": "naia-0.9-omni-24g",
"session": {
"modalities": ["text", "audio"],
"input_audio_format": "pcm16",
"output_audio_format": "pcm16",
"instructions": "<persona instructions>",
"turn_detection": { "type": "server_vad" },
"ref_audio_url": "<reference voice sample URL (optional)>"
}
}
instructions= persona only. Output formatting is enforced by the server.ref_audio_url= a URL of a reference voice to mimic (not a file upload, optional).
4. Messages
Client → Server
| Purpose | Message |
|---|---|
| Voice input | {"type":"input_audio_buffer.append","audio":"<base64 PCM16 24kHz>"} (server VAD auto-detects end of speech) |
| Text input | {"type":"conversation.item.create", ...} then {"type":"response.create"} |
| Interrupt | {"type":"response.cancel"} |
Server → Client
| Message | Meaning |
|---|---|
response.audio.delta | base64 PCM16 24kHz audio chunk |
response.audio_transcript.delta / response.text.delta | reply text (streaming) |
conversation.item.input_audio_transcription.completed | transcription of your spoken turn |
response.done | end of a turn |
emotion.updated | emotion tag (Naia extension) |
5. Admission / queue / sold-out — status contract (clients must read)
The outcome arrives as a status event (JSON) + a WebSocket close code. Judge by the JSON event that precedes the close, not the close code alone.
| Server → client event | close | Meaning / what the client does |
|---|---|---|
session.created | — | Admitted (active). Configure via session.update |
session.queued | 4503 | The slot is busy — wait in line (NOT an error). Carries position (0-based, render +1), queue_len (total waiters incl. you), reservation_token. eta_s is optional — the local queue does not send it, so the client estimates place × session-seconds (the cloud path adds eta_s + provider). Show "N waiting · your place · ~T s" + auto-reconnect with the same instance (backoff 5→60s) → session.created once the slot frees. (Demo = one local GPU, rotated) |
session.preparing | 4503 | (Currently inactive — cloud suspended.) Cloud Pod cold-start. The voice tier (naia-0.9-omni-24g) is local-only today, so this event does not occur. When cloud is enabled it carries the same fields as queued (position·queue_len·reservation_token, + eta_s·provider) and is handled the same (auto-reconnect) |
session.sold_out | 4503 | No slot available (e.g. local slot down). Carries retry_after_s + tier_a_hint (local-model hint). Retry, or fall back to the local model (Naia OS) |
session.consent_required | 4409 | Account already has a live session. Choose a branches option (replace/add) |
session.error | 4503 | Narrow case — assigned but the backend URL couldn't resolve (no endpoint). Generic internal errors arrive as type:"error" + close 4500, not session.error |
WebSocket close codes
| Code | Meaning |
|---|---|
4001 | Auth failed (missing/invalid api_key) |
4002 | superseded — same account connected from another device/tab (last-wins; this connection yields) |
4003 | Insufficient credits |
4409 | Consent required |
4500 | Server internal error (carries a type:"error" message) |
4503 | Queued / preparing / sold-out / no-endpoint (session.error) / transient unavailable — disambiguate via the preceding JSON event |
⚠️ Never render
4503as "closed" or an error. Onsession.queued/preparing, show a "waiting" UX and auto-reconnect with the sameinstance. (A bare 4503 with no preceding event = transient unavailable → reconnect, not an error.)
Reconnect rule: reconnecting with the same instance re-attaches to the held reservation
(reservation_token). Retry with backoff (5s → 60s max); once it's your turn the reconnect yields
session.created.
6. Check balance
GET /v1/profile/balance → {"success": true, "data": {"balance": <credits>}}
A full working implementation is on the 4.3 Live Demo page.