Naia Model Online (planned)

The online version runs Naia's own models in the cloud, so you can talk to Naia from a browser or app without your own GPU.

⏳ The full cloud service is planned (coming soon). For now use the 4.4 Offline version (your own GPU) and the 4.3 free Live Demo. The How it works / developer API below is the same gateway Realtime API the demo already uses.

Plan

Tier B — rented cloud GPU (planned): $0.33 / hour (planned price, billed per active minute); time-reservation under review. Not available yet.
A full online service opens once a GPU pool is secured. (Cloud RunPod/Vast is currently under temporary suspension for prep/experimentation — demo & offline run on local GPUs.)
Until then, the main paths are the offline (owned) $10/month subscription for all Naia models (individuals only; B2B by separate arrangement) + the free demo.

Models

naia-0.9-omni-24g (available now — offline / demo)
naia-0.9-coding-24g · naia-0.9-omni-48g (planned)

Full pricing & lineup: 4.1 Model pricing.

How it works — gateway Realtime API (developers)

The shared entry point for online (cloud), the web demo, and naia-os. naia-0.9-omni-24g is served over an OpenAI Realtime API–compatible WebSocket. The client connects to the gateway, which admits it onto a backend (a local GPU slot, or — when cloud is enabled — a cloud Pod).

1. Issue an API key

Create one on the dashboard API Keys page. (The live demo auto-issues a short-lived key.)

2. Connect + authenticate

Endpoint (prod):

wss://gateway.nextain.io/v1/realtime?model=naia-0.9-omni-24g&instance=<userId>:<random>

Always connect over wss:// (TLS) — your key and audio travel over it.
instance = a stable session id. Reusing it across reconnects re-attaches to the same reservation. (Native clients may send X-Naia-OS-Instance; browsers can't send headers, so use the query param.)

Right after the socket opens, send credentials as the first message (browsers can't send headers):

{ "setup": { "apiKey": "<your key>", "locale": "en", "instanceId": "<userId>:<random>" } }

3. Configure the session — persona / reference voice

After the server sends session.created, configure with session.update.

{
  "type": "session.update",
  "model": "naia-0.9-omni-24g",
  "session": {
    "modalities": ["text", "audio"],
    "input_audio_format": "pcm16",
    "output_audio_format": "pcm16",
    "instructions": "<persona instructions>",
    "turn_detection": { "type": "server_vad" },
    "ref_audio_url": "<reference voice sample URL (optional)>"
  }
}

instructions = persona only. Output formatting is enforced by the server.
ref_audio_url = a URL of a reference voice to mimic (not a file upload, optional).

4. Messages

Client → Server

Purpose	Message
Voice input	`{"type":"input_audio_buffer.append","audio":"<base64 PCM16 24kHz>"}` (server VAD auto-detects end of speech)
Text input	`{"type":"conversation.item.create", ...}` then `{"type":"response.create"}`
Interrupt	`{"type":"response.cancel"}`

Server → Client

Message	Meaning
`response.audio.delta`	base64 PCM16 24kHz audio chunk
`response.audio_transcript.delta` / `response.text.delta`	reply text (streaming)
`conversation.item.input_audio_transcription.completed`	transcription of your spoken turn
`response.done`	end of a turn
`emotion.updated`	emotion tag (Naia extension)

5. Admission / queue / sold-out — status contract (clients must read)

The outcome arrives as a status event (JSON) + a WebSocket close code. Judge by the JSON event that precedes the close, not the close code alone.

Server → client event	close	Meaning / what the client does
`session.created`	—	Admitted (active). Configure via `session.update`
`session.queued`	4503	The slot is busy — wait in line (NOT an error). Carries `position` (0-based, render +1), `queue_len` (total waiters incl. you), `reservation_token`. `eta_s` is optional — the local queue does not send it, so the client estimates `place × session-seconds` (the cloud path adds `eta_s` + `provider`). Show "N waiting · your place · ~T s" + auto-reconnect with the same `instance` (backoff 5→60s) → `session.created` once the slot frees. (Demo = one local GPU, rotated)
`session.preparing`	4503	(Currently inactive — cloud suspended.) Cloud Pod cold-start. The voice tier (`naia-0.9-omni-24g`) is local-only today, so this event does not occur. When cloud is enabled it carries the same fields as `queued` (`position`·`queue_len`·`reservation_token`, + `eta_s`·`provider`) and is handled the same (auto-reconnect)
`session.sold_out`	4503	No slot available (e.g. local slot down). Carries `retry_after_s` + `tier_a_hint` (local-model hint). Retry, or fall back to the local model (Naia OS)
`session.consent_required`	4409	Account already has a live session. Choose a `branches` option (replace/add)
`session.error`	4503	Narrow case — assigned but the backend URL couldn't resolve (no endpoint). Generic internal errors arrive as `type:"error"` + close 4500, not `session.error`

WebSocket close codes

Code	Meaning
`4001`	Auth failed (missing/invalid api_key)
`4002`	superseded — same account connected from another device/tab (last-wins; this connection yields)
`4003`	Insufficient credits
`4409`	Consent required
`4500`	Server internal error (carries a `type:"error"` message)
`4503`	Queued / preparing / sold-out / no-endpoint (`session.error`) / transient unavailable — disambiguate via the preceding JSON event

⚠️ Never render 4503 as "closed" or an error. On session.queued/preparing, show a "waiting" UX and auto-reconnect with the same instance. (A bare 4503 with no preceding event = transient unavailable → reconnect, not an error.)

Reconnect rule: reconnecting with the same instance re-attaches to the held reservation (reservation_token). Retry with backoff (5s → 60s max); once it's your turn the reconnect yields session.created.

6. Check balance

GET /v1/profile/balance  →  {"success": true, "data": {"balance": <credits>}}

A full working implementation is on the 4.3 Live Demo page.

4.6. Naia Model Online (planned)