naia-0.9-omni-24g uses the same interface as an omni model, but it isn't, strictly, an omni model. It's a cascade module that naia builds by weaving several models together, aiming to be a realtime multimodal "brain." Today it starts by listening to your voice in real time and replying in voice; as versions advance, it expands toward seeing, remembering, and thinking more richly.
What it aims to be
The goal is for naia to grow beyond "listening and speaking" into a realtime cognitive module that sees, remembers, and understands context.
- Now: realtime voice conversation — listen → think → speak
- Ahead: we plan to add cognitive abilities such as image input (since current LLMs already handle images, this can come relatively soon), long-term memory (naia-memory) integration, and retrieval augmentation (naia-agent RAG). (What gets added, and in what order, is not fixed.)
- Direction: a realtime multimodal cognitive module that brings vision, memory, and other abilities together.
The name naia-0.9-omni-24g refers not to a single voice capability but to one realtime multimodal endpoint that will carry all of these.
Looks like an omni model — but it's actually a cascade
naia-0.9-omni-24g is served in the same place, in the same way, as an omni (unified multimodal) model. To a client it's exactly like calling a single omni model. But internally it's not one model — it's a cascade.
What is a cascade
A cascade builds the whole capability by chaining proven, role-specific parts in sequence instead of doing everything with one giant model. naia-0.9-omni-24g chains speech recognition (STT) → language model (LLM) → speech synthesis (TTS), with voice activity detection (VAD) and emotion handling in between.
voice in → speech recognition (STT) → language model (LLM) → speech synthesis (TTS) → voice out
Because each stage is independent, multiple models can be used together. Choosing the best model for each stage is itself a kind of orchestration — a cascade isn't just a chain, it's an assembly that weaves several models into a better result.
Single omni model vs cascade
| Single omni model | Cascade (naia's way) | |
|---|---|---|
| Composition | one unified model | assembled role-specific parts |
| Changing ability | fixed once trained — new ability needs retraining | swap/add parts anytime — improve without retraining |
| Speed | unified, so fast (low latency) | extra stages may add a little latency |
| Multimodal growth | retrain from scratch | plug parts into input/output |
| Part choice | locked as one | pick & swap proven parts |
| Using models | one fixed model | multiple models together — best per stage, orchestration in itself |
→ naia chose a cascade to add abilities quickly and safely. The smooth omni-like experience is delivered through one standard endpoint, while the inside grows as a flexible assembly.
Other characteristics
- Barge-in: interrupt mid-sentence and start over. This faithfully reproduces the natural barge-in feel of a live omni model.
- A single 24GB GPU (RTX 3090 / 4090 / A5000) tier. The
-24gsuffix means exactly this — it's designed from the start to run on a single GPU in a personal PC (the cloud just rents the same setup). - Exposed as one single endpoint — the client never needs to know if the backend is a local or cloud GPU. This single endpoint stays even as it expands to images, video, and memory.
Why this shape grows into a "realtime multimodal brain"
- Realtime, bidirectional: data flows both ways without breaks while the connection is open. Input is processed the moment it arrives; responses stream the moment they're generated. It's not one-question-one-answer — it exchanges anything in real time while the conversation is alive.
- Cascade (modular) structure: input (perception), thinking (LLM), and output (expression) are separated, so adding an image/video encoder on input or a new expression on output is just plugging a part in. That's how it grows to take in anything and answer with anything, in real time.
- Instead of retraining one monolithic model, it swaps in proven parts to add abilities quickly and safely.
Usage
naia-0.9-omni-24g is designed from the start to run standalone on a single 24GB GPU. Pricing and how to use it are covered on dedicated pages:
- Free 1-minute trial → 4.3 Live Demo
- Run on your own GPU (offline · $10/month subscription, individuals only) → 4.4 Offline version
- Use via the cloud (online · planned) → 4.5 Online version
- Full pricing & lineup → 4.1 Model pricing
Easiest way — Live Demo
A 1-minute web demo to experience naia-0.9-omni-24g's voice quality with the free credits issued at sign-up (a quality preview, not the full service). Mic/speaker status, persona changes, reference-voice (URL) changes, and text input are supported.
Using it in naia-os
In Settings > AI, select naia-0.9-omni-24g from the model list. No API key needed — it runs on your Naia account credits.
Developers — call the API directly (the gateway Realtime API) — see 4.5 Online version.