Naia
· Luke· 19

Introducing Naia-0.9-Omni-24g: Voice Cloning, Real-Time Conversation, and Skills on an RTX 3090 (Naia v0.1.5)

naia-osnaia-omnivoice-aicascadeopen-sourcev0.1.5

Hello, I'm Luke, the maker of the AI OS Naia. Following my previous post where I hurriedly shared our launch news and asked for your help, this time — together with the v0.1.5 update — I'm bringing you the most important news of all: not just the AI OS, but a new AI omni model, specifically the Naia-Omni module (naia-0.9-omni-24g).

Naia-0.9-Omni-24g
Naia-0.9-Omni-24g

▎What you can experience firsthand in this v0.1.5 release of Naia-OS

  • Real-time voice conversation with a 3D avatar (voice cloning included) — the star of this release.
  • Embedded browser — the AI sees the screen with you and lends a hand.
  • Skill system — extend functionality with built-in skills, and connect MCP (Model Context Protocol) tools.
  • App panels — add the app panels you want right inside Naia.
  • Your model, your key, as-is — connect the AI provider and key you prefer, or run it on a local model.
  • Privacy — local execution is the default, and your inputs and outputs are never used to train models.
  • 14-language interface support

Naia isn't merely a Shell that calls itself an OS. The heart of this update is the naia-0.9-omni-24g module, which delivers 30 languages along with ChatGPT- and Gemini-class full-duplex real-time conversation on a consumer GPU (RTX 3090/4090/5090, with 24GB VRAM).

This is a video of me connecting to Naia-Omni installed directly on my own PC and talking with it using my real voice. (Please note that my microphone voice was not recorded in the video.)

▎Why a 'cascade' module

I once mentioned that I was porting MiniCPM-4.5-omni to vLLM — that was precisely to bring it into Naia. The biggest advantage of an omni model is that it enables extremely natural conversation in real time. However, that model required more than 28GB of VRAM and only supported Chinese and English. To improve it for Korean, I pursued several lines of research in parallel, including Talker fine-tuning (FT), but I ultimately hit a technical ceiling — and the other omni models were far heavier still.

After exploring alternatives, what I arrived at was naia-0.9-omni-24g, a 'cascade' approach rather than a single model. It's a module that orchestrates and optimizes multiple AI models appropriately; while it isn't quite perfect token-by-token processing, it boasts speed on par with that and a far more flexible configuration. It is effectively the only option that supports 30 languages, full duplex, and real-time voice cloning all on a consumer 24GB GPU. We provide guidance so you can use it at the API level, just like a single standalone model, and you can experience its far smarter and clearer voice-cloning ability for yourself.

Single omni modelcascade (the Naia approach)
CompositionOne unified modelAssembled from role-specific parts
Changing capabilitiesFixed once trained — new abilities require retrainingSwap or add parts anytime — improve without retraining
SpeedFast because unified (low latency)A little more latency may arise due to staging
Multimodal expansionTrain again from scratchPlug parts into the inputs and outputs
Part selectionBundled as a wholePick, use, and replace proven parts
Model usageLocked to one modelMultiple models together — the optimal model at each stage

The resulting naia-0.9-omni-24g runs smoothly on consumer PC setups with an RTX 3090/4090/5090 (24GB).

⚙️ How to use it and what makes it special

We've reorganized how this update is offered, taking real-world infrastructure conditions into account, as follows.

Local container provided (a Basic subscriber benefit) — We've opened it up so that anyone can download it as a container onto their personal PC and build their own applications directly.

podman pull ghcr.io/nextain/naia-0.9-omni-24g:latest

That said, since I too need to make a living while sustaining this open-source project, I've made it available as 1 Copy each for Basic subscription (monthly $10) users. I'd be grateful if you'd think of it as the minimum support needed for the Naia open-source ecosystem to keep growing steadily. → Naia model download and activation manual

Web demo trial (a 60s taste) — Right now I'm running it directly on a single personal PC of mine to offer a 60-second taster experience each time. Since resources are limited, a queue may form depending on the number of people connecting, so I kindly ask for your understanding. → Live demo

Naia live demo screen Cloud usage (in preparation) — The $0.33-per-hour model I originally planned is, unfortunately, switching to "in preparation" for the time being. With my current shortage of capital and equipment, it's hard to maintain a GPU pool on constant standby, and I can't bear the burden of covering the roughly 15 minutes of server cost it takes to boot up after allocation free of charge. I had already developed a remote RTX 3090-based system usable domestically, but I judged that smooth service would be difficult amid the GPU shortage. Once capital is secured down the road, I will absolutely deliver a comfortable, wait-free cloud service.
Naia live demo screen

High compatibility — We support the OpenAI Realtime API to maximize compatibility with existing environments.

EndpointPurpose
GET /healthReadiness status {"ready":true} (no authentication required)
GET /v1/modelsModel list
WS /v1/realtimeReal-time voice session (VAD, barge-in, emotion)
POST /v1/chat/completionsChat (streaming supported)
POST /v1/audio/speechSpeech synthesis (TTS)
POST /v1/audio/transcriptionsSpeech recognition (STT)
POST /v1/embeddingsEmbeddings

Detailed Naia model usage (developer manual)

Safety and privacy — Because the module is so capable, there's a risk it could be misused for things like voice phishing, so we've applied audio watermark technology to provide traceability, prepared so you can use it with peace of mind.

🧠 The future Naia envisions (Naia Cognitive)

The ultimate direction Naia pursues is to realize 'my AI inside my own computer' and to build a development and deployment ecosystem for applications you use together with AI. The multimodality I envision isn't simply data input and output — it aims at cognitive ability (Naia Cognitive) whereby the AI experiences, remembers, and expresses things on its own.

The next version has updates waiting in the wings, including the agent (Naia-agent), long-term memory (Naia-memory), and the framework (Naia-ADK). The next version, which will ship alongside Naia-0.9-Omni-48g, is being researched and developed with the goal of local profiles and environment setups capable of coding work, and an AI that remembers the user and works alongside them via real-time voice — literally aiming for something like 'Iron Man's Jarvis.'

48GB is still an area that's a struggle no matter how hard you cram it into 24GB, but since all you need is to plug in two older RTX 3090 cards, I think it's well within reach for individuals to dream about, too.

🎮 Naia OS is an AI that remembers me, and works and plays with me

Here's the picture Naia-OS is painting.

  • A Steam (Bazzite)/Windows-based gaming machine + an intelligent agent that remembers me + a 3D-avatar-based natural-language UI + VRAM-specific optimization profiles

Recently, MS and Nvidia made big headlines for building compact RTX-based AI devices. Honestly, there's no need to wait. After all, Naia-OS — which does gaming and AI alike — is aiming for exactly that spot. Personally, I'd choose a Naia-OS build over that new product — its price is higher even than the upper-tier line we recommend, its game compatibility is uncertain, and its unified memory is slow. By contrast, Naia-OS, which plugs in proven consumer GPUs, nails gaming, AI, and expandability all at once.

It's not that I'm deliberately putting off the Mac environment — I simply haven't been able to get to it yet due to a shortage of equipment and time. There are also people who've agreed to help out together.

Naia-OS isn't a single module but is made up of open-source repositories with separate roles, and we're piecing the framework together step by step. You might think, "That's an awful lot of stuff," but it's possible precisely because this is the 'AI-driven development era.'

If the OS of the past was a management tool for applications, I believe an AI OS will be software — and a robot — that communicates in natural language, remembers, and handles tasks on its own. This isn't mere bluffing; I'll prove it one result at a time.

🤝 Let's do this together — offline meetup & Discord

Sometime next week, I'm thinking of holding an offline meetup to introduce what I've been working on and to find people who'd like to exchange help with one another. Do download the module and give it a try, and if you're interested, please come find us on the Discord below. I hope it grows into a full-fledged open-source community.

Join the Naia Discord

Please give lots of encouragement and attention to the journey Nextain and Naia will build going forward.

#Nextain #Naia

Popular Posts

CC BY-NC-SA 4.0This post is licensed under CC BY-NC-SA 4.0.

Comments

You can comment without signing in

...