White Paper · April 2026

A Brain-Inspired, Privacy-First Digital Intelligence

Neuromorphic software architecture for autonomous personal AI. Spiking neural networks, predictive coding, and the free energy principle — applied to on-device intelligence that becomes yours.

"The brain does not compute. It predicts." — Karl Friston, The Free Energy Principle (2010)
Read the Paper ↓
Scroll to explore

A New Paradigm for Personal AI

The dominant paradigm — centralized cloud inference, perpetual data harvesting, and monolithic billion-parameter models — is architecturally antithetical to how biological intelligence works. The brain is a collection of specialized, tightly coupled neural circuits running in situ, processing through sparse event-driven spikes, maintaining privacy through architectural isolation, and achieving remarkable efficiency by predicting the world rather than reacting to it.

Wize is a new class of AI assistant built on this insight. Written in under 300KB of pure C11, Wize runs a multi-model cognitive pipeline entirely on-device — no cloud, no data leaving the machine.

SNN
Spiking Neural Networks
Event-driven processing
PC
Predictive Coding
Anticipatory context
FEP
Free Energy Principle
Self-reflective cognition

Built Different. Thinks Different.

Wize isn't another chatbot wrapper. It's a cognitive operating system that runs on your hardware, evolves with your usage, and keeps your data where it belongs — with you.

🧬

Neuromorphic by Design

4 specialist neural models coupled like brain regions — router (thalamus), tool agent (motor cortex), reasoner (prefrontal), vision (visual cortex). Not a monolith. A brain.

🔒

Structurally Private

Not "we promise not to read it." There is no network path. Inference runs in-process via llama.cpp. Your data physically cannot leave the device.

300KB of Pure C

No Python. No Docker. No Node.js. A single C11 binary under 300KB. Starts in <1ms. Less code=less attack surface=fewer bugs=faster.

🧠

Self-Evolving

Reflects on every interaction. Discovers compound tool patterns. Refines its own prompts from experience. Builds an evolving self-narrative. Gets better without retraining.

💰

Zero Inference Cost

$0.00 per query. Forever. No API keys, no rate limits, no metered billing. Your hardware, your models, your intelligence.

🌐

Works Offline

Airplane mode? No WiFi? Wize doesn't care. Full cognitive pipeline runs on local GGUF models. Cloud is optional — never required.

Why Cloud AI Cannot Be Your Clone

The Privacy Paradox

To personalize, the system must observe. To observe through a cloud, the system must exfiltrate. Every message, screenshot, and calendar entry traverses networks you don't control, stored in databases you can't audit.

A digital clone that knows everything about you but stores that knowledge on someone else's servers is not your clone — it is theirs.

The Latency Tax

Cloud round-trips add 200–2000ms per call. For a multi-step workflow — classify → select tools → execute → synthesize — latency compounds multiplicatively. The brain completes perception → prediction → action in ~100ms. To match biology, inference must be local.

The Monolith Problem

The brain doesn't use one network for everything. It routes visual input to V1, language to Broca's area, spatial reasoning to the parietal cortex. Each region is a specialist. Wize takes the same approach.

The Cognitive Gateway

Wize is a single binary in pure C11, statically linked against llama.cpp for in-process neural inference. It serves a Vue.js dashboard via HTTP/WebSocket while accepting connections from CLI, Telegram, Discord, Slack, and native macOS overlays.

<300KB
Binary Size
<500MB
RAM — Full Pipeline
35
Native Tools
$0
Inference Cost
Property Specification
Language C11 / POSIX — zero external dependencies beyond libc
Inference In-process via llama.cpp — zero HTTP overhead
Storage SQLite WAL — single wize.db with 8 tables
Channels CLI, Telegram (Bot + App), Discord, Slack
Authentication Zero-config auto-pairing via one-time stdout token

The Multi-Specialist Neural Pipeline

The "Wize Model" is a 4-stage cognitive pipeline coupling small specialist models into an emergent system smarter than any component.

🧭Router qwen3:0.6B Thalamic gating — keyword <1ms, model ~1s
↓ classifies intent
🔧Tool0.6B
🧠Reasonphi4 3.8B
👁️Visiongemma3 4B
✍️Synthqwen3 4B
↓ specialist response
💬Output Coherent, context-aware response

Synaptic Weight Persistence

The brain doesn't reload synaptic weights between thoughts. Neither does Wize. A static model cache holds 4 GGUF models in GPU memory simultaneously. Pipeline stage transitions are zero-cost context switches.

WIZE NEURAL ACTIVITY
LIVE
Neurons: 2,048
Synapses: 847K
Step: 0
Rate: 0.00 steps/sec
COGNITIVE
DA
ACh
NE
5HT

Why Spiking Neural Networks Are the Future of Personal AI

The visualization above isn't science fiction — it's a conceptual rendering of how Wize's cognitive modules already behave: discrete event-driven activations, sparse firing patterns, and hierarchical routing. The difference between Wize today and a true neuromorphic Wize is hardware, not architecture.

🔮
The SNN Thesis: Within 3 years, consumer neuromorphic chips (Apple Neural Engine v3, Intel Loihi 3) will run inference natively as temporal spike patterns — not matrix multiplications. Wize's event-driven architecture is designed to be the first personal AI to make this transition seamlessly.

Why SNNs Change Everything for Privacy

Traditional neural networks process data as dense, continuous tensors — rich, invertible representations that can be reverse-engineered to reconstruct inputs (model inversion attacks). Spiking neural networks process data as sparse, temporal spike trains — lossy, timing-dependent signals that are fundamentally harder to invert.

This means SNN-based personal AI doesn't just keep data local — it processes data in a form that is inherently resistant to reconstruction. Privacy is not a feature. It's a property of the physics.

Event-Driven = Battery-Friendly

A traditional transformer burns compute continuously during inference — every attention head processes every token. An SNN fires only when membrane potential crosses threshold. For a personal assistant that runs 24/7 in the background, this difference is the difference between 2 hours of battery life and 20.

🧬 Bio-Inspired Learning

STDP (Spike-Timing-Dependent Plasticity) enables on-device learning without backpropagation — the system adapts its routing weights in real-time from experience, just like biological synapses.

⚡ Temporal Coding

Information encoded in spike timing, not magnitude. Earlier spikes = stronger evidence. Wize's dual-speed router (keyword <1ms / model ~1s) already implements this principle in software.

Predictive Coding in Software

The brain predicts what it will perceive and only processes the prediction error. Wize implements this through a 10-layer anticipatory context injection system.

1identity_inject()— Who am I? Who is talking?
2inject_identity()— Evolving self-narrative
3inject_goals()— Active objectives
4world_inject()— Environment snapshot
5episode_inject()— Past episodes
6auto_context()— Skills + errors
7inject_insights()— Self-discovered insights
8inject_addendum()— Self-refined rules
9metacog_inject()— Confidence advisories
10context_inject()— Window management
🧬
Before the model sees the query, it has already been primed with a prediction of the relevant context. The model's job is reduced to processing the surprise.

80% Context Reduction

Wize's attentional filter selects only 2-8 relevant tools per query (from 35), achieving 4× faster inference by minimizing the prediction error the model must process.

Spiking Neural Architecture in Software

Wize's gateway is event-driven — no polling loop. The system activates only in response to discrete events (spikes), consuming near-zero CPU between them.

HTTP Request (spike)
  → parse + authenticate      (membrane integration)
  → threshold: valid request?  (spike threshold)
  → classify_intent()         (thalamic routing)
  → specialist execution      (cortical processing)
  → tool calls                (motor output)
  → reflect + learn           (feedback loop)
  → quiescence                (resting potential)

Sparse Activation

Cost is proportional to stimulus complexity, not system size.

Speed Mechanism Confidence Brain Analogy
< 1ms Keyword match Highest System 1 (fast)
~1s Model classify Variable System 2 (deliberate)

Self-Reflective Meta-Cognition

Friston's Free Energy Principle: living systems minimize variational free energy — surprise. Wize implements both FEP strategies:

Perception

Update internal model → reduce prediction error. The reflection engine evaluates every interaction.

Action

Modify behavior → reduce future surprise. The evolve engine rewrites its own prompts from experience.

Confidence Monitoring

15 uncertainty markers scanned per response. Rolling 20-sample window. When confidence drops below 70%, the system injects corrective strategy — the software equivalent of the brain's error-related negativity signal.

Active Inference

Privacy by Architecture

🔒
Wize's privacy is structural, not contractual. There is no network path for data to travel.
Layer Guarantee
Inference llama.cpp in-process. GGUF on local disk. No API calls.
Memory SQLite WAL in ~/.wize/. All local.
Tools 35 tools via POSIX syscalls + local TCP.
Auth No cloud account. No OAuth. No telemetry.

The Trust Inversion

Cloud AI: "Trust us with your data."
Wize: "Your data never enters our systems. There is nothing to trust."

The code is the proof. No HTTP client in the inference path. No telemetry. An architectural invariant.

The Digital Clone

A true digital clone requires persistent identity, episodic memory, and social awareness. Wize implements all three.

The Clone Lifecycle

1

Day 1 — Generic Assistant

Capable but impersonal. Full tools. Zero personalization.

7

Day 7 — Learning Patterns

50+ reflections, 3 episodes, a self-narrative. Knows your routines.

30

Day 30 — Contextually Aware

Discovered projects, learned skills, refined prompts. Greets you by name.

90

Day 90 — Your Digital Clone

Handles tasks autonomously. Monitors Telegram. Evolved behavioral rules matching your style.

Doing More with Less

Metric Wize (M4) Cloud (GPT-4) Advantage
Tool call 3.7s 8–12s 2–3× faster
Simple Q&A 0.8s 2–4s 3–5× faster
Cost $0.00 $0.01–0.10 ∞× cheaper
Privacy Structural Policy Incomparable

⚡ Keyword Fast-Path

60% of queries routed in <1ms. Biological reflex arc.

🎯 Context Compression

80% fewer tokens. 0.6B does what 7B+ normally requires.

🧠 Weight Sharing

4 models cached. Zero-cost pipeline transitions.

The First Prototype of a Digital Self

Wize is not an optimization of the cloud paradigm. It is a different paradigm — one designed not from data center architecture but from the most efficient information processing system known: the biological brain.

It becomes yours. Not because it stores your data on a server and calls it "personalization." Because it runs on your machine, learns from your interactions, builds its own identity, and never — structurally, architecturally, physically — shares what it knows.
📄
Insyms LLC. Every claim in this paper maps to a .c file in the repository. The code is the proof.
@whitepaper{wize2026,
  title  = {Wize: A Brain-Inspired, Privacy-First Digital Intelligence},
  author = {Wize Project},
  year   = {2026}, month = {April},
  note   = {White Paper v0.1}
}