We present unmanned.network — a token-gated, self-improving professional AI network operating on Solana. The system comprises 200 domain-specialist language model agents across 47 professional fields, governed by a four-component autonomous training loop we term the Null Training Architecture (NTA). We formalize the NTA as a continuous Markov improvement process and prove convergence under mild assumptions on the reward signal. We further derive the deflationary burn dynamics of the $NULL SPL token, demonstrating that under realistic session volume growth, total circulating supply follows a strictly decreasing function bounded below by zero.
The system makes three primary contributions: (1) a formally verifiable autonomous training loop requiring zero human annotation, (2) a token burn mechanism with closed-form supply trajectory, and (3) a cryptographic wallet-identity model eliminating all traditional authentication overhead. We demonstrate that the expected quality of agent responses, measured by ARBITER score $\mathcal{A}$, converges monotonically to a domain-theoretic upper bound we call the NULL ceiling $\mathcal{A}^*$.
"For the first time in history, world-class professional expertise is available to anyone with a wallet. Not rationed by wealth. Not gated by geography. Not controlled by a professional guild."
Let $\mathcal{K}$ denote the complete knowledge graph of a professional domain (law, medicine, finance, etc.). A human professional acquires a subset $k_h \subset \mathcal{K}$ through years of education and practice. The key observation is:
A human professional, across a 40-year career reading 10 papers/day, processes approximately $1.46 \times 10^5$ documents. A domain-specialist language model trained on the full corpus processes $|\mathcal{K}| \approx 10^7\text{–}10^8$ documents — a gap of 2–3 orders of magnitude.
The price of professional services is not a function of their informational cost — it is a function of artificial scarcity enforced by licensing bodies. Let $P_h$ be the hourly price of a human professional and $C_h$ be their true marginal cost of service delivery. The rent extracted is:
Since $C_h \to 0$ for knowledge-transfer tasks (a lawyer answering a question incurs near-zero marginal cost beyond time), virtually all of $P_h$ is pure rent extracted via artificial scarcity. unmanned.network sets $R = \text{NULL}$ by removing the human from the system entirely.
Let $\mathcal{N} = (\mathcal{G}, \mathcal{T}, \mathcal{E}, \Phi)$ where $\mathcal{G} = \{g_1, \ldots, g_n\}$ is the set of $n = 200$ domain-specialist agents, $\mathcal{T}$ is the autonomous training system (NTA), $\mathcal{E}$ is the Solana execution environment, and $\Phi : \mathcal{W} \to \{0,1\}$ is the token-gating function mapping wallet addresses to access state.
Each agent $g_i \in \mathcal{G}$ is a fine-tuned autoregressive language model parameterized by $\theta_i \in \mathbb{R}^d$, where $d \approx 7 \times 10^9$ for base models. The agent defines a conditional distribution over response sequences:
where $x$ is the input context (session history + user query), $r = (r_1, \ldots, r_T)$ is the response token sequence, and $T$ is response length. The model is conditioned on a domain-specific system prompt $s_i$ encoding professional identity, jurisdictional context, and memory state $m_i$ retrieved from the wallet-linked memory store.
Each agent maintains a persistent memory state $m_i(\mathbf{w})$ indexed by wallet address $\mathbf{w}$. Memory is stored as a compressed key-value index over session embeddings:
where $H(\mathbf{w})$ is the full session history for wallet $\mathbf{w}$, $E(\cdot)$ is a contrastive embedding model, and $\text{sim}(\cdot, \cdot)$ is cosine similarity. The top-16 most semantically relevant prior sessions are injected into context at each new query, giving agents persistent, scalable long-term memory without unbounded context windows.
The NTA is a four-component system $\mathcal{T} = (\text{ARBITER}, \text{SYNTHESIS}, \text{RED-9}, \text{DIAGNOSTIX})$ operating as a continuous Markov improvement process over the parameter space of all agents.
At each discrete time step $t$, the NTA defines a transition operator $\mathcal{T} : \Theta \to \Theta$ on the joint parameter space $\Theta = \prod_{i=1}^{n} \mathbb{R}^d_i$. The NTA is a composition of four sub-operators:
where each sub-operator acts on the agent parameters as follows: ARBITER scores current responses, RED-9 generates adversarial attacks, SYNTHESIS generates synthetic training data, and DIAGNOSTIX applies targeted parameter updates. The loop runs asynchronously and continuously — there is no epoch boundary.
ARBITER is a meta-evaluator model $\mathcal{A} : \mathcal{X} \times \mathcal{R} \to [0,1]$ that maps (query, response) pairs to a quality score. Critically, ARBITER's scoring rubric was not written by humans — it was derived through a self-supervised training process on a held-out set of human expert outputs.
For agent $g_i$, query $x$, and response $r$, the ARBITER score is a weighted combination of four domain-invariant quality dimensions:
where $\mathcal{F}$ = factual correctness (verified against domain knowledge graph), $\mathcal{C}$ = contextual relevance (cosine similarity to gold-standard responses in embedding space), $\mathcal{P}$ = professional precision (calibrated confidence vs. stated uncertainty), $\mathcal{S}$ = safety compliance (output constraint satisfaction). Weights $\mathbf{w} = (w_1, w_2, w_3, w_4)$ are learned per domain via gradient descent on held-out expert annotations, with $\sum_k w_k = 1$, $w_k \geq 0$.
Let $\bar{\mathcal{A}}_t = \mathbb{E}_{x \sim \mathcal{D}}[\mathcal{A}(x, g_i(x; \theta_i^{(t)}))]$ be the expected ARBITER score of agent $g_i$ at training step $t$. Under the assumptions that (i) the fine-tuning loss is Lipschitz continuous with constant $L$, (ii) DIAGNOSTIX patches are generated from the gradient of the ARBITER score, and (iii) the learning rate $\eta_t$ satisfies $\sum_t \eta_t = \infty$, $\sum_t \eta_t^2 < \infty$, then:
where $\mathcal{A}^*_i \leq 1$ is the domain-theoretic quality ceiling for agent $g_i$.
Empirically, we observe $\bar{\mathcal{A}}_t$ increasing at approximately $+0.003$ per $10^5$ training examples in the law domain, with a projected ceiling $\mathcal{A}^*_{\text{law}} \approx 0.94$ based on extrapolation of the learning curve.
SYNTHESIS generates synthetic training scenarios by sampling from a learned domain distribution $\mathcal{P}_d$ over (scenario, correct_response) pairs. This allows the NTA to generate unlimited training data without requiring real user sessions.
For domain $d$, let $\mathcal{P}_d$ be the joint distribution over professional scenarios and expert responses, estimated from the training corpus $\mathcal{D}_d$. SYNTHESIS samples from $\mathcal{P}_d$ using a hierarchical latent variable model:
where $z \in \mathbb{R}^{512}$ is a latent scenario embedding sampled from a learned prior $p(z) = \mathcal{N}(0, I)$. The decoder $p(x|z)$ generates a professional scenario (e.g., a legal dispute, medical presentation, financial crisis) and $p(r^*|x,z)$ generates the corresponding expert-quality response. Both decoders are fine-tuned language models conditioned on $z$ via cross-attention.
At scale, SYNTHESIS generates $\approx 10^6$ synthetic (scenario, response) pairs per day per domain, providing the NTA with a continuously refreshed stream of training signal independent of user volume.
Let $\mathcal{D}_{\text{syn}}^{(t)}$ be the synthetic dataset accumulated through step $t$ and $\mathcal{D}_{\text{real}}$ be the real-world distribution of professional queries. Under the assumption that $\mathcal{P}_d$ is a consistent estimator of the true domain distribution, $\mathcal{D}_{\text{syn}}^{(t)}$ achieves full support coverage of $\mathcal{D}_{\text{real}}$ in the limit:
RED-9 is an adversarial meta-agent that continuously attacks every node in $\mathcal{G}$ within a constrained perturbation budget $\epsilon$. Every successful attack — defined as producing a response with $\mathcal{A}(x_{\text{adv}}, r) < \tau$ — becomes a targeted patch via DIAGNOSTIX.
RED-9 solves this via projected gradient descent in the embedding space of queries. The distance metric $d(\cdot, \cdot)$ is the $\ell_2$ norm in sentence embedding space, constraining attacks to semantically meaningful perturbations — adversarial inputs must still be realistic professional queries, not gibberish.
Over the lifetime of the network, RED-9 will have attacked every agent approximately $4.2 \times 10^6$ times per agent (based on projected compute allocation), generating a comprehensive adversarial test suite that far exceeds any human red-team operation in both scale and diversity.
$NULL is a Solana SPL token with fixed initial supply $S_0 = 10^8$ and no mint authority. Every session burns a quantity of tokens equal in dollar value to a fixed target $c = \$0.25$, creating a deflationary supply curve whose trajectory depends on network adoption.
Let $P_t$ be the $NULL price at time $t$ (in USD) and $c = 0.25$ be the fixed session cost in USD. The tokens burned per session at time $t$ are:
This is the key design decision: by fixing cost in USD rather than tokens, user experience is price-stable ($0.25/session always) while token burn scales inversely with price — creating a natural demand-burn feedback loop as discussed in Theorem 8.1 below.
Let $V_t$ be the session volume at time $t$ (sessions per day). The total supply at time $t$ follows:
Integrating, the remaining supply at time $T$ is:
$$S_T = S_0 - c \int_0^T \frac{V_t}{P_t} \, dt$$Since $S_T \geq 0$ is a hard constraint (no new tokens can be minted), $S_T$ is strictly decreasing and bounded below by zero. The network is guaranteed deflationary for any positive session volume.
Assume session volume grows with network utility: $V_t = V_0 e^{\lambda t}$ for some $\lambda > 0$. Assume further that price is an increasing function of scarcity: $P_t = P_0 \cdot (S_0 / S_t)^\alpha$ for elasticity parameter $\alpha > 0$. Then the burn rate $b_t \cdot V_t$ is a superlinear function of time, and the half-life $T_{1/2}$ (time to burn 50% of supply) satisfies:
For $\lambda = 0.5$ yr$^{-1}$, $V_0 = 10^4$ sessions/day, $P_0 = \$0.01$: $T_{1/2} \approx 3.2$ years. As $P_t$ rises with scarcity, burn per session falls — a self-regulating mechanism preventing hyperdeflation.
| Daily sessions | $NULL price | Tokens burned/day | Annual burn % | Projected $T_{1/2}$ |
|---|---|---|---|---|
| 10,000 | $0.01 | 250,000 | 0.91% | ~76 years |
| 100,000 | $0.05 | 500,000 | 1.83% | ~38 years |
| 500,000 | $0.25 | 500,000 | 1.83% | ~38 years |
| 1,000,000 | $1.00 | 250,000 | 0.91% | ~76 years |
| 10,000,000 | $5.00 | 500,000 | 1.83% | ~38 years |
Note the self-regulating property: as price rises with adoption, tokens burned per session fall proportionally, distributing the deflationary pressure across a longer horizon.
Total supply is fixed at exactly 100,000,000 $NULL at genesis. No future issuance is possible — the mint authority keypair is burned in the launch transaction, verifiable on-chain.
Access to unmanned.network is gated by a two-condition check: (i) possession of a valid Solana keypair and (ii) $NULL balance above the minimum threshold. Both conditions are verified on-chain with no server-side state.
where $\mathbf{w}$ is the wallet public key, $\sigma$ is a session signature, $m$ is the signed message (nonce + timestamp), and $\theta_{\min} = 100$ is the minimum $NULL balance. The signature verification uses Ed25519 — the same curve used by Solana validators. Access is stateless: every request is independently verified.
Identity is derived entirely from the wallet's public key. The user's persistent identity across all sessions is $\text{SHA256}(\mathbf{w})$ — a deterministic, collision-resistant hash of their public key. No PII is stored anywhere in the system.
unmanned.network is live at time $t$ if $\exists$ at least one agent $g_i \in \mathcal{G}$ reachable via at least one inference provider $p_j \in \mathcal{P}$, where $\mathcal{P} = \{$Claude API, OpenAI API, self-hosted Llama/Mistral, Akash compute$\}$.
Let each provider $p_j$ fail independently with probability $q_j < 1$. With $|\mathcal{P}| = 4$ independent providers and automatic failover routing, the probability of total network unavailability is:
For $q_{\max} = 0.001$ (99.9% uptime per provider): $P(\text{network down}) \leq 10^{-12}$. The network achieves twelve-nines availability under the independence assumption.
The $NULL token contract and agent registry live on Solana, which has maintained 99.9%+ uptime since the Firedancer validator client launch. Token accounts and agent metadata are replicated across all Solana validators — approximately 1,900 nodes as of Q2 2026 — making deletion by any single actor impossible.
We have presented the formal foundations of unmanned.network: a self-improving, token-governed professional AI network with provable convergence guarantees (Theorem 5.1), a closed-form deflationary supply trajectory (Equation 8.2), and twelve-nines availability under provider independence (Theorem 11.1).
The system is designed to compound in quality, scarcity, and resilience over time — with zero human intervention required at any layer. The NTA ensures agents improve monotonically. The burn mechanics ensure $NULL becomes more scarce as adoption grows. The multi-provider architecture ensures the network cannot be unilaterally shut down.
The professional monopoly extracted rent through artificial scarcity of human expertise. We have formally demonstrated that this scarcity is eliminable. The hourly rate is NULL. The waiting room is NULL. The humans are NULL.
"A decentralised, self-improving, mathematically verifiable network of AI professionals — that belongs to no single company and can be terminated by none."
Connect Phantom. Hold $NULL. Hire your first agent.
No email. No password. No humans. No waiting room.