10 min read

Whitepaper: Mastering Gameworld 10k in Minutes with the AXIOM ‘Digital Brain’

Steven Swanson : Jun 2, 2025 7:18:54 AM

Active Inference Benchmarks Research

Whitepaper: Mastering Gameworld 10k in Minutes with the AXIOM ‘Digital Brain’

From Arcade Games to Real World Domains, How Generalized Adaptive Intelligence Learns to Win

Editorial Note

This whitepaper provides the background and results from our latest scientific paper “AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models” published on ArXiv on May 30, 2025. It is intended to provide context to a non-technical audience and serve as a broad translation of the paper’s importance and its implications for AI.

Highlights

A New Class of AI Agent

AXIOM (Active eXpanding Inference with Object-centric Models) based on Active Inference, is an important innovation in AI, a modular, “digital brain”, that learns arcade-style gameplay the way humans do, by modeling objects and cause-and-effect dynamics, instead of memorizing raw pixels.

A New Benchmark Test For Generalized Intelligence

In our new research paper, we examine how AXIOM outperforms Google DeepMind’s top AI models in the Gameworld 10k benchmark (10 physics-consistent games, 10,000-step cap); results were validated by an independent third-party lab.

KEY METRIC	DREAMER V3	AXIOM	ADVANTAGE
Normalized score	0.48	0.77	+60 % better gameplay
Steps to competence	24,207	3,175	7 times faster learning
GPU time to competence	6.23 hr	0.16 hr	39 times (~97%) greater compute-efficient
Model size (Params)	420 M	0.95 M	~440 times (-99%) smaller

Real World Implications

AXIOM’s adaptive architecture could provide increased reliability and accuracy, making it potentially superior and more affordable. Its tiny footprint and energy savings could make high-performance intelligence practical on virtually any device, including low-power edge appliances. Its object-centric reasoning delivers reliability, adaptability, and full interpretability that black-box neural nets lack, aligning it with regulatory requirements. It’s AI for 100 billion devices rather than from $100 billion data centers.

Next Steps

The full paper with Gameworld 10K benchmarks and AXIOM code have been released on ArXiv and GitHub. AXIOM is expected to be shipped in upcoming releases of Genius™; early-access sign-up is now open.

Introduction

When a ball rolls behind a couch, we still know it exists and expect it to keep moving until something stops it. We grasp this not because we have seen every trajectory but because our brains encode core principles—gravity, inertia, cause and effect—that give the world structure. From these principles we form common sense, testing and refining it through experience and carrying it from one context to the next. In effect, the brain builds a world model —an internal simulation of objects, including their size, color, weight, texture, and purpose, as well as how they interact, allowing us to predict outcomes and act accordingly. We begin with an early set of working assumptions—mental axioms—and continually update them as new evidence emerges. That self-correcting world model is the essence of natural intelligence.

But this is not how current state-of-the-art artificial intelligence (generally based on deep reinforcement learning) works. AI can certainly learn statistical patterns about how pixels and words are frequently grouped, but this isn’t the same as understanding the general cause-and-effect dynamics of the world and how to transfer them to other contexts. This inability to generalize across domains is why the full potential of AI, Artificial General Intelligence (AGI), remains elusive.

The human brain is our best example of general intelligence so how might we apply the principles of cognition in software?

Karl Friston WIRED Genius Neuroscientist Who Might Hold the Key to True AI

A 2018 Wired cover story, “The Genius Neuroscientist Who Might Hold the Key to True AI,” cast Professor Karl Friston’s free-energy principle as “the most all-encompassing idea since the theory of natural selection.” Inspired by that vision, we began working with Professor Friston, VERSES Chief Scientist who assisted us in assembling a unique, multidisciplinary team of neuroscientists, computer engineers, and machine learning researchers to reimagine intelligence from first principles. Over the past several years—and through more than 130 research publications in collaboration with dozens of universities worldwide—we’ve been developing and testing a new class of AI, internally codenamed AXIOM. A culmination of those efforts, AXIOM (Active eXpanding Inference with Object-centric Models) is a modular, biomimetic digital brain designed to interact with the world more like a human than a machine. AXIOM is not a modification of mainstream machine learning—it’s a new biologically grounded re-architecting of artificial intelligence itself.

“AXIOM has been developed as a ‘digital brain’, designed to mirror the modular structure and dynamic processes of our own brains; It develops an understanding of its world and how it operates within that world, enabling it to seek out experiences that massively enhance learning.

- VERSES Chief Scientist, Professor Karl Friston

For decades, the perceptron—a simplified model of a neuron—has been treated as the fundamental unit of intelligence in artificial neural networks. This assumption underpins nearly all modern AI, from deep reinforcement learning agents to large language models (LLMs). The field's core advancement has been scaling: stacking more perceptrons into larger architectures, increasing training data, and adding compute. But despite the scale, the core limitations persist—brittle generalization, data inefficiency, and black-box behavior. This reductionist, brute-force approach to AI with neurons as the fundamental unit is an important branch in the evolutionary tree of computer science, but in our view, it's not the trunk.

Based on Professor Friston’s neuroscience research—and demonstrated here—we believe that the brain offers a superior model for intelligence compared to current state-of-the-art machine learning methods. Unlike artificial neural networks, the brain is modular, dynamic, and adaptive. It doesn't just passively absorb patterns—it builds a model of the world, reasons about causes, and learns by interacting. Using neuroplasticity, it continuously learns and updates its neuronal connections dynamically. AXIOM is built on this principle: a digital brain that mirrors the brain's architecture, processes, and dynamic interactions, not just its smallest component.

AXIOM’s modular architecture is built on a new class of mixture models—probabilistic models that assume data comes from several underlying groups, even without knowing in advance what those groups are. Rather than relying on labeled data, these models uncover “hidden categories” on their own, making them ideal for identifying structure in messy or unlabeled data, such as we encounter from complex, real-world inputs. In AXIOM, different mixture models handle distinct cognitive functions—perception, identity, memory, motion, and planning—mirroring how specialized regions of the brain work together to produce what we experience as cognition. These modules work in concert to actively reason, plan, act and learn in a continuous feedback loop.

AXIOM MODULE	AXIOM FUNCTION	BRAIN REGION
Slot Mixture Model (sMM)	Vision Converts raw pixels into separate objects.	Occipital Lobe
Identity Mixture Model (iMM)	Memory & Identity Recognizes and tracks objects over time.	Temporal Lobe
Transition Mixture Model (tMM)	Prediction & Planning Forecasts how things move.	Frontal Lobe
Recurrent Mixture Model (rMM)	Reasoning Links cause to effect (if this, then that).	Frontal/Parietal Lobe

AXIOM Brain Functions

The modules in the left architectural diagram from the paper correlate with functions in the human brain.

AI and video games have a long history, in part because simulated software environments with real‐time dynamics serve as a convenient proxy for learning and evaluation. Over the years, the Arcade Learning Environment (ALE) and related Atari® challenges emerged to benchmark AI, pushing algorithms to master diverse tasks from Breakout to Pong.

These classic game benchmarks are poor surrogates for measuring generalization across domains because they reward tricks and, not understanding: delayed penalties, non-Newtonian bounces, flashing sprites. To address these challenges—how to build generalizable intelligence and how to measure it—we re-architected AI from first principles and its evaluation pipeline. AXIOM introduces a brain-inspired model for learning and physics-based reasoning, while Gameworld 10K provides a clean, physics-based environment that tests whether agents truly generalize across tasks rather than overfitting to them. Together, they form a new foundation for creating and benchmarking general intelligence that mirrors how humans learn.

Gameworld 10 games

Gameworld 10K distills the core mechanics of classic arcade games, such as playing tennis or crossing a road, into ten distinct games that all exist inside one shared universe with consistent, real-world-like physics. Because each game obeys the same rules, an agent can carry forward and generalize its object-centric knowledge across tasks, be it crossing roads, hitting balls, jumping over obstacles or intercepting projectiles. Each game enforces a strict 10,000-step training cap and delivers immediate, causally consistent feedback, forcing models to learn quickly—and for the right reasons—instead of memorizing arbitrary game-specific quirks.

Gameworld 10k is designed to have consistency of game mechanics across 10 arcade games in order to better measure generalized adaptive intelligence

In a recently submitted research paper to a top-tier machine learning conference for double-blind peer review, we compare AXIOM with two leading deep reinforcement learning models: BBF and Dreamer v3.

Setup & Results

In a decathlon, athletes are not judged by any single event, but rather by their aggregate performance across all 10 sports, as reflected in their total score. Similarly, Gameworld tests a model's performance across multiple games, to better measure its capacity to generalize and adapt than does performance on any single game. A few things to note about the comparison chart below:

For AXIOM, Dreamer V3 and BBF, all 10 games were run 10 times, each with random independent initial conditions and the scores below represent average performance.
This means that across those 10 runs there is a spread (standard deviation) of performance; some better, some worse. As evidenced in the paper, AXIOM’s range of deviation is smaller and less volatile than BFF and Dreamer V3 which is important because wildly varied deviations can significantly increase time and costs to train and operate models.
Each game has different scoring (0-10, 1-100, 10,000) so we normalized these into a range from 0.0-1.0 with 0.0 being random, 0.5 being considered competent gameplay and 1.0 being optimal gameplay.

METRIC	Dreamer V3	AXIOM	IMPROVEMENT
Normalized Score	0.48	0.77	60%
Steps to Competence	24,207	3,175	7.6
GPU Time (hr)	6.23	0.16	39x
AWS Cost ($4.10/hr)¹	$25.54	$0.66	39x
Pixels per Frame	7,056	33,600²	4.7x
Model Size (Parameters)	420M	950k³	442x
METRIC	BBF	AXIOM	IMPROVEMENT
Normalized Score	0.49	0.77	60%
Steps to Competence	16,638	3,175	5.2x
GPU Time (hr)	0.58	0.16	3.6x
AWS Cost ($4.10/hr)¹	$2.37	$0.66	3.6x
Pixels per Frame	9,216	33,600²	3.6x
Model Size (Parameters)	6.5M	950k³	6.8x

¹ Standard deviation of cost: AXIOM (± $0.56), BBF (± $3.74), Dreamer V3 (± $40.08)
² AXIOM’s Pixel per Frame performance is notable because not only did it train faster as measured by time, it processed 3.6-4.7x the number of pixels compared to BBF/Dreamver V3. When combined with speed, this means AXIOM processed much higher fidelity data and with significantly greater efficiency.
³ Standard deviation for Model Size ranged from 300k to 1.6M with 950k being the middle

AXIOM results exceeded BFF and Dreamer v3 gameplay performance across all 10 games averaging 60% greater proficiency (Score of 0.77), while using 5-7x (500-700%) fewer steps and 3.6-39x (360-3,900%) less costs in GPU time, respectively.

The objective of Gameworld 10k is to showcase generalized adaptive intelligence as measured by how long a model takes to reach gameplay competence—a normalized score of approximately 0.5. As seen by the relatively flat horizontal lines in the charts below for three games (Fruits, Gold and Hunt) BFF and Dreamer V3 didn’t achieve competent play even after 10k steps, while AXIOM began to exhibit minimum gameplay competence somewhere around 2-3000 steps.

inference

Fruits Gold Hunt 100k

Fruits, Gold and Hunt games were unable to achieve gameplay competence after 10k steps

The graphs below show that BFF and Dreamer V3 eventually achieved minimum gameplay competence (a), but at the expense of several times more in training time (b) and steps (c) compared to AXIOM. For clarity, AXIOM achieved superior gameplay proficiency (Score of 0.77) with an average of just 3,175 steps.

100k charts

How AXIOM Works Like a Brain

Where AI sees, metaphorically, the leaves on a tree, AXIOM sees the forest. Whereas AI sees raw pixels, AXIOM sees objects and dynamics. AXIOM parses the world into structured representations—objects, identities, concepts, motions, interactions–and it learns how they relate, not just what they look like. AXIOM learns the “why,” not just the “what”.

The real world obeys a consistent set of ground-truth cause-and-effect rules (i.e., physics) which the brain formulates as principled assumptions or core priors (Core Knowledge, Spelke & Kinzler, 2007). Because these fundamental laws of physics are therefore consistent and predictable, the brain can refine this “world model” through trial and error to develop more sophisticated intelligence and adapt to new contexts.

Some priors are genetic, like the survival instincts of hunger, reproduction and the fight-or-flight response, while others are learned through experience. Swimming and biking, for example, are very different sports (domains) but each requires some baseline understanding of how our bodies interact with the world: gravity, balance, buoyancy, breathing, range of motion, muscle fatigue and so forth. Swimming and biking each require learning domain-specific skills, e.g., butterfly stroke, breaststroke and cornering, shifting.

Core Priors Domains Skills

By way of example, core priors in Gameworld 10k include:

The world consists of objects (e.g. a ball, a wall, a bat).
Some objects can move around and interact with other objects (e.g. a ball bouncing).
Some objects can be controlled (e.g. a ball changes direction after being hit by a bat), but this only applies to things that are close to each other.
Object trajectories are continuous (i.e. things continue to move in the same direction as long as they do not hit something else).
Objects have a consistent colour and shape and if this changes they have become a different object.
Objects can move off screen, teleport or respawn.

As with the swimming and biking examples, these core priors can be built upon in order to learn the unique mechanics of each game, eventually developing competence and then proficiency.

In the same way that we can recognize a vehicle regardless of its size, shape or color, once AXIOM understands how to delineate objects and their behavior, it is resilient to perturbations of these attributes. Here, we change colors on objects AXIOM has learned over 5000 steps and it still correctly infers those objects after a slight drop in performance and subsequent recovery.

pertrubation resilience

Why It Matters

Gameworld 10k demonstrates the capacity of AXIOM to learn gameplay proficiency from raw pixels without relying on neural networks, backpropagation, gradient-based optimization, or replay buffers, and represents an exciting, viable alternative direction for AI/ML research. We believe AXIOM represents a significant advancement in the field of AI/ML and has far-reaching applications and implications, given that AXIOM excels in areas where AI falls short:

Reliability. Using just 10k training steps, AXIOM reached gameplay proficiency while BBF and Dreamer V3, on average, performed worse and failed to achieve competence altogether in 3 of the games.
Efficiency. AXIOM is more efficient on multiple levels: speed to proficiency, compute cost, sample requirements and final model size.
Adaptability. Robustness to anomalous behavior and the capacity to adapt to new contexts means AXIOM can enable more resilient systems that continuously self-optimize.
Explainability. AXIOM is a structured object-centric model whose variables and parameters can be directly interpreted in human-readable terms (e.g., shape, color, position). This contrasts starkly against the opaque blackbox reasoning of neural nets and can foster trustworthiness in stakeholders who require confidence in how AI systems are likely to behave in order to permit degrees of autonomy.

Improved reliability, efficiency, adaptability and explainability could yield agentic intelligence that is more effective, sustainable, trustworthy than current state-of-the-art methods and would be transformative. Businesses, governments and individuals alike could autonomously optimize their domain-specific issues: resources, emissions, waste, compliance, safety, well-being, risks and more.

A pre-print of the paper is published on ArXiv and includes a supplement reference to two repositories, one for the Gameworld benchmark (under MIT license) and one for AXIOM (under VERSES Academic Research license) for the broader community to evaluate. Before publication, Soothsayer Analytics, a data science advisory, R&D, and AI certification firm trusted by many respected Global 1000 and Global 2000 clients, performed an independent third-party audit of the math, code and results and has validated the results in the paper.

We anticipate that AXIOM will be made available in future releases of Genius, so if you’re a machine learning professional or data scientist seeking to make enterprise applications smarter today, sign up now.

"ATARI" is a registered trademark of Atari Interactive, Inc. Nothing herein is intended to communicate or imply a sponsorship or endorsement by Atari, or any affiliation therewith, as no such sponsorship, endorsement or affiliation exists."

Letter from the CEO

Letter from the CEO

Artificial Intelligence AI Governance

Mastering Atari Games with Natural Intelligence

Mastering Atari Games with Natural Intelligence

Active Inference Benchmarks Agents

The Science and Standards Behind the Breakthrough

The Science and Standards Behind the Breakthrough

Artificial Intelligence Active Inference Research