Genius™ has four modules:
SENSE perceives the world more like we do, integrating sensory information that gives computers the ability to truly “see,” process, and understand.
THINK serves as its digital brain, with modules for memory, prediction, and reasoning that work together to continuously refine an internal model of the world.
ACT allows robots and agents to learn new tasks quickly in physical and digital worlds, without the extensive pre-training that conventional systems require.
SHARE enables our agents, from traffic signals to drones to lunar rovers, to collaborate securely and share not just knowledge but also skills—what one learns can be distributed instantly.
Together, SENSE, THINK, ACT, and SHARE form the sight, brain, body, and ecosystem of an AI that perceives, learns, adapts, and improves with experience.
SENSE makes live vision feasible on edge devices—from drones and autonomous vehicles to robotics—where real-time perception is critical. It treats perception more like our own five senses, integrating vision, motion, touching, and acceleration into a picture of what’s happening. The power isn’t just in collecting signals but in making sense of them.
This ability to “see” comes from inferring what’s behind sensor readings—what in the world is generating the signals. Any modality can provide evidence: A camera gives you color and shape, a microphone gives you timing and echoes, a depth sensor gives you structure. The core task is to integrate incoming signals and update the understanding about what objects are present and how they are arranged. Variational Bayes Gaussian Splatting (VBGS) is one concrete implementation of this idea.
Instead of training a model through endless passes over data, VBGS updates its internal beliefs sequentially with each new observation, sharpening its 3D representation on the fly. Every signal—color, shape, position—sharpens the existing model and enables SENSE to maintain a map of the world—in other words, a world model that gets more accurate with every glimpse. But SENSE also needs a way to interpret, prioritize, and act on what it perceives, which is where hierarchical active inference comes in.
Hierarchical active inference unifies two functions of the human brain that are usually treated separately: control, which pushes goals downward into concrete actions, and planning, which adjusts how strongly each goal should be pursued as conditions shift. These systems work together constantly, balancing competing demands rather than following rigid rules.
Because SENSE is coupled to the agent’s control and decision-making, its signals immediately shape behavior. If it detects an obstacle, movement adjusts automatically; if something unexpected appears, higher-level processes reassess goals. At every level, sensing and acting form a continuous feedback loop.
Picture an autonomous car. Traditional sensors only fuse what they can currently “see,” so if a pedestrian is hidden behind a van, the system has no data to work with. SENSE behaves differently: it maintains hypotheses about what might be present, predicting possible trajectories while integrating every new signal to update its beliefs. It fuses not just sensor readings but expectations, uncertainty, and goals.
Instead of treating sensors as isolated modules, hierarchical active inference provides a shared goal engine. Goals guide what each sensor should attend to. Sensory signals update the system’s expectations. The result is a kind of sensor fusion: a unified, context-aware model of the world that adapts in real time and continuously shapes action.
VERSES has developed a solution called Variational Bayes Gaussian Splatting (VBGS), which builds a 3D world from billions of tiny probabilistic Gaussians—each storing color, shape, and position. As new evidence arrives, these blobs shift and refine, letting the system update its map in real time without overwriting what it knows.
We recently unveiled what we believe is the world's first digital brain, AXIOM. Designed to mirror the modular structure of the human brain, AXIOM develops an understanding of its world and how to operate within it.
Learning is central to its design. Over time, AXIOM’s models grow more efficient, pruning what they no longer need while continuously adapting. The result is a system that becomes smarter, more efficient, and more reliable with experience.
More reliable because our mods learn over time. More efficient because they can learn from fewer interactions, the way humans only have to see something new a few times to recognize it.
Or how a toddler repeatedly dropping a spoon on the ground might be refining their understanding of gravity.
This capacity to simplify mirrors natural intelligence. Recent benchmarking results showed AXIOM to be up to 60% more reliable, 97% more efficient, and 39 times faster at learning than Google DeepMind’s DreamerV3.
| AXIOM MODULE | AXIOM FUNCTION | BRAIN REGION |
| Slot Mixture Model (sMM) | Vision Converts raw pixels into separate objects. |
Occipital Lobe |
| Identity Mixture Model (iMM) | Memory & Identity Recognizes and tracks objects over time. |
Temporal Lobe |
| Transition Mixture Model (tMM) | Prediction & Planning Forecasts how things move. |
Frontal Lobe |
| Recurrent Mixture Model (rMM) | Reasoning Links cause to effect (if this, then that). |
Frontal/Parietal Lobe |
AXIOM mirrors the brain with mixture models that uncover hidden structure, enabling perception, identity, memory, motion, and planning to work together in a continuous loop of reasoning, action, and learning.
In August 2025, we published results of our robotics model, which outperformed other models on Meta’s Habitat benchmark simulation without any pre-training.
Across three tasks—tidying a room, preparing groceries, and setting a table—the VERSES model achieved a 67% success rate, surpassing the previous best alternative of 55%. Unlike a deep-learning robotics model that required imitation-based pre-training with more than 1.3 billion steps to acquire these skills, the VERSES model adapted and learned in real time.
With ACT, it thinks on its feet. Just as a human might enter a room, assess its architecture, and map where items in a kitchen might be located based on models of how the world is typically arranged, the VERSES model tidied a room and set the dining table without pre-training.
This breakthrough is “exciting…offering an alternative approach,” said Sean Wallingford, former CEO and President of Swisslog, a leading logistics automation company. “If we can deploy robots without training, they will be viable in a wide range of activities, from factories and warehouses to domestic and commercial applications.”
The VERSES robot recognizes that objects exist, that one object can be inside another (food in a fridge), and that bumping into things (a couch, a ball on the floor) is bad. It updates its 3D model of the world, and figures out what actions will get it closer to its goal.
Brains don’t just build models—they test them in the real world through action. That connection between perception and movement is essential as more AI and robotics systems operate in physical space and teamwork will require skill-sharing.
To facilitate this, VERSES helped lead the creation of the Spatial Web standards, enabling AI agents to collaborate securely across devices and environments.
In live deployments, Genius SHARE has shown what’s possible: agents using this architecture autonomously reduced building energy use and emissions by 15 to 20 percent, a capability that scales from rooms to entire cities. The same framework lets robots share skills instantly and multiple agents cooperate; what one learns, all can learn.
In a simulation of the lunar surface, NASA’s Jet Propulsion Laboratory applied the Spatial Web standards to coordinate rovers and teams, tackling the problem of transforming data into a unified shared model and enabling entities using different technologies to collaborate.
As the rovers exchanged information, a kind of technical community emerged. Each machine was aware of the others, each contributed what it knew, and each one made adjustments based on what the group discovered. When one virtual rover got stuck in a crater, real-time data was sent to the other rovers in a communal gesture, demonstrating how the standards can assist in cooperation, and coordination on the moon.
Over time, these standards will allow Genius to connect seamlessly with other systems—much like Wi-Fi or Bluetooth—creating a foundation for intelligent, interoperable ecosystems.
Shared knowledge, cooperative lexicon, networked agents: At NASA, a simulation demonstrated how lunar rovers can establish a communication relay by seeing the moonscape, sharing 3D geometry and images, and planning the best route to assist in autonomous collaboration over challenging environments.