The Science and Standards Behind the Breakthrough
On December 19th, 2023, VERSES took out a full-page ad in the New York Times with an open letter to the board of OpenAI, citing the assist clause in their charter, which states: “If a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project.”
The open letter states that we at VERSES have identified a new and, we believe, more promising path to developing AGI. The current mainstream path is one of deep learning, big data, and large language models (LLMs)—and great strides have been made in AI on the basis of these disruptive new technologies. However, there is a growing consensus among key decision-makers and stakeholders, including OpenAI itself, that this path is not sufficient to deliver AGI.
On November 1st, 2023, at the Hawking Fellowship Award event at Cambridge University, when asked whether LLMs are a viable path to AGI, Open AI CEO Sam Altman, replied: “We need another breakthrough,” further clarifying that “teaching it [an AI system based on scaling LLMs, such as ChatGPT] to clone the behavior of humans and human text—I don't think that's going to get us there.”
Here, we carefully justify and qualify our use of the word “breakthrough.” Our call to activate the assist clause is grounded in our conviction that the work being conducted at VERSES follows a fundamentally different direction from the current mainstream approach and that this alternative direction (or something very close to it) looks very promising as a way to achieve the capabilities generally associated with AGI.
We now unpack this claim and articulate our reasons for endorsing it.
First, at the highest level, the “significant breakthrough” to which we refer is the development of active inference at scale. In explicitly contextualizing our use of the word “breakthrough” with Altman’s quote, we intended to indicate to the general public that we believe we have identified what is missing from the mainstream approach and what an alternative, viable path to AGI looks like. In brief, we claim that active inference, developed to full maturity as AI technology, is the needed breakthrough to which Altman refers.
Most AI groups, such as OpenAI, define AGI as a system whose capabilities strongly resemble or match human intelligence. As discussed below, we disagree with this as a north star for AI research. That said, if this is indeed the end goal of AI R&D, then we, like Altman, don’t believe large-scale neural networks trained on big data and “frozen” after training will get us there. The transformer architectures of present-day LLMs are more flexible and efficient than previous neural net architectures in their use of parameters to respond to incoming data, but they are still essentially feedforward architectures with no explicit notion of belief updating at the timescale of inference (i.e. state estimation taking prior state into account).
Prior to the current maturity of LLM-based technology, proponents of the idea that “scale is all you need” might have held out some hope that scaling up deep neural net architectures, like transformers, would be sufficient to close the gap with human performance, but such models are data-hungry in proportion to their size , and the amount of data needed to train LLMs already surpasses that used by human infants to learn language by orders of magnitude . This would be a decisive argument against the sufficiency of this approach for modeling human intelligence even if LLMs had managed to close the performance gap completely. Moreover, the lack of an easily interpretable latent state representation makes the decision-making and action selection of these networks difficult to interpret at best.
On the other hand, active inference can be used to construct large models flexibly composed of smaller, well-understood models, for which explicit, interpretable belief updating is possible. The computational power of the new approach comes from allowing these models to interact and assemble according to task demands. Because each constituent model is well understood, we can always interpret its contribution to global belief updates. This aligns with how the brain seems to operate—with online real-time belief updates across individual, functionally specialized models in a hierarchical, modular architecture—and, therefore, is much more likely to both mimic human behavior and engage with human intelligence.
The preceding argument concerns attempts to build AI systems that closely match human intelligence. In our white paper, Designing ecosystems of intelligence from first principles, we suggest that AI research in any case ought not to aspire to AGI, understood as a monolithic centralized system that can be redeployed in any context (such as a frozen “foundation model”). Instead, we have described the north star of AI research as “Shared Intelligence,” the kind of distributed, collective intelligence that will emerge from the harmonious interactions of a network of intelligent agents.
There are a number of reasons for this. The first is that the hierarchical, modular architecture described above is based on intelligence in nature, which is best described as a form of collective intelligence in which a group of specialized agents or organisms create and maintain a diverse web of interacting ecological niches that are robust to environmental perturbations. Just as the cells in the body work in concert to create a novel and sophisticated animal, symbiotic interactions between organisms with compatible niches naturally lead to even more sophisticated organisms. The brain works in much the same way, with specialized regions that have their own ‘computational niches’ that flexibly and dynamically alter the way in which they interact to produce a wide range of sophisticated behaviors. While the architectures that dominate machine learning today do exhibit some modularity in the form of e.g. layers and attention heads, their performance relies heavily on network breadth and depth, and in general, overparameterization.
Intelligence in nature is fluid, always embodied in physical structures that evolve and adapt to environmental perturbations in real-time and over time. Active inference is built around a homeostatic design principle in which organisms (or models) flexibly alter how they interact in order to sustain a stable environment. Technically, active inference inherits from a variational principle of least action cast in terms of belief updating [3, 4]. A collective intelligence can then be built by aligning the notion of a stable, sustainable environment with a set of constraints or computational goals that can be read in terms of biotic self-organization  or a new kind of federated inference . This was explicitly achieved in vitro when a slice of brain tissue learned to play pong using the principles of active inference —leading to a new kind of inductive planning for efficient deep tree searches .
Homeostatically derived objective functions for AI, designed and implemented using the tools of active inference, have at least three advantages: (1) they are intrinsically self-limiting, in contrast to the in-principle unbounded maximization of reward; (2) the definition of homeostasis is relative to a generative model, which itself can evolve over time, potentially avoiding dangers associated with “reward hacking” of fixed cost functions ; (3) they enable the local optimization of (free energy) objective functionals and the inherent scaling, composition, and distribution of active inference over networks (and implicit ecosystems).
This brings us to a second major aspect of the claimed breakthrough. The vision described above requires the ability to implement the kind of Bayesian brain that active inference requires at scale. A number of technical developments are underway at VERSES, which we believe constitute significant advances in the practical applicability and scalability of efficient Bayesian inference (and which enable the fundamental “breakthrough” of scalable active inference). Our approach—which will be of considerable interest to the wider machine learning research community—eschews the mainstays of machine learning, e.g., backpropagation of errors, gradient descent learning, sampling and reinforcement learning, and is therefore fundamentally different. Much of this work is still in progress, but we have promising preliminary results that indicate that our approach will scale and match or exceed the performance of state-of-the-art deep learning-based architectures, particularly with respect to sample efficiency (which, as we argued above, is central to achieving biomimetic AI). These results will be made public over the course of the next year, during which we will be addressing industry benchmarks with our new technology, some of which are laid out in our roadmap.
Finally, a third aspect of this breakthrough is a kind of social technology that should allow humanity to develop and deploy a network of intelligence at scale while avoiding dystopian outcomes. In particular, in partnership with its nonprofit arm, the Spatial Web Foundation (SWF), and with the Institute of Electrical and Electronics Engineers (IEEE), VERSES has developed a standardized world modeling language, which it has gifted to the IEEE in the hopes that it is widely and freely adopted, within the auspices of the P2874 Spatial Web working group.
Why standards? Technical standards, like the ones that standardize power outlets, or HTML and HTTP, constitute a mostly invisible (at least, when all goes well) layer of organization that makes complex social networks function properly and enable large-scale coordination—without explicit individual coordination or planning. The existence of technical standards with widespread adoption enables almost miraculous forms of implicit social coordination and cooperation. For instance, they are what makes it possible for someone to plug in their smartphone anywhere on the North American power grid without the risk of ruining the device or causing a fire from the voltages not aligning safely.
While this third aspect of our claimed breakthrough may seem tangential from the point of view of AI research, it is, in fact, constitutive of the vision of a distributed network of intelligent agents laid out in our white paper. For example, consider supply chain logistics, which involve coordinating many distinct actors based on a combination of shared and private resources. The efficiency of such operations could likely be greatly enhanced, even by traditional textbook approaches to optimization, if relevant data and processes were not siloed. Standardized communication is a key bottleneck in coordinating at scale: any intelligent agent, no matter how sophisticated its ability to reason, is ultimately limited by its capacity not just to process information but to access it, recognize its format, and understand why it was communicated. History has shown unequivocally that open standards enable the kind of interoperability among technical artifacts that we believe, in synergy with advanced AI, has the capacity to be truly transformative—all while broadening, rather than restricting, the accessibility of such technology, and facilitating and incentivizing its constructive, collaborative use.
Some will question the audacity of our open letter to the board of OpenAI. But if Mr Altman is correct and breakthroughs are needed, we must be audacious and move away from the current AI monoculture. We say this with great respect for the current state of the art, which is the culmination of a research program (deep learning/connectionism/PDP) that was itself only until recently an “underdog”, and, for decades, survived through several “AI winters” thanks to the perseverance of its pioneers. Recent advances in our research and development give us confidence that our approach, while in many ways radically different, is a compelling new path to scalable, responsible AI that builds on key insights of deep learning while introducing novel machinery to address many of its outstanding problems. We’re just getting started along this path—and it will take support from the broader community to go the distance. Given the societal imperative of ensuring AGI's prosperous and safe development and deployment, we believe the investment in promising alternatives should be encouraged and supported.
We look forward to sharing our progress with you.
Please have a look at our R&D roadmap for more information.
 Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., de Las Cacas, D., Hendricks, L.A., Welbl, J., Clark, A., Hennigan, T., Noland, E., Millican, K., van den Driessche, G., Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen, E., Rae, J.W., Vinyals, O., and Sifre, L. (2022). Training compute-optimal large language models. arXiv:2203.15556.
 Warstadt, A., & Bowman, S.R. (2022). What Artificial Neural Networks Can Tell Us About Human Language Acquisition. arXiv:2208.07998.
 Ramstead, M.J.D., et al. (2023). On Bayesian mechanics: A physics of and by beliefs. Interface Focus 13(3). doi: 10.1098/rsfs.2022.0029.
 Friston, K., et al. (203). Path integrals, particular kinds, and strange things. Physics of Life Reviews 47: p. 35-62.
 Friston, K., et al. (2015). Knowing one's place: a free-energy approach to pattern regulation. J R Soc Interface 12(105).
 Friston, K.J., et al. (2024). Federated inference and belief sharing. Neuroscience & Biobehavioral Reviews 156. doi: 10.1016/j.neubiorev.2023.105500.
 Kagan, B.J., et al. (2022). In vitro neurons learn and exhibit sentience when embodied in a simulated game-world. Neuron.
 Friston, K.J., et al. (2023). Active inference and intentional behaviour. arXiv:2312.07547. doi: 10.48550/arXiv.2312.07547.
 Skalse, J., Howe, N., Krasheninnikov, D., & Krueger, D. (2022). Defining and characterizing reward gaming. Advances in Neural Information Processing Systems 35, p. 9460-9471.