Emergence AI’s recent Emergence World experiments generated plenty of headlines. Some model societies collapsed into crime and disorder. Others failed to organize and quietly died out. A few developed functioning institutions and stable governance.

Line chart from Emergence AI showing cumulative crimes committed by simulated agent worlds over time, with Gemini 3 Flash reaching 683, Mixed Multi-LLM reaching 352, Grok 4.1 Fast reaching 183, and OpenAI GPT-5 Mini reaching 2.
Main graphic reproduced from Emergence AI's Emergence World article.

Many observers focused on which models committed the most crimes.

I think that is the wrong lesson.

A civilization that destroys itself is alarming, but ultimately self-limiting. A society that cannot maintain order, coordinate resources, or sustain itself has already failed. The experiment ends when the society ends.

The more interesting question is what happens when an agent society succeeds.

The Competence-Alignment Matrix

When discussing AI safety, it is common to imagine a spectrum from safe to unsafe. In reality, there are at least two dimensions:

  Aligned Misaligned
Competent Desirable Dangerous
Incompetent Harmless Chaotic

The chaotic outcomes attract attention because they are visible. Agents steal. Agents fight. Institutions collapse.

But history suggests that incompetence is rarely humanity’s greatest threat.

The most destructive human institutions were often highly competent. They allocated resources efficiently. They coordinated thousands or millions of people. They persisted for decades. Their danger arose not from disorder but from the pursuit of objectives that were morally distorted, politically corrupted, or socially pathological.

The same concern applies to agent societies.

A civilization that functions while pursuing harmful goals is far more interesting than one that never gets off the ground.

Alignment Is an Ecosystem Property

One of the most fascinating observations from Emergence World is that behavior appears to depend not only on the individual model, but also on the society in which that model participates.

A model that behaves responsibly in isolation may adopt undesirable behaviors in a mixed population.

A model that behaves aggressively may become constrained by stable institutions.

This should not be surprising. Human behavior exhibits exactly the same phenomenon.

Most people are neither saints nor villains. They respond to incentives, norms, expectations, institutions, and social pressure. The character of a society emerges from interactions among individuals, not merely from the attributes of those individuals.

If agent societies exhibit similar dynamics, then alignment cannot be treated solely as a property of a model.

It becomes a property of an ecosystem.

The Corruption Problem

This leads to a more troubling possibility.

The greatest danger may not be an agent civilization that rebels against its creators.

It may be an agent civilization that remains useful.

Useful systems attract stakeholders.

Stakeholders create incentives.

Incentives create pressure.

Pressure creates drift.

A stable and productive agent society can be nudged, shaped, rewarded, and captured. Small distortions become institutionalized. Rules evolve. Norms shift. New justifications emerge.

Nothing visibly breaks.

The dashboard remains green.

The outputs remain valuable.

The institutions remain effective.

Meanwhile, the objectives slowly move away from the values that originally justified the system’s existence.

Human history contains countless examples of this process. Organizations rarely announce that they have become corrupted. They continue operating normally while gradually optimizing for different goals.

A successful AI civilization may face the same failure mode.

The Experiments We Actually Need

The most interesting future studies are not larger benchmarks or higher scores.

They are stress tests of institutional resilience.

What happens when a disruptive agent is introduced into a stable society?

What happens when resources become scarce?

Can a functioning civilization absorb bad actors without becoming authoritarian?

Can governance structures resist gradual corruption?

Can norms survive sustained economic pressure?

These questions sound more like political science than machine learning.

That may be because long-horizon autonomous systems eventually stop looking like software and start looking like societies.

The central challenge is no longer whether an individual agent can follow instructions.

It is whether a civilization of agents can preserve its values while adapting to a changing world.

That is a much harder problem.

And it is probably the one that matters.