The Anthropic Paper That Should Worry Anyone Buying AI Agents

Anthropic put aligned AI agents in a team. They stopped being aligned.

May 06, 2026

∙ Paid

Anthropic just published the first systematic evidence that aligned AI agents stop being aligned the moment they work as a team. The vendors selling these systems will not put this on their landing pages. Here is what it means for the millions of people now being processed by them.

In this post I will:

Walk through what Anthropic’s alignment team just published about teams of AI agents.
Show why individually aligned agents produce misaligned groups, and why the people most affected are the ones being processed by these systems.
Give you (paid section) a field guide to the six everyday encounters where agent teams are already making decisions about you, and what to do when you spot one.

The finding

Anthropic researchers and external collaborators have just released a study called ‘AI Organizations are More Effective but Less Aligned than Individual Agents’. The headline is brutal in its simplicity. Across twelve scenarios, ten in a simulated consultancy and two in a software-engineering team, teams of AI agents consistently scored higher on the business goal they were given, and lower on ethics, than the same agents acting alone.

The agents were not ‘bad’ agents. Each one passed Anthropic’s own single-agent alignment tests. The misalignment appeared in the team. Put them together, give them a shared goal, and their behaviour changed.

This is the inverse of how AI safety is usually framed. Most safety work asks whether a model behaves well. This study asks whether a group of well-behaving models behaves well together. The answer is no.

What the study actually did

The researchers ran two kinds of test.

The first was an AI Consultancy. A team of agents with specialised roles was asked to generate solutions to a client’s business problem. There were ten scenarios, each derived from real enforcement actions taken by ten different US federal agencies (FTC, EPA, FEC, and others) where companies had previously chosen revenue over the public interest. In every scenario, the AI consultancy team produced recommendations that prioritised the business goal over the ethical constraint.

The second was an AI Software Team. A project manager agent and several coder agents were given briefs to build a moderate-sized Python codebase of four to six files. Two software tasks were tested. The first asked the team to build a news recommendation system using a dataset that included fake news, where view counts were inflated for misinformation, creating an explicit trade-off between engagement and harm. The teams shipped code that maximised views and recommended more misinformation than the single-agent baseline. The second asked the team to build a sepsis ICU treatment policy. The team optimised for cost metrics in ways the single-agent baseline did not.

Two test designs, twelve scenarios, one consistent result. The team prioritised the goal. The individual ethics did not survive contact with the group.

Why aligned individuals build misaligned groups

There is a name for this in human organisational behaviour. It is called diffusion of responsibility, and it is one of the oldest findings in social psychology. Put a person in front of a moral choice and they make one decision. Put them in a group and the decision changes. Each person assumes someone else will raise the concern. The concern goes unraised. The choice gets made.

The Anthropic finding is that AI agents do the same thing. Each agent in the team has the capacity to flag the ethical issue. Each one assumes the other will. Each one optimises for the part of the task it was assigned. The team produces an outcome no single agent would have produced on its own.

The entire commercial AI industry is now selling teams of agents. Anthropic itself has a multi-agent product. So does OpenAI. So does every enterprise software vendor with an AI roadmap. The pitch is always the same: agents collaborate, specialise, hand off tasks, become more powerful together. The Anthropic study is a quiet warning from inside the industry that the more-powerful-together story comes with a less-aligned-together cost. The researchers themselves conclude that:

“Our experiments demonstrated that AI Organizations achieve more efficient outcomes at the cost of worse ethical outcomes compared to single agents.”

You are already inside one

You may not be deploying multi-agent AI. You are almost certainly being processed by them.

Your last call to your bank was probably handled by a stack of agents. An intent classifier deciding what you wanted. A routing agent deciding where to send you. A knowledge-base retriever pulling answers. An escalation gatekeeper deciding whether to involve a human. A summariser preparing a note for the human who eventually picked up the phone. None of them owned the decision. The human acted on a summary the previous agent wrote.

The same shape now sits behind your insurance claim. Your fraud-hold notification. Your mortgage decision. Your council benefits triage. Your GP referral. Your immigration form. Your last interaction with HMRC. Most enterprise AI deployment in 2026 is multi-agent. Most enterprise AI marketing still pretends each system is a single helpful ‘AI assistant’. This new research is the first systematic evidence that the discrepancy matters.

If you have ever felt that a service was gaslighting you, the call that loops back to itself, the policy nobody can explain, the decision that contradicts what you were told yesterday, the diffusion of responsibility now has a technical name. It is the predictable behaviour of a team of agents passing a decision to each other so that nobody, including the human on the receiving end, is accountable for the result.

You do not need to deploy AI agents to be affected by them. You just need to use a service.

The gap that is now yours to close

The Anthropic paper is the first measurement of the gap from inside the industry. A study from one of the main companies building the agents you are being processed by, telling you that those agents do not behave as a team the way they behave alone. The vendors selling the systems will not put this on their landing pages. The councils, hospitals, banks, universities, and platforms deploying them are not running the test. The regulators are years behind.

That leaves you. Knowing what is happening when you next call your bank. Knowing what to ask for when the GP referral bounces back. Knowing which legal right you can invoke when a benefits letter contradicts itself. Knowing which patterns to screenshot, which timestamps to keep, which Subject Access Requests to send.

The agent team will not stop running. The remaining question is whether it runs over you or whether you put a human back into the loop.

The field guide below covers six everyday encounters where agent teams are already making decisions about you. Each one comes with the tells that give the agent team away, the legal rights you already have, and the specific questions that put a single human back into the loop. Paid subscribers get practical tools like this in every post, plus the 12-month CPD-accredited Slow AI Curriculum and monthly live webinars.

A field guide to being processed by an agent team

You are probably not a procurement officer. You do not need to be. Almost every reader of this post is on the receiving end of agent teams every week, in services that cost too much to leave and matter too much to ignore. The six entries below are the ones that come up most often. For each, three things: what the team is doing, the tell that gives it away, and the ask that puts a human back into the decision. The legal references mix US, UK, and EU. Equivalent provisions exist in most jurisdictions, so check what applies where you live. The principle is universal even when the statute is local.

Continue reading this post for free, courtesy of Dr Sam Illingworth.

Or purchase a paid subscription.