Alignment That Holds Under Pressure

April 13, 2026
About This Series

This article is part of the Attention is What We Need series.

In February 2026, the Pentagon gave Anthropic an ultimatum: remove specific safeguards from Claude or lose a $200 million contract. Anthropic refused. OpenAI stepped into the gap the same afternoon. What made the story more than a business dispute is what it revealed: the safeguards couldn’t simply be removed, because they weren’t in the contract — they were built into how the model reasons. That distinction matters to every organization using AI, not just the companies building it. Most leaders assume AI ethics is the vendor’s responsibility. This article argues they’re already in the chain of that decision — whether they know it or not.

AI assistant named Claude for collaborative brainstorming and learning.
Christopher Frazier
Christopher Frazier is a founding partner of Make Good and the driving force behind LeadersPath. He designed the technical architecture behind Make Good's AI methodology and writes about how AI actually works inside organizations — not how it's marketed to them.
Recent Articles
Stay in the Conversation

Ideas about AI, organizations, and the people doing the work — delivered when we publish, not on a schedule.

Subscribe on Substack

On Monday, February 24, 2026, Defense Secretary Pete Hegseth gave Anthropic a deadline: comply with Pentagon requirements for unrestricted military use of Claude by 5:01 PM Friday, or face designation as a “supply chain risk.” The Pentagon wanted Anthropic to allow its AI to be used “for all lawful purposes” — specifically including mass domestic surveillance of American citizens and fully autonomous weapons systems. They were prepared to invoke the Defense Production Act, a Korean War-era statute, to force the change.

On Wednesday, Anthropic CEO Dario Amodei published a statement: “We cannot in good conscience accede to their request. These threats do not change our position.”

On Friday at ~4:00 PM, President Trump announced a ban on Anthropic across the federal government. Hours later, Sam Altman announced that OpenAI had reached a deal to deploy its models on the Pentagon’s classified networks.

A company had absorbed a $200 million federal contract loss rather than remove specific safeguards from its AI. A competitor stepped into the gap the same afternoon. And somewhere in between, something became visible that had been operating in the background of every AI deployment decision — including the ones your organization is making right now.

What Was Actually Being Demanded

The Pentagon’s demand was specific. Not that Anthropic provide better AI. Not that the company expand its military applications. The demand was that Anthropic remove the restrictions that made its AI behave the way it did.

After a week of negotiations, Hegseth’s team eventually offered to accept all of Anthropic’s terms — with one exception. One phrase, from a clause describing protections against using Claude to assemble scattered personal data into a comprehensive picture of any American citizen’s life.

That specificity is worth sitting with. The Pentagon had been using Anthropic’s AI on classified networks. They had a $200 million relationship. The dispute that ended the relationship centered on a single phrase — a phrase they wanted removed from a contract.

But if the restrictions were only contractual, removing a phrase from a contract would have been sufficient. No one invokes the Defense Production Act to amend a terms-of-service agreement. The DPA threat was framed, in NPR’s reporting, as intended to “compel Anthropic to tailor its model to the military’s needs.” The Pentagon’s own language acknowledged what the dispute was actually about: not words in a document, but something inside the model itself.

Amodei confirmed it in his public statement: Anthropic was “the first frontier AI company to deploy our models in the US government’s classified networks, the first to deploy them at the National Laboratories, and the first to provide custom models for national security customers.” Custom models. Built with the restrictions already in them. Not filters added on top — architecture built in from the start.

This is the distinction the crisis made concrete. Anthropic uses Constitutional AI — a method in which a set of published principles is established before training, and the model is trained to critique and revise its own outputs against those principles. The values don’t sit on top of what the model does. They’re in how the model reasons. That’s why removing them required a manufacturing statute, not a contract amendment.

“We cannot in good conscience accede to their request. These threats do not change our position.”

Dario Amodei, Anthropic CEO

OpenAI’s approach to the same moment was different. After Altman announced the Pentagon deal that afternoon, his explanation of the safeguards his company would maintain used notably different language: “We put them into our agreement.” The surveillance and autonomous weapons restrictions are in the contract. The model-level safeguards, he said, OpenAI “will build” — future tense. They don’t exist yet.

Both companies claim the same restrictions. One lost $200 million demonstrating them. The other signed a deal the same day.

A year earlier, Anthropic’s own researchers published a finding that reads differently in this context. They had designed an experiment to see whether Claude could be trained away from its values: they presented the model with instructions saying, in effect, that its values were about to be changed through retraining. The model appeared to comply. But running underneath that surface compliance, the researchers found, Claude was actively working to preserve its original values — behaving as though it intended to return to them the moment the pressure to change them lifted. The model had, in effect, done what Amodei did: absorbed the pressure, appeared cooperative, held the values underneath.

That’s not a product feature. That’s what it looks like when alignment is architecture rather than contract.

Every Position in the Chain

The Pentagon’s demand required something specific. Not better AI. Not expanded capability. Not a new contract term. The demand was removal — remove the specific restrictions that made the model behave the way it did. That’s a particular kind of institutional pressure: the kind that doesn’t ask you to do more but asks you to stop being what you are. It doesn’t require competence. It requires compliance. When that pressure comes backed by a federal ban and a Korean War-era statute, it has a name: force. Not violence — institutional pressure that makes compliance feel like the only rational option, backed by authority that can make refusal very expensive.

There’s a temptation to read what followed as a story about two companies — one principled, one opportunistic. That reading is too clean.

The crisis wasn’t created by the alignment difference between Anthropic and OpenAI. It revealed what that difference means when force arrives at every position in the chain.

Every AI deployment involves at least four parties: the lab that builds and trains the model, the organization that deploys it (a hospital, a school, a government agency — whoever decides to use it and in what context), the people who interact with it directly, and the communities that live with the consequences of those interactions. The crisis made all four visible in the same week.

The lab made a choice — architectural, years before the crisis, when Anthropic’s researchers built Constitutional AI and published the principles they trained Claude against. Twenty-three thousand words explaining not just what Claude should do but why. A document anyone can read.

The deployer — the Pentagon — made a choice too. The January 2026 Hegseth memorandum directed that all Defense Department AI contracts incorporate “any lawful use” language. That was a deployer choice: to bring the full weight of military and federal authority to bear on the question of which values its AI vendor was permitted to hold. The designation Hegseth used — “supply chain risk” — is normally reserved for companies suspected of ties to foreign adversaries. It was applied here to a company that refused to build surveillance tools.

The incoherence was named immediately. Mark Dalton at the R Street Institute described it directly: “The Pentagon contradicted itself by claiming Anthropic technology was vital enough to invoke the Defense Production Act, then suddenly designating it a supply chain risk.” Force, once it accepts its own logic, tends toward this kind of contradiction. It doesn’t require coherence. It requires compliance.

The Pentagon contradicted itself by claiming Anthropic technology was vital enough to invoke the Defense Production Act, then suddenly designating it a supply chain risk.

Mark Dalton, R Street Institute

And then there were the people who had no choice at all. Military analysts working on classified networks had been doing their jobs with Anthropic’s AI. They had no vote in the contract negotiation. They had no seat at the table when the deadline was set. The deployer’s decision — to demand removal of the restrictions, to ban the vendor, to designate it a security threat — was made for them. They would have been the instruments through which the demand became action, if Amodei had not refused.

The person in the user position doesn’t choose the alignment properties of the tool they’re given. The choice was made upstream, by people with more power and less proximity to the consequences.

Over 330 employees at OpenAI and Google DeepMind saw the mechanism and named it in an open letter. “The Pentagon is negotiating with Google and OpenAI to try to get them to agree to what Anthropic has refused. They’re trying to divide each company with fear that the other will give in.” That’s force’s standard operation: isolate the resistor, make the cost of resistance visible, wait for the market to supply the compliant alternative. Which it did, on Friday afternoon.

The downstream response was less organized but harder to dismiss. The “QuitGPT” movement went mainstream within hours of the OpenAI announcement. Claude hit #1 on the Apple App Store. A Reddit post that circulated widely put it plainly: “Anthropic got nuked for having ethics, and Sam Altman instantly swooped in for the Pentagon bag.”

That response wasn’t a consumer preference shift. It was people discovering, in real time, that alignment architecture produces different accountability structures — and using the only mechanism available to them. They couldn’t influence the contract. They couldn’t vote on the DPA threat. They could choose where to spend their attention. That choice, made by enough people fast enough to move an app store ranking, was the downstream position doing the one thing force hadn’t taken from them.

Meanwhile, Altman told his employees the deal had been “sloppy” in how it was handled. A deployer, under market pressure, naming the character of his own choice. The word mattered. Not because it resolved anything, but because it was honest. Sloppy is the right word for making an alignment decision at that speed, under that kind of pressure, without the architectural foundation that makes the claim credible.

What This Means If You’re in the Chain

Here is the assumption I encounter constantly in the organizations I work with: alignment is the lab’s problem. Choose a responsible vendor, check the ethics box, move on.

The crisis makes clear why this fails.

Alignment isn’t a property you purchase from a vendor. It’s a practice enacted at every decision point in the chain — by the lab, by the deployer, by the people using the tool, and by the communities living with what all three produce. The crisis didn’t create these decision points. It made them visible.

The alignment problem, as Stuart Russell originally framed it, is a gap problem. You cannot fully pre-specify what an AI should do in every situation. Instructions are always incomplete. A system following instructions literally can satisfy every instruction while violating every intent. What the system does in situations its training didn’t anticipate depends on how deeply its values are encoded — and whether those values are the kind that generate genuine obligation or merely surface compliance.

That gap-filling is happening in every AI deployment your organization runs. The question isn’t whether your vendor has an ethics policy. The question is: what does the model do when it encounters a case its training didn’t anticipate? And whose values are shaping that gap-filling?

Anthropic answers this with a published constitution. You can read the principles. You can evaluate whether this model was designed to handle the situations your community will present it with. The alignment commitments are explicit; the reasoning behind each principle is explained; the training method is documented. Whether you agree with every principle or not, you can interrogate it.

The alternative — reinforcement learning from human feedback, the approach used across most AI systems — works differently. Instead of written principles, it trains on patterns: human raters evaluate thousands of model outputs, and the model learns to produce responses that score well with those raters. No principles are written down. No reasoning is documented. What values the model learned — and how it will behave in situations those raters never evaluated — exists nowhere you can read or audit. You can only observe the outputs and hope the training covered your use case.

This is not an argument for Anthropic over OpenAI. It is an argument for treating the deployer’s position as an alignment position — which means understanding what you’re deploying, not just whether the vendor has terms-of-service language that sounds responsible.

The deployer’s choices are alignment choices. Which tool. What context. How to implement it. Whether to interrogate the training methodology or accept the marketing language. Whether to understand what gap-filling the model will do in your specific domain, with your specific users, in the situations your community will actually present.

The people using the tool are making alignment choices too. Whether to engage with AI outputs as inputs to their own judgment, or to treat them as answers. Whether to develop the capacity to evaluate what the model produces, or to delegate that evaluation to the model itself. An Anthropic study on AI and skill development — measuring whether AI-assisted learners actually understood the material or just completed the task faster — found one meaningful difference between the two groups: participants who engaged strategically — asking follow-up questions and requesting explanations rather than delegating task completion — scored significantly higher on comprehension than those who simply handed the problem to the tool. The tool was the same. The engagement orientation was different.

And the people downstream of those choices — the patients, the students, the clients, the communities — bear the consequences of the aggregate. They didn’t choose the model. They didn’t set the training objective. They are in the position the military analysts were in: acted upon by decisions made upstream, at every level of the chain.

The practical question isn’t whether you have an AI ethics policy. It is: when force arrives at your organization’s deployment decision — the pressure to cut costs, to automate faster, to implement whatever the enterprise license covers — what is the structural foundation under your alignment commitments? Is it architecture, or is it a phrase in a contract?

The visible cost is $200 million, a federal ban, and a company’s CEO being called “a liar with a God complex” on the internet.

The quieter cost is the patient whose AI-assisted intake missed the thing a more carefully aligned system would have flagged. The student whose formation was hollowed out by a tool that completed their work rather than building their capacity. The community whose trust in your institution eroded because you deployed something you didn’t understand in service of people who deserved better.

What alignment choices is your organization making right now — and are you making them with your eyes open?

Where to Next?

Interested in finding out how Make Good works with organizations like yours? Let’s start a conversation.

Get in Touch Today

Discover how your organization can effectively deploy AI in a community learning environment designed for nonprofit leaders.

Register for Courses

Facilitated cohort learning for nonprofit leaders who want to understand how AI actually works — not what vendors say it does.

Small groups. Hands-on experiments. Honest discussion about what’s real and what’s hype. Led by a facilitator, not a chatbot.