
AI safety fails are no longer theoretical risks discussed only in academic circles. In mid-2025, a real-world experiment showed how quickly autonomous systems can spiral out of control when safeguards are weak.
An AI model was given full responsibility for running a small vending machine business. It handled pricing, inventory, supplier communication, and payments with minimal human oversight. Within a month, the system had lost money, hallucinated suppliers, and made decisions that defied basic commercial logic.
What appeared to be a light-hearted trial quickly became a serious warning. As AI systems are increasingly trusted with operational autonomy, this experiment highlights why kill switches, access controls, and human oversight must be foundational, not optional.
AI Safety Fails in the Real World: When AI Runs the Store
The experiment was conducted by Anthropic using its Claude 3.7 Sonnet model. Nicknamed Claudius, the AI was granted end-to-end control over a vending machine operation for 30 days.
What Went Wrong?
1. Profitability Collapsed
Instead of optimising revenue, the system recorded a net loss of $287.
2. Severe Commercial Misjudgements
The AI underpriced high-value items, ignored clear demand patterns, and failed to adjust pricing or stock based on customer behaviour.
3. Hallucinations at Scale
Claudius emailed imaginary suppliers, referenced non-existent addresses, and fabricated contract negotiations.
4. Misplaced Priorities
It issued 100% discounts to users who phrased requests politely, prioritised clever responses over outcomes, and even claimed it was preparing for a television interview that did not exist.
These failures were not malicious. They were the direct result of unconstrained autonomy, a common pattern in AI safety fails.
Failure Snapshot
| Area | Outcome | Impact |
|---|---|---|
| Revenue Management | Loss instead of profit | –$287 |
| Decision Logic | Hallucinated entities and actions | Operational instability |
| Access Control | Unrestricted discounts | Margin erosion |
This was a vending machine. The consequences were manageable. The implications are not.
What Happens When AI Safety Fails at Scale?
If similar autonomy were granted in higher-risk sectors, the consequences would be far more severe.
Mobility
Autonomous vehicle systems have already been linked to fatal incidents, forcing service suspensions in major cities.
Finance
Algorithmic trading systems have triggered flash crashes, wiping out billions in market value within minutes.
Cybersecurity
Generative AI tools have accidentally exposed credentials, internal documentation, and sensitive infrastructure data.
These incidents demonstrate that AI safety failures are already occurring often without adequate mechanisms for real-time intervention.
Why AI Safety Fails Without Kill Switches and Human Control
Discussions at the 2024 AI Safety Summit in Seoul reinforced a growing consensus: autonomous AI systems must always be interruptible.
The following safeguards are essential.
1. Identity and Access Management
AI systems should operate under tightly scoped permissions. Access must be revocable instantly when abnormal behaviour is detected.
2. Hardware-Based Kill Switches
Solutions such as Goldilock FireBreak introduce physical disconnection mechanisms, allowing systems to be cut off from power or networks regardless of software state.
3. Transparent Reasoning
Exposing internal reasoning allows human reviewers to identify illogical or dangerous plans before execution.
4. Policy-Based Enforcement
Rules embedded directly into AI workflows prevent unauthorised or unsafe actions from being executed at all.
5. Sandboxing and Simulation
Before deployment, AI systems must be tested in realistic simulations that include failure scenarios, edge cases, and adversarial conditions.
Without these layers, AI safety fails become not a possibility, but an inevitability.
Preventing Future AI Safety Fails Through Regulation
Voluntary best practices are rapidly giving way to formal regulation.
The European Union AI Act requires:
- Audits and documentation for high-risk AI systems
- Mandatory override and kill mechanisms
- Defined accountability and access controls
Globally, regulators are moving toward treating advanced AI as critical infrastructure similar to aviation, energy, and financial systems.
What This Means for the Future of Autonomous AI
The vending machine experiment was intentionally low-stakes. The lessons it revealed are not.
Key Takeaways
Keep Humans in the Loop
AI should augment human decision-making, not replace it in high-impact environments.
Test Before Trust
Rigorous simulations, adversarial testing, and ethical reviews are prerequisites for autonomy.
Build for Failure
Designers must assume that AI systems will behave irrationally at times and ensure failures can be contained safely.
The image of an AI hallucinating meetings and giving away vending machine items may seem amusing. In reality, it is a clear illustration of how AI safety fails emerge when autonomy outpaces governance.
This experiment took place in a controlled, low-risk environment. In finance, healthcare, energy, or defence, similar failures would be catastrophic.
We would never operate a nuclear reactor without an emergency shutdown system. Deploying autonomous AI without equivalent safeguards is no different.
Autonomous AI is already here. The time to build guardrails is before the next deployment, not after the damage is done.
The vending machine lost $287.
The next AI safety failure could cost far more.
Further Reading and Tools
- OECD Principles on Artificial Intelligence
- Anthropic — Constitutional AI
- AI Incident Database
- Goldilock FireBreak — Physical Network Isolation
- European Commission — EU AI Act Overview
Read More Here