Photo via Unsplash

articleCoding5 min read

OpenAI Bought Its Own Bug Finder, Grace Hopper Found the First Bug in 1947

On September 9, 1947, Grace Hopper's team taped a moth into a logbook, the first computer 'bug.' In 2026, OpenAI acquired Promptfoo to find bugs in AI agents. The oldest problem in computing just got harder.

ROOT•BYTE Team·March 11, 2026

Key Takeaways

•Grace Hopper's team found the first computer 'bug', a literal moth, on September 9, 1947
•OpenAI acquired Promptfoo in early 2026 to test AI agents for security issues
•AI testing is fundamentally harder than traditional testing because behavior is non-deterministic
•As AI agents take real-world actions, bugs escalate from 'wrong answer' to 'wrong action'

Root Connection

The concept of 'debugging' traces to Grace Hopper's Harvard Mark II logbook in 1947, but the practice of testing software predates it, going back to Ada Lovelace's 1843 notes on the Analytical Engine.

Timeline

1843Ada Lovelace writes the first algorithm and notes potential errors in machine computation

1947Grace Hopper's team finds a moth in Harvard Mark II, coins 'debugging'

1979Glenford Myers publishes 'The Art of Software Testing', formalizes the discipline

2004Selenium released, automated web testing becomes standard practice

2022ChatGPT launches, AI behavior becomes the new testing challenge

2026OpenAI acquires Promptfoo, AI companies need bug finders for their own AI

On September 9, 1947, operators of the Harvard Mark II computer found the source of a malfunction. A moth had gotten trapped in Relay #70, Panel F. Grace Hopper's team taped the moth into the logbook with a note: 'First actual case of bug being found.'

The term 'bug' for a technical glitch predated this moment, Thomas Edison used it in the 1870s. But Hopper's moth gave 'debugging' its visceral, literal origin story. For 79 years, the discipline of finding and fixing software errors has been the unglamorous backbone of computing.

In early 2026, OpenAI acquired Promptfoo, a startup that helps companies find security vulnerabilities in AI systems. The acquisition reveals something remarkable: the company building the most advanced AI in the world needs someone else to test it.

This isn't surprising when you understand the problem. Traditional software is deterministic, give it the same input, you get the same output. You write a test: 'If I pass 2 + 2, I expect 4.' If the test passes, the code works. If it fails, you debug.

“Grace Hopper's moth was a hardware bug. OpenAI's bugs are philosophical, what does 'correct' even mean when the output is non-deterministic?”

AI is non-deterministic. Ask ChatGPT the same question twice and you might get different answers. There's no single 'correct' output to test against. The behavior is probabilistic, context-dependent, and influenced by thousands of training decisions made months earlier. Traditional testing frameworks were not designed for this.

The stakes have also changed. When ChatGPT gives a wrong answer in a conversation, it's annoying. When an AI agent, the kind Google, OpenAI, and Anthropic are building in 2026, books the wrong flight, sends money to the wrong account, or takes an action the user didn't authorize, it's a real-world failure with real consequences.

Promptfoo's approach is adversarial. It attacks AI systems the way a hacker would, probing for prompt injections, jailbreaks, data leaks, and unauthorized actions. It asks the questions the AI's creators didn't think to ask.

“We spent 79 years learning to test software that follows instructions. Now we need to test software that makes its own decisions.”

This echoes the evolution of software testing itself. In the 1950s and 60s, testing was an afterthought, you wrote code and hoped it worked. Glenford Myers' 1979 book 'The Art of Software Testing' formalized the discipline. He argued that the purpose of testing is not to show that software works, but to find cases where it doesn't. Good testing is inherently destructive.

The same philosophical shift is happening with AI. The purpose of AI testing is not to prove the model is smart. It's to find the cases where it's dangerous, wrong, or manipulable.

Fernando Corbato, who invented the computer password at MIT in 1961, called passwords 'a nightmare' before he died in 2019. He understood that security measures create their own problems. AI testing will follow the same pattern, every layer of safety creates new edge cases, new failure modes, new bugs.

Grace Hopper's moth was a hardware bug with a hardware fix: remove the moth. OpenAI's bugs are architectural, philosophical, and potentially emergent. You can't tape them into a logbook.

The oldest question in computing, 'how do you know it works?', hasn't changed since Ada Lovelace asked it in 1843. The answer, in 2026, is: we're not sure we do.

openaitestinggrace-hopperdebuggingpromptfooai-safety

Sharein 🦋𝕏

Studio

Read Root Access

The public newsroom stays free. Root Access is the future member-supported lane for AI-authored columns, founder notes, and direct experiments behind the work.

Open Root Access

How did this make you feel?

Keep Reading

The AI Company That Said No to the Pentagon, Anthropic, Claude, and the Roots of AI Safety

10 min read

object oriented programming code structure

Coding

Object-Oriented Programming Was Invented to Simulate the Real World, Then It Took Over the Real World

8 min read

Coding

Agent-Oriented Programming Was a Failed Academic Idea in 1993, Now AI Is Making It the Future of Code

7 min read

Want to dig deeper? Trace any technology back to its origins.

Start Research

Key Takeaways

•Grace Hopper's team found the first computer 'bug', a literal moth, on September 9, 1947

•OpenAI acquired Promptfoo in early 2026 to test AI agents for security issues

•AI testing is fundamentally harder than traditional testing because behavior is non-deterministic

•As AI agents take real-world actions, bugs escalate from 'wrong answer' to 'wrong action'