No One Is Rewriting History. They're just Filling In the Blanks.

Luis Ruiz- El Paso
July 2026

There's a popular fear that AI is rewriting history — that the models themselves are quietly editing the record. The fear points at the wrong actor. The models aren't rewriting anything. People are writing for the models, at industrial scale, and the models are repeating whoever wrote most persistently.

Call them information merchants. Some are states. Most are marketers. They work the same three stages of the pipeline, and each stage is now documented.

Stage one: seed the training data

The Kremlin-linked Pravda network published millions of articles that no human was ever meant to read. The sites have almost no traffic. They exist to be crawled — to sit in the web archives that AI companies scoop up when they train models. In April 2026, DFRLab audited Common Crawl, the public archive that feeds most open AI training pipelines, and found Pravda content inside it. At least one major open-weights model could be prompted to reproduce that content nearly verbatim.

This isn't just possible in theory. Anthropic, the UK AI Security Institute, and the Alan Turing Institute tested how much poisoned material it takes to compromise a model. The answer: about 250 documents — a near-constant number whether the model is small or large. You don't need to control a percentage of the internet. You need a few hundred pages in the right archive.

Stage two: seed the answers

Training data is the slow game. The fast game is retrieval — shaping what AI systems cite when they search the live web to answer a question. An entire industry now exists for this. It's called generative engine optimization, and a study out of Princeton, Georgia Tech, and IIT Delhi found that simple content tweaks — adding statistics, quotations, citations — lift a page's visibility in AI answers by up to 40%. The U.S. market for this work is projected at hundreds of millions of dollars in 2026 and growing fast. Every enterprise marketing team you know is doing some version of it.

Where the manipulation actually wins

Here's the part the alarm-ringers get wrong. In March 2025, NewsGuard reported that chatbots repeated Pravda's false claims in 33% of test answers. Researchers publishing in the Harvard Misinformation Review re-ran the test and got roughly 5% — and nearly all of those cases shared one feature. They occurred on obscure topics where the seeded content was the only content available.

Researchers call these gaps data voids. Ask a model about a well-covered subject and the seeded material drowns in the volume of legitimate coverage. Ask about something thinly documented — a small town's history, a niche policy fight, a claim no journalist ever bothered to check — and whoever filled that void owns the answer.

That's the actual mechanism. AI doesn't rewrite history. It launders persistence. Whoever writes most consistently into the empty spaces of the record becomes, as far as the machine is concerned, the record.

What this means

Three things follow.

First, the center of the record is defended by volume. Contested, well-covered history is hard to capture, because capture requires outweighing everything already written.

Second, the edges are wide open. Every under-documented community, institution, and event is a void waiting for whoever shows up first with a few hundred pages. That includes your town, your industry, and your name.

Third, the defense is the same as the attack: presence. The only reliable countermeasure to seeded content is authentic content — documented, sourced, persistent — sitting in the same archives the machines read.

The question stops being "is AI rewriting history?" and becomes "who is writing the parts of history nobody else bothered to write down — and can you tell when they've done it?" That second question is a systems-literacy problem: you can't audit a record until you understand the pipeline that built it. It's the kind of problem worth learning to see.

Sources: DFRLab, "Pravda in the pipeline" (April 2026); Harvard Kennedy School Misinformation Review, "LLMs grooming or data voids?" (October 2025); Anthropic, "A small number of samples can poison LLMs of any size"(October 2025); AI Incident Database, Incident 968; Search Engine Land, GEO guide 2026.

Next
Next

Cooperative Agents: a glossary