How to Catch a Liar With Math
What You'll Learn
A bizarre mathematical law that lets you detect fake data — used by real fraud investigators, the IRS, and forensic accountants — and how to build your own lie detector in Python.
Part 1
A Strange Question
Quick — think of a number. Any number. The population of a city. The price of a house. The number of followers someone has. The distance to a star.
Now look at the first digit of that number. Is it a 1? A 5? A 9?
You'd think every digit (1 through 9) would show up equally often as the first digit — about 11% each. That seems logical, right?
It's completely wrong.
In the real world, the digit 1 appears as the first digit about 30% of the time. The digit 2 appears about 17%. The digit 9? Only about 5%.
This isn't a coincidence. It's a mathematical law discovered over 100 years ago, and it's so reliable that investigators use it to catch people who fake their numbers. It's called Benford's Law.
Part 2
Seeing It With Real Data
Let's look at the first digits of real-world numbers. We'll start with country populations:
What Just Happened
str(pop)[0] converts the number to text, then grabs the first character. So 1412 → "1412" → "1".
int(str(pop)[0]) converts it back to a number so we can use it as a dictionary key.
Look at the result: the digit 1 dominates. This is real data from real countries. The pattern is already visible.
Part 3
The Law
Benford's Law predicts exactly how often each first digit should appear:
The pattern is dramatic — 1 is six times more likely than 9 to be the first digit. This seems impossible, but it shows up in data set after data set across every field imaginable.
But Why?
Here's the intuition: think about counting from 1 to 1000. You start at 1 and have to pass through all the 100s (100-199) before reaching the 200s. The numbers starting with 1 cover a huge range (1-1, 10-19, 100-199, 1000-1999). Numbers starting with 9 only barely exist before the next power of 10 (9, 90-99, 900-999). The digit 1 has more "room" to appear.
Part 4
Building a Benford Detector
Let's build a function that analyzes any list of numbers and checks if they follow Benford's Law:
Part 5
Testing Different Data Sets
Let's test Benford's Law on different kinds of numbers. Some should follow the law. Some shouldn't. Can we tell the difference?
What Follows Benford's Law?
Follows: Populations, financial transactions, street addresses, river lengths, powers of 2, Fibonacci numbers, stock prices, earthquake magnitudes, file sizes on your computer.
Doesn't follow: Numbers assigned by humans (phone numbers, zip codes, ID numbers), numbers from a narrow range (test scores 0-100), truly uniform random numbers.
The key pattern: Benford's Law applies to data that spans many orders of magnitude (ones, tens, hundreds, thousands, etc.) and grows naturally.
Part 6
Catching a Liar
Here's where it gets exciting. When people make up numbers — to fake financial records, cheat on taxes, or fabricate research — they usually pick digits roughly equally. They don't know about Benford's Law. And that's how they get caught.
Let's simulate this. We'll create "real" data and "fake" data, then see if our detector can tell the difference:
Run it several times. The detector consistently flags the fake data. Real investigators at the IRS and FBI actually use this technique to flag suspicious tax returns and financial reports for further investigation.
Part 7
Seeing the Difference
Let's build a side-by-side visual that shows exactly where fake data deviates from Benford's Law:
Part 8
The Complete Data Forensics Tool
Here's everything from this lesson combined into one interactive forensics tool. Choose a dataset — real city populations, simulated transactions, a fake expense report someone clearly made up, or your own numbers — and the tool applies Benford's Law analysis and gives you a verdict.
Option 5 is especially revealing: it generates a clean dataset and a fraudulent one side by side so you can see exactly what the difference looks like in the numbers.
Try all five options. Option 4 lets you type your own numbers — try entering random numbers you make up and see if the detector catches you!
Part 9
Real-World Fraud Detection
Benford's Law isn't just a math curiosity. It's used in real investigations:
Real Cases
Tax fraud: The IRS uses Benford's Law as a screening tool. If someone's reported income, expenses, or deductions don't follow the expected distribution, the return gets flagged for an audit.
Election fraud: Researchers have analyzed vote counts from elections worldwide. When vote totals don't follow Benford's Law, it can indicate manipulation — though this is just one piece of evidence, not proof by itself.
Accounting fraud: Enron — one of the biggest corporate frauds in history — had financial statements that deviated significantly from Benford's Law. Forensic accountants now routinely use this technique.
Scientific fraud: Researchers who fabricate experimental data often get caught because their numbers are "too random" — they lack the natural patterns Benford's Law predicts.
Part 10
When Benford's Law Doesn't Apply
Like any tool, Benford's Law has limits. It's important to know when it works and when it doesn't:
Critical Thinking
Benford's Law is a powerful tool, but it's not proof of fraud by itself. Data that deviates from Benford's could be fake — or it could just be the wrong kind of data.
A good data detective doesn't jump to conclusions. They use Benford's Law as a screening tool — a reason to investigate further, not a final verdict. This is the same caution scientists use: one piece of evidence is a clue, not a conclusion.
Part 11
What You Just Did
You learned a mathematical law that most adults don't know about — and you built a tool that applies it. You can now analyze any set of numbers and check if they follow the natural pattern of real-world data.
This lesson combines everything you've learned: lists and dictionaries to store data, loops to process it, functions to organize the analysis, and statistical thinking (from Lesson 14) to interpret the results.
The Big Idea
The real world has hidden patterns. Numbers that look random actually follow mathematical laws. When those patterns are broken — when someone makes up data — the math can catch them. The ability to spot what's real and what's fake is one of the most valuable skills you can have.
Thinking Questions
- Why don't human-invented numbers follow Benford's Law? What does this tell us about how we think about randomness?
- If someone knew about Benford's Law, could they fake data that passes the test? Would it be easy or hard?
- Why is it important to know when Benford's Law does NOT apply? What could happen if you used it on the wrong kind of data?
Challenges
Level Up
Challenge 1 · Test Your Friends
Ask 5 friends to each write down 20 "random" numbers between 1 and 10,000. Run each friend's numbers through the Benford analyzer. Do human-generated numbers follow the law? Compare their results to the computer's random numbers.
Challenge 2 · Smart Faker
Write a program that generates fake data that actually follows Benford's Law. Hint: instead of picking numbers uniformly, weight your random choices so digit 1 appears 30% of the time, digit 2 appears 17.6%, etc. Then test it with the analyzer — can you fool your own detector?
Challenge 3 · Second Digit Analysis
Benford's Law also predicts the distribution of the second digit (it's more even than the first, but still not uniform — 0 is most common at 12%). Build an analyzer for second digits and test it on the same data sets. Does it catch the same fake data?
Summary
What You Learned
Thinking Skill
Data Forensics — using mathematical patterns to verify if data is genuine. Ask: "Does this data follow the patterns we'd expect from real-world numbers?"
Key Concept
Benford's Law: In naturally occurring data, the first digit is 1 about 30% of the time, 2 about 17.6%, declining to 9 at about 4.6%.
Data that deviates from this pattern may be fabricated — but always investigate further before concluding fraud.
Python Concepts
Extracting digits — int(str(number)[0])
Distribution analysis — counting occurrences, calculating percentages
Deviation scoring — measuring how far observed data is from expected
Data forensics pipeline — extract → count → compare → verdict
Combining all skills — lists, dicts, loops, functions, f-strings, conditionals