How Machine Learning Made me a Better Philosopher

Ever since I was a kid, I had this irritating urge to understand why things were done the way they were.

I remember my grandmother driving me crazy with traditions she couldn’t explain. One of them was that I was never allowed to keep both hands behind my back when lying down. She had zero reasons for this. She would always say: “That’s how it’s done, do it and don’t complain.”

My ten-year-old brain couldn’t let it go. Of course, the rule felt arbitrary, but what bothered me wasn’t the rule itself; it was the absence of an explanation.

That frustration never really left me. I think it eventually pushed me toward philosophy.

Years later, I got a PhD and the official label of somebody who teaches critical thinking. And yet, oddly enough, that original feeling remained: the sense that something important about good thinking was still missing.

I devoured books, arguments, and different philosophies. But I didn’t feel sharper in the way I expected.

Then one day, I stumbled into machine learning, and my thinking changed forever.

Not because it gave me some hidden answers I never knew I needed, or because it revealed ultimate secrets about life in the universe.

It was much simpler, yet somehow more profound.

What it gave me was razor-sharp clarity about what constitutes good critical thinking.

Now, mind you, I am a professor of philosophy. I’ve been teaching critical thinking for years, but very few things in my academic training as a philosopher taught me much about how to actually develop clarity of thought.

Machine learning, as I discovered, is really good if you want to learn more about where human thinking usually fails and what we can do to prevent those failures. This surprising realization came slowly but affected me more than most philosophy books I read: machine learning is like critical thinking on steroids: a practice with an obsessive focus on reliability, usefulness, and minimization of error.

If you think machine learning is only about computers and artificial intelligence, you’re missing its most interesting feature. At its core, it’s about understanding the limitations of thinking and then designing methods that work despite those limitations. It forces you to be explicit about assumptions, careful about evidence, and honest about failure.

This article is a short attempt to explain what I learned from machine learning about critical thinking by drawing analogies between learning algorithms and ordinary human reasoning.

My main claim is simple: most concepts in machine learning are basically critical thinking concepts, and learning more about what they are and how they work can help you become a better thinker. You don’t have to be an engineer or a philosopher to benefit: basic curiosity and willingness to stretch your mind should be enough.

Data and Experience

The most basic idea in machine learning is deceptively simple: you take existing information about the world and try to extract insight from it.

That information might be housing prices in a city, medical scans from a hospital, or patterns of customer behavior. Machine learning looks for regularities in these data, patterns that allow us to predict, classify, or decide better than we could by inspection alone. It might tell us which houses sell for the most money, or which lung-scan patterns are associated with higher cancer risk.

If you’ve never seen how this works in practice, I encourage you to visit kaggle.com and browse a bit. It’s a community of people who play with different datasets and try to distill patterns from them to answer different questions. They host competitions, where the most successful pattern-recognizers can earn money. Some of the methods they use there are built into the foundation of the AI technology we now use almost every day.

What drew me in, as a philosopher, to this was not the technology but the discipline. Machine learning forces you to take seriously a constraint that philosophy often discusses abstractly but often ignores: you cannot think better than the information you use. The most famous dictum in the machine learning world is: garbage in, garbage out. The quality of the initial data is much more important than how fancy your thinking methods are.

This reminded me of a very old philosophical debate about the origin of knowledge. Some philosophers, like René Descartes, argued that knowledge rests on innate structures of the mind. Others, most famously John Locke, claimed that the mind begins as a blank slate, shaped by experience.

If we generalize this debate, we could recognize the contours of a fundamental dynamic in machine learning: the pull between the data and the models. That is, between the information and the ways of processing it. Some practitioners believe if the models are sophisticated enough, no bad data will prevent us from discovering useful insight. Others stress that no matter how fancy your methods are, if the data are crap, you’ll get nothing.

Now, machine learning doesn’t really care about this debate (I doubt most machine learning engineers ever read Descartes or Locke — correct me if I’m wrong), but it makes one thing impossible to deny: the data matter in a precise, operational way. Any system trained on distorted, incomplete, or biased data will behave accordingly, regardless of how elegant its internal structure is.

In machine learning, this lesson is not optional. You can design sophisticated algorithms and tune them carefully, but if the data are bad, the results will be bad. No amount of intelligence compensates for poor input.

So, in a weird way, machine learning reconciles this debate, without even trying to. Data without structure are useless. Structure without data is empty. Both are necessary, and neither is sufficient.

The harder question, of course, is how to choose the right structures. How to know which model is the best?

Models as Explanations

This is where machine learning becomes, for me, indistinguishable from philosophy.

A model is simply a structured way of mapping inputs to outputs. More concretely, it is a framework for explaining what we observe and anticipating what might happen next. In human terms, models show up as moral theories, political ideologies, economic explanations, or everyday intuitions about how people behave.

Your model determines how you interpret what you see.

For example, you witness a petty crime in your neighborhood grocery store. One explanation frames it as an individual moral failure. Another treats it as the result of policy choices or social conditions. The data are the same. The interpretation depends on the model. If you follow the news, you’ll recognize this instantly: the same set of facts will often elicit completely opposing explanations.

We often forget that data alone never tell us what to think. We always bring a structure to them.

The trouble is that most people rely on fixed models. Once adopted, these models become lenses that never get adjusted, even when they face data that do not match. Ideologies are especially prone to this failure, because they promise explanatory completeness. Everything begins to look like a variation of the same underlying story.

Philosophers are not immune to this either. That’s because we always start with the model, and we almost never think about data. Our entire academic training is just one model after another: we end up choosing and identifying with the one that we intuitively think it’s the best.

But that’s wrong.

I found out that machine learning teaches a more demanding, but more accurate lesson: a model succeeds only if its structure matches the structure of the data.

Every model (be it an explanation of community crime or a fancy machine learning algorithm) comes with built-in commitments about complexity. That is, it rests on a certain underlying structure that determines how granular the model (or the explanation) will be: how many parts it will have, how detailed will it be, what basic things will be assumed to be true, etc.

In machine learning, these commitments are called “hyperparameters”, settings that determine how flexible or rigid a model is allowed to be. In our everyday reasoning, they show up as assumptions about how many causes we are willing to entertain, how much uncertainty we tolerate, and how fine-grained our explanations should be.

Models can be simple, or complex.

For example, when explaining crime, a low-complexity model would focus only on a single cause, such as poverty, race, or culture. A high-complexity model would look at several causal factors, not just one.

I’m sure you know people who stick to a single explanatory model and apply it to everything. These are usually ideological folks, like Marxists, who view everything through the lens of the class struggle, or racists, who think everything boils down to race.

But if you want to be a good critical thinker, you shouldn’t have a single model to explain everything in the world. As you already know, some situations are more complex, and some are less complex.

To truly understand what’s happening, our models must be appropriate for the given situation, instead of rigid frameworks we hold on to for emotional or ideological reasons.

So, how to make such models?

Underfitting, Overfitting, and the Goldilocks Problem

Machine learning experts are really good at tuning the hyperparameters of their model. That is, they spend a lot of time adjusting the complexity of their models to the uniqueness of the data and the problem they’re trying to solve.

They do that because they are careful not to make two of the most serious mistakes of modeling: underfitting and overfitting. Pay attention to the word “fit” here. It is used purposefully because the success of explaining the world, and the success in creating good machine learning algorithms, depends on whether our models fit well with the data.

For example, when we apply a simple model, with just one knob, to a highly complex situation, we commit the mistake of underfitting. Our model is not complex enough to capture all the important details in the world. It’s too simple and not helpful. It’s like explaining rain by saying it’s caused by humidity in the air, or explaining poverty by saying it’s caused by Elon Musk supposedly not paying enough taxes. It’s very simplistic.

Sometimes, however, we overfit by creating a model that is much more complex than it needs to be to explain some situation. It’s like explaining what happens online by connecting every viral post, algorithm tweak, political event, and shadowy actor into one grand conspiracy that perfectly explains the past, but can’t tell you what will happen tomorrow. All conspiracy theories are, basically, overfit explanations.

Good explanation lives between these extremes. This is the Goldilocks problem of modeling: not too simple, not too complex.

In philosophy, it is notoriously difficult to know when you’ve crossed either boundary. Elegant theories can survive for decades without ever being seriously tested against reality. Just take any paper from a philosophy journal or read a recently published book and you’ll see. That always bothered me about philosophy.

But, machine learning found a genius way to bypass this problem.

Testing, Generalization, and Reality

Unlike philosophy and other theoretical disciplines, machine learning is a practical and iterative process. It’s never a one-shot affair.

Models are never created without data, and they are almost never made in a single attempt. When we create models or explanations, we usually do it on the basis of some information, be it experience or formal data.

But merely creating a model based on some data is not the end of our job. No model should be accepted as final until it is actually tested on reality.

Let’s take a simple hypothetical example. Say you notice an increase in petty crime in the summer months and a decrease in the winter. You create an explanation that the rise in temperature causes petty crime. This explanation is based on the data you already observed. It will be good only if it helps you predict what will happen in the future. Come next summer, and if the petty crime doesn’t tick up, your explanation is worthless. You have to go back to the drawing board.

Seems simple and intuitive, right? But, you’d be surprised to learn that most people don’t test and adjust their theories like this.

To test whether their models pick up the underlying truth or overfit by explaining every crease in the fabric of reality, machine learning experts divide data into two parts: one used for creating the model itself, called the training set, and the other for testing its predictive power, called the test set.

During the model design process, the test set is kept hidden. Only after the tuning phase is the model applied to it. A model is good only if it performs well on data it hasn’t seen before.

This is probably my favorite feature of machine learning because it does something we rarely see in many intellectual disciplines: exposing our theories about how the world works to challenges from the real world.

This distinction between training and testing embodies a profound epistemic lesson: an explanation that cannot handle novelty is not a good explanation. No matter how fancy, elegant, or emotionally appealing it is, if it can’t explain information outside the initial training set (or experience), it is worthless.

Intellectual Hygiene

I’ve only scratched the surface here (I had to, otherwise this would have been a book).

Machine learning contains many more techniques (like baseline comparison, regularization, cross-validation, stopping rules, etc) that all reinforce the same underlying message: thinking is a fragile process, and it requires discipline to keep it reliable.

For me, as a philosopher, machine learning became a lesson in intellectual hygiene. It taught me to care about the quality of my inputs, to treat my explanatory frameworks as adjustable rather than sacred, and to test my ideas against the world instead of protecting them from it.

Machine learning may be the backbone of modern AI, but its deeper value lies elsewhere. It exposes the structure of good critical thinking in a way that could benefit anybody willing to spare some time to learn it.

It can teach us how to think more clearly, especially in the modern world, in which noise is everywhere, and clarity of though is obscured at every corner.

I don’t know about you, but I will continue learning about it. The advance of AI, a technology built on it’s principles, is a reason more to get into it sooner than later. I’ll continue sharing lessons I learned, and I’d love if you did the same.

See you around.

Context

This article is part of the School of Critical Thinking's unique curriculum.

Some ideas are developed further in books. Others through guided instruction.

If this way of thinking feels unfamiliar, there is a reason for that.

Start here →