The Relentless Correlation Science of the 21st Century

Most of the work that flies under the banner of “data science” is correlation, or “pattern matching” if you prefer the latin. Who will churn, who will click, which claim looks risky, how many units we will sell. That’s useful, but it runs out of steam quickly. The decisions we argue about - price changes, rollouts, treatments, targeting - live in a different neighbourhood, where the question is counterfactual.

But if that’s true, then why did most of the corporate sector build a decision making culture on top of tools that are almost exclusively correlation‑driven? Well, in this post I want to make the case that to answer this question you don’t need a technical explanation - you need a cultural one.

The 2010s normalised speed as a virtue and treated rigour as a tax. Mark Zuckerberg’s famous “move fast and break things”; it became a management default across tech and then across companies that copied tech’s playbook. Amazon had their own version with “Bias for Action” among their 16 leadership principles (it is rumoured that when you master all 16, the Amazon mothership summons you to the homeland, where no-one defecates themselves on the factory floor). Jokes aside, in training materials Amazon “Bar Raisers” (a hiring role within the company) tell the story of two employees: one makes ten quick calls (five right, five wrong), the other makes one careful, correct call. The fast mover is celebrated for delivering five wins; the careful one is slow, and a loser (probably). The problem with that story isn’t arithmetic; it’s that it ignores that mistakes have different levels of severity. A single bad call on lending, safety, or health can wipe out the gains from a dozen quick wins. But in a culture that worships motion, speed reads as intelligence and caution reads as ineptitude.

“Move Fast and Break Things” shaped what “data‑driven” came to mean in the broader business landscape. Predictive use cases were easy to ship and easy to show on a slide. Leaders saw fast feedback and concluded that the same workflow - point a model at history, get a score - should guide interventional choices too. Vendors encouraged it. Hiring pipelines rewarded leaderboard performance on static datasets. Inside teams, responsibilities split: one group built models, another ran a handful of A/B tests, a third owned the product roadmap. With that split, nobody had clear accountability for whether a model‑informed policy caused a worthwhile change. The question was never really articulated in those terms outside of a few companies with a robust experimation culture.

Causal inference as a discipline didn’t vanish in this period; it just didn’t fit the tempo. To answer counterfactual questions you have to think about the data‑generating process and you have to keep holdouts - even when that feels like “leaving money on the table.” You need to accept that sometimes you roll out in phases so you can compare treated and untreated units fairly. These steps the status markers of the era: fast launches, frequent PRs, green dashboards. The result was predictable: correlation stood in as a proxy.

But who is to blame? Some like to say it’s the leaders. Others will point the finger at the data scientists and analysts instead. After all, plenty of us knew better and still shipped recommendations without insisting on designs that could substantiate them, or we shouted success metrics from the mountaintops. Underneath it all, a lot of our training treated causality like an elective rather than a core skill, so teams got very good at optimising predictive scores and not nearly as good at building credible structural models. Most of the time this wasn’t malice; it was incentives, deadlines, career risk, and the relief of having numbers that moved in the right direction.

Is it likely to change? Only where the culture changes with it. The places that make progress don’t treat causality and/or experiments as a side project or a hurdle to clear on the way to a launch. They build it into the product so comparisons are part of how features roll out. They keep standing holdouts after launch, not because they enjoy withholding value, but because they want a live baseline when the world drifts. They log eligibility and exposure as first‑class data so they can tell “no effect” from “no chance to be affected.” They let prediction do what it’s good at—targeting and prioritizing—without pretending that a good score proves a policy works. And importantly, they change incentives: promotions and budgets follow measured effects.

My experience has taught me the following strategy: if you’re working inside a team that still prizes speed over learning, start by triaging decisions by cost of error. Many choices really are cheap; move quickly on those and learn as you go. The expensive ones - pricing, eligibility, safety, anything with real downsides - deserve designs that can separate correlation from causation. Write the decision in concrete terms, make a sketch of what affects what, and decide how you’ll compare treated and untreated units in a way a skeptical colleague would accept. Keep some holdout, even a small one, and measure exposure. When you can’t randomize, say what assumptions you’re making and test the ones you can.

That’s the story I see when I look back at the last decade: a culture that rewarded motion built a decision stack on correlation because it was fast, visible, and easy to defend with the right people in the room. The fix isn’t a new buzzword. It’s adjusting the culture so that learning is part of shipping, and making room for the slower work where the stakes justify it.

Previous
Previous

The Causal Library

Next
Next

You don’t make friends with Causal Inference