I really don’t care how any given A/B test turns out.

That’s right. Not one bit.

But wait, how do I double or triple conversion rates without caring how a test performs?

I actually care about the whole SYSTEM of testing. All the pieces need to fit together just right. If not, you’ll waste a ton of time A/B testing without getting anywhere. This is what happens to most teams.

But if you do it right. If you play by the right rules. And you get all the pieces to fit just right, it’s simply a matter of time before you triple conversions at any step of your funnel.

I set up my system so that the more I play, the more I win. I stack enough wins on top of each other that conversion rates triple. And any given test can fail along the way. I don’t care.

What does my A/B testing strategy look like? It’s pretty simple.

  • Cycle through as many tests as possible to find a couple of 10-40% wins.
  • Stack those wins on top of each other in order to double and triple conversion rates.
  • Avoid launching any false winners that drag conversions back down.

For all this to work, you’ll need to follow 7 very specific rules. Each of them is critical. Skip one and the whole system breaks down. Follow them and you’ll drive your funnel relentlessly up and to the right.

Rule 1: Above all else, the control stands

I look at A/B tests very differently from most people.

Usually, when someone runs a test, they’ll consider each of their variants as equals. The control and the variant are both viable and their goal is to see which one is better.

I can’t stand that approach.

We’re not here for a definitive answer. We’re here to cycle through tests to find a couple of big winners that we can stack on top of each other.

If there’s a 2% difference between the variant and the control, I really don’t care which one is the TRUE winner. Yes, yes, yes, I’d care about a 2% win if I had enough data to hit statistical significance on those tests (more on this in a minute). But unless you’re Facebook or Amazon, you probably don’t have that kind of volume. I’ve worked on multiple sites with more than 1 million visitors/month and it’s exceedingly rare to have enough data hitting a single asset in order to detect those kinds of changes.

In order for this to system to work, you have to approach the variant and control differently. Unless a variant PROVES itself as a clear winner, the control stands. In other words, the control is ALWAYS assumed to be the winner. The burden of proof is on the variant. No changes unless the variant wins.

This ensures that we’re only making positive changes to assets going forward.

Rule 2: Get 2000+ people through the test within 30 days

So you don’t have any traffic? Then don’t A/B test. It’s that simple. Do complete revamps on your assets and then eyeball it.

Remember, we need the A/B testing SYSTEM working together. And we’re playing the long-term. Which means we need a decent volume of data so we can cycle through a bunch of different test ideas. If it takes you 6 months to run a single test, you’ll never be able to run enough tests to find the few winners.

In general, I look for 2000 or more people hitting the asset that I’m testing within 30 days. So if you want to A/B test your homepage, it better get 2000 unique visitors every month. I even prefer 10K-20K people but I’ll get started with as little as 2000/month. Anything less than that and it’s just not worth it.

Rule 3: Always wait at least a week

Inside of a week, data is just too volatile. I’ve had tests with 240% improvements at 99% certainty within 24 hours of launching the test. This is NOT a winner. It always comes crashing down. Best-case scenario, it’s really just a 30-40% win. Worse case, it flip-flops and is actually a 20% decline.

It also lets you get a full weekly cycle worth of data. Visitors don’t always behave the same on weekends as they do during the week. So a solid week’s worth of data gives you a much more consistent sample set.

Here’s an interesting result that I had on one of my tests. Right out of the gate, it looked like I a had 10% lift. After a week of running the test, the test does a COMPLETE flip-flop on me and becomes a 10% loser (at 99% certainty too):

Flip Flip A/B Test

One of my sneaking suspicions is that most of the 250% lift case studies floating around the interwebs are just tests that had extreme results in the first few days. And if they had ran a bit longer, they would have come down to a modest gain. Some of them would even flip-flop into losers. But because people declare winners too soon, they run around on Twitter declaring victory.

Rule 4: Only launch variants at 99% statistical significance

Wait, 99%? What happened to 95%?

If you’ve done an A/B test, you’ve probably run across the recommendation that you should wait until you hit 95% significance. That way, you’ll only pick a false winner 1 out of every 20 tests. And none of us want to pick losers so we typically follow this advice.

You’ve run a bunch of A/B tests. You find a bunch of wins. You’re proud of those wins. You feel a giant, happy A/B testing bubble of pride.

Well, I’m going to pop your A/B testing bubble of pride.

Your results didn’t mean anything. You picked a lot more losers than just 1 in 20. Sorry.

Let’s back up a minute. Where does the 95% statistical significance rule come from?

Dig up any academic or scientific journal that that has quantitative research and you’ll find 95% statistical significance everywhere. It’s the golden standard.

When marketers started running tests, it was a smart move to use this same standard to see if our data actually told us anything. But we forgot a key piece along the way.

See, you can’t just run a measure of statistical confidence on your test after it’s running. You need to determine your sample size first. We do this by deciding the minimal improvement that we want to detect. Something like 5% or 10%. Then we can figure out the statistical power needed and from there, determine our sample size. Confused yet? Yeah, you kind of need to know some statistics to do this stuff. I need to look it up in a textbook each time it comes up.

So what happens if we skip all the fancy shmancy stats stuff and just run tests to 95% confidence without worrying about it? You come up with false positives WAY more frequently than just 1 out of 20 tests.

Here’s an example test I ran. In the first two days, we got a 58.7% increase in conversions at 97.7% confidence:

Chasing Statistical Significance with A/B Tests - 2 Day Results

That’s more than good enough for most marketers. Most people I know would have called it a winner, launched it, and moved on.

Now let’s fast-forward 1 week. That giant 58.7% win? Gone. We’re at a 17.4% with only 92% confidence:

Chasing Statistical Significance with A/B Tests - 1 Week Results

And the results after 4 weeks? Down to a 11.7% win at 95.7% certainty. We’ve gone from a major win to a marginal win in a couple of weeks. It might stabilize here. It might not.

Chasing Statistical Significance with A/B Tests - 4 Week Results

We have tests popping in and out of significance as they collect data. This is why determining your required sample size is so important. You want to make sure that a test doesn’t trick you early on.

But Lars! It still looks like a winner even if it’s a small winner! Shouldn’t we still launch it? There are two problems with launching early:

  1. There’s no guarantee that it would have turned out a winner in the long run. If we had kept running the test, it might have dropped even further. And every once in awhile, it’ll flip-flop on you to become a loser. Then we’ve lost hard-earned wins from previous winners.
  2. We would have vastly over-inflated the expected impact on the business. A 60% win moves mountains. They crush your metrics and eat board decks for breakfast. 11% wins, on the other hand, have a much gentler impact on your growth. They give your metrics a soothing spa package and nudge them a bit in the right direction. Calling that early win at 60% gets the whole team way too excited. Those same hopes and dreams get crushed in the coming weeks when growth is far more modest. Do that too many times and people stop trusting A/B test results. They’ll also take the wrong lessons from it and start focusing on elements that don’t have a real impact on the business.

So what do we do if 95% statistical significance is unreliable?

There’s an easier way to do all this.

While I was at Kissmetrics, I worked with our Growth Engineer, Will Kurt, at the time. He’s a wicked smart guy that runs his own statistics blog now.

We modeled out a bunch of A/B testing strategies over the long term. There’s a blog post that goes over all our data and I also did a webinar on it. How does a super disciplined academic research strategy compare to the fast and lose 95% online marketing strategy? What if we bump it to 99% statistical significance instead?

We discovered that you’d get very similar results over the long term if you just used a 99% statistical significance rule. It’s just as reliable as the academic research strategy without needed to do the heavy stats work for each test. And using 95% statistical significance without a required sample size isn’t as reliable as most people think it is.

The 99% rule is the cornerstone of my A/B testing strategy. I only make changes at 99% statistical significance. Any less than that and I don’t change the control. This reduces the odds of launching false winners to a more manageable level and allows us to stack wins on top of each other without accidentally negating our wins with a bad variant.

Rule 5: If a test drops below a 10% lift, kill it.

Great, we’re now waiting for 99% certainty on all our tests.

Doesn’t that dramatically increase the time it takes to run all our tests? Indeed it does.

Which is why this is my first kill rule.

Again, we care about the whole system here. We’re cycling to find the winners. So we can’t just let a 2-5% test run for 6 months.

What would you rather have?

  • A confirmed 5% winner that took 6 months to reach
  • A 20% winner after cycling through 6-12 tests in that same 6 month period

To hell with that 5% win, give me the 20%!

So the longer we let a test run, the higher that our opportunity costs start to stack up. If we wait too long, we’re forgoing serious wins that we could of found by launching other tests.

If a test drops below a 10% lift, it’s now too small to matter. Kill it. Shut it down and move on to your next test.

What if we have a 8% projected win at 96% certainty? It’s SO close! Or what if we have enough data to find 5% wins quickly?

Then we ask ourselves one very simple question: will this test hit certainty within 30 days? If you’re 2 weeks into the test and close to 99% certainty, let it run a bit longer. I do this myself.

What happens at day 30? That leads us to our next kill rule.

Rule 6: If no winner after 1 month, kill it.

Chasing A/B test wins can be addictive. JUST. ONE. MORE. DAY. OF. DATA.

We’re emotionally invested in our idea. We love the new page that we just launched. And IT’S SO CLOSE TO WINNING. Just let it run a bit longer? PLEEEEEASE?

I get it, each of these tests becomes a personal pet project. And it’s heartbreaking to give up on it.

If you have a test that’s trending towards a win, let it keep going for the moment. But we have to cut ourselves off at some point. The problem is that a many of these “small-win” tests are mirages. First they look like 15% wins. Then 10%. Then 5%. Then 2%. The more data you collect, the more that the variant converges with your control.

CUT YOURSELF OFF. We need a rule that keeps our emotions in check. You gotta do it. Kill that flop of a test and move on to your next idea.

That’s why I have a 30-day kill rule. If the variant doesn’t hit 99% certainty by day 30, we kill it. Even if it’s at 98%, we shut it down on the spot and move on.

Rule 7: Build your next test while waiting for your data

Cycling through tests as fast as we can is the name of the game. We need to keep our testing pipeline STACKED.

There should be absolutely NO downtime between tests. How long does it take you to build a new variant? Starting with the initial idea, how long until it goes live? 2 weeks? 3 weeks? Maybe even an entire month?

If you wait to start on the next test until the current test is finished, you’ve wasted enough data for 1-2 other tests. That’s 1-2 other chances that you could of found that 20% win to stack on top of your other wins.

Do not waste data. Keep those tests running at full speed.

As soon as one test comes down, the next test goes up. Every time.

Yes, you’ll need to get a team in place to dedicate to A/B tests. This is not a trivial amount of work. You’ll be launching A/B tests full time. And your team will need to be moving at full-speed without any barriers.

If it were easy, every one would be doing it.

Follow All 7 A/B Testing Rules to Consistently Drive Conversion Up and to the Right

Follow the system with disciple and it’s a matter of time before you double or triple your conversion rates. The longer that you play, the more likely you’ll win.

Here are all the rules in one spot:

  1. Above all else, the control stands
  2. Get 2000+ people through the test within 30 days
  3. Always wait at least a week
  4. Only launch variants at 99% certainty
  5. If a test drops below a 10% lift, kill it.
  6. If no winner after 1 month, kill it.
  7. Build your next test while waiting for your data


Do those live chat tools actually help your business? Will they get you more customers by allowing your visitors to chat directly with your team?

Like most tests, you can come up with theories that sound great for both sides.

Pro Live Chat Theory: Having a live chat tool helps people answer questions faster, see the value of your product, and will lead to more signups when people see how willing you are to help them.

Anti Live Chat Theory: It’s one more element on your site that will distract people from your primary CTAs so conversions will drop when you add it to your site.

These aren’t the only theories either, we could come up with dozens on both sides.

But which is it? Do signups go up or down when you put a live chat tool on the marketing site of your SaaS app?

It just so happens I ran this exact test while I was at Kissmetrics.

How We Set Up the Live Chat Tool Test

Before we ran the test, we already had Olark running on our pricing page. The Sales team requested it and we launched without running it through an A/B test. Anecdotally, it seemed helpful. An occasional high-quality lead would come through and it would help our SDR team disqualify poor leads faster.

Around September 2014, the Sales team started pushing to have Olark across our entire marketing site. Since I had taken ownership of signups, our marketing site, and our A/B tests, I pushed back. We weren’t just going to launch it, it needed to go through an A/B test first. I was pro-Olark at this point but wanted to make sure we weren’t cannibalizing our funnel by accident.

We got it slotted for an A/B test in Oct 2014 and decided to test it on 3 core pages of our marketing site: our Features, Customers, and Pricing pages.

Our control didn’t have Olark running at all. This means that we stripped it from our pricing page for the control. Only the variant would have Olark on any pages.

Here’s what our Olark popup looked like during business hours:

Kissmetrics Olark Popup Business Hours

And here it is after-hours:

Kissmetrics Olark Popup After Hours

Looking at the popups now, I wish and I done a once-over with the copy. It’s pretty bland and generic. That might have gotten us better results. At the time, I decided to test whatever Sales wanted since this test was coming from them.

Setting up the A/B test was pretty simple. We used an internal tool to split visitors into variants randomly (this is how we ran most of our A/B tests at Kissmetrics). Half our visitors randomly got Olark, the other half never saw it. Then we tagged each group with Kissmetrics properties and used our own Kissmetrics A/B Test Report to see how conversions changed in our funnel.

So how did the data play out anyway?

Not great.

Our Live Chat A/B Test Results

Here’s what Olark did to our signups:

Live Chat Tool Impact on Signup Conversions

A decrease of 8.59% at 81.38% statistical significance. I can’t say that we have a confirmed loser at this point. I prefer 99% statistical significance for those kinds of claims. But that data is not trending towards a winner.

How about activations? Did it improve signup quality and get more people to install Kissmetrics? That step of the funnel looked even worse:

Live Chat Tool Impact on Activations

A 22.14% decrease on activations at 97.32% statistical significance. Most marketers would declare this as a confirmed loser since we hit the 95% statistical significance threshold. Even if you push for 99% statistical significance, the results are not looking good at this point.

What about customers? Maybe it increased the total number of new customers somehow? I can’t share that data but the test was inconclusive that far down the funnel.

The Decision – Derailed by Internal Politics

So here’s what we know:

  • Olark might decrease signups by a small amount.
  • Olark is probably decreasing Kissmetrics installs.
  • The impact on customer counts is unknown.

Seems like a pretty straightforward decision right? We’re looking at possible hits on signups and activations, then a complete roll of the dice on customers. These aren’t the kind of odds I like to play with. Downside at the top of the funnel with a slim chance of success at the bottom. We should of taken it down right?

Unfortunately, that’s not what happened.

Olark is still live on the Kissmetrics site 9 months after we did the test. If you go to the pricing page, it’s still there:

Kissmetrics Life Chat Tool on Pricing Page

Why wouldn’t we kill a bad test? Why would we let a bad, risky variant live on?

Internal politics.

Here’s the thing: just because you have data doesn’t mean that decisions get made rationally.

I took these test results to one of our Sales directors at the time and said that I was going to take Olark off the site completely. That caused a bit of a firestorm. Alarms got passed up the Sales chain and I found myself in a meeting with the entire Sales leadership.

I wanted Olark gone. Sales was 100% against me.

Live chat is considered a best practice (or at least it was a best practice at one point). It’s a safe choice for any SaaS leadership team. I have no idea HOW it became a best practice considering the data I found but that’s not the point. There’s plenty of best practices that sound great but actually make things worse.

Here’s what the head of Sales told me: “Salesforce uses live chat so it should work for us too.”

But following tactics from industry leaders is the fastest path to mediocrity for a few reasons:

  • They might be testing it themselves to see if it works, you don’t know if it’s still mid-test or a win they’ve decided to keep.
  • They might not have tested it, they could be following best practices themselves and have no idea if it actually helps.
  • They may have gotten bad data but decided to keep it because of internal politics.
  • Even if it does work for them, there’s no guarantee that it’ll work for you. I’ve actually found most tactics to be very situational. There’s a few cases where a tactic helps immensely but most of the time it’s a waste of effort and has no impact.

It’s also difficult to understand how a live chat tool would decrease conversions. Maybe it’s a distraction, maybe not. But when you see good opportunities come in as an SDR rep that help you meet your qualified lead quotas, it’s not easy to separate that anecdotal experience from the data on the entire system.

But none of this mattered. Sales was completely adamant about keeping it.

The ambiguity on customer counts didn’t help either. As long as it was an unknown, arguments could still be made in favor of Olark.

Why didn’t I let the test run longer and get enough data on how it impacted new customer counts? With how close the data was, we would have needed to run the test for several months before getting anywhere close to an answer. Since I had several other tests in my pipeline, I faced serious opportunity costs if I let the test run. Running one test for 3 months means not running 3-4 other tests that have a chance at being major wins.

So I faced a choice. I could have removed Olark if I was stubborn enough. My team had access to the marketing site, Sales didn’t. But standing my ground would start an internal battle between Marketing and Sales. It’d get escalated to our CEO and I’d spend the next couple of weeks arguing in meetings instead of trying to find other wins for the company. Regardless of the final decision, the whole ordeal would fray relationships between the teams. I’d also burn a lot of social capital if I decided to push my decision through. With the decrease in trust, there would be all sorts of long-term costs that would prevent us executing effectively on future projects.

I pushed back and luckily got agreement for not launching it on the Features or Customers pages. But Sales wouldn’t budge on the Pricing page. I chose to let it drop and it lives to this day.

That’s how I launched a variant that decreased conversions.

Should You Use a Live Chat Tool on Your Site?

Could a live chat tool increase the conversions on your site? Possibly. Just because it didn’t work for me doesn’t mean it won’t work for you.

Are there other places that I would place a live chat tool? Maybe a support site or within a product? Certainly. There are plenty of cases where acquisition matters less than helping people as quickly as possible.

Would I use a live chat tool at an early stage startup to collect every possible bit of feedback I could? Regardless of what it did to signups? Most definitely. Any qualitative feedback at this stage is immensely valuable as you iterate to product/market fit. Sacrificing a few signups is well worth the cost of being able to chat will prospects.

If I was trying to increase conversions to signups, activations, and customers, would I launch a live chat tool on a SaaS marketing site without A/B testing it first? Absolutely not. Since this test didn’t go well, I wouldn’t launch a live chat tool without conclusive data proving that it helped conversions.

Olark and the rest of the live chat companies have great products. There’s definitely ways for them to add a ton of value. Getting lots of qualitative feedback at an early stage startup is probably the strongest use case that I see. But if your goal is to increase signups, activations, and customers, I’d be very careful with assuming that a live chat tool will help you.


You’re being lied to.

Well, not intentionally.

We’re constantly being pinged with stories of companies that have rocketed to success. Especially in tech, there’s always another $1 billion unicorn around the corner. Uber, Facebook, Airbnb, Slack, Zenefits, Box, Shopify, yadda yadda yadda.

At the same time, rock-solid companies seem to lose their way and crater.

We’re all desperate to know why.

Makes sense. We want to replicate the crazy success and avoid failure.

This is where we all get sucked in the nonsense narratives. They’ll give you false hope on how to produce success.

Here’s a good example: should a company expand into different products, industries, or markets? We’ll answer this question in a minute.

But first, who loves LEGO? I DO. My favorite childhood toy by far. You know how people will buy huge mansions and a dozen sports cars if they ever hit it big? I’ll just buy every LEGO set and fill an entire room with them. These days, they even have a Batwing with Joker steamroller set. How cool is THAT?


As it turns out, LEGO is a great case study for how delusional we can be about what produces successful companies.

Go check out LEGO’s 2014 annual report. In 2014, their net profit increased by about 15% to over 7 billion Danish krone (DKK). At current exchange rates, that’s about USD$1 billion. Back in 2011, they pulled DKK 4 billion in net profit. So they’ve had similar growth rates since 2011 and have nearly doubled their net profit. Not shabby at all.

Why is Lego doing so well? Management gives the credit to expansions beyond it’s core business, they crushed it with the LEGO Movie and the new line of LEGO sets that released with it. Right on page 5 of the annual report: “new products make up approximately 60% of the total sales each year.” They’ve also seen a lot of growth from the toy market in Asia.

So we’ve answered our question right? If we want to keep growing, we’ll want to expand beyond our core product and market base at a certain point, right?

Well, wait a minute. Our story isn’t that simple.

Go back to 2004 when LEGO nearly went bankrupt. Their COO, Poul Plougmann, got sacked and the business press lambasted the company for poor results. They caught a ton of flack for releasing a LEGO Harry Potter line (apparently, sales slowed when there was a gap between some of the Harry Potter movie releases), experimenting with new toy products, jumping into video games, launching a failed TV show, and trying to go beyond it’s core brand. The consensus was that they should get back to their core base and stop messing around by trying to innovate into new products.

Wait, which is it? In 2014, product expansion from the LEGO Movie helps push the company to new heights. In 2004, the LEGO Harry Potter line, TV shows, and the first attempt at video games nearly pushes it to bankruptcy. During each period, we push narratives and recommendations that contradict themselves. Go back to your core base! Wait, never mind! Expand into new products!

I can’t take credit for this insight or finding the LEGO story. It’s one of the case studies used in The Halo Effect by Phil Rosenzweig.

Rosenzweig shows how narratives are twisted to explain results after they occur. He wrote the original version of his book back in 2007 (there’s a new 2014 copy that you should grab if you haven’t yet). Then after the book is published, LEGO turns around and we start attributing their success to LEGO’s constrained innovation:

LEGO went back to its base. Innovation trashed the company in 2004 because it was highly unprofitable and expanded beyond its core strengths. Now LEGO has entered another golden era by constraining innovation.

But LEGO just had another huge year by expanding into its first movie. Hard to get further from its product base than that. A decade ago, the LEGO TV show got part of the credit when LEGO struggled. Now the LEGO Movie gets the credit when profits have turned around.

Again, which is it? Innovation? Constrained innovation? Innovation as long as you do these 7 simple steps? Maybe all of the above? Reducing a business to a simple narrative for a blog post or interview is incredibly difficult. And you’ll want to be careful of any source that attempts to do so.

To be fair, David Robertson and Bill Breen wrote a book that dives into the Lego story. I’m hoping they capture the nuance of what went into LEGO’s turn-around. I haven’t read the book myself but it’s on my to-read list.

We’re all exceptionally good at rationalizing any argument. If things go well, we’ll cherry pick some attributes and credit them for the company’s success. Then when things go sideways, we take the same attributes to explain the failure. It all sounds nice and tidy. Too bad it’s a poor reflection of reality.

Phil Rosenzweig calls this habit of ours the Halo Effect. When things go well, we attribute success to whatever attributes stand out at the company. When things go poorly, we attribute bad results to those exact same attributes. It’s one of the 9 delusions that he covers in his book. Let’s go through each of them.

The Halo Effect

The tendency to look at a company’s overall performance and make attributions about its culture, leadership, values, and more. In fact, many things we commonly claim drive company performance are simply attributions based on prior performance.

This is what happened to Lego. In 2004, they’re skewered by the press for trying to expand beyond it’s core business. Now it can’t get enough praise as it drives growth into new markets and product lines.

This happens to companies, teams, and you. When things go well, the quirks get credit for success. When things go poorly, those same quirks get the blame. Our stories search for what’s convenient, not what’s true.

Remember this when you’re in your next team meeting. Someone will float a story for how you got to this point. If it sounds good, the story will spread and your whole organization will start shifting in response to it. And a nonsense story means nonsense changes. There are two things you can do to limit these non-sense stories:

  • Chase causality as often as you can (more on this in a moment). The better your team understands how your systems really work, the closer your stories will be to the truth.
  • Realize that your stories are typically nonsense. It’s your goal to test the validity of that story as fast as you can.

The Delusion of Correlation and Causality

Two things may be correlated, but we may not know which one causes which. Does employee satisfaction lead to high performance? The evidence suggests it’s mainly the other way around — company success has a stronger impact on employee satisfaction.

We’ve all heard the adage “correlation, not causation.” But when you’re about to come up short on a monthly goal, how easy is it to remember correlation versus causation? It’s not. We all break and reach for the closest story we can. Even if we avoid throwing blame around, we still grasp for any story that will guide our way through the madness.

Proving causality is one of the most difficult bars to reach. Very few variables truly impact our goals in a meaningful way. How do we deal with this?

If you only rely on after-the fact data, you never move beyond correlation. Every insight and every bump in a metric is, at best, a correlation. The only way to establish any degree of casualty (and we’re never 100% sure) is to run a controlled experiment of some kind. You’ve got to split your market into two groups and see what happens when you isolate variables.

This is why I push so hard for A/B tests and get really strict with data quality. They allow us to break past the constraints of correlation and gain a glimpse of causation.

If you limit your learning to just correlation, you’ll get crushed by those chasing casualty. They’ll have a much deeper understanding of your environment than you do. You won’t be able to keep up.

And remember, the business myths, stories, best practices, and press rarely look at correlation versus causation. It’s all just correlation.

The Delusion of Single Explanations

Many studies show that a particular factor — strong company culture or customer focus or great leadership — leads to improved performance. But since many of these factors are highly correlated, the effect of each one is usually less than suggested.

Data is messy, markets are messy, customers are messy. The complexities of these systems vastly exceed our ability to understand or adequately measure them. Variables interact and compound in limitless ways.

Whenever someone gives you a nice, tidy explanation for why a business succeeded or failed, assume it’s nonsense.

You can’t depend on a single variable to drive your business forward. World-class teams have mastered countless business functions, everything from employee benefits to market research. The hottest New York Times bestseller may give you a 5 step process on how to conquer the world with nothing other than whatever flavor-of-the-month strategy everyone loves at the moment. But that’s a single variable among many.

Remember that your business moves within an endlessly complex system. Not only are you trying to change this system, you’ll be pushed around by it.

The Delusion of Connecting the Winning Dots

If we pick a number of successful companies and search for what they have in common, we’ll never isolate the reasons for their success, because we have no way of comparing them with less successful companies.

Good ol’ survivorship bias. We can’t just look at winners. We need to find a batch of losers and look for the differences between the two groups. Otherwise, we’re just pulling out commonalities that don’t mean anything.

The tech “unicorn” fad has succumbed to this delusion. Everyone’s looking for patterns among the recent $1 billion tech startups, trying to find the patterns so they can build their own unicorn. But they’re doing many things in exactly the same way as all the startups that blow up or stall out. We just don’t hear about those failures. And if we do, those stories aren’t deconstructed in the same level of detail as the unicorns. So we get a picture of what amazing companies look like but a very limited view on how they differ from their failed counterparts.

Study the failures just as deeply as the successes.

The Delusion of Rigorous Research

If the data aren’t of good quality, it doesn’t matter how much we have gathered or how sophisticated our research methods appear to be.

Rosenzweig takes a shot at Jim Collins with this one. Jim Collins has written several well-renowned books like Good to Great, Built to Last, and Great by Choice. Collins and his team do a ton of historical research to figure out which attributes separate great companies from average companies. As Rosenzweig points out, most of this research is based on flawed business journalism that suffers from the Halo Effect. So the raw data for Collins’ research is horribly flawed which then means his books aren’t as solid as many people think.

Regardless of how you feel about Collins’ books, this is still a critical delusion to remember. It doesn’t really matter how sophisticated you are with modeling, data science, research, or analytics if your data sucks. Fix your data first before trying anything fancy.

This is where I start with every business I work with. Before jumping into growth experiments, A/B testing, or building out channels, I always make sure I can trust my data. Data’s never 100% perfect but there needs to be a low margin of error. The quality of your insights depends on the quality of your data.

The Delusion of Lasting Success

Almost all high-performing companies regress over time. The promise of a blueprint for lasting success is attractive but not realistic.

You will regress to the mean. Crazy success is an outlier by default. Sooner or later, results come back down to typical averages.

Mutual funds prove this point perfectly. In any 2 year period, you can find mutual funds that crush the S&P 500. Wait another 5-10 years and those same mutual funds have fallen back to earth. Your company is in the same boat. If things go crazy well, it’s a matter of time before you come back down. Take advantage of your outlier while it lasts.

This is particularly dangerous with individual or team performance. Is it really talent or are you just an outlier? Sooner or later, you’ll have some campaign or project that takes off. Well… if you launch enough stuff, you’re bound to get lucky. The real question is how long can you sustain it? Can you repeat that success? And since we all regress to the mean eventually, how can you use you current success to get through the eventual decline?

All channels decline, all products decline, all markets decline, all businesses decline. You will decline. What are you doing now to plan for it?

The Delusion of Absolute Performance

Company performance is relative, not absolute. A company can improve and fall further behind its rivals at the same time.

You’re graded on a curve whether you like it or not. Even if you’re improving, customers won’t care if your competitor is improving faster than you are. You’ll need to stay ahead of the pack no matter how fast the pack is already moving.

Otherwise, it’s a matter of time before you’ve lost the market. Your success isn’t determined in isolation. Just because you did a great job doesn’t mean you’ll achieve greatness.

This stems from a basic psychological principle: as humans, we do a terrible job at perceiving absolute value. This applies to pricing, customer service, product value, and every trait around us. In order to gauge how good or bad something is, we always look for something to compare it to. It really doesn’t matter if you cut prices by 50% if your competitor found a way to cut them by 60%. You’re still considered too expensive.

Your work will always be judged in relation to the work of your peers.

The Delusion of the Wrong End of the Stick

It may be true that successful companies often pursued a highly focused strategy, but that doesn’t mean highly focused strategies often lead to success.

Another shot at Good to Great with this one.

One of the core concepts in Good to Great is hedgehog versus fox companies. Hedgehog companies focus relentlessly on one thing. Foxes dart from idea to idea. According to Collins, amazing companies are all hedgehogs with ruthless focus.

But we don’t have the full picture of the risk/reward trade-off. It’s a lot like gambling or investing. You COULD throw your entire life savings into a single stock (hedgehog) and if that stock takes off… you’ll make a fortune. But if it doesn’t? You’ve lost everything. Investors that diversify (foxes) won’t reap extreme gains but they also won’t expose themselves to extreme loses.

Companies might work very similarly. Yes, hugely successful companies could tend to be hedgehogs. They made big bets and won. But that might not be the best strategy for your company if it means taking on substantial amounts of risk. Most importantly, we can’t say for sure what the risk/reward trade-offs look like without a larger data set of companies. Even if great companies out-perform average companies when they’re hedgehogs, there could be just as many hedgehog companies that weren’t so lucky.

The Delusion of Organizational Physics

Company performance doesn’t obey immutable laws of nature and can’t be predicted with the accuracy of science — despite our desire for certainty and order.

Physics is beautiful and elegant. Business is not.

No matter what you do, you cannot remove uncertainty in business like you can with physics. Books, consultants, blog posts, and pithy tweets will all try to convince you that a simple step-by-step process will take your business to glory. As much as we’d all like to have simple rules to follow, that’s not how this game is played. Business cannot be reduced to fundamental laws or rules.

And sometimes, the outcome is completely outside your control. Even if you do everything right, follow all the right strategies, use the best frameworks, hire the best people, and build something amazing, the whole business can still go sideways on you. We can’t remove uncertainty from the system. All we can do is stack the odds in our favor. Fundamentally, business and careers are endless games of probability.

Recap Time! The 9 Delusions From the Halo Effect

Here are all 9 delusions in a nice list for you:

  • The Halo Effect
  • The Delusion of Correlation and Causality
  • The Delusion of Single Explanations
  • The Delusion of Connecting the Winning Dots
  • The Delusion of Rigorous Research
  • The Delusion of Lasting Success
  • The Delusion of Absolute Performance
  • The Delusion of the Wrong End of the Stick
  • The Delusion of Organizational Physics

Don’t get sucked into the delusional narratives of success. Embrace the uncertainty.