← All posts

Our Flutter app started in 7 seconds. The fix wasn't to make it faster.

2026-05-20

How we went from 7.4s cold start to 294ms — by removing scaffolding instead of optimizing code. A story about the wrong plan, five bumps along the way, and what we learned to look for next time.

Until recently, opening our Flutter app meant waiting 7.4 seconds before you could see your training screen. That's a long time. Long enough to lose users. Long enough that we knew the problem was bad, long enough that the ticket for it had been bumped to the next sprint enough times that nobody remembered when it first appeared.

We finally fixed it. Cold start is now 294 milliseconds. Twenty-five times faster.

The fix wasn't a performance optimization. It was learning that most of the work the app was doing on startup wasn't startup work — it was scaffolding we had kept building because we'd forgotten we could stop.

What this post is

A story about the wrong plan we started with, the five bumps we hit along the way, the sentence that ended the search, and what we'd want you to take from it — even if you don't write code.

The technical detail is the texture, not the point.

What we thought we were going to do

The branch was called perf/parallel-main-init. The plan was in the name: take the long sequential chain of work the app did when it opened — initialize services, hydrate caches, talk to the server, set up payments — and run as much of it as possible in parallel. Modern phones have eight CPU cores. We were using one. Surely we could do more than one thing at a time.

This was a defensible plan. It would have worked, in the sense that the code would have been faster. It also would have been completely wrong.

I'll come back to why.

The first bump: the tests passed before, and failed after — but nothing the user could see had changed

The startup pipeline had tests. Hundreds of them. They asserted things like step 3 is wrapped in this kind of helper, and the factory returns the right configuration for guest users, and step #7 fires before step #2.

These tests passed. They had always passed.

When I started moving pieces around to parallelize them, the tests began to fail. None of the failures were about what the user observed — they were about how the code was laid out. The user's experience hadn't changed; the structure of the code had.

This is the first thing I had to admit out loud: the tests we had weren't testing what we thought they were testing. They were a museum of the architecture, not a safety net for the behavior. Every time we tried to refactor anything, they screamed. So we stopped refactoring. So the architecture ossified. So things kept getting slower.

The fix wasn't to make the tests pass. The fix was to rewrite them to assert what the user sees, not how the code is arranged. "The user lands on the training screen within 500 milliseconds of opening the app, on a cached state" is a test that survives any refactor that keeps the contract. "Step #3 is this specific kind of helper" is a test that breaks every time you try to improve anything.

I'm still slightly embarrassed I didn't notice this earlier. The pattern is general — it applies to a lot more than code. Process documents, runbooks, even contracts: the more they describe how something is done, the more they fight changes to how it's done. Tests that police arrangement are exactly the same trap.

The second bump: Firebase was hiding in three places I didn't know it was

There's a line you write in basically every Flutter app: await Firebase.initializeApp(). It takes about 200 milliseconds. It blocks startup. So in my refactored version, I moved it off the critical path — fire it once, don't wait for it, let the app render and let Firebase be ready by the time anything actually needs it. Done. The app should be faster now.

The app was not faster.

I spent a day trying to understand why. The function I'd moved was no longer being awaited. The trace showed it firing. So why was startup still blocking on Firebase?

It turned out Firebase was hiding in three different places I didn't know it was. Three separate services, buried deep in the dependency-injection wiring, were silently asking for Firebase the moment they were constructed. And they were being constructed during startup. So even though I'd "moved" Firebase off the critical path, three of its tendrils were still on it, waiting for it to be ready, blocking everything behind them.

I had moved one Firebase. There were four.

This is the lesson that took me a day to learn and that's been worth a hundred. Things you think are isolated rarely are. In any system of meaningful size, the thing you're trying to change has been depended on, accidentally and silently, by other things that have been around so long you'd forgotten about them. This is true of code, but it's also true of decisions, agreements, deprecated features, and people. The fix is almost never to make the moved thing faster. The fix is to find every place that depended on the old behavior, and only ask for the new behavior when something actually needs it.

The third bump: a method called init was secretly calling a server

There's a class called ExerciseLocalDb. It has a method called init(). The name suggests, very clearly, that the method initializes a local database. Nothing about that name suggests network.

It does, however, talk to the server. Or rather — it does both. It opens a local database, and then it quietly tries to synchronize with the API.

For a returning user on good WiFi, this was effectively free — a 50-millisecond round-trip, then the cache got refreshed. For a brand-new user, this was an unauthenticated request to a server that returned an error and then retried until it timed out. Twenty seconds of retry. The splash screen sat there, looping its animation, while a method called init quietly tried to reach a server it didn't have permission to reach.

I won't tell you how long it took us to find this. I'll tell you what we did the moment we did: we changed the contract. init() is local-only. synchronizeAll() is the explicit network call. The method's name has to match the method's behavior. Always.

Code that lies about itself costs more than slow code. Slow code shows up in a profiler — you can see it. Lying code shows up as inexplicable failures in conditions you didn't expect to test. It's not just code, either. Anything in an organization that says it does one thing while quietly doing another — meetings that are really demos, "syncs" that are really decisions, "policies" that are really preferences — has the same shape, and costs in the same ways.

The fourth bump: the two-second apology

This is the bump I'll remember longest. Buried at the bottom of main.dart, after every initialization had completed, there was this:

The smoking gun

main.dart
Future<void>.delayed(const Duration(milliseconds: 2000), () { 
  FlutterNativeSplash.remove();
});

A hardcoded two-second delay before removing the splash screen.

I don't know who wrote it. I know it predated me by a long time. I can guess the story: at some point, startup didn't always finish before the app tried to render its first frame, and the user would briefly see something broken — a flash of white, an unstyled state, a missing image. So someone added two seconds of padding to make the bad case impossible. Forever.

Every user paid those two seconds, on every cold start, every time, for years.

This is the bump I think about most. The two-second delay wasn't an act of ignorance. It was an apology for a problem that probably no longer existed — a debt one engineer had taken on to keep a release from going out broken, then forgotten about because the symptom had stopped showing up. Someone, somewhere, had stopped trusting their own code, and that mistrust had been baked into the user's experience for the rest of time.

Every long-lived system has these. Not just codebases. Every team, every product, every company has its Future.delayed(2000ms) equivalents — defensive scars from a panic that was justified at the time, kept in place because they hadn't hurt enough to revisit. They were never decisions to keep. They were decisions nobody made to remove.

You don't find them by reading the code. You find them by asking "wait, why is this here?" of every line, every meeting, every policy that doesn't immediately explain itself.

The fifth bump: the experiment that won, and was never cleaned up

Six months before I started this work, we had run an A/B test on the app's guest-mode flow. The test had a winner. The losing variant was supposed to be removed.

The code didn't know that.

The entry point still ran a feature-flag check that — due to a hardcoded true that someone had added when they declared the test over — always returned the winning variant. But the losing variant was still there. Both variants still had their own factory. Both factories still had their own pipeline. The entire scaffolding was being maintained, tested, type-checked, and shipped to production — and exactly half of it was dead code that hadn't been load-bearing in six months.

This was the moment the refactor changed character. We weren't optimizing startup anymore. We were removing things that had been pretending to matter.

The reviewer's most common feedback on the pull request was a single word: borrar. Delete. Five separate times: delete this, delete this, this method isn't used anywhere, delete, this wrapper exists already and does the same thing. The final pull request deleted 4,466 lines and added 7,569. The lines added were a tight architecture. The lines deleted were the A/B test that nobody had remembered to clean up.

I think about this one a lot, too. Most teams have A/B tests they never finished retiring. The platform celebrates launching them. Nobody celebrates retiring them. The cost compounds, and the longer the experiment is "officially over but still in the code", the more decisions you make on top of it that you'd have to unravel to remove it.

The pattern isn't specific to A/B tests, either. It's any temporary thing — a feature flag, a workaround, a fallback for a vendor you no longer use — that quietly became permanent because removing it never made it to anyone's roadmap. The hardest cleanup work isn't writing the deletion. It's giving yourself permission to do it.

After a week of "okay but why is this slow too", I stopped trying to parallelize anything and asked a different question:

The question that changed everything

What does the app actually need before it can show the user anything?

The answer surprised me. The app needs to know one of three things, all the time: take the user to their training, take them to a login screen, or take them to onboarding. That's it. Everything else — exercise data, payment status, leaderboard rankings, analytics, notification setup, animation runtimes — is either not needed for the first frame, or already cached on the device from the last session.

The routing decision is a piece of data, not a process. It doesn't need the network. It only needs information the device already has on disk, from the last time the user opened the app. We could answer "where does this user go?" in about 50 milliseconds. Everything else could happen in the background, after the user was already looking at something useful.

This is the only sentence in this post that I really want you to remember:

The reframe

The goal stopped being "do startup faster". It became "do less startup."

That's the difference between a performance optimization and an architectural realization. The first one makes the wrong thing faster. The second one notices that most of the wrong thing didn't need to be there.

The deeper why: offline-first wasn't optional

"Do less startup" was the practical reframe. The deeper one was this: the app had no business requiring the network to show the user anything in the first place.

Think about where people actually open a fitness app. The gym basement where the WiFi goes to die. The park at the edge of cell coverage. The hotel room with the captive portal that refuses to authenticate. The flight where someone wanted to log a workout offline. The morning where home WiFi has decided to take a five-minute break and the user is standing in the kitchen holding a phone, wondering why this app of all apps is the one asking them to wait.

Half the time someone opens a fitness app, the network is either bad or absent. A fitness app that only works on good WiFi is broken half the time.

This isn't a UX nicety. It's the contract:

The contract

If the user has used the app before, the app should work — fully, immediately — without the network.

Network is for refreshing what's already there, never for showing it for the first time.

That contract changes what belongs on the critical path. Exercise data, training history, leaderboard rankings — none of those have any business depending on a network round-trip to display. If they did, the app would stop working the moment the network became unreliable. The cache isn't a performance optimization. It's the source of truth at startup. The network is the secondary, eventual-consistency layer that quietly catches the cache up later, when it's ready.

So the question of "what goes on the critical path" stopped being an engineering question and became a product question: what does the user have a right to see, even when everything else has failed? The answer, every time, was: "the thing they were already using yesterday." Yesterday's data, served from disk, in 50 milliseconds, while a background sync quietly tries to bring it up to date.

Speed was the byproduct. Reliability was the goal.

This is the move most performance work misses. It treats slowness as a constraint to optimize around, when often slowness is just the most visible part of an unreliable dependency. Remove the dependency from the critical path, and the failure mode goes with it. The speed comes for free, because the work isn't there anymore.

Why we chose this shape and not parallelization

The final design has four phases. Each one only blocks the user for the work that actually has to finish before the next thing they see.

We could have parallelized the old pipeline. It would have been faster. But it still would have done the same total amount of work, just in less time. The user would still have had to wait for everything before they saw anything.

This shape is different. The user sees the screen the moment the cache can resolve a destination. Everything else happens around them, after they're already inside the app. The app stops asking the user to wait for things they don't yet need.

That's the realization underneath the whole refactor: the work the user has to wait for is a design choice, not a constraint. Most performance work treats it as a constraint. The bigger leverage is treating it as a choice.

The numbers (at the end, where they belong)

Same device, same user, same build mode. Real-device benchmarks, not synthetic.

MetricBeforeAfter
Cold start to home (iPhone)7,415 ms294 ms
Cold start to home (Pixel 8)7,415 ms478 ms
main.dart size264 lines~100 lines
Net code change+7,569 / −4,466

The 25× number is real. We measured it on the same physical phone, with the same user, on the same build. It's not a benchmark game — it's what the user sees the moment they tap the icon.

But the 25× isn't the story. The story is everything that happened to get there.

What I'd want you to take from this — even if you don't write code

Five things, in order of usefulness.

The plan you start with is sometimes the trap

"Parallelize startup" was a defensible plan. It would have made the wrong thing faster. The bigger move was to ask whether all that work needed to happen at all. This is true of refactors, but it's also true of strategies, products, even careers. The first framing you arrive at is rarely the one that survives contact with what's actually going on.

Things that have been there forever are not load-bearing by default

They're frequently the opposite — they're there because nobody had a reason to remove them, which is not the same as having a reason to keep them. This is the most under-applied insight in any organization I've worked in. Most cleanups don't require new judgment. They require permission to act on old judgment that was never followed through.

The thing you're slightly embarrassed about is a clue, not a flaw

The two-second apology in main.dart wasn't a sign of bad engineering. It was a sign of a system that didn't have the cycles to come back and revisit it. Codebases, products, processes, relationships — they all accumulate small acts of self-defense that nobody remembers were temporary. The fastest way to find them is to look for the thing that makes you wince a little when you read it.

Anything that describes how something is arranged will fight changes to how it's arranged

Tests that assert layout. Documentation that names specific tools. Org charts that grow into job descriptions. The more they describe the current shape, the more they resist the next shape. Describe behavior and outcomes instead. They survive reorganization.

The work nobody will notice is the work worth doing

Nobody opens a fitness app and says thanks for the 25× startup. There's no launch announcement for "we removed scaffolding." But the team is faster. The codebase is smaller. The app stops asking the user to wait. That's a worthwhile thing to do quietly. Most worthwhile things are.