Turning Production Performance Data Into Wisdom

Data literacy is one of the more underrated parts of the software engineering skillset. When you’re dealing with a complex, dynamic, evolving system, being able to reason about data is at times more important than institutional knowledge, which tends to become outdated. Understanding a single library or subsystem really well often isn’t good enough. And when you transition to engineering leadership, grow your team, and focus more on the big picture, keeping up with every technology change isn’t feasible.

In this post, I’ll share a few patterns I look for and what they tell you about how a feature or subsystem is performing.

Let’s use FooService as an example. When you look at the source code for FooService for the first time, you’ll probably be very confused. There’s a config object being passed in with a mysterious flag. It’s difficult to reason about exactly what each code path is doing and how often it’s followed. How can we even begin to reason about performance characteristics in production? Let’s take a look at a few examples.

The Blip

You see a perf regression that looks like noise at first. The week over week graph shows that it’s actually a periodic regression. It doesn’t correlate with a periodic increase in app usage. What’s going on? This is often an indication that you have a warm/cold dynamic somewhere in your codebase. When some part of your application is updated, the initial session for every client experiences degraded performance followed by completely normal performance. When you see a blip, try to find a pattern rather than treating it as random noise.

Examples

You ship a new version of your web app, you probably references resources that don’t exist in the cache, which slows down every page load and causes a blip on every release.
You release a new version of your mobile app, your users will experience a cold start – the app loads into memory, a new process is created, and app initialization code runs. If you’re paying attention to start up time you might notice a blip

Seeing a blip is not necessarily a bad thing but it’s important to treat those as random noise since they represent bottlenecks and optimization opportunities

The Multi-modal Distribution

The histogram reveals multiple modes. In other words, there isn’t a single most common value but multiple values that are far apart. There’s no discernible pattern when you slice by demographic. What’s going on? This is often a sign of one or more really expensive code path being executed part of the time. It doesn’t necessarily mean that something is wrong but, unless the user experience is significantly different for each mode, there’s probably an optimization opportunity here.

Examples

An AB test where behavior differs for the experiment and control groups
Classes and modules with different performance characteristics being used conditionally
One code path is used for new users, which tend to be less computationally intensive, and another is used for experienced users

The Puzzling Outlier

The CDF levels off sharply at p99 indicating a big increase in page load times for that percentile. It increases even more sharply when you zoom in at p99.9. Is your instrumentation broken somehow? Is this the result of a runaway query or zombie process somewhere in this system? What’s going on? This is often an indication that you have big fish or celebrities in your system with massively degraded performance.

Apps are often designed for “normal” people or uses cases. If you’re writing a feature or service that assumes normal usage you’re going to have a bad time. Or at the very least your p99.9 use case is going to have a bad time. So what’s the big deal? That’s not that many users right? Well p99.9 problems often affects your most important customers since they have the resources or influence to stretch your infra to the max in the first place. In other words, sometimes these outliers are actually your most important customers and deserve more attention.

Examples

Your CEO, the ultimate power user, is testing all the things
Rihanna or Bieber have started using your app as part of their marketing strategy
Legitimate use cases that you didn’t anticipate

The Blip

The Multi-modal Distribution

The Puzzling Outlier

Share this:

Related

Leave a comment Cancel reply