Metawork for Engineering Leaders Part 3 – Wrapping Up Projects

See part 1, which discusses setting the stage for effective collaboration and part 2, which discusses managing healthy projects.

Seasons leaders understand that wrapping up a project involved much more than choosing what to work on next. Part of the impact for the team and the broader org is the learnings you gained. Every initiative, whether it ends in fanfare or failure, produces knowledge other people can benefit from. Socializing that knowledge among the groups that can benefit the most is a vital step for any non-trivial project.

Impact

What was the metrics impact?
How did we do relative to the goals we set?

Learning

Do you understand what went well or didn’t go well?
Are we using what we learned to inform decision making for future projects?

Communication

Do stakeholders understand that the project has concluded?
Do stakeholders and the team understand the project’s impact and what we learned?
Are you recognizing the team appropriately for what they contributed to the project?

Metawork for Engineering Leaders Part 2 – Managing Healthy Projects

See part 1, which discusses setting the stage for effective collaboration

After the team agrees on a goal and formulates an initial plan of attack, a team leader’s primary value is tracking project status, supporting people in achieving their goals, and helping the team course correct as necessary. The following is a non-comprehensive list of questions you might ask yourself to quickly determine whether the project is on track. On many teams, project leads (tech leads, PMs, and engineering managers) will shoulder the burden of identifying and removing roadblocks. However, the best teams have a shared sense of project health and distribute responsibility for ensuring the project is healthy.

Execution

Are there clear milestones and a plan to achieve them? Do we use new information to revise the plan and goals?

For long projects (1 month or more) setting intermediate milestones helps set expectations for what should be done when and manage their energy more effectively. When people are given a big goal and a deadline far in the future, it’s difficult to independently determine how much should be done when. When necessary, it’s important to revise milestones dates if the team is ahead or behind schedule. When determining whether to adjust milestones the primary constraints are the team’s energy/morale and how urgently the business needs you to ship.

Is the team shipping at sustainable pace?

A sustainable pace is one that the team can maintain indefinitely. For example, working 2 week long sprints that consist of 12 hour days 5 days a week is not sustainable for most people. Working 2 week long sprints that consist of 2 days of planning of 8 days working 8 hours per day is sustainable (most people can work many more hours comfortably). The pace at which your team works best depends on the individuals and the type of work you’re doing.

Are there large crunches at the end of sprints or before milestones? This is a sign that you’ve committed to too much work or the team is not using their time wisely earlier in the sprint.
Are there large gaps of unstructured time before milestones? This is a sign that the team is underutilized or there isn’t enough work being planned in advance.
If the team were to continue at this pace, would they get burned out? Bored? This question assesses utilization in a slightly different way. Assuming the team is excited about the work they’re doing, if you find the team complaining about being burned out, they’re likely either unfocused (context switches take up a lot of energy) or overutilized. If they’re complaining about being bored, they’re underutilized.

Are we assessing the quality of the project at regular intervals?

Setting goals before the project begins helps us quantify the project’s expected impact. It’s also important to do some continual qualitative assessment. Assessing this accurately is largely a matter of collecting feedback (from the team and stakeholders). That may take the form of assessing metrics impact, design reviews, product reviews, and periodic informal walkthroughs you do yourself – the approach depends on the type of project. This is a compliment to milestones – complete a chunk of work and use that as an opportunity to assess project quality.

Prioritization

Are we prioritizing working on the most important tasks?

Most projects can be broken down into smaller milestones that deliver value to the user or help the team validate assumptions. For example, product development teams commonly choose to narrow down product requirements to a minimum viable product to determine whether or not their feature is useful without building out the full vision. And within that milestone there may be several more milestones (e.g. prototyping, feature complete, ship ready) Periodically reassessing task priority in the context of the next milestone, is important for helping the team stay focused on the most important tasks.

Do we assess task priority with the correct stakeholders?

Prioritization should not be done in isolation. The product development team owns execution of the product but all stakeholders should be a strong input into that exercise.

Impact

Is there new information that affects expected impact? And are we using that information to course correct?

It’s easy to course correct when new information arises that affects the project’s timeline. You either work harder or change the deadline. Responding to changes in expected impact is a bit trickier. In the best case, you should be prepared to ask for more resource and/or time. In the worst case, you should be prepared to change the team’s mission or disband the team altogether.

Can the team see the impact they are having along the way? Is the team excited about their progress?

This is a compliment to setting goals and intermediate milestones. When the team reaches a milestone you should find a way to give them some tangible indicators of progress (e.g. metrics, a lightning talk, a product walk through). It’s also important to make a big deal out of finishing milestones, even when they’re not completed on time. Anytime the team expends energy to achieve a goal, ensure that there’s a positive emotional pay off.

Project Management

Do they understand the process for getting things done (status meetings, milestones, triage, etc)?

Giving people visibility into how their work (especially metawork) fits into the bigger picture ensures that it feels like a good use. Without a thorough understanding of the biggest picture, most people won’t have sufficient motivation to update update jira tickets, complete intermediate milestone, or clean up tech debt in the middle of shipping. Also, the team’s process is collectively owned and without understanding all of the moving parts the team won’t know enough to help you iterate on the process.

Are all team members clear on how we’re making decisions?

Setting an expectation that everyone’s ideas are welcome often isn’t sufficient for helping the team make decision collectively. Giving people a clear idea of how decisions are made (e.g. who owns the decision, what are the right forums for feedback, when is the decision being made) gives a greater sense of autonomy and empowers them to participate in the decision making process.

Do we understand how well we’re executing relative to our time estimates? Are they realistic?

Quickly understanding how well you’re executing against your estimates ism one of the quickest ways to understand whether you need to adjust your plan or approach.

How much metawork is there?

If you feel like you’re spending too much time each week on metawork like removing roadblocks or assessing the health of the project, that’s a sign that you either 1) haven’t planned things in sufficient detail or 2) don’t have the right tools and processes.
To address the planning problem, it’s often useful to set up a weekly or biweekly cadence for task discovery, prioritization, and sequencing among project leadership (although of course these meetings should be open to everyone). Spending 30-60 minutes per week on those tasks often saves an order of magnitude more of the team’s collective time.
For tooling and processes, at the beginning of the project it’s not unusual to spend 20-30 minutes each day refining your approach to tracking and managing work. That initial time investment will save you a lot more time down the road asking questions in meetings, digging through commit logs, or scanning jira tasks.

Communication

Are you giving timely updates to stakeholders?

There are several types of critical updates you should be funneling to your stakeholders: project goals, strategic changes, design/implementation decisions, staffing shifts, and achievement of milestones.

If you’re working on anything non-trivial, frequent course corrections are expected and necessary. Course corrections become disruptive when they’re not communicated with adequate context in a timely fashion. Stakeholders shouldn’t be surprised by major developments in the project and should be thinking about managing their perception of project status or success. Making the “right” decisions is often not enough. The team needs to do the legwork to get buy-in for every major change in the plan between goal-setting and shipping.

Are we giving people a forum for feedback and using that feedback to inform our decision making?

Sometimes expectations become misaligned because the team is not receiving some critical piece of feedback that will help them course correct. A simple example: you set a goal at the beginning of the project to deliver something for another team and part way through the project, due to changes in the market, they discover that they need something slightly different. Changes in project requirements are not inherently bad. Discovering them too late or not responding appropriately is the only failure. Often times, the change is much more nuanced and difficult to discover than the example above, which is why it’s important to establish clear forums for feedback and tight feedback loops, especially with your most important stakeholders.

Code Quality

Are we shipping code quickly without causing major customer impacting issues?

All else being equal, faster execution is better. However, there’s often a trade off between execution speed and execution quality. Keeping an eye on the number of customer impacting issues and bugs is one of many ways to assess execution quality. If the team is shipping quickly but committing lots of bugs, it’s a sign that you should work with them on testing or code design. If the team is moving slowly, and shipping no bugs or seeking code perfection, push them to make technical decisions more expeditiously by being a bit more lenient in their design reviews or seeking help from their tech lead or domain experts within the eng team.

Do we have clear ways to continually assess code quality?

It’s important to form your own opinion about code quality and organize conversation among members of the team. For example, reading through design docs and commits to ensure that people are considering the right risks and anticipating future use cases of their code is probably table stakes. As a leader, you should be producing other leaders so in general organizing conversation among the team is preferred. For example, write/review design docs as a group, have the team give lightning talks about a hard problem they solved recently, or ask people to proactively identify tech debt and devise mitigation strategies.

Is technical debt accumulating? Are we spending excess time cleaning up after earlier shortcuts?

Introducing technical debt into the codebase is not inherently bad as long as you’re intentional about it. That means building an understanding of the risks of your technical decisions and formulating a plan for mitigating or eliminating those risks. Your goal should almost always be to leave the codebase as good or better than you found it. The rare exception is projects with tight time constraints and a huge expected impact on the business – in those cases you may be willing to take on some technical debt indefinitely. When unplanned technical debt begins to accumulate, it’s a sign that you should be more hands on with planning and perhaps set aside more time during the sprint for cleanup and/or discovery.

Are we using the right patterns and technology for the problem we’re solving?

It’s often tempting to use a new project as an opportunity to try out new frameworks or work with technologies you’re unfamiliar with. For projects where maintainability, performance, stability, or efficiency is a concern (which covers pretty much everything outside of rapid prototyping), it’s important to consider the long term implications of your technology decisions. As one example, choosing to redesign a feature and migrate to a new framework is often risky since, toward the end of the project, when you’re trying to assess the impact on business metrics you’ll have trouble separating the impact of a different technology from the impact of a different design.

Are we writing an appropriate number of tests and documentation?

Different types of projects call for different approaches to testing and documentation. At one end of the spectrum, there are projects with murky requirements or questionable impact for the business (e.g. a new blue “Foo” to increase revenue?). The primary goal of these projects is learning. Investing a lot of time in long term maintainability is often a waste of time since you don’t even know whether you’re shipping something of value. On the other end of the spectrum, there are projects with clear requirements and impact on the business (e.g. an infrastructure that stores and retrieves private user data in a secure way). The primary goal of these projects is delivering value. Investing a lot of time in long term maintainability is paramount for these types of projects.

In other words project with well-defined, valuable reproducible outcomes require you to think more about the long term maintainability and stability. And projects that emphasize learning generally require less of that thinking for the first iteration.

Metawork for Engineering Leaders Part 1 – Setting the Stage for Effective Collaboration

When I first started growing tech leads, my biggest challenge was giving people sufficient support without being overbearing and diminishing their perceived ownership of their project. So I devised a list of questions to ask my TLs to give them a better sense of my expectations and help me quickly assess the health of the project. That document formed the basis for Metawork for Engineering Leaders. Over the years, I kept iterating on that list and it eventually became a list of all the things I expect project leaders to handle. Today, I use this list to train tech leads, distribute responsibilities among tech leads, managers, and PMs (product or project managers), and make role expectations clear to the entire project team so everyone can hold each other accountable.

When leading a project, your goals should be to help the team be maximally effective and ensure everyone (the team and stakeholders) feel good about the time investment afterwards, regardless of the outcome. You can accomplish that in many different ways – keeping people motivated, ensuring they have impactful things to work on, building consensus the team’s goals, and much more. The more intricate the project’s dependencies (meaning dependencies among tasks, individuals, and teams) the more time you need to invest upfront to ensure smooth execution. Without setting the stage beforehand, expectations for project cost and ROI will be misaligned, which causes decision churn, inefficient execution, and a lot of wasted time as people struggle to build consensus on the fly.

Below are some of the questions that project leadership (tech lead, PM, eng manager) and the team should answer before embarking on a large chunk of work to ensure that team members can be maximally effective. If you’re unable to answer any of these questions satisfactorily, it’s a sign that you should dig in further to avoid problems further down the line. And if you’re able to answer all of these, you’ll have peace of mind and be better equipped to communicate the project’s status to the team and other stakeholders around the company.

Lifecycle

Do we have a general understanding of what failure looks like?

Once you’ve spent a lot of time working on a project, finishing may start to feel more important than meeting your goals. Without defining what failure for the project looks like, the project may turn into a zombie – one that slowly limps along without sufficient team motivation or organizational support. Failure can mean the project cost is much higher than anticipated, we discovered that metrics aren’t going to move as much as we thought, or the company’s priorities have shifted. Killing projects never feels good. Having a clear definition of failure up front allows the team to have an intellectually honest conversation about whether to continue investing time in any given project.

Do we have a clear end state in mind?

Even highly successfully initiatives will eventually reach a point of diminishing returns. At that point it’s important to take a step back and determine whether it’s time to shift the team’s focus. Without a clearly defined end state, it’s easy to be swept up in the excitement of shipping while your impact diminishes.

Impact

Is it clear how the project fits into the company’s strategy?

Having a clear connection for a given project to the company’s current strategy and objectives is important for keeping the team motivated, giving other teams a clear idea of when and how to include you in discussions, and ensuring continued organizational support.

Are there SMART goals for the project (specific, measurable, attainable, realistic, time-bounded)?

Defining clear goals up front gives both the team and stakeholders clear success metrics. A lack of clarity here leads to lack of focus and disagreement about the project’s outcome – when you don’t communicate metrics for success up front, people will invent their own.

Have we identified clear risks for the project? Do we have mitigation strategies?

Even with well-defined goals and a solid plan, no project will be a guaranteed success. If you’re able to identify all risks ahead of time and devise mitigation strategies (which may include ending the project prematurely), you’ll maximize the project’s chances of success and, even if you don’t hit your goals, you can sleep well knowing you did your best work as a leader.

Stakeholders

Have clear stakeholders been identified (other engineering teams, other job functions, etc.)?

Understanding who is affected by your work helps you continually collect feedback and course correct as necessary.

Have we set expectations for the roles of the stakeholders in the project (code review, feedback, etc)?

Setting clear expectations for stakeholders upfront helps you avoid scrambling to convince people to devote their time and expertise to the project later on down the line.

Team

Are there clear owners for each part of the project (including people on other teams or orgs – design, PM, tech leads, engineers on other teams)?

When tasks or decisions lack clear owners, decision-making and execution tend to be less efficient – you risk having some areas be neglected or multiple owners arising organically, which means there’s no clear way to resolve disagreements.

Does the team understand the project lifecycle, expected impact, stakeholders, and communication plan?

Ensuring the team understands your plan for running the project gives them confidence the plan will succeed and empowers them to aid you in identifying risks and roadblocks.

Does the team understand how their work fits into the bigger picture?

Everyone should have a clear understanding of how their individual efforts tie into the larger mission of the team, the engineering org, and the company. Anytime you ask them to do something it should feel like a good use of their time.

Does the team know each other?

The best performing teams respect each other’s skillset and bond over something other than work. The importance of the former is obvious – trusting your teammates’ competency and reliability is paramount for collaborating on non-trivial tasks. The latter may be non-obvious – forming a bond that transcends the project’s goals is important for sharing authentically among the team, having hard conversations, and committing to shared goals even when unanimity is lacking.

Does the team have the right skills to deliver the project?

Starting the project with the right skillset within the team is ideal. Learning on the fly can also work well if the team is senior enough or you identify the right support (mentors and collaborators) up front. You should avoid situations where you discover part way through the project that you’ll require additional expertise from another team to do your project well.

Does the team have enough people to deliver in time?

Obviously, missing the deadlines you set reduces trust in your team overtime. Establishing unrealistic goals or workloads also hurts morale and, at worst, burns people out. Make sure that you’re committing to a reasonable amount of work for a team your size working at a realistic pace.

Communication plan

Is there a communication plan?

Formulate a plan for keeping stakeholders and the team in the loop. Defining and sharing your communication plan ensures that

Communication is flowing among the teams and people that can help maximize the project’s success.
People feel good about the team’s work. The bigger the organization, the more important it is to devise a clear communication plan.

Do stakeholders understand when and how to give feedback to the team?

Stakeholder feedback is often the difference between project success and failure. Without clear forums for feedback (meetings, Slack channels, docs, etc) misalignment grows overtime and trust in the team erodes.

Who owns communication for different aspects of the project?

Having an owner for communication ensures that people know who to talk to about project updates and feedback. That person serves as a steward for the quality of the communication channels and source of truth for their area. This person is often a tech lead, PM, or manager.

The Importance of Healthy Iterative Loops

Early on in my career, I assumed that being a great leader is about having the right answer all the time. After seeing several projects succeed and fail and reflecting on the reasons, I started to see things much differently. The most successful projects are almost never headed in the right direction and set up perfectly at the beginning. And the biggest train wrecks don’t always fail because of lack of knowledge. People often talk about good communication, project management, or execution being more important than ideas. To me, these are different ways of saying the same thing: establishing a healthy iterative loop is more important than having the right answer on day one.

When I worked at Facebook, I took over a high priority project (reporting directly to the CEO) that wasn’t doing well. When I started managing the team, I had 1:1s with my manager, new reports, and dozens of other people collaborating with us – literally everyone told me the project was doomed. As a result, the team was shrinking, confidence in the project was waning, and everyone had essentially given up. Fast-forward to 9 months later and we hit 3 out of 4 goals we set for the year and had a well-established, sustainable team to own the problem space long-term. How? We established an iterative loop that helped us build confidence and, ultimately, deliver results that surpassed any of our expectations.

Learn. I had a fair amount of theoretical knowledge about the problems we were solving but everything else was new to me. I committed to learning everything possible about the project including the technical, organizational, and business challenges. Diving into a murky, ambiguous problem space was probably the scariest part, but being comfortable with doing that is an important skill to build.
Set expectations. I acknowledged the difficulty of the situation and long road ahead while assuring people that, given time, we’d right the ship.
Devise a plan. I worked with the team to identify the biggest keys to the team’s success (including things like educating other engineers at the company, recruiting, persuading other teams to devote resources to the project, and concrete deliverables), vetted that with the team and our collaborators, and distributed the work among several teams.
Execute. Sequence work, track progress, and keep people motivated.
Win. That actually worked!!

The above cycle repeated itself as new challenges presented themselves. But each time we returned to “learning” again we were more confident in our ability to win and build on previous successes.

If you’re standing at the base of a mountainous challenge, consider how you might define a feedback loop to claw your way to the summit.

Turning Production Performance Data Into Wisdom

Data literacy is one of the more underrated parts of the software engineering skillset. When you’re dealing with a complex, dynamic, evolving system, being able to reason about data is at times more important than institutional knowledge, which tends to become outdated. Understanding a single library or subsystem really well often isn’t good enough. And when you transition to engineering leadership, grow your team, and focus more on the big picture, keeping up with every technology change isn’t feasible.

In this post, I’ll share a few patterns I look for and what they tell you about how a feature or subsystem is performing.

Let’s use FooService as an example. When you look at the source code for FooService for the first time, you’ll probably be very confused. There’s a config object being passed in with a mysterious flag. It’s difficult to reason about exactly what each code path is doing and how often it’s followed. How can we even begin to reason about performance characteristics in production? Let’s take a look at a few examples.

The Blip

You see a perf regression that looks like noise at first. The week over week graph shows that it’s actually a periodic regression. It doesn’t correlate with a periodic increase in app usage. What’s going on? This is often an indication that you have a warm/cold dynamic somewhere in your codebase. When some part of your application is updated, the initial session for every client experiences degraded performance followed by completely normal performance. When you see a blip, try to find a pattern rather than treating it as random noise.

Examples

You ship a new version of your web app, you probably references resources that don’t exist in the cache, which slows down every page load and causes a blip on every release.
You release a new version of your mobile app, your users will experience a cold start – the app loads into memory, a new process is created, and app initialization code runs. If you’re paying attention to start up time you might notice a blip

Seeing a blip is not necessarily a bad thing but it’s important to treat those as random noise since they represent bottlenecks and optimization opportunities

The Multi-modal Distribution

The histogram reveals multiple modes. In other words, there isn’t a single most common value but multiple values that are far apart. There’s no discernible pattern when you slice by demographic. What’s going on? This is often a sign of one or more really expensive code path being executed part of the time. It doesn’t necessarily mean that something is wrong but, unless the user experience is significantly different for each mode, there’s probably an optimization opportunity here.

Examples

An AB test where behavior differs for the experiment and control groups
Classes and modules with different performance characteristics being used conditionally
One code path is used for new users, which tend to be less computationally intensive, and another is used for experienced users

The Puzzling Outlier

The CDF levels off sharply at p99 indicating a big increase in page load times for that percentile. It increases even more sharply when you zoom in at p99.9. Is your instrumentation broken somehow? Is this the result of a runaway query or zombie process somewhere in this system? What’s going on? This is often an indication that you have big fish or celebrities in your system with massively degraded performance.

Apps are often designed for “normal” people or uses cases. If you’re writing a feature or service that assumes normal usage you’re going to have a bad time. Or at the very least your p99.9 use case is going to have a bad time. So what’s the big deal? That’s not that many users right? Well p99.9 problems often affects your most important customers since they have the resources or influence to stretch your infra to the max in the first place. In other words, sometimes these outliers are actually your most important customers and deserve more attention.

Examples

Your CEO, the ultimate power user, is testing all the things
Rihanna or Bieber have started using your app as part of their marketing strategy
Legitimate use cases that you didn’t anticipate