Tuesday, July 22, 2014

Work: Enterprise Defect Management (rant)

I have worked with defects my entire professional life in support, integration, project management, and release management. I found that the size of the team makes a large difference in the management of defects. In smaller companies, there were few factors to determine the priorities of defects. Simply, the client's most urgent issue is the highest priority. The reason for this is because typically there is only a single person working on each project and each project has a single client. Although a person could have multiple projects, project priorities are typically managed by the boss (i.e. manager).

As companies grow in size, there are more people involved and more priorities to manage. Management of defects move towards another team that does not have easy access to development, testing, and deployment planning. Unlike its smaller company counterpart which typically will only have 2-3 roles involved, now I have to deal with several parties. Depending on the size of the projects, there may now be up to 10 different groups each with their own resources and priorities.

One of the biggest problem that I run into is finding a single method on how to manage the defects. The largest factor to my problem is that there is no single entity that makes the final call. Because of this, there is a lot of delays to the decision because the work cannot "start" until the decision is made. Although the official start is not determined, some teams take it upon themselves to work on certain items. This actually causes more problems to managing the defects. Because a change is simple for a development group, the amount of testing is not necessarily simple nor the deployment.

A simple change in a more fundamental code base could trigger a lot of tests required. Because the lab environment only has a single instance, production may have to deploy over a span of time because it has several environments. And so far, only a single path for a single change.

If multiple development groups provide multiple solutions, there are a lot of resource conflicts that can occur across many teams especially testing. For example if ChangeA is put into testing and requires 2 weeks to deploy to production, ChangeB cannot be put into testing until ChangeA has completed testing. ChangeC cannot be put into testing until ChangeB has completed testing, but ChangeB cannot leave testing until ChangeA is completed in production.

So far it is manageable although it does not address how it can be caught up yet, but another factor on issues from deployment or testing is not considered at this point. If ChangeA faces a problem in production and needs to be backed up then what should be done to ChangeB? Depending on the criticality, ChangeB could be backed out or merged with ChangeA. If ChangeA cannot be replicated in test, then what is the path forward?

If ChangeA works but ChangeB is unable to pass testing, ChangeC is now on hold. How should the group handle if a very high critical issue now comes up? Should everything be backed out then readdressed? Should it be merged to current change?

So far I have only addressed the main actors to SDLC, but there are other actors that make this management more complex: management and product planning. Because they have their own plans resources are now being used to manage unexpected work that comes up, budget and promises need to be considered.

This is comes to another factor to the resource consumption. Regular/planned/scheduled releases are impacted or impacting to defect deployments. Although rules are made to what changes should go into a emergency/point release or a planned/scheduled release, there are always exceptions that throws the process for a loop.

There seems to be several red lights to the chaotic process that I am currently in.

  1. The most obvious request from release management is to skip a few weeks of releases. And obviously, no one can stand missing fixes for 2-4 weeks. If the product cannot remain stable for a month, there is something seriously wrong with the product (IMO). What customer wants a solution that require weekly updates? I get quite irritated with weekly updates for changes that do not even impact me to begin with!
  2. Similar defects keep showing up. No one wants to manually fix out-of-sync issues, while also unable to replicate or completely fix the problem. There is something seriously wrong fundamentally with the product if no one is able to figure out syncing issues that keep coming up for over 3 months (for us two years even with some fixes that didn't do anything).
  3. Every team is requesting additional resources or time. Obviously something is wrong with planning if everyone (or even the majority of groups) cannot accomplish the simplest of tasks within a regular plan.
  4. Always in catch-up mode. There is seriously something wrong if product/application planner cannot realign the plan given the current situation.
  5. No COO. There is not a single point person who can just put their foot down and just move forward. I do not how many times we spent more time "negotiating" with everyone when the worst plan was still faster than the "planning" of the changes.
Rant over