Thursday, May 1, 2014

Work Life: Enterprise Release Management for Hardware Dependent Solutions

I see a greater need for release management. Now with agile development becoming more of the norm for corporate/enterprise level solutions, there is even more need to validate what is going out into production.

To me, the main driving force is because software solutions have finally deviated (in my opinion) from the traditional workflow of hardware development, a very traditional waterfall model. Even with agile development, agile also follows a very cyclical version of the waterfall model.

In the past, software was deployed in its own instance at a client system. Thus each client can be on different versions thus maintaining an overlapping waterfall model if one was to build a model on an instance level. In other words if we tracked each individual client, we can create a waterfall for that client even though the SDLC will be shared among the clients with the same version.

But with introduction of cloud, now there is only a single solution or a single client install that everyone shares. Add this to a corporate/enterprise level, you have many development cycles that need to be deployed to a single instance.

With a single instance, a software company can no longer just push out changes whenever they want to. Planning and scheduling is also more complex. Basically, software releases now must follow a more traditional waterfall process while software development is moving towards a more agile process. There are two bottlenecks in an enterprise cloud solution: system testing and production deployment.

System testing has to continuously run regression testing, but as enterprise solutions grow larger, regression testing becomes longer and longer. When regression testing starts to get around 2+ weeks to complete, this becomes a major bottleneck for changes to be deployed within a normal agile schedule.

With more components, there are also more defects. More defects means more emergency change windows. With more change windows, there are more regression test needed. Not only do changes become longer, but they also become more frequent.

A simple solution for non-hardware dependent solutions, is to create more replications of the production environment. This allows multiple regression testing to overlay. Although this does not help with decreasing the cycle time, it allows a higher frequency of deployments into system test.

Putting all the changes into a single "patch" can alleviate a lot of stress on deployment but adds stress to production support. With more changes, it becomes more difficult for production support to know the source of the problems whether it already existed or coming from one of the other changes. By isolating changes to only a few select changes, then a queue will build up.

Priorities also require different expectations and management when multiple changes are bundled together. If one of the changes is a high priority, but then a low level change fails in production deployment, all changes should be rolled back. General management must then make the decision to go with the change with a regression test that did not test that specific scenario or to roll back a high priority change that may delay other factors by several days if not weeks.

System test environment will be under even more stress than production because production only has to maintain the latest completed and tested code. System test has to maintain three different environments. In an enterprise level, this causes a lot of complications.

First, it is difficult to financially maintain three instances of relatively the same thing. One is needed for the latest major release, one to test emergency changes (aka point releases, patch releases, etc.), and last to be on the same version as production to replicate errors and bugs.

Second because there are three different environments, not all vendors are capable of existing in multiple instances on the same network much less different versions, etc. Then if you want to have different versions of a vendors application, then more labs will be needed.

Third, changes are coming from many different groups. Agile causes even more iterations to be introduced into system test. Eventually, a quiet period (aka code freeze, soak period, etc) is required to do a comprehensive regression test. Rarely will there not be an issue, then scheduling becomes an issue because quiet period is no longer possible to fix changes.

This does not even include all the politics that occur in enterprise level solutions, nor the management of defects across different groups, nor the different policies of different groups. There are also many things out of the control of process like OS maintenance updates, vendors no longer support older versions, bugs from vendors (ie Heartbleed), and so many other variables that interrupt a process that is already very stressed.

Evident of current software development and trending agile development, having more formal processes are nearly impossible to implement much less enforce. I believe there will be a greater need for skilled engineers and technical managers, because they are required to maintain processes that require less common abilities. Formal processes has corrupted the workforce with personnel that cannot keep up with the pace.

In corporations, processes have been so dummiefied (ie idiot-proofed) that people who can follow instructions call themselves engineers. These people are unable to keep up the changes. Sadly, they are so tenured and senior in the company, that they are also the source of the inertia to any change to the culture.

The age of corporations to make human resources a commodity is in the recession. This is why the technical sector has a need for "skilled" engineers yet unemployment still remains relatively the same. This is why small companies can take over corporations.

Returning back to the topic on release management, release management is needed to make sure that changes that are introduced in the different environments are maintainable. For example if there is a backlog of changes into production, new changes should not be introduced into system test. If they must, then dependencies must be evaluated. If there is a dependency, then many other factors must be considered.