Wednesday, May 19, 2010

About capacity planning

On March 31, ComputerWorld ran an article "How to develop an effective capacity planning process"

(http://www.computerworld.com/s/article/9174422/How_to_develop_an_effective_capacity_planning_process)
Excerpt:
* 1. Select an appropriate capacity planning process owner.
* 2. Identify the key resources to be measured.
* 3. Measure the utilizations or performance of the resources.
* 4. Compare utilizations to maximum capacities.
* 5. Collect workload forecasts from developers and users.
* 6. Transform workload forecasts into IT resource requirements.
* 7. Map requirements onto existing utilizations.
* 8. Predict when the shop will be out of capacity.
* 9. Update forecasts and utilizations.
"

It's all very true, but in the ideal world. In particular, the Item 5 is not very likely to happen in the real world, for a number of reasons.

First, workload forecasts are normally derived from the general amount of work in the corporation - private information usually kept close to heart by the corporate business wizards, as knowledge of such info is classified as insider information.

Second, developers and other IT users will at best have an understanding of what such information is being collected for, but it is in the farthest backburner of their minds and typically will be forgotten right after you talk to them about it: they have more pressing issues to worry about, and developers want to optimize the resources they have, anyway.

Third, transformation from workload prediction information to capacity utilization is an ill-defined problem, the only possible solution to which is, "it depends".

Even such a seemingly trivial problem as finding the correlation between corporate headcount and utilization of the Exchange server capacity (storage, performance, or both) will never follow a single formula, but will depend on a number of conditions, from Exchange server configuration, allocation, and configuration to corporate policies, etc.

When the company is using a SAN cloud, MapReduce in any of its incarnations, or any other distributed system, such transformation of developer workload forecasts into corporate IT utilization speculations very quickly degenerates into an exercise in Professor McGonagall's art of transfiguration and results in what Scott Armstrong called, "Bold Freehand Extrapolation".

Such forecasts cannot really be used to do serious data-driven capacity planning.

The right approach would be to collect the actual usage data from the IT network, and once sufficient amount of data have been collected, calculating a Time-Series-Analysis (TSA) forecast on each piece of hardware for which usage data have been collected.

This approach is characterized by 5 very important features:

1. It reduces the guesswork to the bare minimum and almost makes it non-arbitrary: forecasts become data-driven.
2. Assuming the right tools have been used for forecasting, it produces accurate and reliable forecasts
3. It makes its user aware of the uncertainty inherent in forecasting: when we say that we will use X amount of resource, we choose to ignore the uncertainty; when we use a statistical tool to come up with the X, the uncertainty comes up, and we have to deal with it - which is an inconvenience, but a good thing: it keeps us and our partners honest.
4. It gives us a way to judge the quality of the forecast.
5. It lends itself easily to multi-level forecasting, allowing us to build up forecasts for higher levels of integration from the lower levels of granularity.

All of this together is a very convincing reason why we should not rely on users' and developers' information in capacity planning, but instead use real forecasts.

No comments:

Post a Comment