Friday, July 30, 2021

Introducing bufferts and slack in organisational processes

Typically when estimating how long a task will take for a project you see managers multiplying the estimate with some factor. This leads to estimates inflating like a balloon. Estimation is however typically done based on a deficient understanding of the problem. The effort put into estimation is further limited, typically to just something that can be measured best in minutes for tasks taking days.  Surprises and new circumstances arise as work is done. I would  however argue that these additions to the time a task will take is not necessarily linear in nature, as the multiplication of effort estimates would assume.

Perhaps an alternative way of inflating estimates would be to apply an exponent to the initial effort, but this will mean even bigger estimates that will not be easy to sell. This would be better if the distribution of delay as compared to initial effort estimations are exponential in nature. This may still not reflect the true nature of problems with estimates. Could applying a random unlimited exponent to estimates produce a value better reflecting the true nature? Perhaps the conclusion is that random values would be the best, as you save the effort of estimating over all... Work however exhibit the behaviour that it takes as long or longer than estimated. People tend to use the time allocated to the full extent. Padding processes and estimations slows down everything and waste time as a result.

Another alternative is to not inflate the estimated duration of single tasks, but add a buffer for the entire whole of all tasks to  be performed as a part of what is done. Each task overrunning its initial estimate can then use a fraction of this buffer. The remaining buffer would then give a good estimate of what the state of the over all project. When your buffer is going to zero, you know that your project is in trouble.

When organisations mandate that the utilisation of people processing work is kept very high any delay will have a severe impact. Full utilisation in a non deterministic system will mean that everything grinds to a halt. For an entirely deterministic system full utilisation can be seen as efficient, but entirely deterministic systems are rare and if the demand exceed the capacity also a deterministic system will cause delays. According to the Universal Scalability Law adding a node (person or machine) to process some work can degrade the overall performance of the system as a whole, due to coordination costs between nodes.  

Failure demand is also seldom included into the model used for estimating work. Depending of your circumstances some things will need to be done to correct mistakes, misunderstandings, communication gaps, variation related issues and systematic errors that may occur with the outcome of the work. When failure demand is high all predictability will anyway be close to none. Addressing failure demand can be a first step to reach some form of predictability, but failure demand will never go away completely and time must always be allocated to addressing issues. Reducing failure demand, especially in terms of common causes arising from the attributes of the system can in the long run free up some time, but require an investment in terms of work, further increasing the workload momentarily.

Time buffers are clearly needed whenever you have tasks performed by people. So what if the individual person has time to take a pause, as long as the system over all run smoothly?  Efficiency of the individual steps in a process or of the individual doing the work does not mean that the process overall is running efficiently. Efficiency of the overall process is dependent on that each node has the bandwidth to address needs as they arise. At high individual utilisation this is impossible.