HOW TO BUILD A BALANCED SCORECARD

HOW TO BUILD A BALANCED SCORECARD^©

Part 3: Selecting Scorecard Metrics*

Arthur M. Schneiderman

A balanced scorecard contains a concise set of strategically important measures. They capture the vital few drivers of the organization’s future success. I’ve called these scorecard measures “metrics” and defined them as:

“Metrics are a subset of measures of those processes whose improvement is critical to the success of the organization”

Once we have identified those processes, we face the challenge of selecting this subset from a seemingly endless list of possibilities. Usually this decision is based on what measures are already available or can easily be obtained, benchmarking studies, or executive edict. But there is a much better way of doing it.

Classifying Measures

Measures of a process come in two flavors: I call them “results measures” and “process measures,” although each has many aliases:

Results Measures	Process Measures
Output	Input
Outcome	Driving
Lagging	Leading
External	Internal
Reactive	Predictive
Static	Dynamic
Effect	Causal
Retrospective	Prospective
Dependent	Independent

Whichever set of names you choose, there is a very important difference between them:

Results measures characterize the output of the process. They are the consequences of actions taken within it. Since they are descriptors of the output, they relate directly or indirectly to things that a customer of that process can sense or measure.

Process measures, on the other hand, are the internal measures from within the process that determine these results. In most cases, the customer has little or no interest in or knowledge of them.

The SIPOC Method

One very useful model for generating candidate measures is called the SIPOC method. SIPOC stands for

Supplier®Input®Process®Output®Customer.

In using this model, we usually start by identifying all of the customers of the process and determine their complete sets of requirements. Here, customers include both the external purchaser of the final product or service as well as other internal processes that are part of the organization’s value creating activities (or, as we say in TQM: “The next process step is the customer.”). Through a process called “Voice of the Customer” we translate these requirements into results measures that characterize the output of the process in terms that are both meaningful to and measurable by the process executors. This translation is necessary because the customer often describes their requirements in words that do not have a direct process counterpart.

Next, we reverse this procedure by identifying all of the external inputs that we need in order to execute the process, define our requirements for each of these inputs, and ideally working with our suppliers, translate them back into a set of specifications that are expressed in the supplier’s own language (“Voice of the Supplier”).

Output measures and their associated quantifiable customer requirements (Output®Customer) are clearly results measures. Measures associated with steps internal to the process (Process) are obviously process measures. But what about input and supplier measures (Supplier®Input)? Symmetry would suggest that since they are results measures of the supplier’s value creation process, they must also be results measures for our process. But is that necessarily so? In other words, can a measure that is a results measure for an upstream process be a process measure in a subsequent step? The answer here is a little bit tricky.

What is different about Supplier®Input measures are that we cannot improve them directly from within our own process. We can only do so indirectly by changing specifications, or suppliers, or through the redesign of our product and/or process (“design for x-ability”). Their actual improvement is directly controllable only by the supplier of that input. Often we have a limited ability to affect our supplier’s control or improvement efforts (through partnering, for example) or to redesign our products and/or processes. If that is the case, then we need to treat that measure as a given (that is, a constant) and that measure’s classification into the results or process category then becomes moot.

Generally speaking, to indirectly change an input measure requires the exercise of different internal process within our organization - the supplier selection process by which we choose suppliers, and/or the product and process design processes. Even in that case, it is difficult to argue that they are anything but results measures. In other words, unless we include within our process sub-processes for supplier selection and product/process redesign, we must view these measures as the result measures of other internal or external processes.

Any given process is part of a system of interacting processes. This is one of the important reasons why it’s critical to have sponsorship of all improvement efforts by someone who is in a position to set appropriate boundaries and constraints to that effort.

The Math of Metrics

From a mathematical point of view, the last alias-pair is the traditional choice of terms. For each results measure, we can write a symbolic equation that relates this dependent measure (or more correctly, “dependent variable”) to the independent ones:

In words, this equation simply states that the dependent measure, y_i, is a function of (i.e. depends on, or is determined by) all of the independent measures: x₁, x₂, up to x_n, where n is the total number of independent process measures.

For example, if the process were baking a cake, then one dependent measure would be the “lightness” (in the Language of the Customer) of the resulting cake, measured by its density (in the Language of the Process) in grams per cubic centimeter. Here, y₁ would be cake density and the goal is for it to be in a specific range: not too light, not too heavy, but just right. What about the x’s? The list would include oven temperature, cooking time, amounts used of the various ingredients, freshness of ingredients, etc. These are the measures that are included or implied in a clear recipe (or Standard Operating Procedure (SOP)). Other dependent measures would include moistness, sweetness, and flavor for example and we could create instruments that would measure each of them, as well as establishing each of their associated target ranges. Each dependent measure would depend on one or more of the many independent measures.

Determining drivers of change

In general, we are trying to limit variation of and/or improve dependent measures in order to make our product more attractive to its customers. So let’s look at how this equation changes with changes in the independent measures:

The symbol “D” stands for a small change in the measure. So this equation says that the change in a dependent measure is the sum of the weighted changes in all of the independent measures. For very small changes in the measures, mathematicians can show that this simple additive relationship holds in most practical cases. The weights a_ij are sometimes called “influence coefficients” or “impact parameters.”¹ They represent the effect that a small change in the j^th independent measure has on the i^thdependent measure. If a_ij is zero, than small changes in its independent measure have no effect on that dependent measure. If the value of a_ij is large compared to the other coefficients, then the dependent measure is very sensitive to changes in that independent measure. It’s these influential independent measures that are usually the targets for both process control and process improvement and are therefore candidate scorecard metrics.

In process control, they are called “critical nodes.” By “locking” them, we assure that variation in the dependent measures that they affect will be maintained within a range that’s acceptable (but not necessarily satisfactory) to the customer. For process improvement they indicate the likely root causes of the gap between current and target results.

Unfortunately in practice, for large changes in the measures, this simple model is often limited by two phenomena: “non-linearity” and “interaction.” Non-linearity causes the influence coefficients to change (increase or decrease) for large changes in their independent measure. Interaction occurs when interdependencies develop between the various independent measures (they loose their independence).

Some Simple Examples

The exact mathematical function takes on different forms for different dependant measures and processes. Here are some examples:

Example 1:

The time required to execute a process from its start to its finish is called its cycle time. If the various x’s are the cycle times, t_j, for the internal process steps that lie on the “critical path,” then the total cycle time, t_T, is

Example 2:

The overall yield of a process depends on the sequential yield of the internal sub-process steps. Let’s say that if the process were perfect (no internal yield loss), it would produce 100 output units. If the actual yield in the first step is 90%, then only 90 potential outputs survive it to the next step. If that step’s yield were 80%, then only 80% of those 90 or 72 would make it to the next step, etc. Therefore, the overall yield is given by:

For Example 1 above, all of the influence coefficients have a constant value of 1, that is any increase or decrease in a critical path cycle time simply adds or subtracts that change from the total cycle time. We could include non-critical path sub-process cycle times, but their influence coefficients would all be zero (until they became long enough to enter the critical path). On the other hand, for example 2 it is straight forward to show that the influence coefficient is inversely proportional to that sub-process’ yield (a_Tj=Y_T/Y_j). What this means is that improving a low yielding process step by 1% (for example from 25% to 26%) has a greater impact on total yield than that same 1% improvement in a high yielding process step (going from say 95% to 96%). In other words, lower yield process steps have larger influence coefficients.

In many manufacturing environments, process or manufacturing engineers know the mathematical relationships between the dependent and independent measures. Usually they do this based on a physical or chemical theory of what’s happening in the process. When this is the case, these experts can help in the selection of those independent measures that are the principle drivers of change in any given results metric. Once identified, these process metrics generally represent the primary targets for improvement efforts and are tracked on the appropriate scorecard.

Empirically Determining Process Metrics

As a rule-of-thumb, low influence coefficient independent measures vastly outnumber the critical few (see Why Do Root Cause Analysis?). So trial-and-error is not a viable option. Finding the process metrics in practice often ends up requiring a mixture of both art and science.

When a theoretical equation does not exist or is not known, we need to resort to empirical observation. Total Quality Management (TQM) employs teams that apply the scientific methodology (the PDCA Cycle and the 7-Step Method) and basic analysis tools (the 7 QC Tools) for identification of the root causes (process metrics) of undesirable outcomes (results metrics). I’ve explained this process in more detail in my article “Are There Limits to TQM?"

The vast majority of process improvements can be discovered using these simple scientific tools. For more complex situations, three additional approaches are sometimes used: heuristic techniques, design of experiments (DOE), and simulation modeling.

Heuristic Methods

I once assisted a team trying to reduce defects in welded pipe used in the oil industry. The particular defect was called “hook cracks” since they had the shape of a fishhook. In stratifying defect data by shift, I discovered that one crew had significantly lower defect levels than the others. I narrowed it down to the welder operator and interviewed him in the hopes of documenting his “secret” so that this best practice could be shared with the others. Each welder setting was specified with a range determined by the industrial engineers. I asked him how he chose a setting from within these ranges and his answer was “I can tell by the sound the welder makes.” The other operators just tried to pick the mid-point. The IE’s response: “Sound has nothing to do with weld quality.”

A few months later I visited an identical pipe mill in Japan where the operators relied on an additional meter to adjust the mill settings. Using a microphone placed near the weld site and connected to a measuring instrument (a spectrum analyzer), their IE’s had determined that if the sound frequency was within a certain range, a perfect weld was produced. Outside that range, the resulting product was defective. What was the defect? No one remembered at first since the discovery had been made several years before. Finding an old-timer they came back with the answer: “Something called hook cracks.” Why should a good weld have a certain pitch to the sound it made? There was no accepted theoretical explanation; it simply worked. The Japanese IE’s were willing to accept this heuristic observation while their American counterparts had discarded it as scientifically baseless.

As another example, Kano² observed an important non-linearity in the independent measures that we call customer satisfaction. He classified the independent attributes that drive customer satisfaction (such as particular product features, price, availability, reliability, etc.) into four categories:

*Independent Measure type*	*Influence coefficient*	*Implication*
One-dimensional (satisfier)	Constant	The more the better, the less the worse
Attractive (delighter)	Zero below a threshold, then positive and increasing with increasing x	Absence does not dissatisfy, increasing presence produces significantly greater satisfaction
Must-be (dissatisfier)	Negative and decreasing with increasing x to a value of zero at a threshold	Presence does not satisfy, increasing absence produces significantly greater dissatisfaction
Neutral (indifferent)	Relatively small or zero	Don’t care whether it’s there or not

To place each independent measure into one of these categories, Kano developed a structured multiple-choice survey tool. He than created a heuristic “decoder ring” for determining the measure type from the responses to paired questions. By understanding current performance and the type of measure, the user could than rank all of the independent measures by their improvement’s impact on customer satisfaction.

In general, heuristic methods are based on empirical observation, not on any underlying mathematical theory. They are often discovered through gut feel or what I’ve called the “ins”: instinct, intuition, insight, inspiration, innovation, invention, etc. Their justification is therefore based on the fact that they simply work in practice. Although we preach “management by fact” it is important to also acknowledge that in many instances, and through mechanisms that we do not even understand, some people are able to see through process complexity and identify the underlying drivers.

Design of Experiments and the Taguchi Method

Another way to determine the influence coefficients would be to vary each of the independent measures over an appropriate range while holding all of the others constant and observing its effect on the dependent measure. By doing this we could also identify their range of independence. But in many instances, the number of required experiments would be impractical in both time and cost. Fortunately mathematicians have devised efficient experimental sequences in which we can vary more than one independent measure at the same time. The first to do this was Euler (1783) in what are called Latin Squares. Today such experimental schemes go under the name “Design of Experiments” or DOE. DOE is a popular tool used by six-sigma practitioners, and facility with it is usually a prerequisite for black-belt certification.

Genichi Taguchi has attempted to demystify DOE by creating a somewhat simplified procedure, that although not as mathematically rigorous usually gives an adequate answer. In doing so, he followed the example set by Walter Shewhart in his pioneering efforts (c. 1930) to bring statistical techniques to the shop floor environment.

Most statistical software packages now include a DOE and/or Taguchi Method capability (see for example Minitab, which is used in several six sigma initiatives). However, even with current software support, their use is beyond the capability of most improvement team members and requires expert assistance (e.g. staff statisticians or six sigma black belts). Fortunately, the vast majority of improvement efforts do not require this level of analysis in order to uncover the relevant independent measures.

Simulation Modeling

In a process simulation, we attempt to dynamically reproduce its important characteristics in a computer model. By “running” the model, we can understand the complex interrelationships that exist within the process and test the effect of changes. Simple simulations are often done using spreadsheets such as Microsoft Excel or Lotus 123. For example, the columns in the spreadsheet might represent sequential times (e.g. months or quarters) while the formulas for each period’s cells depend on several results calculated for an earlier period. Many software packages have specialized structures that make them particularly suitable for certain types of process simulations.

Flowcharting is an essential step in process improvement. Several of the current flowcharting software packages also include a simulation capability (I use Scitor Process) that is very helpful in finding internal leverage points, particularly when there are complex process flows and/or random variation is important.

Example 3:

A biotech company’s product involved a new medical procedure that required special approval from the patient’s insurance company for reimbursement. Long average approval times were having a serious adverse financial impact on the company. Furthermore, the variation (standard deviation) in approval times was also unacceptably high. What could they do to improve this result metric? There were many theories as to the root cause, most of which involved problems in someone else’s area. The process flow was complicated by many alternate paths and frequent “resubmittal loops.” A simulation of the process (using the Monte Carlo method), based on probable paths at each process node explained both the average and variation in approval time and pointed directly at the independent measures whose improvement would have the greatest impact.

For complex processes that contain time lags as well as subjective variables, System Dynamics modeling can also be very valuable (here I use Vensim). System Dynamics modeling has the advantage that it can easily accommodate both non-linearity and interdependencies, although its successful use does take considerable modeling skill.

Example 4:

To successfully compete in a new market segment, an electronics company needed major improvements in its delivery performance. Stratification of late shipment data showed that it was significantly higher the last week of the quarter. Again, there were many theories as to why. A system dynamics model of the entire order fulfillment process (order receipt to payment by customer) uncovered the answer and it was closely related to a similar phenomenon known as the end-of-quarter revenue “hockey stick.”

Shipment linearity implies that with constant revenues, one-thirteenth of the quarterly total accumulates each week. In many organizations, there is a shortfall and the revenue falls below this linear goal. Miraculously, in the last week or two of the quarter, a few heroes appear and through their superhuman efforts the target is achieved and they are appropriately rewarded. The shape of the resulting weekly cumulative revenue curve is much like that of a hockey stick, whence its name.

The model explained what was happening. The added revenue at the end of the quarter came from early shipments of large dollar orders not due until the first few weeks of the following quarter. With limited capacity, this was at the expense of many small orders due in that hectic end-of-quarter period. Even worse, once started, this practice triggered a perpetual cycle where only small quantity unfilled orders were due for shipment at the start of the next quarter thus creating that initial revenue shortfall. The solution: just as the cycle was started by a one-time action, it needed to be ended the same way -- just stop doing it! Unfortunately, this results in a temporary sales shortfall that only goes unnoticed if it is hidden by rapid revenue growth. By phasing the practice out over several quarters, the adverse revenue impact was minimized.

Without the use of a simulation model, it would have been difficult to identify either the root cause or a palatable corrective action plan.

Choosing the Scorecard Metric

If improving a particular results measure is a strategic goal, then improvement efforts should be focused on the process measures that will have the highest impact on its improvement. They are usually the process measures with the largest influence coefficients. What does that imply about choosing scorecard metrics?

Most scorecards that I’ve seen are heavily populated with results metrics. No doubt this results from the all too common management attitude: “I don’t care how you do it, just do it!” I strongly believe that ALL scorecard metrics must be directly actionable by their owner. Therefore, it’s the underlying process metrics, not the results metrics that belong on a scorecard. If the improvement goals for the process metrics are achieved, than we can be assured that the desired results will follow, assuming we have identified these drivers correctly.

For example, dieters often tend to focus on their body weight (a results metric) rather than its independent measures: exercise along with calorie, protein, fat, and carbohydrate consumption. Nutritionists now believe that successful diets involve lifestyle (aka process) changes that act on these independent measures. Get them right and over time you will achieve and sustain your weight goal. I wonder to what extent this results focus explains the statistic that 95% of dieters fail to maintain their weight loss.

I would argue that results metrics only belong on a scorecard when their associated process metrics are on two or more subordinate scorecards. In this case, the job of the owner of the results metric is not its improvement, but sponsorship of the subordinate scorecards. That sponsorship includes guidance, monitoring and diagnosis, organizational troubleshooting, resourcing, communicating, etc. for the individuals and teams responsible for the subordinate scorecards. There is an important place for results measures, but it is mainly in the detection step in process control, not improvement.

The Japanese have a saying “Focus on process, not on results.” In no case is this truer than in the selection of scorecard metrics. The key to linking strategy to action is not the balanced scorecard itself; it is this underlying process focus.

1 The influence coefficients are given by the partial derivative of f_j with respect to x_i.

2 See for example: Shoji Shiba, Alan Graham, and David Walden, “A New American TQM: Four Practical Revolutions in Management” Productivity Press Inc., January 1993, ISBN: 1563270323, pg. 221.

Part 2

return to top

Conclusion

Last modified: August 13, 2006