How many bidders does a typical public procurement procedure attract? The question seems straightforward, and the answer — an average — feels satisfyingly precise. In Country B, competitive procedures receive 3.26 bids on average; in Country A, just 2.26. Case closed: Country B has healthier procurement markets.
Except that this conclusion is almost certainly wrong.
The Category Problem Hiding in Plain Sight
The number of bids a procedure receives — 0, 1, 2, 3, or more — is not a continuum. These values represent qualitatively different events. Zero bids means market failure: no competition occurred, and the procedure failed to attract any response. One bid is often suspicious, suggesting the procedure may have been tailored to a specific firm. Two or more bids represents actual competition, though even here the intensity varies considerably.
Averaging across these categories is a category error dressed up as measurement. It is like computing the "average health status" of patients by assigning 0 to deceased, 1 to critical, 2 to stable, and 3 to healthy, then reporting that "average health is 2.1." The number is meaningless because it blends categorically distinct states.
Return to our country comparison. Country A shows 15% of procedures with zero bids, 10% with one bid, and 75% with two or more (averaging 3 bids in this competitive class). Country B shows 20% with zero bids, 15% with one bid, and only 65% with two or more — but when competition occurs, it averages 5 bids. Country B's higher overall average (3.26 vs 2.26) conceals the fact that it has substantially more market failures and suspicious single-bid procedures. Which market is healthier? The average refuses to tell us.
Why the Average Persists
If the average is so misleading, why does it dominate procurement analysis? The answer lies in what I call the certainty illusion — and in a deeper institutional dynamic.
An average feels like a fact. A distribution feels like uncertainty. Kahneman's distinction between System 1 (fast, intuitive) and System 2 (slow, analytical) thinking is useful here: averages allow System 1 processing, while distributions demand the effortful engagement of System 2. We experience something akin to ambiguity aversion: we prefer a known point estimate to acknowledged variance, even when the variance is precisely what matters. The average offers the comfort of a single, comparable number that can be tracked, benchmarked, and placed on a dashboard. "Competition increased from 3.1 to 3.4 bids" is reportable. "The share of suspicious single-bid procedures dropped from 22% to 18% while zero-bid failures rose from 8% to 11%" requires explanation, nuance, and uncomfortable trade-offs.
But there is something deeper at work. The average solves a problem that the disaggregated data refuses to solve: it produces a ranking. Country A is better on failure rates; Country B is better on competitive intensity. The average cuts this Gordian knot by imposing a commensuration. That is not confusion — it is a decision dressed up as a calculation. (Herbert Simon's concept of bounded rationality applies here, but at the institutional rather than individual level)
Here lies the hidden politics of the mean. By collapsing categories, the average smuggles in an implicit value judgment: that five bids on one procedure can compensate for another procedure receiving zero. Is that true? From a policy perspective, zero-bid procedures are not simply "low competition" — they are potential system failures that extra bids elsewhere cannot redeem. The average embeds a weighting scheme that nobody voted for and nobody made explicit.
A Better Path: Longitudinal Tracking Within Categories
If cross-sectional comparison demands an indefensible commensuration, what is the alternative? Policy makers, after all, cannot be expected to set explicit weights — let alone to justify them politically.
The more constructive approach is to track performance within each category over time. Rather than asking "Is Country A better than Country B?" — a question that forces false trade-offs — we ask "Is Country A improving on each dimension?" That question can be answered separately for each class without requiring us to collapse them into a scalar.
This approach has practical virtues. Trajectory is harder to game than level. If you are judged on the average, you can inflate it by attracting more bids to already-competitive procedures while ignoring the pathological cases. But if you are tracked on change in each category, improving your zero-bid rate from 20% to 15% is visible and distinct from improving your competitive intensity from 3 to 4 bids. The decomposition makes selective effort transparent.
Most importantly, this reframing transforms measurement into diagnosis. "Raise the average from 3 to 4 bids" is an outcome target with no causal pathway attached. It tells you what success looks like but nothing about what is wrong or what to do. A minister can nod at it and have no idea where to intervene.
By contrast, "We have 35% zero-bid procedures" is actionable. It immediately provokes the right questions: Are these concentrated in certain sectors? Is it a publicity problem? A specification problem? An unrealistic timeline? Each answer points to a different intervention. The categorical breakdown parses the problem into components that correspond to levers someone can actually pull.
The Seduction We Ought to Resist
The cognitive seduction of the average is, at bottom, the seduction of false simplicity. It offers the appearance of rigour while obscuring the value judgments that meaningful comparison requires. It provides a number that everyone can agree on precisely because it avoids the hard questions about what we actually care about.
Stephen Jay Gould, facing a cancer diagnosis with a median survival time of eight months, wrote a celebrated essay — The Median Isn't the Message — about resisting the tyranny of summary statistics. He understood that he was not the median patient, and that the distribution's shape mattered far more than its central tendency. Most people lack Gould's statistical sophistication. But in public procurement, we have no such excuse. We know better. The question is whether we have the institutional courage to act on what we know.
In public procurement, as in so many domains, the path forward lies not in finding a better single metric but in accepting that some comparisons are irreducibly multidimensional — and that tracking progress within each dimension, over time, is both more honest and more useful than collapsing everything into a mean that means less than it appears.


