Security Metrics: Getting Aggregation Right

March 9, 2025 3-minute read

Posts

Lately I’ve been working a good bit on security metrics, and I wanted to publish a short post on some learnings I’ve had along the way.

First off, security metrics may not be super exciting to work on, but I do see them as important work. They make it possible for leadership and product teams alike to understand security postures without having to go and directly talk to a subject matter experts, so they’re an effective way of scaling security.

But, if we’re going to rely on them as a way of scaling security, it is important to ensure that they’re accurate. One specific pitfall I recently ran into is the question of how to aggregate security metrics, and how this can lead to some dangerously wrong conclusions.

The Problem with Averaging Risk Scores

Let’s talk about a common scenario. You’ve got multiple services in your environment, each with its own risk score. Maybe you’re using CVSS, a custom framework, or something simple like High/Medium/Low translated to numbers (say 10, 5, and 1).

The temptation is to take the average. Your dashboard proudly displays: “Average Risk Score: 2.5 (LOW)” and everyone sleeps well at night.

But wait a minute. What if you have five nearly-perfect services with risk scores of 1, and one critical service with a score of 10? The arithmetic mean gives you $(1+1+1+1+1+10)/6 = 2.5$, a low risk score.

But intuitively, that doesn’t feel right. Attackers aren’t purely stochastic. They look for high risk services and exploit those vulnerabilities. Adding more low risk services doesn’t actually reduce overall risk. So it actually is quite misleading to use the average risk score!

This is the problem with simple averages for risk–they obscure the outliers, and in security, outliers are exactly what we care about most. The weakest link in the chain determines the overall strength.

A Better Approach: The Generalized Mean

Instead of the arithmetic mean, I’ve been experimenting with the generalized mean as a more accurate way of aggregating risk. The generalized mean is defined as:

$$M_p(x_1, x_2, …, x_n) = \left(\frac{1}{n} \sum_{i=1}^{n} x_i^p\right)^{1/p}$$

When p=1, this is just our familiar arithmetic mean/average. But when p is higher, this effectively weighs higher values more heavily, which is exactly what we want for risk scores.

Let’s look at our previous example where we have 5 low-risk services and one high-risk service:

Arithmetic mean (p=1): 2.5
Generalized mean (p=3): 4.2

That higher score much better reflects the reality that we have a significant risk in our environment. And of course, the value of p can be tuned here. A higher value of p puts a higher emphasis on high-risk services.

Conclusion

Ultimately this comes from the fact that security isn’t about averages; it’s about weakest links. By using the generalized mean with p=3 (or experimenting with different values of p based on your risk tolerance), you get a much more accurate representation of your overall security posture.