Why don't investors measure the success of strategic asset allocation policy?

Institutional investors like to measure things. They measure the performance of their investment managers pretty much continuously. They measure portfolio positions. They measure holdings. They measure risk exposures. So why don’t they measure the success of strategic asset allocation policy, the decision that more than any other determines the long-term success or failure of an investment program?

Strategic asset allocation policy involves a trade-off

The explanation begins, I believe, with the observation that the strategic asset allocation policy decision is based on a trade-off. When choosing between (say) policy A and policy B, the investor considers three things: first, how do the policies compare in terms of expected return?; second, how do they compare in terms of risk?; and, finally, which combination of expected risk and return better fits what the investor is trying to achieve? We can label these three considerations return, risk, and risk tolerance.

The decision is made under conditions of uncertainty, relying on imperfect assumptions about the behavior of capital markets. This means that we may select policy A over policy B based on what we know at the time, but – as with most investment decisions – our model of the world is unlikely to prove exactly correct. Which is where post-hoc measurement comes in.

Hindsight is NOT always 20-20

Suppose we could fast forward to the end of the story and look back on how things turned out. It might seem like this would remove the uncertainty and make it possible to tell how good the strategic allocation decision was: did policy A turn out to be a better decision than policy B after all?

Hindsight is certainly clear when it comes to the first of the three considerations that went into the decision: return. After the event, we do indeed know with certainty how well a strategic asset allocation decision performed in terms of the outcome generated.

But when it comes to risk, we are only somewhat better informed after the event than before. Hindsight is not 20-20 in this case. A retroactive assessment of risk, just like a prospective assessment of risk, requires a risk model and some assumptions. So, unlike the measurement of returns, the measurement of risk remains subjective even with the benefit of hindsight. For example, if we elect to measure the volatility of a return series as a gauge of risk, then we are making the implicit assumption that the bumpiness of the path is a reliable indicator of the uncertainty of the destination (for an explanation of the difference, click here.)[1]

And hindsight is even less useful in assessing the third component of the decision: risk tolerance. As Don Ezra and I have written: “everything that has actually happened seems very obvious – will nigh inevitable in fact – from today’s perspective. And we cannot unknow what we know today when we attempt to judge how obvious these events were before they happened.”[2] Hence, our recollection of the crash of 2008 is inevitably colored by the subsequent recovery, which distorts how we remember the depths of uncertainty that investors felt in March 2009. In short, after the event we basically know no more – possibly even less – about our risk tolerance than we did when the original decision was made.

Can anything be done?

That’s my take on why strategic asset allocation decisions tend not to get evaluated. Others would point to the absence (in most cases) of an obvious neutral benchmark to which decisions can be compared, which is another complicating feature. So this is certainly a more difficult task than, say, evaluating the success of an active management mandate. But that doesn’t necessarily mean that there’s nothing that can be done. Perhaps the way forward could be based on the principles underpinning Brier scores, which are used to measure the success of forecasting in other fields.

That wouldn’t be easy, but it’s worth giving thought to. For example, I posted a few weeks ago about the widespread reliance on peer group league tables for target date fund (TDF) evaluation and how “peer group pressure comes not from the formal objectives of funds, but rather as an indirect side-effect of the way funds are, in practice, evaluated.” Peer group comparison only compares returns, not risk or the effectiveness of the trade-off that is built in to the glide path decision. But the asset allocation component of a TDF (which takes the form of a glide path) is a huge element in eventual success or failure. To do a better job of measuring TDFs, we need to do a better job of measuring the asset allocation decision.

It will be interesting to see how this plays out in the coming years.

1Another line of argument – which may appeal to some readers – runs as follows: before the event there are many possible paths, but in practice only one is actually taken. So, except in the case where that one outcome was not even recognized as a possibility, you do not necessarily know more after the event than you did before about what the range of possibilities was. Those who take that line might even conclude that, after the event, considerations of risk or risk tolerance become irrelevant: the actual outcome tells you only what path was taken, and nothing about what else could have happened.
2Collie, B. and D. Ezra (2007). “Resist the amygdala! Institutional investment decision making.” Russell Investments Research.