Distributions-3

Distributions: Part III ... continuing from Part II

thanks to Jerry S. for his comments

In Part I, we assumed that the Variance of a SUM was equal to the SUM of the Variances and, although that's a good assumption for uncorrelated/independent returns, we were considering a sum like: z₁+z₁z₂+z₁z₂z₃+...
(where the numbers z are annual Gain Factors like z = e^q > 0).

Alas, the terms z₁, z₁z₂, z₁z₂z₃ etc. are NOT uncorrelated (even if the individual gain factors like z₁, z₂, z₃ ARE uncorrelated).

>Is that why your so-called safe withdrawal rates were ~~lousy~~ so high, compared to Monte Carlo?
I think so.

>So, change it!

Okay, but recall (from earlier investigations) our magic formulas for determining the Survival Rate as a function of the Withdrawal Rate:

Assume that our Portfolio is invested in some asset(s) with Mean and Standard Deviation of returns of M and S and
the Inflation Rate is i (so I=1+i is the Inflation Factor) and
g₁, g₂, g₃ are the annual asset Gain Factors (each of the form 1+Return), so that, after n years,
the asset has increased by a factor g₁g₂...g_n (which is then the value of a $1.00 buy-and-hold Portfolio after n years).
Starting with a $1.00 portfolio and an initial withdrawal rate of W (increasing with inflation), the value of our portfolio after n withdrawals is
g₁g₂...g_n[1 - W*gMS(n)] where gMS(n) = I/g₁ + I²/g₁g₂ + I³/g₁g₂g₃ + ... + Iⁿ/g₁g₂...g_n.
Hence W*gMS(n) is the reduction factor for a buy-and-hold Portfoio ... reduced by virtue of our withdrawals.
Note that, in order to survive n years, we'll need gMS(n) < 1/W.
Let
[a] M = I e^{-(M - S²/2)} the Mean of I / g = (InflationFactor)/(GainFactor)
[b] S² = M²[e^S²-1] the Variance of I / g
[c] Mean = M (1 - Mⁿ) / (1 - M) the Mean of I/g₁+I²/g₁g₂+...+Iⁿ/g₁g₂...g_n = z₁+z₁z₂+z₁z₂z₃+...+z₁z₂...z_n
[d] SD = ? the Standard Deviation of z₁+z₁z₂+z₁z₂z₃+...+z₁z₂...z_n
Assuming a lognormal distribution of values for gMS(n), we calculate the probability that gMS(n) < 1/W and get the probability that our portfolio will survive n years with an initial withdrawal rate of W, (increasing with inflation), namely N[X] where N is the "standard" normal cumulative distribution function and
[e] X = log(1/W) - (1/2)log[Mean²/{ 1+SD² / Mean² }]

SQRT{log[1+SD²/Mean²]}
... and log is the natural logarithm.
If, for example, W = 0.05 (meaning a 5% withdrawal rate), then 1/W = 20 and Figure 0 displays typical results (depending upon n, the number of years of withdrawals).
The dots are on the gMS(n) cumulative distribution, evaluated at 1/W = 20.

Figure 0

We want to modify our formula for SD.

Here's what we'll do (reproducing, in part, what we got earlier):

Assume that the Mean and Standard Deviation of the variables z are M and S (so the Variance is S², as in [a] and [b], above).
Then the Mean of z₁z₂z₃...z_n is Mⁿ ... that's okay, for independent z-variables
The Mean of z₁ + z₁z₂ + z₁z₂z₃ + ... + z₁z₂z_n is M + M² + ... + Mⁿ (giving [c], above) ... that, too, is okay for independent z-variables
The Variance of z₁z₂z₃...z_n is given by (M²+S²)ⁿ - M²ⁿ ... we got this before and it's also okay for independent z-variables
BUT, what's SD², theVariance of z₁ + z₁z₂ + z₁z₂z₃ + ... + z₁z₂...z_n ???

>What did you get before?
We just added the Variances and got SD² = Σ [(M²+S²)^k - M^2k] the sum going from k = 1 to k = n.

>So what's a better formula for that Variance?
Okay. Remember that VAR[x] = (the Average Square) - (the Square of the Average) = Mean[x²] - Mean²[x] see stat stuff #6.

In particular, we can write:
[1] Mean[x²] = VAR[x] + Mean²[x].

Let's now consider VAR[x+y]. We have (see stat stuff #7):

VAR[x+y] = (the Average of (x+y)²) - (the Square of the Average of (x+y))
= VAR[x] + VAR[y] + 2Mean[xy] -2Mean[x]Mean[y]

If x and y were independent, then Mean[xy] = Mean[x]Mean[y] and we'd get our Variance of a Sum = Sum of Variances.

>You mean VAR[x+y] is NOT equal to VAR[x]+VAR[y]?
Not in our case.
We'll use

[2] VAR[x+y] = VAR[x] + VAR[y] + 2Mean[xy] -2Mean[x]Mean[y]

with x = z₁+z₁z₂+...+z₁z₂...z_k-1 and y = z₁z₂z_k-1...z_k.
This x and y ... they aren't independent!
Anyway, that'll give us a prescription for finding VAR[x+y] = VAR[z₁ + z₁z₂ +...+ z₁z₂...z_k-1 + z₁z₂...z_k]

>zzzZZZ

First, notice that the CoVariance of x and y is (see stat stuff #4):

COVAR[x,y] = Mean[xy] - Mean[x]Mean[y]

(See also Pearson Correlation )

Then we can rewrite [2] as:

[3] VAR[x+y] = VAR[x] + VAR[y] + 2COVAR[x,y]
or
[3A] SD²[x+y] = SD²[x] + SD²[y] + 2r SD[x]SD[y]

where r is the Pearson Correlation between x and y.

>Huh? If r = 1 then ... uh ...
Then SD²[x+y] = SD²[x] + SD²[y] + 2SD[x]SD[y] = (SD[x] + SD[y])²
so SD[x+y] = SD[x] + SD[y] (as in stat stuff #9).

>The Standard Deviation of a Sum is the Sum of the Standard Deviations?
Yes. Neat, eh? That's for perfect correlation and, if we look at some examples of a thousand uncorrelated random returns z₁, z₂, z₃ etc. we see that the correlation between z₁ and z₁z₂ and between z₁+z₁z₂ and z₁z₂z₃ etc. etc. ... we find that the Pearson Correlations are very high.

>What's "very high"?
I whipped out my Excel, generated a thousand random returns (which happened to be lognormally distributed) to represent the z-variables, calculated the Pearson Correlation between z₁ and z₁z₂, then between z₁+z₁z₂ and z₁z₂z₃ etc. etc.
and I did this again and again and again I got this
(which gives five examples of what I got):
Notice that, although the correlation between z₁ and z₂ is small, the correlation between x = z₁+z₁z₂+...+z₁z₂...z_k-1 and y = z₁z₂z_k-1...z_k is about 80%.

Figure 1

>So, what's your next attempt at the Standard Deviation of z₁+z₁z₂+z₁z₂z₃+... ?
I'm going to add the Standard Deviations of each term. Here's an example
where I generate a thousand random z₁, z₂, z₃ and z₄ and, from these,
calculate a thousand values for z₁+z₁z₂ and z₁+z₁z₂+z₁z₂z₃ and z₁+z₁z₂+z₁z₂z₃+z₁z₂z₃z₄
then compare the Standard Deviation of the Sum to the Sum of the Standard Deviations
(for all thousand sums).
>Sure, but can't you find an exact result?
Yes, but it'll be messy. Let's start with
Standard Deviation of a Sum = the Sum of the Standard Deviations.
After all, the proof ...
>Yeah, yeah. The proof is in the pudding.

Figure 2
Aah, but now my problem is to add the Standard Deviations:
SQRT[(M²+S²) - M²] +SQRT[(M²+S²)² - M⁴] +SQRT[(M²+S²)³ - M⁶] +...+SQRT[(M²+S²)ⁿ - M²ⁿ]

>Can you get a magic formula for that?
I'm thinking, but in the meantime ...

If you recall, from here, we used Variance of Sum = Sum of Variances ... and got :
SD²(gMS(n)) = SD²(z₁ + z₁z₂ + z₁z₂z₃ + ... +z₁z₂z₃...z_n) = Σ (M²+S²)^k - ΣM^2k
We even said, at that time: "If we don't like what we get, we'll change it."

>Now you'll change it, eh?
Yes. Now we'll try Standard Deviation of a Sum = Sum of the Standard Deviations.

That would give us:
[4] SD(gMS(n)) = SQRT[(M²+S²) - M²] +SQRT[(M²+S²)² - M⁴] +SQRT[(M²+S²)³ - M⁶] +...+SQRT[(M²+S²)ⁿ - M²ⁿ]

We could also write:
[4A] SD(gMS(n)) = M SQRT[(1+S²/M²) - 1] +M² SQRT[(1+S²/M²)² - 1] +...+Mⁿ SQRT[(1+S²/M²)ⁿ - 1].

Let's look (again!) at the relative size of these terms, for
M = 0.09 (that's a 9% return) and S = 0.15 (that's a 15% SD) and I = 1.03 (that's an inflation rate of i = 3%) we get:
M = I e^-(M-S²/2) = 0.9520
S² = I²e^-2(M-S²/2)[e^S²-1] = 0.0206

So M is about 0.95 and 1+S²/M² is about 1.02 and, if we look at the graph of the SUM
0.95 SQRT[1.02-1]+0.95² SQRT[1.02²-1]+0.95³ SQRT[1.02³-1]+...+0.95ⁿ SQRT[1.02ⁿ-1],
we get Figure 3 which shows the size of the terms ... and their SUM.
Our problem is to approximate the SUM graph, for n up to 40 (the number of years of withdrawals)
... as in Figure 4.

Figure 4

Figure 3

>Good luck!
Well, since S²/M² is small, we can approximate (1+S²/M²)^k-1 by kS²/M² ... since (1+x)^k = 1+kx, for x small.

>For k as large as 40?
Well, not really, but the the proof ...

>Forget the pudding!
Anyway, [4A] then becomes:
SD(gMS(n)) = M SQRT[S²/M²] +M² SQRT[2S²/M²] +...+Mⁿ SQRT[nS²/M²].
or
[4B] SD(gMS(n)) = S { 1+M SQRT[2]+M² SQRT[3]+...+M^n-1SQRT[n] }
This gives an approximation which is a mite too small
(as seen in Figure 5 where we used the earlier parameters) so we'll use, instead:
[4C] SD(gMS(n)) = {S/M} { 1+M SQRT[2]+M² SQRT[3]+...+M^n-1SQRT[n] }

Figure 5
Now our problem is to approximate this SUM.
You wouldn't happen to recognize the function f(x,n) = 1+2^1/2x + 3^1/2x² + ... +n^1/2x^n-1

>Are you kidding? Anyway, have you tried using 4A to compare ... uh ...?
To 4C? Yes. Figure 6 gives a comparison of withdrawal / survival rates using Equation 4A, 4B and 4C and, for comparison, using a Monte Carlo simulation with randomly selected annual returns (from the S&P 500 returns from 1928 to 2000) as well as randomly selected returns from a normal or lognormal distribution with the same Mean and Standard Deviation. (For the former I used the spreadsheet described here and, for the latter, I used the spreadsheet described here.)
>An average return of 12.7%?
For the S&P 500? Yes.
>And the Monte Carlo stuff?
That's the result of 5000 Monte Carlo simulations with the same Mean and SD ...
>The S&P curve is crooked.
Well, I used 5000 simulations for each withdrawal rate.
It'd be smoother if I had used 100,000 simulations and ...
>Are you happy?
Hardly. Our survival rates are too high and ...

Figure 6

>And the Monte Carlo curves don't even agree with the S&P curve!
Yes. The S&P curve uses the actual returns (selected at random, without assuming anything about their distribution) whereas the other Monte Carlo curves assume a particular distribution.

>I'd say all your approximations are lousy! The proof is in the pudding? Well ...
Yeah. Don't remind me.
But I'm still thinking ...

Okay, let's look (again!) at that magic formula for the Survival Rate, namely N[X] with:
[A] X = log(1/W) - (1/2)log[Mean²/{ 1+SD² / Mean² }]

SQRT{log[1+SD²/Mean²]}
where W = 0.04 (meaning a 4% Withdrawal Rate), n = years = 30
... and "typical" values of Mean and/or SD
(as represented by the S&P500) ... to see how that Survival Rate depends upon these parameters.
Note that (according to Figure 7) the Survival Rate decreases as Mean or SD increase.
Since our formula gives Survival Rates that are too large, we can decrease our calculated Rates by multiplying each of Mean and SD by some number, say C > 1. Then the modified formula becomes:
[B] X = log(1/W) - logC - (1/2)log[Mean²/{ 1+SD² / Mean² }]

SQRT{log[1+SD²/Mean²]}

Since log(1/W) - logC = log(1/CW), this has the effect of changing W to a larger value: CW.
For example, if we use [B] instead of [A] (and [4C] to provide a value for SD)
with C = 1.20, we get Figure 8 (which we can compare with Figure 6).
>You picked your C to get good results, eh?
Why not?

Figure 7

Figure 8

>Are you sure that the gMS distribution is lognormal?
Okay. I did a 5000 iteration Monte Carlo simulation, randomly selecting either Lognormal or Normal returns to calculate gMS(30)
(with 3% inflation) and I got Figure 9 ... which shows a lognormal distribution with the same Mean and Standard Deviation as gMS(30).

Figure 9

>You did 5000 simulations. Is that enough?
Here's a picture with 10,000 simulations
(compliments of Jerry S.)

>Just what are you plotting?
In Figure 10? It's the distribution of 10,000 values of gMS(30).
Note that the percentage of gMS values less than 1/W = 1/0.04 = 25
(for a 4% withdrawal rate) gives the survival rate. In Figure 10, it's 93%.

>So you're happy with that assumption. I mean, assuming that ...
That the distribution is lognormal? Yes, I'm happy about that.

Figure 10

>So, is there a spreadsheet?
Uh ... yes, such as it is ... but it's not finished ... I guess. It uses [4B], I think.
However, if you'd like to play, just RIGHT-click and Save Target here to download the .ZIPd file.

Recently Walt D. sent me a 1999 paper by McCabe who considers survival rates when n = infinity (meaning you live forever).
In particular, McCabe had a slick way of calculating SD for infinite n.
Following McCabe we write:
[5] z₁ + z₁z₂ + z₁z₂z₃ + ... = z₁(1 + z₂ + z₂z₃ + z₂z₃z₄ + ... ) two infinite series

Note that the right side of [5] has the form xy with x = z₁ and y = 1 + z₂ + z₂z₃ + ...
Note, too, that x and y are uncorrelated.

We'll refer to the Mean and Variance of x and y as M[x], M[y] and VAR[x], VAR[y].
For these series, VAR[xy] = VAR[z₁ + z₁z₂ + z₁z₂z₃ + ... ]
is the same as VAR[z₂ + z₂z₃ + z₂z₃z₄ +... ] 'cause they'r the same infinite series when it comes to distributions
is the same as VAR[1 + z₂ + z₂z₃ + + z₂z₃z₄ + ... ] see stat stuff #2
which is VAR[y]. We make a note of this:
[6A] VAR[xy] = VAR[y].

Compare this to:
[6B] VAR[xy] = M²[x]VAR[y] + M²[y]VAR[x] + VAR[x]VAR[y] see stat stuff #10

>I like 6A!
Yes, me too. The simpler version [6A] is a result of the fact that we're dealing with an infinite series
so the distribution of z₁ + z₁z₂ + ... is the same as the distribution of z₂ + z₂z₃ +...

Proceeding, we put [6A] into [6B] and solve for VAR[xy] = VAR[y] giving the neat result:
[6C] VAR[y] = M²[y]VAR[x] / (1 - M²[x] - VAR[x])

Using the notations in [a], [b], [c] and [d] above:

M[x] = M[z₁] = M the Mean of the terms in gMS
M[y] = 1+M[z₂ + z₂z₃ + ...] = 1 + Mean note that Mean is the Mean of gMS
VAR[x] = S² the Variance of the terms in gMS
VAR[y] = SD² the Variance of gMS

Substituting into [6C], we get:

[7] SD² =

(1+Mean)²S²

1 - M² - S²

But S² = M²[e^S²-1] from [b], above, so 1 - M² - S² = 1 - M² e^S²
so

[7A] SD² =

(1+Mean)²S²

1 - M² e^S²

If we again take our survival rate as N[X], with X defined by [e], namely:
X = log(1/W) - (1/2)log[Mean²/{ 1+SD² / Mean² }]

SQRT{log[1+SD²/Mean²]}
and we let n infinity so that
Mean = M (1 - Mⁿ) / (1 - M) M / (1 - M) we can compare the survival rates for large values of n (like n = 60) and n = infinity ... as in Figure 11A.

>Is the red curve McCabe's formula.
No. It's our N[X] using [7A] for SD. The other coloured curves use [4B].
McCabe's formula gives the thin black line in Figure 11A.

>Why use [4B]?
Look at Figure 6. There's little to choose between [4A], [4B] and [4C] and ...

>And [4B] is simplest?
Yes. It approximates (1+x)ⁿ by 1+nx which isn't a good approximation for large n, but then the large-n terms contribute very little to gMS.

>And for larger withdrawal rates?
See Figure 11B for n = 30 years.

>So, is there a spreadsheet?
Soon ...

Figure 11A

Figure 11B

... don't trust nothin' - yet but see more Monte stuff.