Standard Deviation

Standard Deviation ... some thoughts

suggested by John B.

When doing the calculations necessary to generate Bollinger Bands one normally ...

>Bollinger?
Don't you remember? From the above link:

Collect stock prices for the past N days: P₁, P₂, ... P_N
Calculate M, their Mean (or Average) via:
M = (1/N) (P₁ + P₂ + ... + P_N) = (1/N)ΣP_m
where Σ means Sum the terms and P₁ is the Price N days ago and P_N the most recent Price
Calculate SD, the Standard Deviation of this set of Prices via:
Variance = SD² = (1/N)(P₁² + P₂² + ... + P_N²) - M²(N) = (1/N)ΣP_m² - [(1/N)ΣP_m]²
(the average square) minus (the square of the average) ... which is the same as (1/N)Σ(P_m - M)²
Pick a small number k (example k = 1 or maybe 1.5 or maybe 2) and calculate:
the Upper Bollinger Band via: BU = M + k SD
the Lower Bollinger Band via: BL = M - k SD

Here's an example: GE stock, from November, 2000 to July, 2003 and ...

>So? N = ? k = ?
Patience! In Figure 1, I used N = 20 days and k = 2.0 Standard Deviations.
>And the stock price bounces off the Bollinger Bands, eh?
That's the idea.
To see the geometry more clearly (in this example), we can blow up a part of the chart:

Figure 2

Figure 1

>Yeah, but isn't that unusual? I mean, the Standard Deviation of Stock Prices?
Yes, but it works pretty well, eh? It suggests when to BUY or SELL.

However, one normally calculates (and plays with) the Standard Deviation of Stock Returns, not Stock Prices, but ...

>Don't tell me! That's the purpose of this tutorial, right?
Right. To investigate the relationship between the SD of Stock Prices and the SD of Stock Returns. For the GE example above, we get Figure 3.

In what follows. we'll be considering the daily stock Gain Factors rather than stock Returns.
By Gain Factors I mean g₁ = P₁/P₀ and g₂ = P₂/P₁ etc.

Figure 3

Let's suppose that:

The Stock Price, N+1 days ago, was P₀
Let the daily Gain Factors over the past N days be g₁, g₂, ... g_N
Let G_m be the cumulative Gain Factor over the first m days, so
G_m = g₁g₂...g_m.
Then the Stock Prices are:
P₁ = P₀ g₁ = P₀ G₁
P₂ = P₀ g₁g₂ = P₀ G₂
P₃ = P₀ g₁g₂g₃ = P₀ G₃
...
P_N = P₀ g₁g₂...g_N = P₀ G_N

The question is:

If we know the Mean and SD of the gs, what's the Mean and SD of g₁ + g₁g₂ + g₁g₂g₃ + ... +g₁g₂g₃...g_m ?

>Huh? Why is that the question?
Because, whereas the set g₁, g₂, g₃, ... are the daily Stock Gain Factors, the daily Stock Prices depend upon g₁, g₁g₂, g₁g₂g₃, ... and we're interested in determining the relationship between ...

>Yeah, I remember: the relationship between the g's and the G's.
Right, because the G's are products of the g's, like g₁g₂g₃...

We assume we know the parameters for the daily Gain Factors, g₁, g₂, g₃, ...assumed to be random variables:

Mean(g) = g
SD(g) = s

Okay, now we have the following magic formula:

As noted here:
{ Mean(xy) - Mean(x) Mean(y) } = SD(x)SD(y) R
where R is the Pearson correlation between the random variables x and y.

We'll assume that our Gain Factors g_m are independent, so R = 0 so that Mean(g_ig_j) = Mean(g_i) Mean(g_j)
... and that extends to a product with umpteen terms. That is, the Mean of a Product is the Product of the Means.

>The g's are independent? Can you assume that?
Why not? We could incorporate all the cross-correlations but day-to-day gains (usually) have small correlation.
For example, the Pearson correlation between successive daily gains, for the GE example, above, is about -1.6% so it's seems reasonable (and greatly simplifies the math) ...
>Yeah, yeah. Sounds like a math gimmick to me.
A math gimmick, eh? Yes, analysts ~~always~~ often use such gimmicks, like assuming a Lognormal distribution. It makes life easier.

Anyway, if the g's are uncorrelated, we say that Mean(g₁g₂... g_m) = Mean(g₁) Mean(g₂)...Mean(g_m)
(where the g's are random numbers selected from some distribution).
Then: Mean(g₁g₂... g_m) = g^m.

Hence, the Mean of N successive Gain Factors = (1/N)[g₁ + g₁g₂ + g₁g₂g₃ + ... + g₁g₂g₃...g_N] = (1/N)[g + g² + g³ + ... + g^N]
and we have a formula for the sum of that series:
Mean of N successive Gain Factors = Mean(g₁ + g₁g₂ + g₁g₂g₃ + ... + g₁g₂g₃...g_N ) = (1/N)[g(g^N - 1)/(g - 1)]

Okay, now on to the Standard Deviation:
For independent random variables x and y, we have another magic formula:
SD²(x*y) = Mean²(x)SD²(y) + Mean²(y)SD²(x) + SD²(x)SD²(y)

We can write this as:
[*] SD²(g₁g₂...g_m-1g_m) = Mean²(g₁g₂...g_m-1)SD²(g_m) + Mean²(g_m)SD²(g₁g₂...g_m-1) + SD²(g₁g₂...g_m-1)SD²(g_m)
or
[*] SD²(g₁g₂...g_m-1g_m) = (g²+s²) SD²(g₁g₂...g_m-1) + g^2m-2s²

>This looks awfully familiar.
Yes, we did it here, but we'll repeat it just to avoid going back and forth:
Using [*] and proceeding step-by-step, we get:

[!!]

Product Mean² SD²
g₁ g² g²+s² - g² = s²
g₁g₂ g⁴ (g²+s²)² - g⁴
g₁g₂g₃ g⁶ (g²+s²)³ - g⁶
... ... ...
g₁g₂g₃...g_m g^2m (g²+s²)^m - g^2m

To get the Variance (or SD²) for N cumulative Gain Factors, we could consider adding the variances for each (in the right-most column, above) but ...

>I assume that the variance of a sum is the sum of the variances?
Only for independent random variables, but the terms g₁, g₁g₂, g₁g₂g₃ etc. are NOT independent.
Finally, then, we have: SD²(g₁ + g₁g₂ + g₁g₂g₃ + ... +g₁g₂g₃...g_N) = Σ (g²+s²)^j - Σg^2j (j running from j = 1 to j = N)

>And these g₁g₂ and g₁g₂g₃ ... they're are all independent?
Uh ... not really, but for starters we'll assume that they are.

>Don't you have an "altogether now"?
Note that, when multiplied by the starting Stock Price P₀, the cumulative Gain Factors G₁ = g₁, G₂ = g₁g₂ etc. are just the successive daily Stock Prices.

Altogether now:

[A]
If the daily stock Gain Factors, g₁, g₂, ... are uncorrelated random variables with Mean(g) = g and SD(g) = s
then:
Mean of N successive Stock Prices = (P₀/N)[g(g^N-1)/(g-1)]
and
Variance of N successive Stock Prices = P₀² [(g²+s²) {(g²+s²)^N-1} / (g²+s²-1) - g² {g^2N-1} / (g²-1)]

>You've used a magic formula to add up Σ (g²+s²)^j - Σg^2j ?
Yes.

>I figure there's a lot of hand-waving there. Can you provide some real life ...?
Let's check out the efficacy of these magic formulas, okay?
>Efficacy?
Pay attention.

We generate a set of 20 randomly selected daily returns
from a lognormal distribution with Mean = 1% and SD = 1%.
Using the magic formula above,
namely (P₀/N)[g(g^N - 1)/(g - 1)]
where P₀ = $10, N = 20 days and daily Gain Factor g = 1+DailyReturn = 1.01
we get a (formula-generated) Mean Stock Price as:
Mean Stock Price (over 20 days) = (10/20)[1.01(1.01²⁰ - 1)/(1.01 - 1)] = $11.12
Now we calculate the actual Mean of this set of 20 Stock Prices (starting at P₀ = $10.00).
Finally, we calculate the percentage error between the actual Mean and the formula-generated Mean of $11.12
Then we repeat all of the above steps umpteen times ... to see how good (or bad) the formula is.

Figure 4

For a dozen sets of 20-day Stock Prices, we get Figure 4.
>Yeah, so the formula gives a reasonable estimate for the Mean of Stock Prices, but what about GE and what about SD and ... ?
Patience, but I should point out that the formula for the SD is ... uh, lousy.
Look at Figure 5, for the GE example we started with. It shows, in blue, the actual moving 20-day Average Stock Price and, in red, the Average according to the above formula. That is, we take the Mean daily Gain Factor for the past 20 days (that's g) then we use the formula to estimate the average stock Price, namely:
(P₀/N)[g(g^N - 1)/(g - 1)]
>Huh? That's g? What's g?
Yes. If the Stock Price 21 days ago was P₀ = $39.40 and the average daily gain (over the past
N = 20 days) is 0.234%, then we use (for that 20-day period)
g = 1+AverageGain = 1.00234 and (P₀/N)[g(g^N - 1)/(g - 1)]
and plot the point (39.40/20)[1.00234(1.00234²⁰ - 1)/(1.00234- 1)] = $40.38 ... in red.

Figure 5

Figure 6 >Mamma mia! Something smooth, like the 20-day moving average, turns into ...
Something scary ... which explains why the SD = Volatility formula needs some work
>So what does it look like, for the GE example?
Can't you see Figure 6? There, I looked at the actual SD (for Prices) over the previous 20 days and compared it with the formula-generated SD (using, each day, the 20-day moving average daily Gain Factor g ... and SD = SQRT[Variance] from [A], above)
Although the Formula Mean stays close to the actual Mean (over 20 days, as in Figure 5), those wee oscillations are pretty wild and they generate a wild and wooly volatility.
>But the volatility in Prices is pretty wild too.
True, so maybe trying to generate a formula which mimics the actual Price volatility is pointless.

>So is it always that way or maybe with some other stock or ...
Okay. Consider these:

Figure 7

>Aha! Look at the S&P 500! The formula gives a smooother SD!
Hmmm. Interesting, eh?

>And how do Bolli Bands work ... for the S&P?

Figure 8 shows the S&P and 20-day, 2-SD Bollinger Bands and ...

>Yeah, so why can't you conjur up a formula which generates actual Price volatilities?
Uh ... senility?

>Besides, I thought you wanted to compare two Volatilities: Gains and Prices.
Instead you're comparing your Price-Volatility formula with the actual Price-Volatility.
Oh yeah ... I forgot.

Figure 8

Figure 9

Okay, let's assume the Standard Deviation of Gains is s and we stare intently at the formula for the Standard Deviation of Prices (from [A], above), namely:

SD(Prices) = P₀SQRT{[(g²+s²) {(g²+s²)^N-1} / (g²+s²-1) - g² {g^2N-1} / (g²-1)]}

If we pick a few daily Gain Factors (averaged over 20 days ... that's g) and pick P₀ = $1.00 then see how SD(Prices) varies with s... that's SD(Gains) ... we get Figure 9.

Remember, g is the average Gain Factor, namely 1+AverageReturn.

>It looks linear, eh?
Yes, for these particular parameters. We're talking about DAILY parameters
... and g is close to "1" and s is small.

>But it doesn't change much with the average daily gain. All the curves are ...
Close together? Yes, so let's analyze.

If we put g = 1+R, where R is the (small) 20-day average daily return, then we can use the fact that, for small values of x and y,
(1+x)^m = 1+mx (approximately) and (1+x)(1+y) = 1+x+y (approximately).

We use these in the magic formula, putting:

g^m = (1+R)^m = 1+mR (approximately)
g²+s² = 1+2R+s² (approximately)
(g²+s²)^N = (1+2R+s²)^N = 1+2NR+Ns² (approximately).

The magic formula:
SD(Prices) = P₀SQRT[ (g²+s²) { (g²+s²)^N-1} / (g²+s²-1) - g² {g^2N-1} / (g²-1)]
then becomes:
SD(Prices) = P₀SQRT[ (1+2R+s²) (2NR+Ns²)/(2R+s²) - (1+2R)(2NR)/(2R)]
or
SD(Prices) = P₀SQRT[N(1+2R+s²) - N(1+2R)]
or
SD(Prices) = P₀SQRT[Ns²]
or
SD(Prices) = P₀SQRT[N]s = P₀SQRT[N]SD(Gains)
so
SD(Prices) / SD(Gains) = P₀SQRT[N]

Moral?
We might expect that the Standard Deviation of Prices is a multiple of the Standard Deviation of Gains, the multiplier being proportional to the square root of N (the number of days, say N = 20) and the initial Price P₀ (that'd be 21 days ago).

>I thought we were talking 20 days, not 21.
Uh, you're right. It's 20 days ago, but we're considering a total of 21 days 'cause you need 21 prices to calculate 20 daily gains.

>Aah, I see. And the average daily gain g disappears.
It's not a daily gain, it's a daily gain factor, but yes, in this small-return-approximation it cancels out which explains why the various graphs in Figure 9 are so close together.

Figure 10
>So the volatility of Prices is proportional to the volatility of Gains. Do you believe that?
Well, it changes day to day because that starting price, P₀, changes.
>And that's it? You've finished?
Well, suppose we look at our formula-generated Price Volatility and plot it against the actual Gain Volatility, except we'll "normalize" the Price Volatility by dividing (at each daily calculation) by P₀SQRT[N] ... so the changing prices (as we move day to day) don't influence the comparison. Then, for the GE example, we get Figure 10.
>I haven't the faintest idea of what Figure 10 is saying!
Then have a nap. I think I will ...

>But 98% correlation! Wow!
Don't get too excited. That's the correlation between our SD formula for Prices (divided by
P₀SQRT[N]) and the Standard Deviation of the actual Stock Gains, namely s ... and we've already shown that they're (almost) proportional so we'd expect ...
>But what about the correlation between actual SD(Prices) and actual ...?
Like Figure 11?
>You're comparing formulas for Gain volatilities and Price volatilities and actual Gain volatilities and Price volatilities. It's confusing!
Then have a nap.
>zzzZZZ

Figure 11
Of course, y'all can play with a spreadsheet which looks like this.

Just RIGHT-click and Save Target ... here.