Variance Stuff

Variance of a Special SUM

We assume that g₁, g₂, g₃, ... are independent random variables with Mean and Variance (= StandardDeviation²) given by:

M[g] = g
VAR[g] = V = s²

We want to determine the Variance of a Special Sum, namely:

[1a] SUM(n) = g₁ + g₁g₂ + g₁g₂g₃ + ... + g₁g₂g₃...g_n

>What's so special about ...?
Pay attention.
We'll assume that the gs are the daily Gain Factors for some stock price.

If the gs are daily Gain Factors, then this corresponds to the sum of stock prices over the past n days, assuming the price started at $1.00 n days ago ... or it's the sum of the prices over the next n days, if today's price is $1.00 ... or they're the numbers g₁ = P₁/P₀, g₂ = P₂/P₁, ... g_n = P_n/P_n-1 where the P's are the stock prices. In what follows we'll assume that the starting price P₀ = $1.00.

For convenience, we'll set: G_m = g₁g₂g₃...g_m for m = 1, 2, 3, ... n (This corresponds to the price after m days: G_m = P_m/P₀)
We can then write:

[1b] SUM(n) = G₁ + G₂ + G₃ + ... + G_n

Since we're assuming that the gs are independent, then the Mean of G_m = g₁g₂g₃...g_m is g^m.

>Huh?
Okay, we'll recall some magic Stat Stuff regarding the Mean, Variance, Standard Deviation and Covariance of random variables (which we'll call M[x], VAR[x], S[x] and COVAR[x]):
Stat Stuff
If x, y, x₁, x₂ etc. are random variables and C is a constant, then:

M[x+y] = M[x] + M[y] and M[x+C] = M[x] + C since M[C] = C
VAR[x] = S²[x] = M[(x-M[x])²] = M[x²] - (M[x])² so VAR[x-M[x]] = VAR[x+C] = VAR[x]
COVAR[x,y] = M[xy] - M[x] M[y] = COVAR(x+C, y]
and notice that COVAR[x,x] = VAR[x]
VAR[x+y] = VAR[x] + VAR[y] + 2 COVAR[x,y] = VAR[x] + VAR[y] + 2 r(x,y) S[x]S[y]
where r(x,y) = COVAR[x,y] / S[x]S[y] is the Correlation Coefficient
VAR[x₁+x₂+...+x_m] = ΣVAR[x_i] + 2 ΣCOVAR[x_j, x_k]
i = 1 to m, k = 2 to m and j < k
COVAR[x₁+x₂+...+x_n,y] = COVAR[x₁,y]+COVAR[x₂,y]+...+COVAR[x_n,y]
If COVAR[x,y] = 0 so r(x,y) = 0, then:

M[xy] = M[x] M[y] and M[x₁x₂...x_m] = M[x₁]M[x₂]...M[x_m]
VAR[x+y] = VAR[x] + VAR[y] and VAR[x₁+x₂+...+x_m] = VAR[x₁]+VAR[x₂]+...+VAR[x_m]
VAR[xy] = M²[x]VAR[y] + M²[y]VAR[x] + VAR[x]VAR[y]

In addition, we'll need some other magic formulas:
Magic Formulas

(1 + x)ⁿ = 1 + nx approximately, for n and x small.
1 + 2 + 3 + ... + n = n(n + 1)/2
1 + x + x² +...+ x^m-1 = (x^m - 1) / (x-1)
1 + 2x + 3x² +...+ (m-1)x^m-2 = [(m-1)x^m - mx^m-1 + 1]/(x - 1)²

>Can I just bypass the math and go directly to the result ... please?
Well ... okay. Click here.

Continuing ... to get the Variance of the SUM(n) we'll use Stat Stuff #5:

[2a] VAR[G₁ + G₂ + ... + G_n] = ΣVAR[G_i ]+ 2 ΣCOVAR[G_j, G_k] where i goes from 1 to n and the latter sum is for j < k and k goes from 2 to n

>Huh? Do you really expect me to ...?
Okay, in all its grandeur, it looks like:
VAR[G₁ + G₂ + ... + G_n] = VAR[G₁]+VAR[G₂]+...+VAR[G_n]
+ 2COVAR[G₁, G₂]
+ 2COVAR[G₁, G₃]+ 2COVAR[G₂, G₃]
+ 2COVAR[G₁, G₄]+ 2 COVAR[G₂, G₄]+ 2COVAR[G₃, G₄]
...
+ 2COVAR[G₁, G_n]+ 2COVAR[G₂, G_n]+ ...+ 2COVAR[G_n-1, G₃]

Consider COVAR[G_j,G_k]. Remember that k = 2, 3, ... n and j < k.
From Stat Stuff #3:
COVAR[G_j,G_k] = M[G_jG_k] - M[G_j]M[G_k] Mean of the Product - the Product of the Means
But the g's are independent, so that
M[G_k] = M[g₁g₂...g_k] = M[g₁]M[g₂]...M[g_j] = g^k Mean of a Product = the Product of the Means
So we can rewrite: COVAR[G_j,G_k] = M[G_jG_k] - g^j+k so ...

>If Mean of a Product equals the Product of the Means, why isn't that COVAR[G_j,G_k] zero?
Because G_j and G_k aren't independent since G_k contains all the factors of G_j ... and more!

>Huh?
Pay attention:
Consider the term M[G_jG_k] = M[(g₁g₂...g_j)*(g₁g₂...g_k)] = M[(g₁g₂...g_j)²g_j+1g_j+2...g_k] noting that j is less than k.

Now we use "Mean of a Product equals the Product of the Means" because (g₁g₂...g_j)² and g_j+1g_j+2...g_k are independent:
M[(g₁g₂...g_j)²g_j+1g_j+2...g_k] = M[(g₁g₂...g_j)²] M[g_j+1g_j+2...g_k]

However, for the second factor we have:
M[g_j+1g_j+2...g_k] = M[g_j+1]M[g_j+2]...M[g_k] = g^k-j.

For the first factor we use Stat Stuff #2: Mean[x²] = (Mean[x])²+ VAR[x] with x = G_j = g₁g₂...g_j and get:
Mean[(g₁g₂...g_j)²] = (Mean[g₁g₂...g_j])² + VAR[g₁g₂...g_j] = (g^j)² + VAR[g₁g₂...g_j]
where (again!) the Mean of a Product = the Product of the Means (since the g's are independent).

It might look more elegant if we rewrite this like so:
Mean[G_j²] = (Mean[G_j])² + VAR[G_j] = g^2j + VAR[G_j]

>Let's forget elegance, okay?
Putting it all together:
COVAR[G_j,G_k] = M[G_jG_k] - M[G_j]M[G_k]
= M[(g₁g₂...g_j)²g_j+1g_j+2...g_k] - g^j+k
= M[G_j²] M[g_j+1g_j+2...g_k] - g^j+k
= [g^2j + VAR[G_j]] g^k-j - g^j+k
= VAR[G_j] g^k-j

So far we have:

VAR[G₁ + G₂ + ... + G_n] = ΣVAR[G_i ] + 2 ΣVAR[G_j] g^k-j

But we know those VAR[G_i] for each i = 1, 2, 3, ... n
>We do?
Yes, we did it here and it looks like this:

[3] VAR[G_m] = VAR[g₁g₂g₃...g_m] = (g²+s²)^m - g^2m

For typical parameters, namely daily Gain Factors and Standard Deviations, we'd have g = 1+r with r small (r is the daily return, say 0.01 or less) so g is close to "1" and s small (say 0.02 or less) ... so s/g is small ... so we can use Magic Formula #1, like so:

VAR[G_m] = (g²+s²)^m - g^2m = g^2m[ (1+s²/g²)^m - 1 ] = g^2m[ (1+ms²/g²) - 1 ] = m g^2m-2 s²

This says that (approximately), the Standard Deviation of m-day gains is SQRT(m g^2m-2 s²) = SQRT(m)g^m-1s.
That's just the 1-day Standard Deviation, s, increased by a factor: the square root of the time period SQRT(m) ... a familiar result

>And increased by g^m-1, too.
Yes. That's like applying the average 1-day Gain Factor m-1 times.

Anyway, [2a] becomes ...

>We're talking approximation, right?
Yes, but I won't keep repeating that word. Anyway, [2a] becomes ... approximately:
VAR[G₁ + G₂ + ... + G_n] = ΣVAR[G_i ]+ 2 ΣCOVAR[G_j, G_k]
= ΣVAR[G_i ]+ 2 ΣVAR[G_j] g^k-j
= Σ[ i g^2i-2 s²] + 2 Σ[ j g^2j-2 s²] g^k-j
= (s²/g²)Σ[ i g²ⁱ ] + 2(s²/g²) Σ[ j g^k+j] where i = 1 to n, k = 2 to n and j < k

[!] VAR[G₁ + G₂ + ... + G_n] = ΣVAR[G_i ] + 2 ΣVAR[G_j] g^k-j = (s²/g²)Σ[ i g²ⁱ ] + 2(s²/g²) Σ[ j g^k+j] approx
i from 1 to n, k from 2 to n and j < k (meaning j = 1, 2, ... k-1)

From [!], we have two sums to evaluate: Σ[ i g²ⁱ ] and Σ[ j g^k+j]

Evaluating Σ[ i g²ⁱ ]

We have:
Σ[ i g²ⁱ ] = g² + 2g⁴ + 3g⁶ + ... + n g²ⁿ
= x + 2x² + 3x³ + ... + n xⁿ = x (1 + 2x + 3x² + ... + n x^n-1) where x = g² ... and we have a magic formula for that sum
= x [n xⁿ⁺¹ - (n+1)xⁿ+1]/(x-1)²
= g² [n g²ⁿ⁺² - (n+1)g²ⁿ+1]/(g² -1)²

Magic Formula #4 was used: 1+2x+3x²+...+nx^n-1 = [nxⁿ⁺¹ - (n+1)xⁿ+1] / (x-1)²

Evaluating Σ[ j g^k+j]

>That looks awful. How about in all its grandeur, eh?
Okay, in all its grandeur, it looks like:
Σ[ j g^k+j] = [g³] + [g⁴+2g⁵] + [g⁵+2g⁶+3g⁷] + [g⁶+2g⁷+3g⁸+4g⁹] + ... + [gⁿ⁺¹+2gⁿ⁺²+3gⁿ⁺³+...+(n-1)g^2n-1]
= g³ + g⁴[1+2g] + g⁵[1+2g+3g²] + ... + gⁿ⁺¹[1+2g+3g²+...+(n-1)g^n-2] for n-1 terms
= (g³+g⁴+g⁵+...+gⁿ⁺¹) + (g⁴+g⁵+g⁶+...+gⁿ⁺¹)2g + (g⁵+g⁶+...+gⁿ⁺¹)3g² + ... + gⁿ⁺¹(n-1)g^n-2
where we've collected the terms multiplying 1 then 2g then 3g² etc. ... ending with the term multiplying (n-1)g^n-2
= g³[1+g+g²+...+g^n-2] + 2g⁵[1+g+g²+...+g^n-3] + 3g⁷[1+g+g²+...+g^n-4] + ... + (n-1)g^2n-1[1]
= g³[(g^n-1-1)/(g-1)] + 2g⁵[(g^n-2-1)/(g-1)]+ 3g⁷[(g^n-3-1)/(g-1)] + ... + (n-1)g^2n-1[(g-1)/(g-1)]
where we've used another magic fromula: 1 + x + x² + ... + x^m-1 = (x^m - 1)/(x-1)
= [ [gⁿ⁺²-g³] + [2gⁿ⁺³-2g⁵] + [3gⁿ⁺⁴-3g⁷] + [4gⁿ⁺⁵-4g⁹] +... + [(n-1)g²ⁿ-(n-1)g^2n-1] ] / (g-1)
= [ gⁿ⁺²[1+2g+3g²+...+(n-1)g^n-2] - g³[1+2g²+3g⁴+...+(n-1)g^2n-4] ] / (g-1)
where we've collected like terms
= [ gⁿ⁺²[(n-1)gⁿ - ng^n-1+1] / (g-1)² - g³[(n-1)g²ⁿ - ng^2n-2+1] / (g²-1)² ] / (g-1)
where we've used magic formual [5] with x = g and again with x = g²
= [ { (n-1) g²ⁿ⁺² - n g²ⁿ⁺¹ + gⁿ⁺² } (g+1)² - (n-1) g²ⁿ⁺³ + n g²ⁿ⁺¹ - g³ ] / [ (g - 1)(g² - 1)² ]
where we've taken a (g² - 1)² out, to the right
= ...

>Can't you just give the final result?!
Okay, we've calculated both sums ... so here it is:

[!!!] VAR[G₁ + G₂ + ... + G_n] = s² [n g²ⁿ⁺² - (n+1)g²ⁿ+1]/(g² -1)²
+ 2(s²/g²) [ {(n-1) g²ⁿ⁺² - n g²ⁿ⁺¹+gⁿ⁺²} (g+1)² - (n-1) g²ⁿ⁺³ + n g²ⁿ⁺¹ - g³ ] / [ (g-1)(g²-1)²]
where
g₁, g₂, ...g_n are random Gain Factors over n days,
they are from a distribution with Mean = g and Standard Deviation = s,
G_m = g₁g₂ ...g_n are the cumulative Gain Factors
and the formula is good for daily gains and n not too large (say n < 50)

>Isn't there something more elegant?
You said to forget elegance. Besides, we won't be using it ... not with pencil and paper. We'll use a spreadsheet and ...
>How good ... uh, how bad is it?
Okay, here's what we'll do (again!):

Generate n daily returns: g₁, g₂, ...g_n.

With these, construct the numbers G₁=g₁, G₂=g₁g₂, ... G_n=g₁g₂...g_n.

Calculate the SUM(n) = G₁ + G₂ + ... + G_n.

Repeat steps 1, 2 and 3 ten thousand times and calculate the Variance of the 10,000 numbers SUM(n).

Repeat steps 1, 2, 3 and 4 for n = 1, 2, 3, ... 40.

Compare the Variances obtained (using this actual data) with the formula [!!!].

The result is shown below where we also plot the Standard Deviation = SQRT(Variance).
It assumes an average daily return of 1% and a Standard Deviation (of daily returns) of 2%:

Figure 2

Notice an interesting thing, in [!!!].
The Variance of the sum of Gain Factors for the past n days is proportional to the Variance of the Returns, namely s².

That means that the Standard Deviation of this Special Sum sum is proportional to s.
It looks like:

SD[G₁ + G₂ + ... + G_n] = f(n,g)s.
If we assume that the starting stock price was P₀, n days ago, then we have:

SD[P₁ + P₂ + ... + P_n] = P₀f(n,g)s.

>Yeah, so what good is it?
Some time ago I was looking for the Variance of stock prices over the past n days, in connection with Bollinger Bands, here.

>I remember. You got a lousy result.
Uh ... yes, thanks. I took "the Variance of a Sum = the Sum of the Variances" as an approximation and ...

>That's your creeping senility ... again?
Yes, thanks ... again.