Bollinger Bands revisited
A continuation of Standard Deviation ... of Prices and Returns.

Bollinger Bands: Introduction

Years ago, when I first ran across Bollinger Bands, I thought they were pretty neat ... stock prices bouncing between two curves, the Upper and Lower boundaries of the "band" and ...
>Remind me. Bollinger bands?
We look at the last n stock prices P1, P2, ... Pn (where we include Pn, today's price, and where P0 is the price n days ago) and we calculate their average, Pav, and their Standard Deviation, SD:

      Pav = (1/n)(P1+ P2+ ... +Pn) = (1/n)ΣPk
      SD2 = (1/n) [ (P1-Pav)2+ (P2-Pav)2+ ... + (Pn-Pav)2) ] = (1/n)ΣPk2 - Pav2  
(See SD stuff.)

Then, each day, we plot the two points:

[a]       U = Pav + k SD
[b]       L = Pav - k SD
These points trace out two curves and we see the current stock price bounce between the two curves, U and L, as in Figure 1.
>So what are n and k?
You can pick anything., but we'll choose n = 20 days and k = 2 standard deviations.
>So you buy at L and sell at U?
I didn't say that!

Figure 1
>So what are you saying?
I just want to look at Bollinger Bands, again, because although one often calculates the SD (or Volatility) of stock returns, it's strange to see the SD of stock prices and ...
>As in Bolli bands?
Yes, as in Bollinger Bands. We did, at one time, try to find a relationship between the statistical properties of returns and of prices, here. What we want to do now is investigate WHY one would expect stock prices to oscillate between U and L.

When we consider the SD of daily returns, we often assume they have a Normal distribution ... in which case it's unlikely that returns will lie too far from the Mean return. In fact, we would expect most returns to lie within 2 Standard Deviations of the Mean return. In fact, if they were Normally distributed, the probability that the returns lie within two SDs of the Mean is X%. However, if we consider prices, do we also expect them to lie (mostly) within 2 Standard Deviations of the Mean price Pav?
>That'd be like choosing k = 2, eh?
Exactly! When today's price is larger than U or smaller than L, then it's outside that 2S band centred on Pav ... so we might expect tomorrow's price to return to the band. That says something about tomorrow's price, eh?
>And the last n = 20 prices are Normally distributed?
Ay, there's the rub! Are they?
>Huh? You're asking me?
That was a rhetorical question ... but it indicates what we want to investigate in this tutorial.

the Distribution of Stock Prices

Suppose that, over the last n days, the daily Gain Factors are g1, g2, g3, ... gn.
>Gain Factors?
Yes, if a stock price goes from $P to $Pg in a day, then g is the Gain Factor for that day.
For example, g = 1.056 corresponds to a 5.6% daily return.

Then n successive daily stock prices (after the starting price of $P0) are P0g1, P0g1g2, P0g1g2g3, ... P0g1g2g3...gn
... the last being today's stock price.

So here's the question:
If the g's are selected from a Lognormal distribution,
what's the distribution of the numbers g1g2g3...gn ??

>Lognormal? I thought you wanted Normal?
Well, it's common practice to consider daily gains (or, in our case, Gain Factors) to be Lognormally distributed.
Besides, it makes the math easier.  

Note that, if the g's are Lognormally distributed, then y = log(g) is Normally distributed.
We'll let F(x) be the cumulative Normal distribution function for the y's.
That means that the probability that a randomly chosen y is less than some x is F(x) (as in Figure 2). That's for a single random Gain Factor.

What we now want is to consider the product of n such (random) Gain Factors:
For any x and n random g's, what is the probability that g1g2g3...gn < x ?
That requires that log(g1g2g3...gn) < log(x).
That requires that log(g1)+log(g2)+...+log(gn) < log(x).


Figure 2
But, as we've said, if the g's are Lognormally distributed, then y = log(g) is Normally distributed.
Hence, we set log(gk) = yk and ask:
What is the probability that, for a randomly selected set of y's:
     y1 + y2 + ... + yn < log(x)   where the y's are Normally distributed ?
Let's label the statistical properties of the y's:
[1a]       Mean[y] = Mean[log(g)] = M
[1b]       Variance[y] = Variance[log(g)] = Var = S2   the square of the standard deviation

Now, if the g's are independent random variables (meaning zero correlation between the g's), then the y's will also be independent in which case the sum Σyk will be Normally distributed with Mean and Variance (= SD2) given by:
[2a]       Mean[Σyk] = ΣMean[yk] = n M   the Mean of a Sum = the Sum of the Means
[2b]       Variance[Σyk] = ΣVariance[yk] = n Var = n S2   the Variance of a Sum = the Sum of the Variances (for zero correlation)
>The Mean of a sum equals the sum of the Means?
Yes, and that goes for Variance, too ... if there's zero correlation between the y's.

Okay, we now see that, since log(g1g2g3...gn) = Σyk is Normally distributed, then g1g2g3...gn is Lognormally distributed.
>Isn't g1g2g3...gn today's stock price?
Well, yes ... after multiplying by the starting stock price n days ago, namely P0.

Okay, our assumptions and results are, so far:
      Results
  1. The price n days ago is given as P0.
  2. The daily Gain Factors, g, have a Lognormal distribution with Mean[g] = M and Variance[g] = Var = S2.
          These are calculated from historical returns.
  3. The logarithms of the g's have Mean[log(g)] = M and Variance[log(g)] = Var = S2   that's [1].
  4. The relation between the Mean and Variance of g and those of its logarithm, y, is given by:
          Mean[log(g)] = M = log(M) - (1/2)S2
          Variance[log(g)] = Var = S2 = log(1 + S2 / M2)
          (This assumes a Lognormal g-distribution: see this.)
  5. The n-day Gain Factor is Lognormally distributed so the distribution of today's prices, Pn
    is Lognormal with (assuming zero correlation between daily gains):
          Mean[Pn / P0] = Mean[g1g2g3...gn] = Mean[g1]Mean[g2]...Mean[gn] = Mn   the Mean of a Product = the Product of the Means
          Variance[Pn / P0] = Variance[g1g2g3...gn] = (M2+S2)n - M2n     See this
>The "distribution" of today's price? But we KNOW today's price so why ...?
It gives the distribution of prices (for today), assuming a given price n days ago (that's P0) and n random Gain Factors over the past n days. What we want is to determine the probability that today's price is within, say, 2 Standard Deviations of the average price over the last n days (that's Pav). In other words, we want the probability that today's price is inside the Bollinger Bands ... then we'll look up today's price and see where it's at.

Probability that Today's Price lies within some interval centred on the n-day Mean: Pav

Notice that, if P0 = $1.00, then the Gain Factors g1, g1g2, g1g2g3 etc., are just the subsequent stock prices.
We'll assume that's the case.
>Huh? What's the case?
That P0 = $1.00 so the products g1g2...gk are the stock prices. We'll stick P0 in our formulas ... later.

Okay, we have that today's price Pn = g1g2...gn is Lognormally distributed with a known Mean and Variance as given is Results #5.

Now we ask:
If the random variable G has a Lognormal distribution with given Mean and Variance,
what is the probability that G < x ... for a given number x ??

>You're asking me?
That was a rhetorical question. Now pay attention. We've been here before.

  • If G < x then log(G) < log(x)
  • Since G is Lognormal then log(G) is Normal
  • The distribution of log(G) is then described by N[u,Mean,SD]
        the Normal cumulative distribution function
        and Mean and Standard Deviation are the Mean and Standard Deviation of log(G) ... not of G itself!
  • The probability that log(G) < log(x) is then N[log(x),Mean,SD]
  • But log(G) < log(x) is the same condition as G < x so the probability is the same:   N[log(x),Mean,SD]

>What about our stock prices?
Yes, of course. I'm sure you've recognized our G. It's today's stock price assuming the starting price was P0 = $1.00, n days ago.
In fact, G = Pn / P0 = g1g2...gn and, as we've said, it's Lognormally distributed and we know its Mean and SD so ...
>We know its Mean and ...?
Haven't you been listening!
We have:
Given the stock price P0 = $1.00, n days ago, and assuming n random daily Gain Factors
which are Lognormally distributed with Mean = M and Standard Deviation = S
then the probability that today's price will be less than U' is given by:
      Prob[P < U'] = N[log(U'), nM, SQRT(n)S)]
      where N[x, Mean, SD] is the Normal cumulative distribution function
      and M = log(M) - (1/2)S2
      and S2 = Var = log(1 + S2 / M2)

>So the chances of being in that Bollinger band is ... what?
If A is the probability of being less than U and B is the probability of being less than L, then ...
>It's B - A, eh?
Actually, it's A - B as in: N[log(U), nM, SQRT(n)S)] - N[log(L), nM, SQRT(n)S)]

Note:

  • Remember that we're talking about the probability that the n-day Gain Factor lies between two numbers.
  • Don't confuse Gain Factors with daily returns.
  • In fact, a Gain Factor is 1 + (daily return).
  • The Mean of the Gain Factors, that's M, is "1" greater than the mean of the daily returns.
  • If the Mean of the daily returns is 0.0123 (that's 1.23%), then M = 1.0123.

>When are you going to insert some other starting price ... P0?
Other than $1.00? Right now.
The numbers U and L given in [a] and [b] assume an arbitrary P0 value.
To generate the appropriate numbers for the case P0 = $1.00, we'd divide each of U and L by P0.
Assume we've divided U and L by P0. We'll call these U' = U/P0 and L' = L/P0, okay?
Now we're talking about the case where P0 = $1.00 (as we did above).
As we've seen above, the probability that G = Pn/P0 < U' is N[log(U'), nM, SQRT(n)S)]
But U' = U/P0 so Pn/P0 < U' is the same condition as Pn < U.
If we then want the probability that P lies within the Bollinger Band for arbitrary starting Price, then ...
>Why don't you just give the result, okay?
Here it is:
Magic Formula
Given the stock price n days ago, P0, and assuming n random daily Gain Factors
which are Lognormally distributed with Mean = M and Standard Deviation = S
then the probability that today's price, P, will lie within the Bollinger band:
is given by:
      Prob[L < P < U] = N[log(U/P0), nM, SQRT(n)S)] - N[log(L/P0), nM, SQRT(n)S)]
      where N[x, Mean, SD] is the Normal cumulative distribution function, and
      M = log(M) - (1/2)S2
      S2 = Var = log(1 + S2 / M2)
Remember: the probability (in the Magic Formula ) is really the probability that the Gain Factor (over n days) lies between L/P0 and U/P0.
That's like: Prob[L/P0 < P/P0 < U/P0].
If, for example, both L and U are less than the starting Price of P0 (n days ago), then you're asking for the probability that, after n random gains, the price today has dropped to some range of lower values. Similarly, if both L and U are greater than the starting Price ...
>Do you have an example?
Okay, we'll consider the stock prices of GE over the past n = 20 days.

Do you see how it works?
>Yeah, so what is today's GE stock price?
It's $P.
>Will you buy, hold or sell?
I don't own any GE stock
>So where's the spreadsheet?
I'm workin' on it ...
>I have a question.
Shoot.
>Do I buy at L and sell at U?
You're asking me?
>I think I'd look at today's price, calculate the probability that it's where it's at and ...
And if it's unlikely to be there, because it's either too low or too high, then ...
>Then I'd buy or sell.
Good luck.
>How about some simple, rule-of-thumb formula. That one above is pretty scary and ...
Well, we could assume that the mean daily return is small, say r, so that M = 1+ r is close to "1",
and S2 = log(1+S2/M2) = S2/M2 = S2   ... approximately, since log(1+x) = x, approximately, and M is close to 1
and M = log(1+r) - (1/2)S2 = r - S2/2   ... again setting log(1+x) = x and M = 1, approximately.

We'd then get: Prob[L < P < U] = N[log(U/P0), n(r-S2/2), SQRT(n)S)] - N[log(L/P0), n(r-S2/2), SQRT(n)S)]
>That's simpler?
Well, r-S2/2 is the compound daily growth rate (approximately) so ...
>Huh?
If we were talking annual parameters r and S (rather than daily), then r-S2/2 would be the annualized return.
>Something simpler ... please.
Okay, suppose that U = P0(1 + up) and L = P0(1+ dn) so log(U/P0) = log(1+up) = up and log(L/P0) = log(1+ dn) = dn (approximately), for up and dn small (meaning the Upper and Lower Bollinger bands are not that much different from the starting price, n days ago).
>Huh?.
The numbers up and dn are the changes from that starting price, n days ago, to the current Upper and Lower bands. If, for example, the Lower band is 5.6% less than P0, then dn = - 0.056 and if the Upper ...
>So what's the simpler formula?
It's:

Prob[P0(1+dn) < P < P0(1+up)] = N[up, n(r-S2/2), SQRT(n)S)] - N[dn, n(r-S2/2), SQRT(n)S)]   ... approximately

Future Stuff

Have you noticed that, if P0 were the price today, then the above result gives the probability that the stock price will lie in some interval, n days into the future?
>Huh?
Assuming, of course, that future daily Gain Factors have a Lognormal distribution and are uncorrelated and ...
>Yeah, so what'll GE be worth, next month?
Let me
check.
>Be serious. What about ...?
Okay, the current GE price is $ so the probability that GE will lie between $A and $B, in n days, is:
      Prob[A < P < B] = N[log(B/$), nMM, SQRT(n)SS] - N[log(A/$), nMM, SQRT(n)SS]
>A picture is worth a thousand ...
Here's a picture: Figure 3
If we use the same parameters as before (based upon a few past year's stock performance),
then for A = $ and B = $$ it shows the probabilities Prob[A < P < B] for various n.    
>That'll help me if I buy options, right?
Yes, I guess it would. In fact, the Magic Formula involves the difference between two Normal distribution functions, just like the Black-Scholes option pricing formula.
>So, should we buy a call option?
You're asking me?
>I was asking one of those rhetorical questions.

Figure 3
About options:
Suppose you're buying a call option with a strike price of $K and you've paid $C for the option, and the option expires in n days, then the stock had better end up at a price bigger than $(K+C) in order to make money. The probability that the stock will be less than $(K+C) is N[log((K+C)/P0), nM, SQRT(n)S)] so the probability that its' greater then $(K+C) is
  1 - N[log((K+C)/P0), nM, SQRT(n)S)]   where P0 is today's price.
>But it only has to be greater than K+C sometime in the next n days, right?
Right, so look at the probability for all values of n less than the number of days to expiry.

Figure 4a

>Okay, what's the probability that the price will be exactly $40?
Zero. However, if we ask for Prob[A < P < B] where B is very close to A, then we'd get (using that Magic Formula):
Prob[A < P < B] = N[log(B/$), nMM, SQRT(n)SS] - N[log(A/$), nMM, SQRT(n)SS]
= f[log(A/$), nMM, SQRT(n)SS] log(B/A)     (approximately)
where f is the density distribution, as in Figure 4.
>The density distribution?
Yes, the slope of the cumulative distribution, N. Its maximum occurs at the Mean nMM, so the price is most likely near that value, in n days. However, if you want B = A (hence an exact price), then log(B/A) = log(1) = 0 and ...
>Probability = 0, eh?
You got it.

Figure 4
.
>I have a question.
Shoot.
>Are U and L each Bollinger Bands ... or is it the space between them?
Uh ... I have no idea.