another Distribution: Part II ... continued from Part I

In Part I we considered a modification to the Normal distribution that would ...
>Give fat tails!
Yes. We want to find a distribution which satisfies certain conditions like ...
>Are you starting again?
Yes. Pay attention! We're going to simplify. To start, we'll assume that, for our new distribution function, the Mean is 0.

We want a distribution function f(x) which satisfies the conditions:
  1. When x is close to 0, f(x) ≈ B e-x2/2   just like the Normal distribution
  2. When x is far from 0, f(x) ≈ C e-|x|   so it decreases less dramatically than a Normal distribution  
    We'll call this Simple Exponential Decay
  3. And the necessary condition for a probability distribution:   f(x) dx = 1
Look at the Figure 1:
It shows half of such a function f(x) ... just for x > 0.
Condition 3 means that the TOTAL area under the curve should be 1.
Hence half of the area (as shown) should be (1/2).

Figure 1

>What're B and C?
They're constants, but I don't know what they are ... yet.
We're going to look at:
[1A]       f(x) = A     ... this satisfies 1 and 2, above (as we noted in Part I)  

The cumulative distribution is then:
[1B]       F(x) = f(u) du = A du     ... the area to the left of x

Compare to the Normal distribution, namely   du

Condition 3 (above) requires that:
[1C]       F(∞) = Adx = 1

This condition gives us the value of A, obtained numerically ...

>Huh?
Yeah, we just evaluate that integral in [1C] numerically ... to a few decimal places.
However, it's easier to simply determine the necessary A-value (when we need to) which makes F(∞) = 1.
>Numerically? Why don't you just evaluate it ... exactly?
Are you familiar with the function F, in [1B]?
>Never seen it before.
Well, it's maybe a new function. Let's call it Sam.
>Can you do that?
Are you familar with the sine function?
>Sure. Trigonometry. I remember that from high school.
Well, before Joe Sine invented it, gave it a name, tabulated it and showed it to be useful in math and science it was unknown ... just like our Sam function.
>Joe Sine? You're kidding, right?
Yes, I'm kidding ... but I've made my point, no?
>No.
Well, don't worry about it.
>Could you remind me why you're doing this?
We're trying to get fat tails.
Once upon a time we talked about Mandelbrot's fractal technique
... but I tired of that and ended the tutorial abruptly.
Then I thought I might talk about Jump-diffusion models, to incorporate the occasional large returns
(for those fat tails) ... but it was much too complicated.
So I decided to try ...
>The Sam function?
Yes.
>But that function has Mean = 0 and what about Standard Deviation?
If we've got some function f(x) then we can shift it left or right by considering f(x-m) and expand (or contract) the horizontal scale by writing f( (x-m)/s ) ... as indicated in Figure 2.

>Them's the mean and standard Deviation?
Yes. For example, we could rewrite [1A] so that (shifting by m and rescaling by s) we'd get:
[2]       f(x) = A

>Is it any good?
Does it give fat tails? Yes ... more than you'd like !!
Here's a picture where I downloaded about 1700 daily prices for GE, calculated the distribution, then compared to our f-function where I chose m to be the actual Mean (from the downloaded data) and fiddled with s to get a good fit.


Figure 3

>Fiddled?
Fiddled.

>That's pretty crude, don't you think? Do you expect people to fiddle? I think ...

Figure 2

We'll try one last attempt at "another" distribution.
>No! I think you've done enough!
Okay, here it is:
Other Distribution
  • We'll use the actual historical Mean m and Standard Deviation s.
  • For x close to the Mean, we'll use the Normal Distribution.
  • When x is sufficiently far from the Mean, we'll switch to Simple Exponential Decay.
  • We'll measure the distance of x from its mean by introducing y = (x - m)/s.
  • We'll let the user specify "How far from the Mean" we do the switch by specifying some number k.
    k measures the number of Standard Deviations from the Mean.
  • If |y| < k we'll use f(y) = e- y2/2   ... that's the Normal distribution, close to the Mean
  • If |y| >k we'll use f(y) = ek2/2e- k|y|   ... Simple Exponential Decay, farther than k standard deviations from the Mean
  • Then, to guarantee that A = f(y) dx = 1, we'll divide by A to get the final
  • f(x) = (1/A)f(y)   where (remember): y = (x - m)/s

>I recognize the Normal distribution, but why the funny factor out in front ... for that "other" part?
That's so the two meet (at y = k or y = -k) smooothly, with the same value and the same slope.
>For example?
Here's an example, where I've generated a thousand random returns from a Normal distribution (in blue) and a thousand random returns from our "other" distribution (in green) ... switching to the Simple Decay at k = 3 Standard Deviations.


Figure 4A

>Three stand deviations? That's pretty far from the mean, eh?
Yes, so it's mostly the Normal distribution. However, if we switch to Simple Decay at, say, k = 1 Standard Deviation we get those ...
>Fat tails?
Yes, fat tails.


Figure 4B

>Wow! Look at those fat tails! Up to 100% returns.
Yeah, a little too fat for me.
>So pick a fatter k!
Good idea ... but there's another problem. Look carefully at Figure 5.
It shows the distribution of monthly S&P 500 returns over the past 40 years.
It also shows a Normal distribution and our Other Distribution (as given above), each having the same Mean (0.66%) and Standard Deviation (4.3%) as the actual S&P returns.
(The "other" distribution has k = 1.5 standard deviations as indicated by the red circles.)

Do you see anything sad?
>Sad? No.
Both the Normal and that "other" don't have as many returns near the Mean as does the S&P distribution.
Further, as you move away from the Mean, both have more of those remote returns.

>Yeah, especially that other distribution!
Yes ... and that's sad because a return of R followed (later) by a return of -R gives a combined Gain Factor of (1 + R)(1 - R) = 1 - R2 and that's a return of -R2
... and that's a loss, eh?
Indeed, it's a BIG loss if R is large
(Note: by a return of R = 0.123 I mean a 12.3% return.)

Figure 5

Since our distributions are so symmetrical, there's the same probability of getting a return r above the Mean as below the Mean.
Two such returns are m + r and m - r for a combined Gain Factor of (1+m+r)(1+m-r) = 1+2m+m2 - r2.
That's a return of 2m+m2 - r2 which is a negative return when r2 > 2m+m2.
For example, for our S&P 500, the Mean is 0.66% and that means m = 0.0066
and that means r2 > 2(0.0066)+(0.0066)2 = 0.01324
and that means r > 0.115 or 11.5%
and that means ...

>And that means that, for two returns farther than 11.5% from the mean, you'd get a negative return, right?
Yes, if one is negative and one is positive.
So if we avoid returns NEAR the Mean and exaggerate those FAR from the Mean (as our "other" distribution does) then ...

>Then you get lots of bad returns ... but how bad is it?
Assume I start with a $90K portfolio and withdraw $300 per month, increasing with inflation at the rate of 3% per year.
I select monthly returns at random to get an estimate of the probability of having the portfolio survive for 30 years.
That's Monte Carlo simulation.
The returns are distributed according to one of the three distributions of Figure 5.
Here's what I got
>That's sad.
Yes.

>What're you gonna do about it?
I'm thinking ...

Distribution
Probability
Using the actual S&P 500 returns
76%
Using the Normal distribution of returns
48%
Using that "Other" distribution of returns
20%

Sample portfolio evolutions

>Maybe you should give up the Normal distribution ... near the Mean.
Because it misses too many near-Mean returns? Maybe, but I have another idea.
>No, please, spare me another ...
Here's what we'll do:

  • If |y| < k we'll use f(y) = e- ay2   ... that's Normal distribution behaviour, close to the Mean
  • If |y| >k we'll use f(y) = Be- b|y|   ... Exponential Decay again, farther than k standard deviations from the Mean
  • So that they're equal at when y = k, we'll need:
    (1)       e- ak2 = Be- bk
  • So that their slopes are equal when y = k, we'll need:
    (2)       (-2ak)e- ak2 = -bBe- bk
  • Equations (1) and (2) give:
          b = 2ak
          B = eak2
We then get yet another "other" distribution, namely:
Another "other" Distribution
  1. If |y| < k define f(y) = e- ay2
  2. If |y| >k define f(y) = eak2e- 2ak|y|
  3. Then, to guarantee that A = f(y) dx = 1, we'll divide by A to get the final
  4. f(x) = (1/A)f(y)   where y = (x - m)/s

    so

    f(x) = (1/A)f((x - m)/s)


>But this "other" distribution has the value "1" at y = 0, no? I mean, f(0) = 1, right?
Wrong. It's true that f(0) = 1 (at y = 0 or x = m), but then (from 4, above) f(m) = 1/A ... and that depends upon our choice of a.
>Pictures?
Here's some pictures:

Figure 6

Note: If we choose a larger value for a, the decay is faster and we get fewer of those returns far from the Mean m.
That means the area under the curve, far from the Mean, is smaller.
However, since the TOTAL area under the distribution must equal "1", the area near the Mean must be larger.
That means that the "peak" is higher ... when we increase the value of a


Figure 7

>Why do you sometimes plot a percentage like 3% or 4% (in Fig. 6) and sometimes a number like 20 or 30 (in Fig 7)?
I sometimes plot the percentage of returns lying in a small interval (about 1% in width) about some return-value
... and sometimes I plot the number of returns in that interval (of the total of some 480 monthly returns, over 40 years).
You can just divide the numbers (like 20) by 480 to get the percentage (like 4.1%).

>And is this other "other" return better than that ... uh, "other" return?
Oh, yes. I can pick an a-value so that, for a given set of historical returns (like 40 years of monthly GE returns), this other other return gives the same survival rate (using Monte Carlo simulation).

>And if your objective is to get the same distribution and survival rate, why don't you just use the actual returns?

Why didn't I think of that?

>You did ... here.

See also Risk stuff.