us the bell
The frequency of body sizes in an adult population gives a good idea of the appearance of such a distribution:
In this type of distribution reign of tyranny of the mean:
- the vast majority of the sample is concentrated around the average, at the tip of the bell: 68% of values are within one standard deviation of the mean (between 1m75 and 1m90 in our example);
- the proportion of extreme values decreases exponentially as the distance from the average there is one in a billion chance of hitting a giant over 2m30.
extremes are both moderate, rare and of little influence. The law of large numbers predicts that the average measured from any (Large) sample gives a very good approximation of the average theoretical because no individual sample have sufficient weight to significantly distort the measurement. In this Médiocristan as Nassim Nicholas Taleb calls [1] is the statistics of the population age, number of sides "stack" when you throw a coin, the number of people per household, etc. .
Pareto journey into the realm of Ekstremistan
By studying the distribution of wealth of the Italian population in the late nineteenth century, Vilfredo Pareto discovered a different random distribution, where 20% of the population earns 80% of a country's income. Since that time, the richest 20% earn 40% of revenues rather in France and the United States because inequality has reduced a bit (or is it because the income reported to tax authorities do not reflect the value real high incomes? ;-) but we still call it the 80-20 rule:
This type of distribution "wild" as it is called Benoit Mandelbrot, a way in many areas, such as found for example on the site Gerard Villemin :
- Less than 1% of car rental companies account for more than 25% of the hours of rental,
- 30% of websites for 90% of visits;
- 17% of world population (those in rich countries) consume 80% of drugs etc..
- In biology insects are a million of the 1.8 million species described so far, followed distantly by higher plants (270 000) which themselves ahead of the molluscs (85 000) . But all this is nothing compared to bacteria that soundly beat all other life on the tree of life (borrowed from the figure Wikipedia, the bacteria are in blue) and represent more than half of the biomass on Earth!
The law of large numbers: repealed!
Here, the opposite of Médiocristan. Extreme values are certainly rare, but they are so spectacular that their presence is not at all negligible on average. Take for example the size of 36 500 common in France: 1722 inhabitants on average. If you dismiss the extreme values of the 113 cities with over 50,000 inhabitants, your average falls to 1324 people! 0.35% data therefore weigh 25% of the mean. Therefore no question of applying the law of large numbers because you have a good chance that your sample is not truly representative of extreme values. Ditto for the deviation.
I had fun watching the history of the Nasdaq since 1971 (data can be downloaded here ). The differences are obvious when one compares the distribution of daily fluctuations of the Nasdaq with those of a Gaussian with the same mean (0.27) and the same standard deviation (27):
To give an idea of the impact these days of folly for forty years the Nasdaq has oscillated between 5,060 points (maximum value in March 2000) and 54 (minimum value in October 1974). Now the 11 days heckled over the history of the Nasdaq alone account for a cumulative change of more than 3000 points or 60% of the overall net change!
2) At the other end of the scale, there are many more days on the stock exchange where nothing happens at all, or almost: 2000 days without any variation where a Gaussian has only 150. So much so that I had to shave the top of the ordinate scale so we can see the rest of the curve. Paradoxically, much more frequently bored Ekstremistan! The changes are happening a bit rate of a jacket that opens by rejecting its parts, without taking care of unbuttoning: long periods of immobility after violent fits and starts, whenever there is a button to pass.
Random fractal ...
These distributions have another extreme characteristic: their strange rules are valid whatever the scale at which they are viewed. To take the example of Commons, 23% population is concentrated in 0.34% of the municipalities (the 113 largest cities), but this is true also hyper in the size of the 60 largest cities which 40% of the population is concentrated on the first 6. Paris and weighs on its own almost half of the 6 mega-cities.
Ditto for the Nasdaq, the evolution of which is eerily similar, it is observed over 15 years or 12 months:
Who said scale invariance said ... fractals! While in a Gaussian distribution, the variations become imperceptible when we take the height, it is not for these distributions that look the same very irregular whatever the scale at which they are viewed (2).
The underlying reason is that the 80-20 rule holds true at all scales, is that it is in all these distributions a reinforcing effect for extreme values, such as "winner takes all": wealth-called wealth (for the income distribution), awareness reinforces the reputation (for traffic on the Web), cities draw people and the stock market is known for its herd behavior in times of panic or euphoria .
When the order dictates Size ...
We've already met this scale invariance in this post on the law of Benton : it assumes that the distribution follows a power law p (x ≥ h) = h- α, α being a fixed parameter. When this distribution relates to phenomena that can be classified by size (the length of rivers for example), there is always a direct relationship between rank and dimension of the phenomenon. The phenomenon has a dimension r number proportional to (K / r) 1 / α , K and α are constants characteristic of this distribution.
phenomenon of amplitude h will have the rank r = Kh-α
The phenomenon of rank r will have to size h = (K / r) 1 / α ]
A pile of natural and social phenomena verify the link between classification and geometric amplitude:
- the magnitude Seismic is the Gutenberg-Richter law :
- the frequency of words in a text: the law of Zipf :
- the size of rivers, lakes or mountains, and in general all that relates to the topology of the landscape. It is also not very surprising since the Breton coast is THE fractal figure par excellence: his ragged appearance is similar whatever the scale at which it is viewed.
linear correlation between logarithms equivalent to a power law as:
Log (area) =- 1.2 log (rank) returns to S = 4.67 47000r -1.2 .
Assuming circular lakes, their width is therefore L = 122R -0.6
tale of Little Lakes
But do not believe that such determinism help anything to predict the size of extreme events. Statistics on the lakes has inspired a nice story on this topic Mandelbrot fractals Pope (3). The story takes place in a foggy country to conquer which is launching explorers. This country is littered with bodies of water, some huge (it even says that there is an ocean of 300 km wide , others reduced to mere lake a mile wide. Our explorers have no card, but statistics are animals (or so they read the Webinet). So they know that the lakes are on average 2.5 km and the lake is large number of 122 r r - 0.6
Once committed by boat on a lake, the fog prevents to distinguish the other side if it is over a kilometer. The crew is reduced to speculate on the probability to arrive soon. If after three miles we still have not seen the opposite bank, the calculations indicate that there is an average of five more kilometers to cover. If still see nothing after ten kilometers, it must prepare to go in ten more.
[For algebraists only: it can be shown that property of a bizarre geometric growth of hope as we s 'away from the edge:Worldwide fractals, everything plays at the start. If a project planned a year in total, is initially two months instead of one to take his first step, not a month late it may have on arrival, but a year ! The bright side, if the day of its theatrical release film is five times more entries than any other, it has a good chance to have five times more successful overall. It probably why Apple focuses much effort to launch its promotional iPad, even if some of the success of it.
If p (L ≥ x) = x-α (this is the assumption, remember) the conditional probability P (L ≥ x) given that L ≥ h written:
p (x ≥ L / L ≥ h) = p (x ≥ L) / P (L ≥ h) = (x / h)-α
If h is fixed (eg 5km), the probability density is
p (x / x ≥ h) = αh α x-α -1 (the derivative of distribution function we just wrote)
and the expectation E (x / x ≥ h) is the integral between h and + ∞ of the expression: α αh x α-1- xdx
The calculation gives
E (x / x ≥ h) = H-alpha / (α-1) that is to say: E (xh / x ≥ h) = h / (α-1)
This barbarian equation reads as follows: the distance to the airport when we have already covered a distance h is proportional to the distance h [with a factor 1 / (α-1) ]
Uncertainty Principle in Macroscopic version
Except ... Mandelbrot's tale also tells us that we are always sure to be surprised, that paradox is no shortage of taste:
short, if the Gaussian extrapolations are irrelevant, the predictions of statistical fractal can not do much better. They fail to show us how all our efforts are in vain predictions in many fields. This principle of "inevitable surprises"-like quantum indeterminacy? - Is ultimately the only positive certainty that we have. Mind you, I think it's not bad to be certain in advance that nature reserves many other subjects of astonishment.
Sources:
(2) ... to a certain extent, but the problem is you never know which one.
(3) Benoit Mandelbrot, Fractals, Random & Finance (1997). I changed the data from the original story because they do not stick with my own statistics (from Wikipedia on the lakes of Europe) and the more it seems they are inconsistent (the law 100/racine T = (r ) Mandelbrot does not fit the average size of 5km he says in his book).
Related posts
Logarithms: again! On the law of Benton and others sights on the logarithmic distributions
The Queen, the Mad and the Tree demonstrates fractality of biological and technological developments.