next up previous
Next: The largest firms Up: Converting aggregated data Previous: Converting aggregated data

Non-equidistant bins

The size data in the 1992 U.S. economic census comes in non-equidistant bins. For example, we obtain the number of establishments with annual sales above 25000 k$, between 10000 k$ and 25000 k$, etc. For an accumulated function, such as Fig. 5 (right), this is straightforward to use. For distributions, such as Fig. 5 (left), this needs to be normalized. We have done this in the following way: (1) We first divide by the weight of each bin, which is its width. In the above example, we would divide by (25 000 k$ - 10 000 k$) = 15 000 k$. Note that this immediately implies that we cannot use the data for the largest companies since we do not know where that bin ends. (2) For the log-normal distribution ρ(x) ∝1 x   exp[ - ( ln(x) - ln(μ) )2 ] (note the factor 1/x), one typically uses logarithmic bins, since then the factor 1/x cancels out. This corresponds to a weight of x of each data point. (3) Now we have to decide where we plot the data for a specific bin. We used the arithmic mean between the lower and the upper end. In our example case, 17 500k$. (4) In summary, say the number of establishments between si and si+1 is Ni. Then the transformed number Ñi is calculated according to Ñi = Ni si+1 - si   si + si+1 2 .


next up previous
Next: The largest firms Up: Converting aggregated data Previous: Converting aggregated data


Tue May 9 13:55:49 CEST 2000