Non-equidistant bins

Next: The largest firms Up: Converting aggregated data Previous: Converting aggregated data

Non-equidistant bins

The size data in the 1992 U.S. economic census comes in non-equidistant bins. For example, we obtain the number of establishments with annual sales above 25000 k$, between 10000 k$ and 25000 k$, etc. For an accumulated function, such as Fig. 5 (right), this is straightforward to use. For distributions, such as Fig. 5 (left), this needs to be normalized. We have done this in the following way: (1) We first divide by the weight of each bin, which is its width. In the above example, we would divide by $(25 000 k$ - 10 000 k$) = 15 000 k$$ . Note that this immediately implies that we cannot use the data for the largest companies since we do not know where that bin ends. (2) For the log-normal distribution $ρ(x)
∝1 x exp [ - (ln (x) - ln (μ) )$ ² ] (note the factor $1/x$ ), one typically uses logarithmic bins, since then the factor $1/x$ cancels out. This corresponds to a weight of $x$ of each data point. (3) Now we have to decide where we plot the data for a specific bin. We used the arithmic mean between the lower and the upper end. In our example case, $17 500k$$ . (4) In summary, say the number of establishments between $s$ _i and $s$ _i+1 is $N$ _i. Then the transformed number $Ñ$ _i is calculated according to $Ñ$ _i = N_i s_i+1 - s_i s_i + s_i+1 2 .

Next: The largest firms Up: Converting aggregated data Previous: Converting aggregated data

Tue May 9 13:55:49 CEST 2000