The size data in the 1992 U.S. economic census comes in
non-equidistant bins. For example, we obtain the number of
establishments with annual sales above 25000 k$, between
10000 k$ and 25000 k$, etc. For an accumulated function, such
as Fig. 5 (right), this is straightforward to use. For
distributions, such as Fig. 5 (left), this needs to be
normalized. We have done this in the following way:
(1) We first divide by the weight of each bin, which is its width.
In the above example, we would divide by
. Note that this
immediately implies that we cannot use the data for the largest
companies since we do not know where that bin ends.
(2) For the log-normal distribution
For the largest firms (but not for the large establishments), the census also gives the combined sales of the four (eight, twenty, fifty) largest firms. We used the combined sales of the four largest firms divided by four as a (bad) proxy for the sales of each of these four companies. We then substracted the sales of the four largest firms from the sales of the eight largest firms, divided again, etc. Those data points should thus be seen as an indication only, and it probably explains the ``kink'' near in Fig. 5.