Performance prediction for Portland (Oregon)

For the Dallas case, we have not just done the above iterations, but we have used the set-up for extensive studies of the computational performance. This included a systematic derivation of an equation for the computational performance, including such elements as CPU time, communication time (start-up and bandwidth), competition for bandwidth, etc. (see Appendix). That investigation, together with the results of the above graph partitioning investigations in Sec. 5.2, was used to make estimates for our Portland problem.

The result, for the 200000 link network, can be seen in Fig. 12. The figure shows predictions for the so-called real time ratio, which says how much faster than reality the simulation is. This is for example interesting for real time applications, since one wants the traffic forecast to be completed before the fact. The thick lines in the figure refer to a 250 MHz SUN Enterprise 4000 (dark) and a hypothetical machine consisting of the same CPUs but a two-dimensional communications topology. The Enterprise 4000 has a fast backplane, but it is still a bus communications system, thus leveling out at about 50 CPUs without getting faster than about twice as fast as real time. The 2-d communications topology does not have this problem, and speed-up is nearly linear. (Note, however, that the Enterprise 4000 does not accept more than 14 CPUs; Enterprise computers which accept more CPUs also have faster backplanes.) The other lines in Fig. 12 refer to parallel computers which again have a bus topology, but faster CPUS, faster network bandwidth, or both.

The prediction means the following: Let us assume we want to look at 24 hours of traffic, after 50 iterations. Using one CPU, it would take 500 days to get the desired result. Using 500 CPUs and a 2-d communications topology, it would still take a day.

Fig. 13 shows preliminary actual real time ratio measurements for a situation similar to the one that was predicted. However, the network for Fig. 13 is the 20024 links network, i.e. a factor of 10 smaller than the the network for which the predictions were made. The SGI Origin 2000 is a supercomputer with a more powerful communications technology than our Enterprise 4000. In fact, the Origin 2000 uses a hypercube communications topology, which is even more powerful than the 2-dimensional topology assumed for our computational speed predictions. Fig. 13 indeed confirms the predicted near-linear speed-up. The results in Fig. 13 are ``preliminary'' because, besides using the smaller network, for the simulations on the Origin 2000 no vehicles were in the simulation. In our experience, the addition of actual vehicles does not slow down the code enormously; in fact, on the Enterprise 4000 the ``Apr 1999'' runs (with vehicles) were faster than the ``Jun 1998'' runs without vehicles. This is due to additional code optimizations. We were not able to do the corresponding runs with vehicles on the Origin 2000 because our computing privileges had been revoked. Runs on the 200000 links network were not available when this article was written.

**Figure:** Performance predictions for the 200000 links network for Portland. Real time ratio as function of the number of CPUs. The dark gray curve labeled ``now'' refers to the communication architecture of an Enterprise 4000, but extrapolated for higher number of CPUs than it can have. The light gray curve labeled ``(2-D grid)'' refers to a hypothetical machine where everything is the same except that the communication topology is now a 2-d grid. ``5*performance'' refers to a hypothetical machine where everything is the same as in ``now'' except that the CPUs are five times faster. ``5*c_net'' refers to a hypothetical machine where everything is the same as in ``now'' except that the bandwidth of the communications network is five times higher. The last entry refers to a hypothetical machine where both improvements are combined. - As is well known, really large scale computing is only possible with a communications technology that scales better than a bus. From [].
$\includegraphics[angle=-90,width=\hsize]{performance-gz.eps}$

**Figure 13:** Performance, measurements on Sparc Enterprise 4000, and on Origin 2000. Note that these runs were done on the 20024 links network, an order of magnitude smaller than the one the predictions in Fig. 12 are for. The computational speeds from Apr 1999 are higher than those from Jun 1998 because of code optimizations.
$\includegraphics[angle=-90,width=\hsize]{rtr-gpl.eps}$