For the Dallas case, we have not just done the above iterations, but we have used the set-up for extensive studies of the computational performance. This included a systematic derivation of an equation for the computational performance, including such elements as CPU time, communication time (start-up and bandwidth), competition for bandwidth, etc. (see Appendix). That investigation, together with the results of the above graph partitioning investigations in Sec. 5.2, was used to make estimates for our Portland problem.
The result, for the 200000 link network, can be seen in Fig. 12. The figure shows predictions for the so-called real time ratio, which says how much faster than reality the simulation is. This is for example interesting for real time applications, since one wants the traffic forecast to be completed before the fact. The thick lines in the figure refer to a 250 MHz SUN Enterprise 4000 (dark) and a hypothetical machine consisting of the same CPUs but a two-dimensional communications topology. The Enterprise 4000 has a fast backplane, but it is still a bus communications system, thus leveling out at about 50 CPUs without getting faster than about twice as fast as real time. The 2-d communications topology does not have this problem, and speed-up is nearly linear. (Note, however, that the Enterprise 4000 does not accept more than 14 CPUs; Enterprise computers which accept more CPUs also have faster backplanes.) The other lines in Fig. 12 refer to parallel computers which again have a bus topology, but faster CPUS, faster network bandwidth, or both.
The prediction means the following: Let us assume we want to look at 24 hours of traffic, after 50 iterations. Using one CPU, it would take 500 days to get the desired result. Using 500 CPUs and a 2-d communications topology, it would still take a day.
Fig. 13 shows preliminary actual real time ratio measurements for a situation similar to the one that was predicted. However, the network for Fig. 13 is the 20024 links network, i.e. a factor of 10 smaller than the the network for which the predictions were made. The SGI Origin 2000 is a supercomputer with a more powerful communications technology than our Enterprise 4000. In fact, the Origin 2000 uses a hypercube communications topology, which is even more powerful than the 2-dimensional topology assumed for our computational speed predictions. Fig. 13 indeed confirms the predicted near-linear speed-up. The results in Fig. 13 are ``preliminary'' because, besides using the smaller network, for the simulations on the Origin 2000 no vehicles were in the simulation. In our experience, the addition of actual vehicles does not slow down the code enormously; in fact, on the Enterprise 4000 the ``Apr 1999'' runs (with vehicles) were faster than the ``Jun 1998'' runs without vehicles. This is due to additional code optimizations. We were not able to do the corresponding runs with vehicles on the Origin 2000 because our computing privileges had been revoked. Runs on the 200000 links network were not available when this article was written.
|
|