next up previous
Nächste Seite: EXPERIENCES WITH TRANSIMS (VERSION Aufwärts: ch Vorherige Seite: RESULTS


COMPUTATIONAL ISSUES

A metropolitan region can consist of 10 million or more inhabitants, the simulation of whom causes considerable demands on computational performance. This demand on computation is made worse by the repeated execution of the relaxation iterations. And in contrast to simulations in the natural sciences, traffic particles ($=$ travelers, vehicles) have internal intelligence. This internal intelligence translates into rule-based code, which does not vectorize and therefore does not run efficiently on traditional vector supercomputers, such as the Cray series. It does however run well on modern workstation architectures, which makes traffic simulations ideally suited for clusters of PCs, also called Beowulf clusters. One uses domain decomposition, that is, each CPU obtains a patch of the geographical region. Information and vehicles are exchanged between the patches via message passing, for example using MPI (Message Passing Interface, MPI (27)).

Two important numbers to judge the performance of a parallel simulation are speed-up and real time ratio. They are defined as follows:

The limiting factor for tightly coupled simulations, as is a parallel traffic simulation, is in practice the latency of the communication hardware, which is the time that is needed to initiate a message. For example, 100 Mbit Ethernet has a latency of about 0.5 msec. Since each domain has in the average six neighbors, and typically two messages are sent and received per time step, about 6 msec per time step are spent on communication. This sets an upper limit on parallel simulation speed of $1~\hbox{sec}/6~\hbox{msec} \approx 167$ time steps per second for this type of hardware. If the simulation has a time step of one second, this translates into an RTR of 167. Unfortunately, latency of standard communication hardware has not significantly increased over the last decade, so that this limitation will not go away by itself. It is however possible to use Myrinet (www.myri.com), a communications technology specifically developed for cluster computing.

Fig. 5 shows measured and predicted computing speeds as a function of the number of CPUs for the queue micro-simulation and the Switzerland scenario. Both figures refer to the simulation of the morning peak, as explained earlier in this paper. The top figure shows the real time ratio (RTR); the bottom figure shows the speed-up. As one can see, the plots are related by a vertical shift of the data, which means a multiplication with a constant multiplier because of the logarithmic scale. That multiplier is the RTR of the simulation on a single CPU; in our case, the RTR on a single CPU is about eight. The differences between RTR and speed-up would become important if one changed the scenario size: In the RTR plot, the graph would be shifted to the left or right for smaller or larger scenarios, respectively, meaning that the maximally reachable RTR would not change, but would be shifted to a different number of CPUs. In the speed-up plot, the graph would be shifted down or up, respectively, meaning that the maximally reachable speed-up depends on the size of the scenario. That is, for both plots the results depend on the scenario size; it is impossible to make parallel performance predictions without knowledge about the scenario size.

High computing speeds matter: For real time applications, one wants the simulation to be considerably faster than reality so that the prediction is finished much before reality catches up. For transportation planning, about 50 iterations between simulation and plans generation are necessary, meaning $50 \times 24 = 1200$ hours of simulated traffic for a 24-hour scenario. With our real time ratio of 200, the computing time for this would still be 6 hours for the micro-simulation alone.

Abbildung 5: Real time ratio (top) and speed-up (bottom) for the 6-9 Scenario on single and dual CPU machines, using Ethernet or Myrinet. The x-axis refers to the number of CPUs. The solid lines give the predictions according to the computing time $T_{cmp}$ and the latency term for Ethernet. As one can see, real time ratio and speed-up are related by a simple vertical shift in the logarithmic plot.
\includegraphics[width=0.8\hsize]{queue-rtr-gpl.eps}
\includegraphics[width=0.8\hsize]{queue-speedup-gpl.eps}

Besides the micro-simulation, also the feedback mechanism consumes computing time; including re-routing and agent database operations, it currently takes roughly 45 min per iteration for the morning peak Switzerland scenario (15). This is clearly the bottleneck of the current approach; better implementations are under investigation.

In summary, an iteration using our current implementation takes less than one hour. Running fifty iterations thus takes about two days.


next up previous
Nächste Seite: EXPERIENCES WITH TRANSIMS (VERSION Aufwärts: ch Vorherige Seite: RESULTS
Bryan Raney 2003-03-05