 
 
 
 
 
   
It is possible to make the mobility simulation parallel, i.e. to give different pieces of it to different CPUs (Central Processing Units). One good option is to do this via domain decomposition; that is, the geographical area is decomposed into domains, and each CPU computes traffic within that domain. These domains need to exchange information at their borders, which can be achieved by messages. For messages, existing message passing software such as MPI (Message Passing Interface, www.mcs.anl.gov/mpi/) or PVM (Parallel Virtual Machine, www.epm.ornl.gov/pvm) can be used.
In this situation, simulation time per time step is a sum of the time
spent on computation and the time spent on communication,
 
 is the number of CPUs.  For the purposes of an intuitive
understanding, let us assume that
 is the number of CPUs.  For the purposes of an intuitive
understanding, let us assume that
 
 is the time a
single CPU needs. For the communication, it turns out that the main
component is the latency
 is the time a
single CPU needs. For the communication, it turns out that the main
component is the latency  of the communication.  Latency
refers to the amount of time that is necessary to prepare a message
before it can be sent away.  Most of that time is caused by the
hardware, such as Ethernet, and the corresponding Internet protocols,
such as TCP/IP.  For 100 Mbit Ethernet, this latency time is of the
order of
 of the communication.  Latency
refers to the amount of time that is necessary to prepare a message
before it can be sent away.  Most of that time is caused by the
hardware, such as Ethernet, and the corresponding Internet protocols,
such as TCP/IP.  For 100 Mbit Ethernet, this latency time is of the
order of  .  Since in two-dimensional systems each domain has
in the average six neighbors, and we need two messages per time step,
communication time can be approximated as
.  Since in two-dimensional systems each domain has
in the average six neighbors, and we need two messages per time step,
communication time can be approximated as
 
 
Let us define the real time ratio as how much faster than
reality the simulation is.  If we assume that one simulation time step
corresponds to one second, then we obtain
 , and therefore
, and therefore  for
 for  .  In words:
.  In words: 
![\fbox{%%
\begin{minipage}[c]{0.9\hsize}
\emph{With 100~Mbit Ethernet, the best p...
...ic simulation with a one-second time step is
approximately 170.}
\end{minipage}}](img56.png) 
The actual performance measurements in Fig. 5(a) from our implementation (dots) demonstrate that these concerns are justified. The line is a fit to those performance measurements, based on the mathematical form of Eq. (2). The results of Fig. 5 refer to runs of the Switzerland 6-9 scenario as explained above. Only with a faster communication hardware, Myrinet (www.myri.com), this bottleneck can be overcome and much faster simulations can be achieved. Our best performance currently is an RTR of nearly 800. This means that a whole day of all car traffic in Switzerland, which has about 7 mio inhabitants, can be simulated in less than 2 minutes. This makes now large scale learning studies possible, although it should be noted that these performance numbers are obtained without taking data input and output into account.
Fig. 5(b) also shows the speedup for the same runs.
Speedup is defined as the performance ratio between a multi-CPU and a
single-CPU run:
 
| Feb 2003: 
[]
 ![\includegraphics[width=0.5\hsize]{rtr-gpl.eps}](img58.png) [] ![\includegraphics[width=0.5\hsize]{speedup-gpl.eps}](img59.png)  | 
Figure 6 depicts the cumulative contributions of the major steps in the feedback system to the total execution time of each iteration. The left figure shows the implementation of the feedback mechanism with MySQL (www.mysql.org), a file-based database. As explained in Sec. 4, the agent database keeps track of agent strategies and their scores. The right figure shows an implementation where all relevant data (i.e. strategies) are permanently kept in computer memory. Rather than using slower scripting languages for each step in the iteration (48), the new implementation is written in C++ and combines all operations into one program. The figure shows computational performance results for the scenario described in Sec. 5; the contributions shown in the figure are:
One can see that in the left figure, on average each iteration takes about an hour to execute, with the feedback system, with strategy-related steps totaling about 45 minutes of that time. The overall result is that we can run a metropolitan scenario with 1 million agents, including 50 learning iterations, in about two days.
For the right figure, the total time is cut in half, with about 20 minutes spent on strategy-related steps.
| ![\includegraphics[width=0.49\hsize]{db_execution_time-gpl.eps}](img60.png) ![\includegraphics[width=0.49\hsize]{brain_execution_time-gpl.eps}](img61.png)  | 
 
 
 
 
