Next: Graph partitioning Up: Parallel computing Previous: Introduction Contents

Micro-simulation parallelization: Domain decomposition

An important advantage of the CA is that it helps with the design of a parallel and local simulation update, that is, the state at time step depends only on information from time step , and only from neighboring cells. (To be completely correct, one would have to consider our sub-time-steps.) This means that domain decomposition for parallelization is straightforward, since one can communicate the boundaries for time step , then locally on each CPU perform the update from to , and then exchange boundary information again.

Domain decomposition means that the geographical region is decomposed into several domains of similar size (Fig. 25.1), and each CPU of the parallel computer computes the simulation dynamics for one of these domains. Traffic simulations fulfill two conditions which make this approach efficient:

Domains of similar size: The street network can be partitioned into domains of similar size. A realistic measure for size is the accumulated length of all streets associated with a domain.
Short-range interactions: For driving decisions, the distance of interactions between drivers is limited. In our CA implementation, on links all of the Transims1999 rule sets have an interaction range of meters ( 5 cells) which is small with respect to the average link length. Therefore, the network easily decomposes into independent components.

We decided to cut the street network in the middle of links rather than at intersections (Fig. 25.2); THOREAU does the same (91). This separates the traffic complexity at the intersections from the complexity caused by the parallelization and makes optimization of computational speed easier.

In the implementation, each divided link is fully represented in both CPUs. Each CPU is responsible for one half of the link. In order to maintain consistency between CPUs, the CPUs send information about the first five cells of ``their'' half of the link to the other CPU. Five cells is the interaction range of all CA driving rules on a link. By doing this, the other CPU knows enough about what is happening on the other half of the link in order to compute consistent traffic.

The resulting simplified update sequence on the split links is as follows (Fig. 25.3):^25.1

Change lanes.
Exchange boundary information.
Calculate speed and move vehicles forward.
Exchange boundary information.

The Transims1999 microsimulation also includes vehicles that enter the simulation from parking and exit the simulation to parking, and logic for public transit such as buses. These additions are implemented in a way that no further exchange of boundary information is necessary.

The implementation uses the so-called master-slave approach. Master-slave approach means that the simulation is started up by a master, which spawns slaves, distributes the workload to them, and keeps control of the general scheduling. Master-slave approaches often do not scale well with increasing numbers of CPUs since the workload of the master remains the same or even increases with increasing numbers of CPUs. For that reason, in Transims1999 the master has nearly no tasks except initialization and synchronization. Even the output to file is done in a decentralized fashion. With the numbers of CPUs that we have tested in practice, we have never observed the master being the bottleneck of the parallelization.

The actual implementation was done by defining descendent C++ classes of the C++ base classes provided in a Parallel Toolbox. The underlying communication library has interfaces for both PVM (Parallel Virtual Machine (98)) and MPI (Message Passing Interface (76)). The toolbox implementation is not specific to transportation simulations and thus beyond the scope of this paper. More information can be found in (100).

**Figure 25.1:** Domain decomposition of transportation network. *Left:* Global view. *Right:* View of a slave CPU. The slave CPU is only aware of the part of the network which is attached to its local nodes. This includes links which are shared with neighbor domains.
$\includegraphics[width=0.8\hsize]{gz/distrib.eps.gz}$

**Figure 25.2:** Distributed link.
$\includegraphics[width=0.8\hsize]{gz/bnd.eps.gz}$

**Figure 25.3:** Example of parallel logic of a split link with two lanes. The figure shows the general logic of one time step. Remember that with a split link, one CPU is responsible for one half of the link and another CPU is responsible for the other half. These two halves are shown separately but correctly lined up. The dotted part is the ``boundary region'', which is where the link stores information from the other CPU. The arrows denote when information is transferred from one CPU to the other via boundary exchange.
$\includegraphics[height=0.7\textheight]{parallel-fig.eps}$

Next: Graph partitioning Up: Parallel computing Previous: Introduction Contents

2004-02-02