next up previous contents
Next: Adaptive Load Balancing Up: Parallel computing Previous: Micro-simulation parallelization: Domain decomposition   Contents


Graph partitioning

Once we are able to handle split links, we need to partition the whole transportation network graph in an efficient way. Efficient means several competing things: Minimize the number of split links; minimize the number of other domains each CPU shares links with; equilibrate the computational load as much as possible.

One approach to domain decomposition is orthogonal recursive bi-section. Although less efficient than METIS (explained below), orthogonal bi-section is useful for explaining the general approach. In our case, since we cut in the middle of links, the first step is to accumulate computational loads at the nodes: each node gets a weight corresponding to the computational load of all of its attached half-links. Nodes are located at their geographical coordinates. Then, a vertical straight line is searched so that, as much as possible, half of the computational load is on its right and the other half on its left. Then the larger of the two pieces is picked and cut again, this time by a horizontal line. This is recursively done until as many domains are obtained as there are CPUs available, see Fig. 25.4. It is immediately clear that under normal circumstances this will be most efficient for a number of CPUs that is a power of two. With orthogonal bi-section, we obtain compact and localized domains, and the number of neighbor domains is limited.

Another option is to use the METIS library for graph partitioning (see (129) and references therein). METIS uses multilevel partitioning. What that means is that first the graph is coarsened, then the coarsened graph is partitioned, and then it is uncoarsened again, while using an exchange heuristic at every uncoarsening step. The coarsening can for example be done via random matching, which means that first edges are randomly selected so that no two selected links share the same vertex, and then the two nodes at the end of each edge are collapsed into one. Once the graph is sufficiently collapsed, it is easy to find a good or optimal partitioning for the collapsed graph. During uncoarsening, it is systematically tried if exchanges of nodes at the boundaries lead to improvements. ``Standard'' METIS uses multilevel recursive bisection: The initial graph is partitioned into two pieces, each of the two pieces is partitioned into two pieces each again, etc., until there are enough pieces. Each such split uses its own coarsening/uncoarsening sequence. $k$-METIS means that all $k$ partitions are found during a single coarsening/uncoarsening sequence, which is considerably faster. It also produces more consistent and better results for large $k$.

METIS considerably reduces the number of split links, $N_{spl}$, as shown in Fig. 25.5. The figure shows the number of split links as a function of the number of domains for (i) orthogonal bi-section for a Portland network with 200000 links, (ii) METIS decomposition for the same network, and (iii) METIS decomposition for a Portland network with 20024 links. The network with 200000 links is derived from the TIGER census data base, and will be used for the Portland case study for TransimsThe network with 20024 links is derived from the EMME/2 network that Portland is currently using. An example of the domains generated by METIS can be seen in Fig. 25.6; for example, the algorithm now picks up the fact that cutting along the rivers in Portland should be of advantage since this results in a small number of split links.

We also show data fits to the METIS curves, $N_{spl} = 250 \,
p^{0.59}$ for the 200000 links network and $N_{spl} =
140 \, p^{0.59} - 140$ for the 20024 links network, where $p$ is the number of domains. We are not aware of any theoretical argument for the shapes of these curves for METIS. It is however easy to see that, for orthogonal bisection, the scaling of $N_{spl}$ has to be $\sim p^{0.5}$. Also, the limiting case where each node is on a different CPU needs to have the same $N_{spl}$ both for bisection and for METIS. In consequence, it is plausible to use a scaling form of $p^{\alpha}$ with $\alpha > 0.5$. This is confirmed by the straight line for large $p$ in the log-log-plot of Fig. 25.5. Since for $p=1$, the number of split links $N_{spl}$ should be zero, for the 20024 links network we use the equation $A \,
p^{\alpha} - A$, resulting in $N_{spl} =
140 \, p^{0.59} - 140$ . For the 200000 links network, the resulting fit is so bad that we did not add the negative term. This leads to a kink for the corresponding curves in Fig. 25.12.

Such an investigation also allows to compute the theoretical efficiency based on the graph partitioning. Efficiency is optimal if each CPU gets exactly the same computational load. However, because of the granularity of the entities (nodes plus attached half-links) that we distribute, load imbalances are unavoidable, and they become larger with more CPUs. We define the resulting theoretical efficiency due to the graph partitioning as

\begin{displaymath}
e_{dmn} := {\hbox{load on optimal partition} \over \hbox{load on largest
partition}} \ ,
\end{displaymath} (25.1)

where the load on the optimal partition is just the total load divided by the number of CPUs. We then calculated this number for actual partitionings of both of our 20024 links and of our 200000 links Portland networks, see Fig. 25.7. The result means that, according to this measure alone, our 20024 links network would still run efficiently on 128 CPUs, and our 200000 links network would run efficiently on up to 1024 CPUs.

Figure 25.4: Orthogonal bi-section for Portland 20024 links network.
\includegraphics[width=0.8\hsize]{ob-plot-gpl.eps}

Figure 25.5: Number of split links as a function of the number of CPUs. The top curve shows the result of orthogonal bisection for the 200000 links network. The middle curve shows the result of METIS for the same network - clearly, the use of METIS results in considerably fewer split links. The bottom curve shows the result for the Portland 20024 links network when again using METIS. The theoretical scaling for orthogonal bisection is $N_{spl} \sim \sqrt {p}$, where $p$ is the number of CPUs. Note that for $p \to N_{links}$, $N_{spl}$ needs to be the same for both graph partitioning methods.
\includegraphics[width=0.8\hsize]{splitedges-gpl.eps}

Figure 25.6: Partitioning by METIS. Compare to Fig. 25.4.
\includegraphics[width=0.8\hsize]{metis-plot-gpl.eps}

Figure 25.7: Top: Theoretical efficiency for Portland network with 20024 links. Bottom: Theoretical efficiency for Portland network with 200000 links. ``OB'' refers to orthogonal bisection. ``METIS (k-way)'' refers to an option in the METIS library.
\includegraphics[width=0.8\hsize]{efficiency-e2-gpl.eps} \includegraphics[width=0.8\hsize]{efficiency-allstr-gpl.eps}


next up previous contents
Next: Adaptive Load Balancing Up: Parallel computing Previous: Micro-simulation parallelization: Domain decomposition   Contents
2004-02-02