Once we are able to handle split links, we need to partition the whole transportation network graph in an efficient way. Efficient means several competing things: Minimize the number of split links; minimize the number of other domains each CPU shares links with; equilibrate the computational load as much as possible.
One approach to domain decomposition is orthogonal recursive bi-section. Although less efficient than METIS (explained below), orthogonal bi-section is useful for explaining the general approach. In our case, since we cut in the middle of links, the first step is to accumulate computational loads at the nodes: each node gets a weight corresponding to the computational load of all of its attached half-links. Nodes are located at their geographical coordinates. Then, a vertical straight line is searched so that, as much as possible, half of the computational load is on its right and the other half on its left. Then the larger of the two pieces is picked and cut again, this time by a horizontal line. This is recursively done until as many domains are obtained as there are CPUs available, see Fig. 25.4. It is immediately clear that under normal circumstances this will be most efficient for a number of CPUs that is a power of two. With orthogonal bi-section, we obtain compact and localized domains, and the number of neighbor domains is limited.
Another option is to use the METIS library for graph partitioning (see (129) and references therein). METIS uses multilevel partitioning. What that means is that first the graph is coarsened, then the coarsened graph is partitioned, and then it is uncoarsened again, while using an exchange heuristic at every uncoarsening step. The coarsening can for example be done via random matching, which means that first edges are randomly selected so that no two selected links share the same vertex, and then the two nodes at the end of each edge are collapsed into one. Once the graph is sufficiently collapsed, it is easy to find a good or optimal partitioning for the collapsed graph. During uncoarsening, it is systematically tried if exchanges of nodes at the boundaries lead to improvements. ``Standard'' METIS uses multilevel recursive bisection: The initial graph is partitioned into two pieces, each of the two pieces is partitioned into two pieces each again, etc., until there are enough pieces. Each such split uses its own coarsening/uncoarsening sequence. -METIS means that all partitions are found during a single coarsening/uncoarsening sequence, which is considerably faster. It also produces more consistent and better results for large .
METIS considerably reduces the number of split links, , as shown in Fig. 25.5. The figure shows the number of split links as a function of the number of domains for (i) orthogonal bi-section for a Portland network with 200000 links, (ii) METIS decomposition for the same network, and (iii) METIS decomposition for a Portland network with 20024 links. The network with 200000 links is derived from the TIGER census data base, and will be used for the Portland case study for TransimsThe network with 20024 links is derived from the EMME/2 network that Portland is currently using. An example of the domains generated by METIS can be seen in Fig. 25.6; for example, the algorithm now picks up the fact that cutting along the rivers in Portland should be of advantage since this results in a small number of split links.
We also show data fits to the METIS curves, for the 200000 links network and for the 20024 links network, where is the number of domains. We are not aware of any theoretical argument for the shapes of these curves for METIS. It is however easy to see that, for orthogonal bisection, the scaling of has to be . Also, the limiting case where each node is on a different CPU needs to have the same both for bisection and for METIS. In consequence, it is plausible to use a scaling form of with . This is confirmed by the straight line for large in the log-log-plot of Fig. 25.5. Since for , the number of split links should be zero, for the 20024 links network we use the equation , resulting in . For the 200000 links network, the resulting fit is so bad that we did not add the negative term. This leads to a kink for the corresponding curves in Fig. 25.12.
Such an investigation also allows to compute the theoretical
efficiency based on the graph partitioning. Efficiency is optimal if
each CPU gets exactly the same computational load. However, because
of the granularity of the entities (nodes plus attached half-links)
that we distribute, load imbalances are unavoidable, and they become
larger with more CPUs. We define the resulting theoretical efficiency
due to the graph partitioning as
|
|