Next: Bibliography Up: Dynamic Traffic Assignment on Previous: Acknowledgments

Equation for performance predictions

In this appendix, we explain the methodology for our performance predictions. We start by assuming that the time for one time step has contributions from computation and from communication. If they do not overlap, as is reasonable to assume for coupled workstations, we have

$\begin{displaymath}T = T_{comp} + T_{comm} \ . \end{displaymath}$

Time for computation is assumed to follow

$\begin{displaymath}T_{comp} = {T_1 \over p} \cdot \left( 1 + f_{overhead} + f_{domains} \right) \ . \end{displaymath}$

Here, T₁ is the time of the same code on one CPU (assuming a problem size that is rescaled to available memory); p is the number of CPUs; f_overhead includes overhead effects (for example, split edges need to be administered by both CPUs); f_domainsincludes the effect of unequal domain sizes discussed in Sec. 5.2.

Time for communication is assumed to follow

$\begin{displaymath}T_{comm} = n_{neighb} \cdot t_{lat} + {N_{splitedges} \over p} \cdot t_{edge} + {C \over C_{sat}} \ . \end{displaymath}$

n_neighb is the number of neighboring ``tiles'' each CPU talks to; for the bisection algorithm this number approaches four. t_lat is the latency (or start-up time) of each message. Next come two terms that describe bandwidth effects. If is necessary to use two terms because bandwidth is limited by two factors:

First, it is limited by the connection of the computer to the network. For example, with our Sparc 5s we could never obtain more than 30-40 Mbit/sec, although our FDDI network had a bandwidth of more than 100 Mbit/sec.
Thus, only when three or more pairs of machines start talking to each other, we start seeing the network bandwidth saturation effects.

One way to model these effects are the above two terms. N_splitedges is the number of split edges after the domain decomposition; it is the number shown in Fig. 7. Remember that $N_{splitedges} \sim \sqrt{p}$ . t_edge is the time that the network connection of the CPU needs for one additional split edge.

C/C_sat models the effect of network saturation. It depends on the network topology:

For example, for a bus topology (e.g. a local area network, or the Sparc Enterprise computers), C is just the amount of overall communication, that is (neglecting message headers) $C \approx N_{splitedges} \cdot s_{splitedges}$ , where s_splitedges is the amount of data for each split edge. C_sat is simply the bandwidth of the communication medium, for example 100 Mbit/sec for a 100-Mbit-Ethernet.
For a two-dimensional grid topology, C/C_sat is negligible. This can be seen as follows: Given a problem size, then for each additional CPU the amount of information C exchanged between two CPUs actually goes down, while the bandwidth between these two CPUs, C_sat remains the same.

In consequence, the combined time for one time step is

$\begin{displaymath}T = {T_1 \over p} \cdot \left( 1 + f_{overhead} + f_{domains}... ... + {N_{splitedges} \over p} \cdot t_{edge} + {C \over C_{sat}} \end{displaymath}$

which (for f_overhead and f_domains small enough) scales as

$\begin{displaymath}T \sim {1 \over p} + 1 + {1 \over \sqrt{p}} + {C \over C_{sat}} \ . \end{displaymath}$

For bus topologies, $C/C_{sat} \sim \sqrt{p}$ eventually becomes the dominating term, explaining why the real time ratio, which goes as $\propto T^{-1}$ , eventually goes down in Fig. 12. For 2-d topologies, C/C_sat is negligible, and it is in practice f_domains, the domain decomposition imbalance, that makes using more than about 1000 CPUs inefficient.

This is the analysis which is behind Fig. 12. Constants such as t_edge or t_lat were obtained from systematic measurements of simplified situations. See [] for more information.

Next: Bibliography Up: Dynamic Traffic Assignment on Previous: Acknowledgments

Kai Nagel
1999-12-12