next up previous
Next: Bibliography Up: Dynamic Traffic Assignment on Previous: Acknowledgments

Equation for performance predictions

In this appendix, we explain the methodology for our performance predictions. We start by assuming that the time for one time step has contributions from computation and from communication. If they do not overlap, as is reasonable to assume for coupled workstations, we have

\begin{displaymath}T = T_{comp} + T_{comm} \ .
\end{displaymath}

Time for computation is assumed to follow

\begin{displaymath}T_{comp} = {T_1 \over p} \cdot \left( 1 + f_{overhead} + f_{domains} \right) \ .
\end{displaymath}

Here, T1 is the time of the same code on one CPU (assuming a problem size that is rescaled to available memory); p is the number of CPUs; foverhead includes overhead effects (for example, split edges need to be administered by both CPUs); fdomainsincludes the effect of unequal domain sizes discussed in Sec. 5.2.

Time for communication is assumed to follow

\begin{displaymath}T_{comm} = n_{neighb} \cdot t_{lat}
+ {N_{splitedges} \over p} \cdot t_{edge}
+ {C \over C_{sat}} \ .
\end{displaymath}

nneighb is the number of neighboring ``tiles'' each CPU talks to; for the bisection algorithm this number approaches four. tlat is the latency (or start-up time) of each message. Next come two terms that describe bandwidth effects. If is necessary to use two terms because bandwidth is limited by two factors: One way to model these effects are the above two terms. Nsplitedges is the number of split edges after the domain decomposition; it is the number shown in Fig. 7. Remember that $N_{splitedges} \sim \sqrt{p}$. tedge is the time that the network connection of the CPU needs for one additional split edge.

C/Csat models the effect of network saturation. It depends on the network topology:

In consequence, the combined time for one time step is

\begin{displaymath}T = {T_1 \over p} \cdot \left( 1 + f_{overhead} + f_{domains}...
...
+ {N_{splitedges} \over p} \cdot t_{edge}
+ {C \over C_{sat}}
\end{displaymath}

which (for foverhead and fdomains small enough) scales as

\begin{displaymath}T \sim {1 \over p} + 1 + {1 \over \sqrt{p}} + {C \over C_{sat}} \ .
\end{displaymath}

For bus topologies, $C/C_{sat} \sim \sqrt{p}$ eventually becomes the dominating term, explaining why the real time ratio, which goes as $\propto T^{-1}$, eventually goes down in Fig. 12. For 2-d topologies, C/Csat is negligible, and it is in practice fdomains, the domain decomposition imbalance, that makes using more than about 1000 CPUs inefficient.

This is the analysis which is behind Fig. 12. Constants such as tedge or tlat were obtained from systematic measurements of simplified situations. See [] for more information.


next up previous
Next: Bibliography Up: Dynamic Traffic Assignment on Previous: Acknowledgments
Kai Nagel
1999-12-12