The problem considered in this paper is known as dynamic traffic assignment (DTA). Given a set of travelers, all with fixed starting time, the task is to find plausible routes for each traveler. In order to define ``plausible'', one often makes the assumption that travelers should go towards a User Equilibrium (UE), where no traveler can arrive earlier at his/her destination by changing routes. However, the UE formulation is a mathematical construct; it is not the real problem that we want to solve.
The traditional approach to this type of problem is static assignment (e.g. (14)). In static assignment, there are is no time-of-day; instead, trips between any given origin-destination pair are assumed to occur at a constant rate. The advantage of the static formulation is that a fair amount of mathematics is known about it, including a uniqueness proof for certain situations, which allows one to compare results even if they are obtained with different methods.
Since the static assignment problem is non-linear, the methods to solve it are iterative (e.g. (14)). One of the oldest solution algorithms is the Frank-Wolfe algorithm. An interpretation of it is that the routes of a certain fraction of the population are re-planned optimally based on the current link travel times. Those routes then replace the previous routes of that part of the population. New link travel times are computed using the new set of routes and the volume-based link travel time function. A critical issue is the selection of the replanning fraction; it turns out that the Method of Successive Averages (MSA), which means that the replanning fraction drops as where is the iteration number, guarantees convergence, albeit a slow one, to the correct (unique) solution.
Static assignment does not, by definition, model time-dependent dynamics correctly. There are many ways to attack this problem, ranging from making relatively small changes to static assignment (e.g. (30,32,31)), via so-called mesoscopic simulations (20,34,33,21), to very detailed micro-simulations (22,35,36,19).
Since much less is known about the mathematics of the dynamic problem (37), implementations often move to ``learning algorithms'' (13,30) instead of the nonlinear optimization methods used for the static assignment. In learning algorithms, the travelers search for better routes based on some kind of historic information; there are many different ways of doing this. The learning algorithms typically no longer guarantee any property of the solution (such as being a User Equilibrium), but in general one can assume that they go towards a fixed point (if everything is deterministic) or towards a steady state density (for stochastic systems if they are Markovian) (see, e.g., (38)). Markovian means that the transition probabilities to the next iteration only depend on the current iteration. If the stochastic systems are in addition ergodic, that is, each possible combination of routes can be reached by the iterative dynamics, then the steady state density is unique (39). For certain non-traffic systems it has even been shown that a best-reply dynamics leads to a User Equilibrium, an observation which connects learning dynamics and optimization techniques.
Unfortunately, for large scale scenarios these properties are less useful than they sound. Let us concentrate on stochastic systems. The problem essentially is that the relatively small number of iterations that is typically done is by far not enough to come close to the mathematical limit. Intuitively, with a plausible 10% replanning fraction, ten iterations are necessary until everybody in the population had one chance to react to a change in the system. However, every time someone finds a solution (a route) that was not previously used, this counts as such a change and the ten ``final'' iterations need to start over. A drastic version of this is called ``broken ergodicity'' (40), where the system can remain in subregions of the phase space for very long times, in spite of being ergodic.
In practice, however, such problems with dynamic traffic assignment rarely occur and simulations relax reasonably fast (38,41). However, none of the currently known mathematics explains this nice behavior; maybe one can speculate that the dynamic problem inherits some of the ``niceness'' of the static problem.
Existing learning approaches to the dynamic traffic assignment typically only remember one route per traveler. Some approaches (e.g. (20)) use a probabilistic route choice which is somewhat similar to our agent-based approach, but in general the thinking behind those methods is different: In those implementations, the assumption is that there is some external module which calculates all options for each traveler, and then makes a random weighted draw between these options (discrete choice theory, see e.g. (16)). The external module needs to compute the utilities for each individual option. Since it is not possible that all options have been previously tried out, the external module needs to make assumptions about the option's performance. This enters a possible cause of inconsistencies into the system, since those assumptions about system performance need not be identical with actual performance. Such inconsistencies can for example arise because of programming errors or modeling simplifications; our paper shows an example. In this paper, it will be shown how a truly agent-based representation of learning makes all this considerably more robust.
An additional point is that for large scenarios it is not possible to compute all possible options, and some usually heuristically selected subset needs to be used. In the discrete choice approach, it would be the discrete choice module that selects that subset. In our approach, the agents themselves determine that subset. Depending on the implementation, the difference may not be so large. However, in view of history-dependent learning, for example with mental maps (e.g. (42)), the agent-based representation also seems to be more robust here.