Next: Distributed computing and truly Up: Distributed intelligence in large Previous: Agent-based traffic simulation modules

Subsections

Day-to-day learning, feedback, and relaxation
Individualization of knowledge
- Classifier System and Agent Database
- Individual plans storage
Within-day re-planning
Smart agents and non-predictability

Adaptation and learning in traffic simulation systems

In Sec. 3, we have defined the modules of a multi-agent traffic simulation. What is important is that the agents are not just particles that are moved through the system via some force field, but that they have tactical and strategic goals - tactical for example being the decision to change lanes, strategic for example being to eat three times a day. Sec. 3 has laid out what the corresponding modules do, but not how they interact. This is the topic of this section.

Day-to-day learning, feedback, and relaxation

The interaction between the modules can lead to logical deadlocks. For example, plans depend on congestion, but congestion depends on plans. A widely accepted method to resolve this is systematic relaxation (e.g. [8]) - that is, make preliminary plans, run the traffic micro-simulation, adjust the plans, run the traffic micro-simulation again, etc., until consistency between modules is reached. Fig. 4 shows an example. The method is similar to a standard relaxation technique in numerical analysis.

Fig. 4 shows an example of the effect of this. The scenario here is that 50000 travelers, distributed randomly throughout Switzerland, all want to travel to Lugano, which is marked by the circle. The scenario is used as a test case, but it has some resemblance with vacation traffic in Switzerland,

The left figure shows traffic when every driver selects the route which would be fastest on an empty network. The micro-simulation here uses the so-called queue model [9], which is a queueing model with an added link storage constraint. That is, links are characterized by a service rate (capacity), and a maximum number of cars on the link. If the link is full, no more vehicle can enter, causing spill-back. Compared to the original version of Ref [9], our model has an improved intersection dynamics [10]. After the initial routing and the initial micro-simulation, iterations are run as follows:

During the micro-simulation, one collects link travel times, averaging over, say, 15 minutes. That is, all vehicles entering a link between, say, 8am and 8:15am, will contribute to the average for that time period.
Now, for a randomly selected fraction of, say, 10% of the travelers, the old routes are replaced by new routes which are generated based on these averaged link travel times.
For a randomly selected fraction of, say, 10% of the travelers, new routes are generated based on the averaged link travel times. As described in Sec. 3.2, this is achieved by running a time-dependent Dijkstra algorithm. The time-dependency is included by using the time-dependent averaged link travel times every time a link is considered.
The traffic micro-simulation is run again based on the new set of routes
Another 10% of the travelers obtains new routes.
Etc., until some kind of convergence criterion is fulfilled.

Fig. 4 right shows the result after 49 such iterations. Quite visibly traffic has spread out over many more different routes.

Such iterated simulations can be treated as very high dimensional time-discrete dynamical systems. A state is the trajectory of the simulation through one day; an iteration is the update from one day (period) to the next (Fig. 5). As such, one can search for properties like fix points, steady state densities, multiple basins of attraction, strange attractors, etc. Typically, one would first analyze the steady state behavior, and then the transients. Under certain conditions the existence of a unique steady state can be proven [11], although for the computationally feasible number of iterations the possible occurrence of ``broken ergodicity'' [12] needs to be taken into account. Broken ergodicity is the property of a system to be mathematically ergodic but to remain in parts of the phase space for long periods of time.

Fig. 6 shows the relaxation behavior for a scenario in Dallas [13,14]. The plot shows the sum of all travel times as a function of the iteration number. From this plot and from other observations it seems that here, broken ergodicity is not a problem, and all relaxation methods go to the same state, although with different convergence speeds.

The result is in fact similar to a fixed strategy Nash Equilibrium: for a single run in the relaxed state, it is approximately true that no traveler could improve by changing routes. The players follow essentially a ``best reply'' dynamics (i.e. find the best answer to yesterday's traffic), and for some systems it can even be proven that this converges to a Nash equilibrium [15]. In our case, we have been able to show for a scenario in Dallas that in a relaxed congested system, the strategy landscape is much flatter than in an uncongested system [16]. This is a signature of a (population-based) mixed strategy Nash Equilibrium.

The relaxed solution is typically better than the initial one, but also worse than some system-optimized solution; in fact, it is relatively easy to construct such scenarios (e.g. [17]). Again, one recognizes that our system finds a ``workable'' solution, but does not optimize in any way.

**Figure 4:** Result of day-to-day learning in a test example. LEFT: Situation at 9:00am in the initial run. RIGHT: Situation at 9:00am in the 49th iteration. Each pixel on the road is a car (by overlapping in the graphics they form the traffic streams); the circle denotes where they are going. Clearly, the system has found a better solution after 49 iterations.
$\includegraphics[width=0.49\hsize]{0it-gz.eps}$ $\includegraphics[width=0.49\hsize]{gotth-9am-49it-prob-tif.eps}$

**Figure 5:** Schematic representation of the mapping generated by the feedback iterations. Traffic evolution as a function of time-of-day can be represented as a trajectory in a high dimensional phase space. Iterations can be seen as mappings of this trajectory into a new one.
$\includegraphics[width=0.8\hsize]{mapping-fig.eps}$

**Figure 6:** Different relaxation paths in day-to-day replanning. The plot shows the sum of all travel times VTT (Vehicle Time Traveled) as a function of the iteration for different relaxation methods. All methods relax to the same value of VTT. From [18].
$\includegraphics[width=0.7\hsize]{sumtt-by-iteration-gz.eps}$

Individualization of knowledge

Classifier System and Agent Database

Knowledge of agents should be private, i.e. each agent should have a different set of knowledge items. For example, people typically only know a relatively small subset of the street network (``mental map''), and they have different knowledge and perception of congestion.

This now completely opens the door for the use of Complex Adaptive Systems methods (e.g. [19]). Each agent has a set of strategies from which to choose, and indicators of past performance for these strategies. The agent normally choses a well-performing strategy. From time to time, the agent choses one of the other strategies, to check if its performance is still bad, or replaces a bad strategy by a new one.

This approach divides the problem into two parts (see also [20]):

Plans evaluation. In this phase, plans (or strategies) are evaluated. In our context this means that travelers try out all their different strategies, and the strategies obtain scores. Finally, the agents settle down on the better-performing strategies.
As usual, the challenge is to balance exploration and exploitation. This is particularly problematic here because of the co-evolution aspect: If too many agents do exploration, then the system performance is not representative of a ``normal'' performance, and the exploring agents do not learn anything at all. If, however, they explore too little, the system will relax too slowly (cf. ``run 4'' and ``run 5'' in Fig. 6).
Plans generation. In this phase, new plans (or strategies) need to be generated. Since they will be evaluated later, the challenge is to generate a large diversity of strategies, which covers as much as possible the space of possible strategies.

A major advantage of this approach is that it becomes more robust against artifacts of the router: if an implausible route is generated, the simulation as a whole will fall back on a more plausible route generated earlier. Fig. 7 shows an example. The scenario is the same as in Fig. 4; the location is slightly north of the final destination of all trips. We see snapshots of two relaxed scenarios. The left plot was generated with a standard relaxation method as described in the previous section, i.e. where individual travelers have no memory of previous routes and their performance. The right plot in contrast was obtained from a relaxation method which uses exactly the same router but which uses an agent data base, i.e. it retains memory of old options. In the left plot, we see that many vehicles are jammed up on the side roads while the freeway is nearly empty, which is clearly implausible; in the right plot, we see that at the same point in time, the side roads are empty while the freeway is just emptying out - as it should be.

The reason for this behavior is that the router miscalculates at which time it expects travelers to be at certain locations - specifically, it expects travelers to be much earlier at the location shown in the plot. In consequence, the router ``thinks'' that the freeway is heavily congested and thus suggests the side road as an alternative. Without an agent data base, the method forces the travelers to use this route; with an agent data base, agents discover that it is faster to use the freeway.

This means that the true challenge is not to generate exactly the correct routes, but to generate a set of routes which is a superset of the correct ones [20]. Bad routes will be weeded out via the performance evaluation method. For more details see [21]. Other implementations of partial aspects are [22,23,24,25].

Individual plans storage

The way we have explained it, each individual needs computational memory to store his/her plan or plans. The memory requirements for this are of the order of $O(N_{people} \times N_{trips} \times N_{links} \times N_{options})$ , where $N_{people}$ is the number of people in the simulation, $N_{trips}$ is the number of trips a person takes per day, $N_{links}$ is the average number of links between starting point and destination, and $N_{options}$ is the number of options remembered per agent. For example, for a 24-hour simulation of all traffic in Switzerland, we have $N_{people} \sim 7.5 \hbox{mio}$ , $N_{trips} \sim 3$ , $N_{links} \sim 50$ , and $N_{options} \sim 5$ , which results in

$\begin{displaymath} 7.5 \cdot 10^6 \hbox{persons} \times 3 \hbox{trips per person} \times 50 \hbox{links per trip} \end{displaymath}$

$\begin{displaymath} \times 5 \hbox{options} \times 4 \hbox{bytes per link} = 22.5 \hbox{GByte} \end{displaymath}$

of storage if we use 4-byte words for storage of integer numbers. Let us call this agent-oriented plans storage.

Since this is a large storage requirement, many approaches do not store plans in this way. They store instead the shortest path for each origin-destination combination. This becomes affordable since one can organize this information in trees anchored at each possible destination. Each intersections has a ``signpost'' which gives, for each destination, the right direction; a plan is thus given by knowing the destination and following the ``signs'' at each intersection. The memory requirements for this are of the order of $O(N_{nodes} \times N_{destinations} \times N_{options})$ , where $N_{nodes}$ is the number of nodes of our network, and $N_{destinations}$ is the number of possible destinations. $N_{options}$ is again the number of options, but note that these are options per destination, so different agents traveling to the same destination cannot have more than $N_{options}$ different options between them.

Traditionally, transportation simulations use of the order of 1000 destination zones, and networks with of the order of 10000 nodes, which results in a memory requirement of

$\begin{displaymath} 1 000 \hbox{destinations} \times 10 000 \hbox{nodes} ... ...box{options per destination} \times 4 \hbox{bytes per node} \end{displaymath}$

$= 200 \hbox{MByte}$ , considerable less than above. Let us call this network-oriented plans storage.

The problem with this second approach is that it explodes with more realistic representations. For example, for our simulations we usually replace the traditional destinations zones by the links, i.e. each of typically 30000 links is a possible destination. In addition, we need the information time-dependent. If we assume that we have 15-min time slices, this results in a little less than 100 time slices for a full day although some of the information can be recycled [26]. The memory requirements for the second method now become

$\begin{displaymath} 30 000 \hbox{links} \times 10 000 \hbox{nodes} \times 100 \hbox{time slices} \end{displaymath}$

$\begin{displaymath} \times 5 \hbox{options} \times 4 \hbox{bytes per entry} \approx 600 \hbox{GByte} , \end{displaymath}$

already more than for the agent-oriented approach. In contrast, for agent-oriented plans storage, time resolution has no effect. The situation becomes worse with high resolution networks (orders of magnitude more links and nodes), which leaves the agent-oriented approach nearly unaffected while the network-oriented approach becomes impossible. As a side remark, we note that in both cases it is possible to compress plans by a factor of at least 30 [27].

**Figure 7:** Individualization of plans and interaction with router artifacts. LEFT: All vehicles are re-planned according to the same information; vehicles do not use the freeway (arrrows) although the freeway is empty. As explained in the text, this happens because the router makes erroneous predictions about where a vehicle will be at what time. RIGHT: Vehicles treat routing results as additional options, that is, they can revert to other (previously used) options. As a result, the side road now empty out before the freeway. - The time is 7pm.
$\includegraphics[width=0.7\hsize]{individualization-gz.eps}$

Within-day re-planning

Day-to-day replanning assumes, in a sense, ``dumb'' particles. Particles follow routes, but the routes are pre-computed, and once the simulation is started, they cannot be changed, for example to adapt to unexpected congestion and/or a traffic accident. In other words, the strategic part of the intelligence of the agents is external to the micro-simulation. In that sense, such micro-simulations can still be seen as, albeit much more sophisticated, version of the link cost function from static assignment, now extended by influences from other links and made dynamic throughout time. And indeed, many dynamic traffic assignment (DTA) systems work exactly in that way (e.g. [8]). In terms of game theory, this means that we only allow unconditional strategies, i.e. strategies which cannot branch during the game depending on the circumstances.

Another way to look at this is to say that one assumes that the emergent properties of the interaction have a ``slowly varying dynamics'', meaning that one can, for example, consider congestion as relatively fixed from one day to the next. This is maybe realistic under some conditions, such as commuter traffic, but clearly not for many other conditions, such as accidents, adaptive traffic management, impulsive behavior, stochastic dynamics in general, etc. It is therefore necessary that agents are adaptive (intelligent) also on short time scales not only with respect to lane changing, but also with respect to routes and activities. It is clear that this can be done in principle, and the importance of it for fast relaxation [28,18] and for the realistic modeling of certain aspects of human behavior [29,30] has been pointed out.

Smart agents and non-predictability

A curious aspect of making the agents ``smarter'' is that, when it goes beyond a certain point, it may actually degrade system performance. More precisely, while average system performance may be unaffected, system variance, and thus unpredictability, invariably goes up. An example is Fig. 8, which shows average system performance in repeated runs as a function of the fraction of travelers with within-day replanning capability. While average system performance improves with increasing from zero to 40%, beyond that both average system performance and predictability (variance) of the system performance degrade. In other words, for high levels of within-day replanning capability, the system shows strong variance between uncongested and congested. From a user perspective, this is often not any better than bad average system performance - for example, for a trip to the airport or to the opera, one usually plans according to a worst case travel time. Also, if the system becomes non-predictable, route guidance systems are no longer able to help with efficent system usage. The system ``fights back'' against efficient utiliziation by reducing predictability.

Results of this type seem to be generic. For example, Kelly reports a scenario where many travelers attempt to simultaneously arrive at downtown for work at 8am [31]. In this case, the mechanism at work is easy to see: If, say, 2000 travelers want to go to downtown, and all roads leading there together have a capacity of 2000 vehicles per hour, then the arrival of the travelers at the downtown location necessarily will be spread out over one hour. Success or failure to be ahead of the crowd will decide if one is early or late, very small differences in the individual average departure time will result in large differences in the individual average arrival time, and because of stochasticity there will be strong fluctuations in the arrival time from day to day even if the departure time remains constant. Ref. [32] reports from a scenario where road pricing is used to push traffic closer towards the system optimum. Also in this case, the improved system performance is accompanied by increased variability. Both results were obtained with day-to-day replanning.

**Figure 8:** Predictability as function of within-day rerouting capabilities. The result was obtained in the context of a simulation study of route guidance systems. The x-axis shows the fraction of equipped vehicles; the y-axis shows average travel time of all vehicles in the simulation. For each value of market saturation, five different simulations with different random seeds were run. When market saturation increases from zero to 40%, system performance improves. Beyond that, the average system performance, and, more importantly, also the predictability (variance) of the system performance degrade. From [18].
$\includegraphics[width=0.6\hsize]{reroute-unpredict-gz.eps}$

Next: Distributed computing and truly Up: Distributed intelligence in large Previous: Agent-based traffic simulation modules

Kai Nagel 2002-08-14