next up previous
Next: Directions for Future Work Up: A Route Advice Agent Previous: The Interaction Component

Experimental Results

 



  
Figure 5: Sample task for the subjects. The starting point is the box at the upper left and the ending point is at the lower right. A is the route with fewest turns, B is the fastest route, C is the route with fewest intersections, and D is the shortest route.
\includegraphics[width=5in]{sample.eps}

In order to test the adaptation algorithm apart from the other functionality of the Adaptive Route Advisor, we simulated a series of interactions on paper with human subject evaluations of planner output. The test consisted of 20 tasks that involved trips between intersections in the Palo Alto area. To compensate for the lack of interactivity, we produced four routes for each task instead of two. Since we had no opportunity to build user models, we used exploratory weight vectors with a unit weight for one attribute and zero for the rest, creating routes optimized for time, distance, number of intersections, and number of turns, respectively. We plotted the four routes, labeled randomly A through D, on a map of Palo Alto. We presented the tasks in a different random order for each subject. Figure 5 shows an example of one of the tasks and its four route choices.

We asked the subjects to evaluate the routes for each task and rank them in preference order, using 1 for best and 4 for worst. Since a ranking of four routes gives six independent binary preferences (A better/worse than B, C, D; B better/worse than C, D; C better/worse than D), each subject provided $6 \cdot 20 = 120$ training instances.


  
Figure 6: Exchange rates for three of the attributes with respect to distance. High positive values for an attribute indicate that shorter distance is less important than reducing that attribute, near zero values indicate that shorter distance is more important, and high negative values indicate that longer distance is more preferable.
\includegraphics[width=5in]{model.eps}

We trained the perceptron for 100,000 epochs ($\eta$ = 0.001) for each subject, then looked for some way to compare the resulting user models. Since the cost of a route is a relative measure, the relative values of the weights are more informative than the absolute values. We will refer to the ratio of two weights between two attributes as their exchange rate, because they define how much of one attribute a driver is willing to give up to improve another attribute. For example, if the exchange rate between time and turn weights is 30, the driver is willing to drive up to 30 seconds longer to save one turn, but no more. Figure 6 shows the exchange rates between distance and the other three attributes.

The results indicate that route preferences differ widely across people. Some subjects, such as 11 and 16, are apparently willing to go to great distances to improve their route on some other attribute. Other subjects, such as 9 and 17, would sacrifice other attributes to reduce the distance attribute. The most surprising result is that many subjects have negative exchange rates. For example, the distance/turns exchange rate for Subject 10 is -1027. This means that, given two routes A and B, if route A has one more turn than route B, it will have a lower cost if it is more than 1027feet longer than B. Besides its intuitive difficulties, it is inconvenient to use these weights directly for planning because it means some edges could have a negative cost. We believe these negative weights come from the bias in the training data toward optimal routes on some attribute. For example, the fact that drivers prefer shorter routes, other factors being equal, is not explicitly represented in the training data. Our future work will include using such background knowledge to eliminate negative exchange rates.


  
Figure 7: Comparison between the accuracy of the personalized models and that of the aggregated model. The accuracy was computed using a ten-fold cross validation. The error bars mark one standard deviation.
\includegraphics[width=5in]{cross.eps}

To evaluate the advantage of using personalized models versus a single fixed model, we also created an aggregate training set of all $120
\cdot 24 = 2880$ instances. Figure 7 compares the accuracy of the personalized model to the aggregate model. As expected, the accuracy of the aggregate model is poor, hovering around chance (50%). The personalized model is uniformly better than chance and the aggregate model, but still far from perfect. Some possible sources for this model failure are that people are inherently inconsistent or that our model space does not represent some important attributes in drivers' route preferences. For example, people may dislike a certain road or intersection, which affects the rankings for some tasks but not others. Future studies will include additional information about the routes and measure the subjects' consistency on redundant tasks.


next up previous
Next: Directions for Future Work Up: A Route Advice Agent Previous: The Interaction Component
Seth Rogers
1999-01-27