The small size of the data set limited our work. Various processing stages eliminated almost half of the raw data. This was due in part to the fact that real-world driving samples roads unevenly. The drivers tended to move quickly to major thoroughfares during normal driving, so they visited many small roads only rarely.
We restricted ourselves to the identification of only three traffic controls. A small fleet of instrumented vehicles would sample a larger number of roads and would provide sufficient traversals to eliminate under-sampling. It would also provide the data to explore other types of controls, such as traffic lights for turns.
![]() |
We eliminated data exhibiting infrequent stops, because it is ambiguous. It is difficult to discriminate between a clear segment that shows an occasional, random stop from a segment with a traffic light that is infrequently red. Confusion can also result from traffic backup from segments with lights onto clear segments.
Other solutions, besides simply eliminating the data, are possible. For example, the multi-segment post-processor might exploit characteristics of the distribution of stop positions to disambiguate data exhibiting infrequent stops. The frequency histogram in Figure 4 shows that the distribution of stop positions is qualitatively different on clear segments than on segments with traffic lights. This approach requires many more samples for each segment than we had available for the work reported here.