To sample the road network, we equip a fleet of cars with absolute position sensors, and they record traces of their trips. Each record in the trace includes the latitude and longitude of the vehicle, as well as the estimated standard deviations on the latitude and longitude. The probe cars record positions at regular intervals. These probe cars require two main components: accurate position sensing, usually built around a GPS receiver, and communications with a centralized aggregator. The cost of GPS devices is rapidly decreasing, to the point where most new cars sold will have at least one GPS receiver in the next few years. Wireless technology is also advancing rapidly, and position communications may be ``piggybacked'' on other content, such as route update requests. In the near future, cars with these capabilities will become commonplace, making it possible to build a database of raw position traces with little cost.
Figure 1 plots two sample position traces in the San Francisco Bay Area. Some parts of one trace coincide with the other trace while other parts are solitary. The plot overlays the traces on a rough digital map available from Navigation Technologies, Inc. These maps divide the road network into portions of road between two intersections, called segments. For example, at a standard highway interchange, the segments are the part of the highway before the exit, the part between the exit and the entrance, and the part after the entrance. Each segment has a unique identifier and associated attributes, including the segments to which each end connects and a rough approximation of its shape.
![]() |
The problem with such a database is that it provides no direct data regarding the information of interest: the lane a car occupies at a given time. The database also does not provide the a priori number of lanes on a segment. Another problem is that the positioning systems will not be perfectly accurate. Generating lane models from such data requires the use of background knowledge about the domain to structure the input and statistical techniques to accommodate the noise.