GPS accuracy can vary, and is usually pretty bad, particularly in “urban canyon” (multipath error) or forested environments. This requires the geographer to snap the locations to a road network (assuming it is known that a subject travelled along a road) in order to do any useful analysis. This, to put it lightly, is a challenge.

In one sense, snapping is a fool’s errand — Strava’s official policy is that they won’t do it, and for good reason: the relative accuracy of the time stamps of the locations in a GPS track is pretty high, though sometimes imprecise, but the positional accuracy is often terrible. It is impossible to know with certainty how fast a subject was going because the exact length of a segment is unknown. Even if one can snap the locations to the road network with 100% accuracy, the distance from one to the next is not solvable. The only way to overcome this would be to record the average speed of the subject during the segment using an external, mechanical sensor. Strava’s policy seems to be that you can not improve accuracy by snapping, so why do it?

I can think of one reason. The slope of a segment is a critical parameter to the power function. The DEM used in this project is a highly accurate 2m, LiDAR-derived raster, with vertical accuracy in the range of 30cm. Roadways tend to have a consistent, or gradually-changing, slope lengthwise, but the surrounding terrain can have radical variations in elevation within a few meters of the pavement. The accuracy of the DEM means that both the gradual slope of the roadway and the radical topography of the surrounding environment are accurately represented. Snapping the track locations to the roadway guarantees that elevation values are taken from the pavement.

Assuming there is a good algorithm for snapping locations to the road network, and accepting that there is no good way of correcting the distances between snapped locations, is the slope from snapped positions still useful? If the power calculations are performed on un-snapped points, does it make sense to use the slopes from snapped points in the power function?

I’ll revisit that some other time. For now, here’s how I did the snapping.

The algorithm, arrived at through some study of the literature, and a lot of trial and error is as follows:

- Locate the nearest road segments to each point, within a given distance, ordered by the distance from the point.
- Calculate the difference between the azimuth of the segment ending with the point, and azimuth of the nearest road section.
- Determine the confidence of of the match between a point and a road segment. A match is confident if:
- there is only one road segment within the search distance,
- the point is within a given distance of the segment, and
- the azimuthal difference is small.

- Snap the confident points to the road network at the nearest point on the road segment.
- Between each pair of confident matches, determine the shortest path along the road network using Dijkstra’s shortest path algorithm.
- Match the un-snapped points to the segments returned by the shortest path search.
- Snap the un-snapped points.

The algorithm was realized as a PostgreSQL function which is dependent on the plpython, PostGIS and pgRouting extensions. Postgresql functions are loaded into the database and executed within the database’s process space. plpython allows these functions to be written in Python. pgRouting is a graph manipulation extension that builds graphs from linework (roads) and provides methods for traversing them.

The road network was created from OpenStreetMap data clipped to the Victoria area. All lines that did not have a *highway* designation, or for which the *highway* designation was “footway,” were deleted. Then the linestrings were decomposed into individual segments, with each row in the table containing one segment. For each segment, the azimuth was computed and modulated to 180 degrees (North and South are irrelevant).

A GPS track was downloaded from Strava for testing. The ride begins at the University of Victoria and ends at my home, a distance of about 20km. The track was parsed by a custom made Java library and saved to the database as a point set, with each of the 2326 points having an x, y, z and time coordinate. GPS tracks with power data will also have a power attribute.

First, the points in the ride are selected in order, by SQL:

// Select the points of the ride select id, azimuth from points where ride_id=$1 order by id

A Python script loops the points. For each, the candidate (nearest, within the search distance) road segments are found.

// Select the road segment match(es) for each point. select distinct on(l.osm_id) l.gid, l.osm_id, st_distance(l.geom, p.geom) as dist, abs(l.azimuth-p.azimuth) as az from osm_line l, points p where p.id=$1 and st_dwithin(l.geom, p.geom, 10.0) order by l.osm_id, st_distance(l.geom, p.geom)

The ID of the nearest segment, and the number of candidate segments are attached to each point object, and the averages of the distance and azimuthal differences are recorded.

The next step is to identify “anchor” points, those for which the confidence of a match with a road segment is high. I identified these, somewhat arbitrarily, by the criteria given in step 3, using 3x the average distance and 2x the average azimuthal difference. These values could certainly be refined, but they suffice to demonstrate the algorithm.

The final step in the algorithm is to iterate over each pair of anchor points that is separated by one or more intermediate points. pgRouting is used to find the shortest path between the anchor points, and the intermediate points are snapped to this path using a similar method to step 1.

// Find the shortest path between two points. select id2 as gid from pgr_dijkstra('select gid as id, source, target, cost_len as cost from osm_line', (select source from osm_line where gid=$1), (select target from osm_line where gid=$2), false, false) // Snap the intermediate points to the shortest path. update points set geom_snap=st_geomfromtext('POINT(' || st_x(t.pt) || ' ' || st_y(t.pt) || ' 0.0)', 32610) from ( select id, gid, st_closestpoint(lgeom, pgeom) as pt from ( select p.id, l.gid, l.geom as lgeom, p.geom as pgeom from osm_line l, points p where p.id=$1 and l.gid in({}) order by st_distance(p.geom, l.geom) limit 1 ) as u ) as t where points.id=t.id

The results of this procedure are pretty convincing, and some tough cases are well-handled. The image below shows a scenario where two roads merge, and the points in the track snap to the wrong road.

These points are eliminated as confident points, and instead snapped to the shortest path, as below.

The algorithm also performs well around tight corners, in complex routes.

Of course, this algorithm is not correct in the mathematical sense, but it should do a good job in assisting the extraction of elevations from the DEM for the calculation of the slopes (to be covered in another post).

Looking good Rob. So now you have spatial that you can use for estimating wind and spatial locations of the riders! What spatial resolution are you planning to use to model outputs?

I think the only place where resolution comes into play is with the DEM, which is 2m. The power calculations will be performed on the nodes of the GPS track, using the slope of the incident segment (which comes from the DEM) and the wind conditions associated with the intersecting school polygon. That is, other than calculating the slope, all the work is done with vectors.

The output (power) won’t be represented as maps, only numbers … unless I make a map with each segment coloured to represent average power along the segment.

On the other hand, temporal resolution might a factor. The times on my GPS tracks are in seconds. I don’t know how much of a difference that makes, or whether other GPS units are more precise.