This post is part 2. Start here.
The Server Side
When a user uploads a ride to the Web application, there are several processing steps that must be performed, before the power can be displayed.
- The GPX representation is converted into a standard in-memory representation that can be stored to and retrieved from a database.
- The weather data for the time span of the ride, and the school zones (Voronoi cells) that the ride passes through must be downloaded and/or retrieved.
- The ride points must be snapped to the road network.
- The elevations of each node in the ride must be read from the DEM and slopes calculated.
- The power is calculated.
Steps 1 and 2 (if the weather data has already been retrieved) can be performed synchronously — that is, while-you-wait — but the other steps are time consuming, and the user should not be expected to sit and watch a spinner while the application takes an indeterminate period of time to work. So the rides have to be put into a queue, where a background process can perform these actions when it has the time. The user should be able to access processing status messages, and should get an email when the processing is finished. Other Web services don’t need this processing step because they don’t factor in the weather, and if they did, they’d have it downloaded already. Plus, they have the computing resources to do things like snapping and elevation extraction nearly instantaneously.
At right is an activity diagram which gives an overview of the processing system. The Processor class implements Runnable and ServletContextListener. The listener is configured in the Servlet container’s web.xml to start on context initialization, and stop when the context is destroyed. This application has only one context, so the processor is effectively a singleton.
The REST API mediates between the Client and Server tiers in this diagram, but is not explicitly labelled. All REST calls have access to the ServletRequest, through which they can access the ServletContext, and hence the Processor, which provides a method, enqueueRide, through which the application can enqueue a ride to be processed. The processor’s run method idles until there’s a ride in the queue, at which point it removes the ride and performs the processing steps sequentially as in the diagram.
The processor also notifies the application when a ride has finished processing or if processing is failed. It does this by sending a message to the server, which triggers an email. The client can also retrieve a ride’s status at any time through the REST API, to learn whether it is queued, processing, completed or failed.
When a ride is first updated, it is saved to the database and is immediately available for display in its unprocessed form. This allows the user to see the ride displayed on the client. Behind the scenes, the ride is fed through the sausage factory…
The first thing the processor does is to ascertain whether the weather data for the school zones intersected by the path of the ride, and the time span covered by the ride, are available in the database. These checks are performed by first retrieving a list of schools via a spatial query:
SELECT DISTINCT ON(s.school_id) s.school_id FROM schools s INNER JOIN points p ON ST_Contains(s.poly, p.geom) WHERE p.ride_id=? ORDER BY s.school_id
Here, s.poly is the Voronoi cell of the school, and p.geom is a node in the ride track.
Then, another query checks whether the data exist for each school and date. If data cannot be found for the given school and data, they are downloaded from the school weather station network.
With wind data are retrieved, the wind speed and direction are set on every point using a spatial query:
UPDATE points p SET wind_dir=t.wind_dir, wind_speed=t.wind_speed FROM ( SELECT s.poly, w.wind_speed, w.wind_dir, date_trunc('minute', w.obs_date) as obs_date FROM weather w INNER JOIN schools s ON w.school_id=s.id ) AS t WHERE ST_Contains(t.poly, p.geom) AND date_trunc('minute', p.time)=t.obs_date
This query should actually include temperature, pressure and humidity as well, because the power algorithm requires parameters derived from these values. That algorithm will be explained later. The date_trunc function is used to round point times to the nearest minute, as the weather data are recorded with minute resolution, while rides are recorded with second resolution.
It does occur that a school will stop reporting weather conditions. Perhaps it will be necessary to calculate local conditions by averaging neighbouring cells. I have not examined any solutions. It is also possible that a record is missing for the requested minute, but not neighbouring minutes. This query could be adjusted to locate the nearest record in time, but at what temporal distance does the nearest record become meaningless? Another question to ponder.
What remains is a ride with nodes containing the wind speed and direction and slope. All that is left is to calculate the power.
All of the above is implemented, except for the power calculation and email notification. Of course, “implemented” and “durable under real-world loads” are two entirely different things…