Using FITPACK Smoothing Spline Routines from C++

Many modern libraries provide smoothing and interpolation routines that are actually interfaces to old Fortran libraries written by de Boor and Dierkx. For most of us, the inner workings of these routines are best kept shrouded in mystery, and in any case there’s no sense in reinventing the wheel. They can be used easily in their Python and R implementations or linked from C and C++ code. I’m going to discuss an implementation of the latter.

In particular, I’m interested in something called the smoothing or approximating spline, which can be calculated in one or more dimensions and any degree between 1 and 5. In general, the smoothing spline is an attempt to balance two competing needs: a) to maximize the smoothness of a curve (which, at its limit, would be a straight line or plane); and b) to minimize the sum of squared differences between the approximation and the data. This is accomplished by minimizing the function,

    \[S(x) = p \sum_{i=1}^{n} \left( \frac{(Y_i - \hat{f}(x_i))^2}{\sigma_i^2} \right) + (1 - p) \int{ \left( \hat{f}^{(m)}(x) \right)^2 } dx,\]

where p, the weight, from [0,1], is applied to the first term (the least squares part), and its negation is applied to the second term (the integral part). The denominator of the least squares term is a weight for which the FITPACK documentation suggests the reciprocal of the standard deviation. In that case, the smoothing factor is recommended to be in the vicinity of n. One may also choose a variable weight.

I had acquired some LiDAR data that covered an enormous geographic region, with flight lines approaching 45km long, far distant from the nearest GPS base station. The accumulation of drift had caused up to 2m of vertical, along-track divergence in the flight lines. I wanted to use a 2-dimensional cubic smoothing spline in an attempt to resolve the mismatch between LiDAR flight lines, without resorting to a point-based method (or an expensive license for TerraMatch).

The general idea was to compute a smooth surface approximating the average ground height across all flight lines where they overlap, and then to compute a separate surface for each flight line, adjusting its elevation by the difference between the two surfaces. As a result of this process, the z-coordinates of the point cloud would be adjusted, leaving the x/y-coordinates untouched.

The adjusted flight lines would have no real correspondence to the “true” elevation, but the latter would be impossible to recover anyway, given the loss of accuracy during the acquisition. Since a smooth surface was the primary need, it was acceptable to produce one, then worry about validating it against field measurements later.

To do this, I wrote a program in C++, which links into to Fortran functions provided by Dierkx’ FITPACK library: surfit and surev. surfit accepts lists of x, y and z coordinates, along with a list of weights and a smoothing parameter, and produces list of knots in x and y, along with a list of b-spline coefficients. These are used by surfit to compute the approximate z-coordinate at any x/y location.

The first step is to declare the Fortran functions.

extern "C" {

    void surfit_(int* iopt, int* m, double* x, double* y, double* z, 
        double* w, double* xb, double* xe, double* yb, double* ye, 
        int* kx, int* ky, double* s, int* nxest, int* nyest, 
        int* nmax, double* eps, int* nx, double* tx, int* ny, double* ty, 
        double* c, double* fp, double* wrk1, int* lwrk1, 
        double* wrk2, int* lwrk2, int* iwrk, int* kwrk, int* ier);

    void surev_(int* idim, double* tu, int* nu, double* tv, int* nv,
        double* c, double* u, int* mu, double* v, int* mv, double* f, int* mf,
        double* wrk, int* lwrk, int* iwrk, int* kwrk, int* ier);
}

Note the underscore after each function name. The gcc compiler uses this convention to link to the library produced by gfortran. The arguments are well documented in the original Fortran code, but note that all passed as pointers, whether scalars or arrays. For parameters that accept an array of double, such as the x, y and z parameters, you can create and resize a vector, then pass in the pointer returned by its data() method. These functions are called like any other function.

Compiling the Fortran library is easy on Linux using gfortran and cmake. In your CMakeLists.txt file, add something like the following:

enable_language (Fortran)
file (GLOB FSRC src/fitpack/*.f)
add_library (fitpack SHARED ${FSRC}) 

Line 1 enables the Fortran compiler; line 2 locates all the source files in the source tree and line 3 compiles them into a shared library called libfitpack. To link the Fotran library to the executable, link the target as usual.

add_executable (cloudmatch src/apps/cloudmatch.cpp)
target_link_libraries (cloudmatch ${GDAL_LIBRARY} ${PDAL_LIBRARIES} fitpack)

Here, I have an executable callse cloudmatch with one source file which is linked to GDAL, PDAL and the libfitpack.