What a view. . .

What a view. . .

Sunday, March 29, 2015

EngNote - Linear Least Squares

"EngNotes" are my engineering notes.  This is my way of creating a digital version of an engineering notebook.  This particular entry is on Linear Least Squares.



I often find myself using linear least squares curve fitting to estimate a data set.  Below is a quick write-up of linear-least squares that includes a derivation AND some sample Matlab code.  After the write-up I've also included more advanced versions of linear least squares curve fitting - weighted, constrained, and regularized linear least squares.  Often times I have found these variants to be necessary for the problem at hand.  At the very end there are some links to reference material that is invaluable.


Linear Least Squares Write-Up

The Problem
Formulating an equation to describe how one variable depends on another given a data set (xi,yi) i = 1,2,3,...,n where xi is the independent variable and yi is the dependent variable is known as curve fitting. One quantitative way to assess the quality of an equation's fit to the data set is by finding the sum of the squares of the offset (or residual) between the data and the describing equation.

Illustration 1: Example Data Set and Fit Equation

Introduction
The quality of an equation's fit to a data set is important. Linear least squares is a common and straightforward method used to find a function that best fits a data set. Keep in mind that linear least squares assumes the data set being modeled can be represented as a linear combination of basis functions. The basis functions do not need to be linear in nature, but the combination of them must.

Consider trying to fit a polynomial to a data set. The method of linear least squares will attempt to find the coefficients ( a,b,c,... ) of the polynomial p = a + bx + cx2 + … that minimize the sum of the squares of the offsets:

$S=\sum\limits_{i=1}^{n}(a+bx_{i}+cx^{2}_{i}+...-y_{i})^{2}$

1, x, x2, … compose the basis functions of the fit. Notice that the basis functions are not confined to be linear functions. The coefficients ( a,b,c,... ) form a weighting for the linear combination of the basis functions.

Derivation
When S, the sum of the squares of the offsets (errors), is viewed as a function of the polynomial's coefficients ( a,b,c,... ), S is minimized when its gradient is 0. Solving for the gradient results in n gradient equations:

$\frac{\partial S}{\partial a}=0, \frac{\partial S}{\partial b}=0, \frac{\partial S}{\partial c}=0, ...$

By combining all of the gradient equations the coefficients ( a,b,c,... ) that minimize S can be solved for. Below is an example of the steps involved in fitting a first-order polynomial (a line) to a data set.

Example
Data points (xi,yi) generated by:

$y_{i} = x_{i} + noise$

First order polynomial to be fit to the data set:

$a + bx_{i}$

Equation for the sum of the squares of the error:

$S=\sum\limits_{i=1}^{n}(a+bx_{i}-y_{i})^{2}$

Solving for the two gradient equations:

$\frac{\partial S}{\partial a}=-2\sum\limits_{i=1}^{n}(y_{i}-(a+bx_{i}))=0$
$na+b\sum\limits_{i=1}^{n}x_{i} = \sum\limits_{i=1}^{n}y_{i}$
$\frac{\partial S}{\partial b}=-2\sum\limits_{i=1}^{n}(y_{i}-(a+bx_{i}))x_{i}=0$
$a\sum\limits_{i=1}^{n}x_{i}+b\sum\limits_{i=1}^{n}x_{i}^{2} = \sum\limits_{i=1}^{n}x_{i}y_{i}$

Converting the two equations into matrix form gives:

$ \begin{bmatrix} n & \sum\limits_{i=1}^{n}x_{i} \\
\sum\limits_{i=1}^{n}x_{i} & \sum\limits_{i=1}^{n}x_{i}^{2} \end{bmatrix}

\begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} \sum\limits_{i=1}^{n}y_{i} \\ \sum\limits_{i=1}^{n}x_{i}y_{i} \end{bmatrix} $

Rearranging and solving for (a,b) yields:

$ \begin{bmatrix} a \\ b \end{bmatrix} =
\begin{bmatrix} n & \sum\limits_{i=1}^{n}x_{i} \\ \sum\limits_{i=1}^{n}x_{i} & \sum\limits_{i=1}^{n}x_{i}^{2} \end{bmatrix}^{-1}
\begin{bmatrix} \sum\limits_{i=1}^{n}y_{i} \\ \sum\limits_{i=1}^{n}x_{i}y_{i} \end{bmatrix} $

The general form of linear least squares is almost always given in the form of the following equation.  Reference [1] contains a detailed derivation.

$ a = (X^{T}X)^{-1}X^{T}y $


Examples

Linear Least Squares

$ a = (X^{T}X)^{-1}X^{T}y $

X contains basis functions in the columns of the matrix and different measurement states in the rows.  The vector 'a' is the coefficient vector being sought.  The vector 'y' is the measurement vector being fit.

% Linear least squares example
% ----------------------------

% Create the independent variable
x = [0:25].';

% Create the dependent variable (linear line of slope 1 in this example)
yTrue = x+5;

% Create observations of the dependent variable
% The rand call is adding noise to the observations
yMeas = x + rand(size(x)) + 5;

% Create the basis functions (model) that will be used to estimate the true 
% system. In this case, the basis functions are a constant function and a
% linear function (matches true system).
basisFunc = [ones(size(x)), x];

% Calculate the linear least squares fit
coefs = (basisFunc' * basisFunc)^-1 * basisFunc' * yMeas;
llsFit = basisFunc*coefs;

% Make a plot illustrating the fit
figure; plot(x,yMeas,'o','LineWidth',2);
hold all; plot(x,llsFit,'LineWidth',2);
grid on; legend('Measured Data','Predicted Fit');
text(1,25, ['Fit Equation:  '  num2str(coefs(2)) 'x + ' num2str(coefs(1))])
text(1,26.25, ['True Equation: y = x + 5']);


Weighted Linear Least Squares

$ a = (X^{T}WX)^{-1}X^{T}Wy $
$ W = diag(w) $

Weighted linear least squares allows one to assign less importance to the fit of particular measurement points.  A common reason to do this is if the statistics of some of the measurements you are trying to fit to are different from those of others.  An exaggerated case is given below for illustration.

Applying weights is a somewhat obvious extension.  Simply multiply the measurements and the basis function matrix by a diagonal matrix of the weight (importance) vector.

% Weighted linear least squares
% -----------------------------

% Create a dependent variable
x = [0:99].';
% Define a function to be measured/estimated (5*x)
% Add some noise to it (randn) to simulate a measurement process
y = 5*x + randn(size(x))*5;
basisFunc = [ ones(size(x)) x ];

% Calculate a plain-jane linear-least squares
coefs = (basisFunc'*basisFunc)^-1 * basisFunc' * y;

% Plot the results
figure;
subplot(3,1,1);
plot(x,y,'o');
hold all; plot(x,basisFunc*coefs,'LineWidth',2);
title('Linear Least Squares Fit With No Gross Measurement Error');
ylim([0 525]);

% Pick a measurement(s) and give it a gross error.  I strategicly picked
% two points to make it easier to see the impact.
y(100) = 1;
y(1) = 500;
% Calculate the plain-jane linear least squares with the gross error
coefs = (basisFunc'*basisFunc)^-1 * basisFunc' * y;

% Plot the results to illustrate how the gross error "pulls off" the fit
subplot(3,1,2);
plot(x,y,'o');
hold all; plot(x,basisFunc*coefs,'LineWidth',2);
title('Linear Least Squares Fit With A Gross Measurement Error');
ylim([0 525]);

% Create a weight vector - in this case the weight vector is relatively
% arbitrary to prove a point.  In general, determining the weight vector is
% the secret sauce of the weighted linear least squares.
weight = ones(size(y));
weight(100) = 0;
weight(1) = 0;
weight = diag(weight);

% Calculate the weighted linear least squares solution
coefs = (basisFunc'*weight*basisFunc)^-1 * basisFunc' * weight * y;

% Plot the weighted result to illustrate how the gross error no longer
% "pulls off" the fit
subplot(3,1,3);
plot(x,y,'o');
hold all; plot(x,basisFunc*coefs,'LineWidth',2);
title('Weighted Linear Least Squares Fit With A Gross Measurement Error');
ylim([0 525]);


Regularized Linear Least Squares

$ a = (X^{T}X + \mu I)^{-1}X^{T}y $
where $\mu$ is known as the regularization weight and $I$ is the identity matrix

If your basis functions don't model the process being fit, some are nearly dependent, or the number of basis functions is large, linear least squares can over-fit.  An over-fit solution can look good on paper but it can cause the fit to do "weird" things when evaluating a fit between measured points  (interpolation).  Said another way, it can increase a fit's sensitivity to input noise.  A common symptom of over-fitting is that the coefficients of the fit begin to grow rapidly.

One way around over-fitting is to use regularized linear least squares.  This technique adds a second objective to the least squares solution - minimizing the squared sum of the coefficients themselves.  The trick to this technique is in picking a regularization weight.  Too large of a weight and the fit favors shrinking the coefficients too much, resulting in a poor fit.  Too small of a weight and you wind up over-fitting.  Over-fitting and the balancing act of a regularization weight versus fit is illustrated by an L-curve.

Constrained Linear Least Squares

Often there are parts of the fit process I would like to control.  At one point in time I wanted the sum of the coefficients I was finding to sum to a specific value.  This can be easily accomplished by adding a faux value to all of the basis functions (adding an extra row to the X matrix) and a corresponding faux measurement to the measurement vector (adding an extra element to the y vector).

Keep in mind, this isn't a hard constraint but is part of the fitting process.  As a result, it isn't necessarily or exactly met.  You can increase the importance of the constraint over the fit by increasing the value of the faux values added.


% Constrained linear least squares
% -----------------------------

% Make up a measurement vector
meas = [-6 ; 0.7 ; 0.0];
% Define a regularization weight (not necessarily needed)
lambda = 0.00005;

basisFunc = [0.5 -3.0 0.0 ; 0.3 -1 0.0 ; 0.8 0.3 0.0 ; 0.71 2.3 0.0].';

for weightNdx = 0:99
  % Add the constraint, making the number larger increases the importance of 
  % the constraint
  meas(4) = weightNdx/10;
  basisFunc(4,:) = weightNdx/10;
  
  % Calculate the regularized linear least squares solution
  coefs = (basisFunc' * basisFunc + lambda .* eye(size(basisFunc,2)))^-1 * ...
    basisFunc' * meas;
  
  % Debug evaluation
  coefSize(weightNdx+1) = sum(coefs);
  
end

% Plot the sum of the coeficients to show it approaching one
figure; plot(linspace(0,9.9,100),coefSize,'LineWidth',2);
title('Sum of the Coefficients of the Fit');
ylabel('Sum of Coefficients');
xlabel('Weight Assigned to Sum');



Good References

[1] The Mathematical Derivation of Least Squares [Local Mirror]
http://isites.harvard.edu/fs/docs/icb.topic515975.files/OLSDerivation.pdf

[2] Least-Squares [Local Mirror]
http://stanford.edu/class/ee103/lectures/least-squares/least-squares_slides.pdf

[3] Regularized Least-Squares and Gauss-Newton Method [Local Mirror]
http://see.stanford.edu/materials/lsoeldsee263/07-ls-reg.pdf

[4] Constrained Linear Least Squares [Local Mirror]
http://people.duke.edu/~hpgavin/cee201/constrained-least-squares.pdf

[5] The L-curve and its use in thenumerical treatment of inverse problems [Local Mirror]
https://www.sintef.no/globalassets/project/evitameeting/2005/lcurve.pdf

[6] Choosing the Regularization Parameter [Local Mirror]
http://www2.compute.dtu.dk/~pcha/DIP/chap5.pdf



No comments:

Post a Comment