"EngNotes" are my engineering notes. This is my way of creating a digital version of an engineering notebook. This particular entry is on
Linear Least Squares.
I often find myself using linear least squares curve fitting to estimate a data set. Below is a quick write-up of linear-least squares that includes a derivation AND some sample Matlab code. After the write-up I've also included more advanced versions of linear least squares curve fitting - weighted, constrained, and regularized linear least squares. Often times I have found these variants to be necessary for the problem at hand. At the very end there are some links to reference material that is invaluable.
Linear Least Squares Write-Up
The Problem
Formulating an
equation to describe how one variable depends on another given a data
set (x
i,y
i) i = 1,2,3,...,n where x
i
is the independent variable and y
i is the dependent
variable is known as curve fitting. One quantitative way to assess
the quality of an equation's fit to the data set is by finding the
sum of the squares of the offset (or residual) between the data and
the describing equation.
Illustration
1: Example Data Set and Fit Equation
Introduction
The quality of an
equation's fit to a data set is important. Linear least squares is a
common and straightforward method used to find a function that best
fits a data set. Keep in mind that linear least squares assumes the
data set being modeled can be represented as a linear combination of
basis functions. The basis functions
do not need to be linear
in nature, but the combination of them
must.
Consider trying to
fit a polynomial to a data set. The method of linear least squares
will attempt to find the coefficients ( a,b,c,... ) of the polynomial
p = a + bx + cx
2 + … that minimize the sum of the
squares of the offsets:
$S=\sum\limits_{i=1}^{n}(a+bx_{i}+cx^{2}_{i}+...-y_{i})^{2}$
1, x, x2, … compose the basis functions of the fit.
Notice that the basis functions are not confined to be linear
functions. The coefficients ( a,b,c,... ) form a weighting for the
linear combination of the basis functions.
Derivation
When S, the sum of
the squares of the offsets (errors), is viewed as a function of the
polynomial's coefficients ( a,b,c,... ), S is minimized when its
gradient is 0. Solving for the gradient results in n gradient
equations:
$\frac{\partial S}{\partial a}=0, \frac{\partial S}{\partial b}=0, \frac{\partial S}{\partial c}=0, ...$
By
combining all of the gradient equations the coefficients ( a,b,c,...
) that minimize S can be solved for. Below is an example of the
steps involved in fitting a first-order polynomial (a line) to a data
set.
Example
Data points (x
i,y
i) generated by:
$y_{i} = x_{i} + noise$
First
order polynomial to be fit to the data set:
$a + bx_{i}$
Equation for the sum of the squares of the error:
$S=\sum\limits_{i=1}^{n}(a+bx_{i}-y_{i})^{2}$
Solving for the two gradient equations:
$\frac{\partial S}{\partial a}=-2\sum\limits_{i=1}^{n}(y_{i}-(a+bx_{i}))=0$
$na+b\sum\limits_{i=1}^{n}x_{i} = \sum\limits_{i=1}^{n}y_{i}$
$\frac{\partial S}{\partial b}=-2\sum\limits_{i=1}^{n}(y_{i}-(a+bx_{i}))x_{i}=0$
$a\sum\limits_{i=1}^{n}x_{i}+b\sum\limits_{i=1}^{n}x_{i}^{2} = \sum\limits_{i=1}^{n}x_{i}y_{i}$
Converting the two equations into matrix form gives:
$ \begin{bmatrix} n & \sum\limits_{i=1}^{n}x_{i} \\
\sum\limits_{i=1}^{n}x_{i} & \sum\limits_{i=1}^{n}x_{i}^{2} \end{bmatrix}
\begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} \sum\limits_{i=1}^{n}y_{i} \\ \sum\limits_{i=1}^{n}x_{i}y_{i} \end{bmatrix} $
Rearranging and solving for (a,b) yields:
$ \begin{bmatrix} a \\ b \end{bmatrix} =
\begin{bmatrix} n & \sum\limits_{i=1}^{n}x_{i} \\ \sum\limits_{i=1}^{n}x_{i} & \sum\limits_{i=1}^{n}x_{i}^{2} \end{bmatrix}^{-1}
\begin{bmatrix} \sum\limits_{i=1}^{n}y_{i} \\ \sum\limits_{i=1}^{n}x_{i}y_{i} \end{bmatrix} $
The general form of linear least squares is almost always given in the form of the following equation. Reference [1] contains a detailed derivation.
$ a = (X^{T}X)^{-1}X^{T}y $
Examples
Linear Least Squares
$ a = (X^{T}X)^{-1}X^{T}y $
X contains basis functions in the columns of the matrix and different measurement states in the rows. The vector 'a' is the coefficient vector being sought. The vector 'y' is the measurement vector being fit.
% Linear least squares example
% ----------------------------
% Create the independent variable
x = [0:25].';
% Create the dependent variable (linear line of slope 1 in this example)
yTrue = x+5;
% Create observations of the dependent variable
% The rand call is adding noise to the observations
yMeas = x + rand(size(x)) + 5;
% Create the basis functions (model) that will be used to estimate the true
% system. In this case, the basis functions are a constant function and a
% linear function (matches true system).
basisFunc = [ones(size(x)), x];
% Calculate the linear least squares fit
coefs = (basisFunc' * basisFunc)^-1 * basisFunc' * yMeas;
llsFit = basisFunc*coefs;
% Make a plot illustrating the fit
figure; plot(x,yMeas,'o','LineWidth',2);
hold all; plot(x,llsFit,'LineWidth',2);
grid on; legend('Measured Data','Predicted Fit');
text(1,25, ['Fit Equation: ' num2str(coefs(2)) 'x + ' num2str(coefs(1))])
text(1,26.25, ['True Equation: y = x + 5']);
Weighted Linear Least Squares
$ a = (X^{T}WX)^{-1}X^{T}Wy $
$ W = diag(w) $
Weighted linear least squares allows one to assign less importance to the fit of particular measurement points. A common reason to do this is if the statistics of some of the measurements you are trying to fit to are different from those of others. An exaggerated case is given below for illustration.
Applying weights is a somewhat obvious extension. Simply multiply the measurements and the basis function matrix by a diagonal matrix of the weight (importance) vector.
% Weighted linear least squares
% -----------------------------
% Create a dependent variable
x = [0:99].';
% Define a function to be measured/estimated (5*x)
% Add some noise to it (randn) to simulate a measurement process
y = 5*x + randn(size(x))*5;
basisFunc = [ ones(size(x)) x ];
% Calculate a plain-jane linear-least squares
coefs = (basisFunc'*basisFunc)^-1 * basisFunc' * y;
% Plot the results
figure;
subplot(3,1,1);
plot(x,y,'o');
hold all; plot(x,basisFunc*coefs,'LineWidth',2);
title('Linear Least Squares Fit With No Gross Measurement Error');
ylim([0 525]);
% Pick a measurement(s) and give it a gross error. I strategicly picked
% two points to make it easier to see the impact.
y(100) = 1;
y(1) = 500;
% Calculate the plain-jane linear least squares with the gross error
coefs = (basisFunc'*basisFunc)^-1 * basisFunc' * y;
% Plot the results to illustrate how the gross error "pulls off" the fit
subplot(3,1,2);
plot(x,y,'o');
hold all; plot(x,basisFunc*coefs,'LineWidth',2);
title('Linear Least Squares Fit With A Gross Measurement Error');
ylim([0 525]);
% Create a weight vector - in this case the weight vector is relatively
% arbitrary to prove a point. In general, determining the weight vector is
% the secret sauce of the weighted linear least squares.
weight = ones(size(y));
weight(100) = 0;
weight(1) = 0;
weight = diag(weight);
% Calculate the weighted linear least squares solution
coefs = (basisFunc'*weight*basisFunc)^-1 * basisFunc' * weight * y;
% Plot the weighted result to illustrate how the gross error no longer
% "pulls off" the fit
subplot(3,1,3);
plot(x,y,'o');
hold all; plot(x,basisFunc*coefs,'LineWidth',2);
title('Weighted Linear Least Squares Fit With A Gross Measurement Error');
ylim([0 525]);
Regularized Linear Least Squares
$ a = (X^{T}X + \mu I)^{-1}X^{T}y $
where $\mu$ is known as the regularization weight and $I$ is the identity matrix
If your basis functions don't model the process being fit, some are nearly dependent, or the number of basis functions is large, linear least squares can over-fit. An over-fit solution can look good on paper but it can cause the fit to do "weird" things when evaluating a fit between measured points (interpolation). Said another way, it can increase a fit's sensitivity to input noise. A common symptom of over-fitting is that the coefficients of the fit begin to grow rapidly.
One way around over-fitting is to use regularized linear least squares. This technique adds a second objective to the least squares solution - minimizing the squared sum of the coefficients themselves. The trick to this technique is in picking a regularization weight. Too large of a weight and the fit favors shrinking the coefficients too much, resulting in a poor fit. Too small of a weight and you wind up over-fitting. Over-fitting and the balancing act of a regularization weight versus fit is illustrated by an L-curve.
Constrained Linear Least Squares
Often there are parts of the fit process I would like to control. At one point in time I wanted the sum of the coefficients I was finding to sum to a specific value. This can be easily accomplished by adding a faux value to all of the basis functions (adding an extra row to the X matrix) and a corresponding faux measurement to the measurement vector (adding an extra element to the y vector).
Keep in mind, this isn't a hard constraint but is part of the fitting process. As a result, it isn't necessarily or exactly met. You can increase the importance of the constraint over the fit by increasing the value of the faux values added.
% Constrained linear least squares
% -----------------------------
% Make up a measurement vector
meas = [-6 ; 0.7 ; 0.0];
% Define a regularization weight (not necessarily needed)
lambda = 0.00005;
basisFunc = [0.5 -3.0 0.0 ; 0.3 -1 0.0 ; 0.8 0.3 0.0 ; 0.71 2.3 0.0].';
for weightNdx = 0:99
% Add the constraint, making the number larger increases the importance of
% the constraint
meas(4) = weightNdx/10;
basisFunc(4,:) = weightNdx/10;
% Calculate the regularized linear least squares solution
coefs = (basisFunc' * basisFunc + lambda .* eye(size(basisFunc,2)))^-1 * ...
basisFunc' * meas;
% Debug evaluation
coefSize(weightNdx+1) = sum(coefs);
end
% Plot the sum of the coeficients to show it approaching one
figure; plot(linspace(0,9.9,100),coefSize,'LineWidth',2);
title('Sum of the Coefficients of the Fit');
ylabel('Sum of Coefficients');
xlabel('Weight Assigned to Sum');
Good References
http://stanford.edu/class/ee103/lectures/least-squares/least-squares_slides.pdf
[3]
Regularized Least-Squares and Gauss-Newton Method [Local Mirror]
http://see.stanford.edu/materials/lsoeldsee263/07-ls-reg.pdf
[4]
Constrained Linear Least Squares [Local Mirror]
http://people.duke.edu/~hpgavin/cee201/constrained-least-squares.pdf
[5]
The L-curve and its use in thenumerical treatment of inverse problems [Local Mirror]
https://www.sintef.no/globalassets/project/evitameeting/2005/lcurve.pdf
[6]
Choosing the Regularization Parameter [Local Mirror]
http://www2.compute.dtu.dk/~pcha/DIP/chap5.pdf