Tag Archives: OLS

Using a Genetic Algorithm to Minimize an OLS Regression in R

A genetic algorithm allows you to optimize parameters by using an algorithm that mimics biological evolution. It will run through several generations of values trying to find the values that minimizes [or maximizes depending on the algorithm] its fitness or evaluation function, which is just any function that returns a value from the parameters the algorithm is optimizing.

There is a lot of literature on how genetic algorithms work, and I would recommend reading those if you want the technical details on how they work. Genetic algorithms are typically demonstrated by the knapsack algorithm problem [Numb3rs Scene Youtube], where you look to optimize the survival points by seeking the right combination of survival items weighing under a specified amount to fit in a knapsack. This R-bloggers site has a good demonstration of that example and code. However, I find it more interesting to use a genetic algorithm on something more familiar to analytics and statistics, and that’s the ordinary least squares regression (OLS).

OLS minimizes the sum of squared error (SSE) to find the best fit line or regression line for the data set. This is derived using matrix calculus, and it’s computational efficient, easy to understand, and ubiquitous.

Since OLS essentially is an algorithm that uses calculus to minimize SSE, we can use a genetic algorithm to accomplish the same task. R’s GA (genetic algorithm) package allows you to use either binary or real numbers as parameters for the fitness function. Traditionally, genetic algorithms use binary parameters [see the knapsack algorithm], but for this problem, real numbers will be much more useful since the regression coefficients will be real numbers.

The GA algorithm will create a vector of real numbers between -100 and 100, then use that vector to evaluation a regression equation in the fitness function. The fitness function returns the SSE. Since the GA algorithm seeks to maximize the fitness function, the function has a negative sign in front of it, so the lowest absolute SSE will at the maximum if it’s negative. The GA has a population of 500 vectors which are evaluated with the fitness function, and the best solutions are generally kept and children vectors are created, the process is repeated 500 times. The results is a SSE that is very close the OLS solution, and parameter estimates that match up as well.

I’ve included two different linear models. The first has only two variables which play significant roles in the OLS regression, and a second model which has every variable with not all being significant. You can run it a few times and see how the GA solutions differ. The first model’s GA estimates will be a lot closer to the OLS’ estimates than the second model’s.

All of this is rather academic for well-behaved linear regression problems, since GA are computationally expensive taking forever relative to your standard OLS procedure.

The full annotated R code follows:


#loads an airquality dataframe

#removes missing data
airquality <- na.omit(airquality) 

#### create a function to evaluate a linear regression
#### takes intercept and the two best variables to compute the predicted y_hat
#### then computes and returns the SSE for each chromosome
#### we will try to minimize the SSE like OLS does

OLS <- function(data, b0, b1, b2){
  attach(data, warn.conflicts=F)
  Y_hat <- b0  + b1*Wind + b2*Temp
  SSE = t(Ozone-Y_hat) %*% (Ozone-Y_hat) #matrix formulation for SSE

#### this sets up a real-value GA using 3 parameters all from -100 to 100
#### the parameters use real numbers (so floating decimals) and passes those to
#### the linear regression equation/function
#### the real-value GA requires a min and max
#### this takes a while to run

ga.OLS <- ga(type='real-valued', min=c(-100,-100, -100), 
             max=c(100, 100, 100), popSize=500, maxiter=500, names=c('intercept', 'Wind', 'Temp'),
             keepBest=T, fitness = function(b) -OLS(airquality, b[1],b[2], b[3]))

#### summary of the ga with solution
ga.model <- summary(ga.OLS)

#### check against the results against the typical OLS procedure
lm.model <- lm(formula= Ozone ~ Wind + Temp, data=airquality)
lm.model$res %*% lm.model$res ### SSE.lm
-ga.model$fitness ### SSE.ga

lm.model$res %*% lm.model$res + ga.model$fitness  ### difference between OLS and GA's SSE

#### FULL MODEL ####

OLS.FULL <- function(data, b0, b1, b2, b3, b4, b5){
  attach(data, warn.conflicts=F)
  Y_hat <- b0 + b1*Solar.R + b2*Wind + b3*Temp + b4*Month + b5*Day  # linear regression equation
  SSE = t(Ozone-Y_hat) %*% (Ozone-Y_hat) #matrix formulation for SSE

#### this sets up a real-value GA using 6 parameters all from -100 to 100
#### the parameters use real numbers (so floating decimals) and passes those to
#### the linear regression equation/function
#### the real-value GA requires a min and max
#### this takes a while to run related to the survival pack
#### this will produce some values that vary a lot from OLS estimates since not all values are significant
#### some estimates should have high standard error
ga.OLS <- ga(type='real-valued', min=c(-100,-100, -100, -100, -100, -100), 
             max=c(100,100, 100, 100, 100, 100), popSize=500, maxiter=500, 
             keepBest=T, fitness = function(b) -OLS.FULL(airquality, b[1],b[2], b[3], b[4], b[5], b[6]))

#### summary of the ga with solution

#### check against the results against the typical OLS procedure
summary(lm(formula= Ozone ~ Wind + Temp, data=airquality))

OLS Derivation

Ordinary Least Squares (OLS) is a great low computing power way to obtain estimates for coefficients in a linear regression model. I wanted to detail the derivation of the solution since it can be confusing for anyone not familiar with matrix calculus.

First, the initial matrix equation is setup below. With X being a matrix of the data’s p covariates plus the regression constant. [This will be represented as a column of ones if you were to look at the data in the X matrix.] Y is the column matrix of the target variable and β is the column matrix of unknown coefficients. e is a column matrix of the residuals.

$latex \mathbf{Y = X} \boldsymbol{\beta} + \boldsymbol{e} &s=1$

Before manipulating the equation it is important to note you are not solving for X or Y, but instead β and will do this by minimizing the sum of squares for the residuals (SSE). So the equation can be rewritten by moving the error term to the left side of the equation.

$latex \boldsymbol{e} = \mathbf{Y – X} \boldsymbol{\beta}&s=1$

The SSE can be written as the product of the transposed residual column vector and its original column vector. [This is actually how you would obtain the sum of squares for any vector.]

$latex \mathrm{SSE} = \boldsymbol{e}’\boldsymbol{e} &s=1$

Since you transpose and multiply one side of the equation, you have to follow suit on the other side. Yielding

$latex \boldsymbol{e’e} = (\mathbf{Y – X} \boldsymbol{\beta})'(\mathbf{Y – X} \boldsymbol{\beta})&s=1$

The transpose operator can be distributed through out the quantity on the right side, so the right side can be multiplied out.

$latex \boldsymbol{e’e} = (\mathbf{Y’ – \boldsymbol{\beta}’X’})(\mathbf{Y – X} \boldsymbol{\beta})&s=1$

Using the rule that A’X = X’A, you can multiple out the right side and simplify it.

$latex \boldsymbol{e’e} = (\mathbf{Y’Y – Y’X\boldsymbol{\beta} – \boldsymbol{\beta}’X’Y} + \boldsymbol{\beta’\mathbf{X’X}\beta})&s=1$
$latex \boldsymbol{e’e} = (\mathbf{Y’Y – \boldsymbol{\beta}’X’Y – \boldsymbol{\beta}’X’Y} + \boldsymbol{\beta’\mathbf{X’X}\beta})&s=1$
$latex \boldsymbol{e’e} = (\mathbf{Y’Y – 2\boldsymbol{\beta}’X’Y} + \boldsymbol{\beta’\mathbf{X’X}\beta})&s=1$

To minimize the SSE, you have to take the partial derivative relative to β. Any terms without a β term in them will go to zero. Using the transpose rule from before you can see how the middle term yields -2X’Y using differentiation rules from Calc1. The last term is a bit tricky, but it derives to +2X’Xβ.

$latex \frac{\delta\boldsymbol{e’e}}{\delta\boldsymbol{\beta}} = \frac{\delta\mathbf{Y’Y}}{\delta\boldsymbol{\beta}} – \frac{2\boldsymbol{\beta}’\mathbf{X’Y}}{\delta\boldsymbol{\beta}} + \frac{\delta\boldsymbol{\beta’\mathbf{X’X}\beta}}{\delta\boldsymbol{\beta}}&s=1$

$latex \frac{\delta\boldsymbol{e’e}}{\delta\boldsymbol{\beta}} = – 2\mathbf{X’Y} + 2\boldsymbol{\mathbf{X’X}\beta}&s=1$

To find the minimum (it will never be a maximum if you have all the requirements for OLS fulfilled), the derivative of the SSE is set to zero.

$latex 0 = – 2\mathbf{X’Y} + 2\mathbf{X’X}\boldsymbol{\beta}&s=1$

$latex 0 = \mathbf{- X’Y} + \mathbf{X’X}\boldsymbol{\beta}&s=1$

Using some basic linear algebra and multiplying both sides by the inverse of (X’X)…

$latex (\mathbf{X’X})^{-1}\mathbf{X’X}\boldsymbol{\beta} = (\mathbf{X’X})^{-1}\mathbf{X’Y}&s=1$

…yields the solution for β

$latex \boldsymbol{\beta} = (\mathbf{X’X})^{-1}\mathbf{X’Y}&s=1$


The Mathematical Derivation of Least Squares. Psychology 8815. Retrieved from: http://isites.harvard.edu/fs/docs/icb.topic515975.files/OLSDerivation.pdf

Chatterjee, S & Hadi, A. (2012). Regression analysis by example. Hoboken, NJ: John Wiley & Sons, Inc.