Regression Coefficients
With simple linear regression, there is one dependent variable and one independent variable. The regression equation is:
ŷ = b_{0} + b_{1}x
In the previous lesson, we developed a leastsquares solution for the regression coefficients of simple linear regression:
b_{1} = Σ [ (x_{i}  x)(y_{i}  y) ] / Σ [ (x_{i}  x)^{2}]
b_{0} = y  b_{1} * x
where ŷ is the predicted value of the dependent variable, b_{0} and b_{1} are regression coefficients,
x_{i} is the value of the independent variable for observation i,
y_{i} is the value of the dependent variable for observation i,
x is the mean x score, and y is the mean y score.
In this lesson, we describe a leastsquares solution for the regression coefficients of multiple regression.
The Multiple Regression Challenge
With simple linear regression, there are only two regression coefficients  b_{0} and b_{1}. There are only two
normal equations. Finding a leastsquares solution involves
solving two equations with two unknowns  a task that is easily managed with ordinary algebra.
With multiple regression, things get more complicated. There are k independent variables and k + 1 regression
coefficients. There are k + 1 normal equations. Finding a leastsquares solution involves solving k + 1
equations with k + 1 unknowns. This can be done with ordinary algebra, but it is unwieldy.
To handle the complications of multiple regression, we will use matrix algebra.
Matrix Algebra
To follow the discussion on this page, you need to understand a little matrix algebra. Specifically, you should be
familiar with matrix addition, matrix subtraction, and matrix multiplication. And you should know about matrix
transposes and matrix inverses.
If you are unfamiliar with these topics, check out the free matrix algebra tutorial
on this site.
The Regression Equation in Matrix Form
With multiple regression, there is one dependent variable and k dependent variables. The regression equation is:
ŷ = b_{0} + b_{1}x_{1} + b_{2}x_{2} + … +
b_{k1}x_{k1} + b_{k}x_{k}
where ŷ is the predicted value of the dependent variable, b_{k} are regression coefficients,
and x_{k} is the value of independent variable k.
To express the regression equation in matrix form, we need to define three matrices: Y, b, and X.
X = 

1 
X_{1, 1} 
X_{1, 2} 
. 
. 
. 
X_{1, k} 

1 
X_{2, 1} 
X_{2, 2} 
. 
. 
. 
X_{2, k} 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
1 
X_{n, 1} 
X_{n, 2} 
. 
. 
. 
X_{n, k} 

Here, the dataset consists of n records. Each record includes scores for 1 dependent variable and k independent variables.
Y is an n x 1
vector that holds predicted values of the dependent variable; and b is a
k + 1 x 1 vector that holds estimated regression coefficients. Matrix X has a column of 1's plus k columns of
values for each independent variable in the regression equation.
Given these matrices, the multiple regression equation can be expressed concisely as:
Y = Xb
It is sort of cool that this simple expression describes the regression equation for 1, 2, 3, or any number of independent variables.
Normal Equations in Matrix Form
Just as the regression equation can be expressed compactly in matrix form, so can the normal equations. The least squares normal equations
can be expressed as:
X'Y = X'Xb or X'Xb = X'Y
Here, matrix X' is the transpose
of matrix X. To solve for regression coefficients, simply premultiply by the inverse of X'X:
(X'X)^{1}X'Xb = (X'X)^{1}X'Y
b = (X'X)^{1}X'Y
where (X'X)^{1}X'X = I, the identity matrix.
In the real world, you will probably never compute regression coefficients by hand. Generally, you will
use software, like SAS, SPSS, minitab, or excel. In the problem below, however, we will compute regression coefficients manually;
so you will understand what is going on.
Test Your Understanding
Problem 1
Consider the table below. It shows three performance measures for five students.
Student 
Test score 
IQ 
Study hours 
1 
100 
110 
40 
2 
90 
120 
30 
3 
80 
100 
20 
4 
70 
90 
0 
5 
60 
80 
10 
Using least squares regression, develop a regression equation to predict test score, based on (1) IQ and (2) the number of hours
that the student studied.
Solution
For this problem, we have some raw data; and we want to use this raw data to define a leastsquares regression equation:
ŷ = b_{0} + b_{1}x_{1} + b_{2}x_{2}
where ŷ is the predicted test score; b_{0}, b_{1}, and b_{2} are regression coefficients;
x_{1} is an IQ score; and x_{2} is the number of hours that the student studied.
On the right side of the equation, the only unknowns are the regression coefficients. To define the regression coefficients, we use the following equation:
b = (X'X)^{1}X'Y
To solve this equation, we need to complete the following steps:
 Define X.
 Define X'.
 Compute X'X.
 Find the inverse of X'X.
 Define Y.
Let's begin with matrix X. Matrix X has a column of 1's plus two columns of
values for each independent variable. So, this is matrix X and its transpose X':
X = 

1 
110 
40 

1 
120 
30 
1 
100 
20 
1 
90 
0 
1 
80 
10 

X' = 

1 
1 
1 
1 
1 

110 
120 
100 
90 
80 
40 
30 
20 
0 
10 

Given X' and X, it is a simple matter to compute X'X.
X'X = 

5 
500 
100 

500 
51,000 
10,800 
100 
10,800 
3,000 

Finding the inverse of X'X takes a little more effort. A way to find the inverse is described on this site at
https://stattrek.com/matrixalgebra/howtofindinverse.aspx.
Ultimately, we find:
(X'X)^{1} = 

101/5 
7/30 
1/6 

7/30 
1/360 
1/450 
1/6 
1/450 
1/360 

Next, we define Y, the vector of dependent variable scores. For this problem, it is the vector of test scores.
With all of the essential matrices defined, we are ready to compute the least squares regression coefficients.
b = (X'X)^{1}X'Y
To conclude, here is our leastsquares regression equation:
ŷ = 20 + 0.5x_{1} +0.5x_{2}
where ŷ is the predicted test score;
x_{1} is an IQ score; and x_{2} is the number of hours that the student studied.
The regression coefficients are b_{0} = 20, b_{1} = 0.5, and b_{2} = 0.5.