5 min read

Regression Coefficients

Let’s say you’ve estimated a covariance matrix between a set of random variables. You can then determine the set of linear regression coefficients implied by that covariance matrix if one of the random variables is taken as the independent variable in a regression equation.

There are two approaches you can take: first, you can see that if you want to recover the ordinary least squares (OLS) estimates from the typical sample covariance matrix, you use Gaussian elimination to show the two calculations are numerically equivalent. The second requires statistical assumptions about the regression equation, and then you can recover the estimates from any arbitrary covariance matrix.

Gaussian Elimination

Let \(n\) be the number of observations and \(k\) be the number of independent variables. Then let \(x_{i,j}\) be the \(i^{th}\) observation of the \(j^{th}\) variable. Let the data matrix be as follows, where \({\mathbf{x}_{j}}^T = \begin{bmatrix} x_{1,j} & x_{2,j} & \cdots & x_{n,k} \end{bmatrix}\) and \(\mathbf{1}^T = \begin{bmatrix} 1 & \cdots & 1 \end{bmatrix}\):

\[\mathbf{X} = \begin{bmatrix} \mathbf{1} & \mathbf{x}_1 & \mathbf{x}_2 & \cdots & \mathbf{x}_k \end{bmatrix}\]

Now let \(y\) be the dependent variable in the linear regression equation \(y = \mathbf{\beta}^T\mathbf{x} + \epsilon\) where \(\mathbf{x}\) is a \(k\)-length vector with each element a value for one of the independent variables. We also have that the coefficient vector \(\mathbf{\beta}^T = \begin{bmatrix} \alpha & \beta_1 & \cdots & \beta_k \end{bmatrix}\) where \(\alpha\) is the constant coefficient. If we use ordinary least squares (OLS) to estimate the coefficients of the regression equation, then the estimate \(\hat{\mathbf{\beta}}\) of \(\mathbf{\beta}\) is the solution to the following system of equations:

\[\begin{equation} \mathbf{X}^T\mathbf{X\beta} = \mathbf{Xy} \end{equation}\]

where \(\mathbf{y}\) is the vector of \(n\) observations of the dependent variable. We note that we can multiply both sides of the above equation by \(\frac{1}{n-1}\) without impacting the equality. To solve this equation, rather than performing matrix multiplication of both sides by the inverse \((\mathbf{X}^T\mathbf{X})^{-1}\), we instead perform the first step of Gaussian elimination or equivalently the first step of an LU factorization.

First, we note that the if the sample mean for each independent variable is \(\bar{\mathbf{x}}_j = \frac{1}{n}\sum_{i=1}^nx_{i,j}\), then we have that

\[\mathbf{X}^T\mathbf{X} = \begin{bmatrix} n & n\bar{\mathbf{x}}_1 & \cdots & n\bar{\mathbf{x}}_k \\ n\bar{\mathbf{x}}_1 & \mathbf{x}_1^T\mathbf{x}_1 & \cdots & \mathbf{x}_1^T\mathbf{x}_k \\ \vdots & \vdots & \ddots & \vdots \\ n\bar{\mathbf{x}}_k & \mathbf{x}_k^T\mathbf{x}_1 & \cdots & \mathbf{x}_k^T\mathbf{x}_k \end{bmatrix}\]

The first step of Gaussian elimination is to multiply both sides of equation (1) by the following matrix that will set the first element in \(\mathbf{X}^T\mathbf{X}\) to 1 and delete the remainder of the entries in the first column:

\[\mathbf{L} = \begin{bmatrix} \frac{1}{n} & 0 & \cdots & 0 \\ -\bar{\mathbf{x}}_1 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ -\bar{\mathbf{x}}_k & 0 & \cdots & 1 \end{bmatrix}\]

Thus, performing the first step of Gaussian elimination (or equivalently, the first step of an LU factorization) on the LHS of equation (1) we get:

\[\frac{1}{n-1}\mathbf{LX}^T\mathbf{X\hat{\beta}} = \frac{1}{n-1} \begin{bmatrix} 1 & \bar{\mathbf{x}}_1 & \cdots & \bar{\mathbf{x}}_k \\ 0 & \mathbf{x}_1^T\mathbf{x}_1 - n\bar{\mathbf{x}}_1^2 & \cdots & \mathbf{x}_1^T\mathbf{x}_k - n\bar{\mathbf{x}}_1\bar{\mathbf{x}}_k \\ \vdots & \vdots & \ddots & \vdots \\ 0 & \mathbf{x}_k^T\mathbf{x}_1 - n\bar{\mathbf{x}}_k\bar{\mathbf{x}}_1 & \cdots & \mathbf{x}_k^T\mathbf{x}_k - n\bar{\mathbf{x}}_k^2 \end{bmatrix} \begin{bmatrix} \hat{\alpha} \\ \hat{\beta}_1 \\ \vdots \\ \hat{\beta}_k \end{bmatrix}\]

and on the RHS of (1) we get

\[\frac{1}{n-1}\mathbf{LX}^T\mathbf{y} = \frac{1}{n-1} \begin{bmatrix} \frac{1}{n} & \frac{1}{n} & \cdots & \frac{1}{n} \\ x_{1,1} - \bar{\mathbf{x}}_1 & x_{1,2} - \bar{\mathbf{x}}_1 & \cdots & x_{1,n} - \bar{\mathbf{x}}_1 \\ \vdots & \vdots & \ddots & \vdots \\ x_{k,1} - \bar{\mathbf{x}}_k & x_{k,2} - \bar{\mathbf{x}}_k & \cdots & x_{k,n} - \bar{\mathbf{x}}_k \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}\]

We then notice that the first row of the system of equations is \[\hat{\alpha} + \sum_{i=1}^k\bar{\mathbf{x}}_i\hat{\beta}_i = \frac{1}{n}\sum_{i=1}^ny_i = \bar{\mathbf{y}}\] and thus we can solve for \(\hat{\alpha}\): \[\begin{equation} \hat{\alpha} = \bar{\mathbf{y}} - \sum_{i=1}^k\bar{\mathbf{x}}_i\hat{\beta}_i \end{equation}\] The problem is then reduced to solving the following system of equations: \[\frac{1}{n-1} \begin{bmatrix} \mathbf{x}_1^T\mathbf{x}_1 - n\bar{\mathbf{x}}_1^2 & \cdots & \mathbf{x}_1^T\mathbf{x}_k - n\bar{\mathbf{x}}_1\bar{\mathbf{x}}_k \\ \vdots & \ddots & \vdots \\ \mathbf{x}_k^T\mathbf{x}_1 - n\bar{\mathbf{x}}_k\bar{\mathbf{x}}_1 & \cdots & \mathbf{x}_k^T\mathbf{x}_k - n\bar{\mathbf{x}}_k^2 \end{bmatrix} \begin{bmatrix} \hat{\beta}_1 \\ \vdots \\ \hat{\beta}_k \end{bmatrix} = \frac{1}{n-1}\hat{\Sigma_{\mathbf{x}}}\hat{\mathbf{\beta}} = \\ \frac{1}{n-1} \begin{bmatrix} x_{1,1} - \bar{\mathbf{x}}_1 & x_{1,2} - \bar{\mathbf{x}}_1 & \cdots & x_{1,n} - \bar{\mathbf{x}}_1 \\ \vdots & \vdots & \ddots & \vdots \\ x_{k,1} - \bar{\mathbf{x}}_k & x_{k,2} - \bar{\mathbf{x}}_k & \cdots & x_{k,n} - \bar{\mathbf{x}}_k \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \frac{1}{n-1} \begin{bmatrix} \hat{\sigma}_{\mathbf{x}_1, \mathbf{y}} \\ \hat{\sigma}_{\mathbf{x}_2, \mathbf{y}} \\ \vdots \\ \hat{\sigma}_{\mathbf{x}_k, \mathbf{y}} \end{bmatrix}\] where \(\hat{\Sigma}_{\mathbf{x}}\) is the sample covariance matrix between the \(\mathbf{x}_i\) and \(\hat{\sigma}_{\mathbf{x}_i, \mathbf{y}}\) are the sample covariances between \(\mathbf{x}_i\) and \(\mathbf{y}\). Therefore, we can solve for the coefficients as \[\begin{equation} \hat{\mathbf{\beta}} = \hat{\Sigma}_{\mathbf{x}}^{-1} \begin{bmatrix} \hat{\sigma}_{\mathbf{x}_1, \mathbf{y}} \\ \hat{\sigma}_{\mathbf{x}_2, \mathbf{y}} \\ \vdots \\ \hat{\sigma}_{\mathbf{x}_k, \mathbf{y}} \end{bmatrix} \end{equation}\] Thus, if we have already calculated the full sample covariance matrix \[\hat{\Sigma} = \begin{bmatrix} \hat{\sigma}_{\mathbf{y}} & \cdots\\ \vdots & \hat{\Sigma}_{\mathbf{x}} \end{bmatrix}\] then we can recover \(\hat{\alpha}\) and \(\hat{\mathbf{\beta}}\) following the above two calculations (2) and (3).

Second Approach

If we again start from the assumption that the linear regression equation \(y = \mathbf{\beta}^T\mathbf{x} + \epsilon\) holds along with \(cov(\epsilon, x_i) = 0\), we have that \[\sigma_{y,x_i} = cov(\mathbf{\beta}^T\mathbf{x}, x_i) = \sum_{j=1}^k\beta_j\sigma_{x_j,x_i}\] and thus we have that \[\begin{bmatrix} \sigma_{y,x_1} \\ \vdots \\ \sigma_{y,x_k} \end{bmatrix} = \Sigma_{\mathbf{x}}\mathbf{\beta}\] where we drop the first entry \(\alpha\) in \(\mathbf{\beta}\). Then if we use a previously estimated covariance matrix as above, we substitute our estimated values and solve for \(\mathbf{\beta}\) as \[\hat{\mathbf{\beta}} = \hat{\Sigma}_{\mathbf{x}}^{-1} \begin{bmatrix} \hat{\sigma}_{y,x_1}\\ \vdots \\ \hat{\sigma}_{y,x_k} \end{bmatrix}\]