Excel Data Analysis with Linear Regression

There are a lot of things that Excel is an amazing tool to use for. One of these things is data analysis, and as a part of it, linear regression. There are many definitions of linear regression. This is a statistical method that is used to model the relationship between one dependent variable (Y) and one or more independent ones (X).

Excel has a great built-in tool for linear regression, which is called “Regression”. In the example below, we will show how to do it.

Preparing Data Analysis

Several steps need to be taken to conduct a linear regression analysis in Excel:

  1. Prepare the Data

First thing first, we need to have our data. This means that we have a dependent variable (Y) and one or more independent variables (X) in our columns. Every row that we have should represent an individual data point.

  1. Enable Data Analysis ToolPak (if not already enabled)

To make sure that you can do the proper Data Analysis, you need to enable Data Analysis ToolPak. To do this, you need to go to File >> Options >> Add-ins. There, in the Add-ins window, we will select Excel Add-ins and click Go:

Once we do, the Add-ins window will appear. There, we will click on the Analysis ToolPak box, and click OK:

  1. Access Data Analysis ToolPak

Once we have the TookPak enabled, we should be able to see the Data Analysis option located in the Data tab on the Excel ribbon:

  1. Perform Regression analysis and interpret the results

For the final part, you do the regression and then see what you got.

Excel Data Analysis with Linear Regression in Example

For our purposes, we will create the table with hours spent on learning and student’s exam scores:

Since we all did steps 1 to 3, we will go to the Data tab, and then click on the Data Analysis. Once we do, we will have a new window on which we will choose Regression as an option:

We will click OK and will be presented with a Regression window. There, we will click on the Input Y_Range and then choose range B2:B7 (exam scores) as our Y Range, and on Input X_Range choose range A2:A7 (study hours) as our X range.

We will also click on the Labels checkbox as we have headers:

When we click OK, Excel will create another sheet with the regression analysis done:

We will have information such as coefficients, standard errors, t-values, p-values, and the R-squared value.

In our example, we wanted to see if there was any correlation between the hours that were put into work, and the scores that students achieved.

The coefficient represents the change in the exam score for every additional hour that students put to work. The p-value is associated with the coefficient and will tell us if the relationship is statistically significant.

R-squared value can give us a good idea about the fit of the model and the data. A value that is close to 1 indicated a good fit, and a value that is close to 0 indicates a poor fit.

In our example, we have a coefficient of 5, which means that, on average, for each additional study hour, the exam score increases by 5 points. Our p-value is smaller than 0.05, meaning that the relationship between the score that students can achieve, and study hours is significant.

This example is fairly simple, and the correlation can be seen by looking at the original table. In real-world scenarios, datasets would be larger, and the models would be more complex. However, the steps that were described above can be applied to these as well.