Skip to content

Linear regression options and insights using Python and PyShiny

Notifications You must be signed in to change notification settings

denisecase/python-ml-linear-regression

Repository files navigation

PyShiny Linear Regression Comparison App

This repo (python-ml-linear-regression) contains a PyShiny app for exploring different linear regression methods using Python. The app allows you to select and compare the performance of two linear regression methods on different datasets (with and without outliers), and visualize the results.

Try the App

The app may take a while to load.

Why Do We Square Residuals in Linear Regression?

About the App

This app demonstrates how different linear regression methods perform on two datasets:

  • Dataset 1 – Clean data without outliers.
  • Dataset 2 – Data with outliers to observe the impact on model performance.

How to Use

  • Select two different regression methods from the dropdowns to compare performance.
  • Select the dataset - start with Dataset 1 (without outliers) then switch to Dataset 2 (with outliers).
  • Compare the results shown side-by-side.

Edit the Code: Comparing Linear Regression Options

  1. Go to the PyShiny Playground.
  2. Run the Plotly Example.
  3. Update the PyShiny code:
  • Edit 1. Replace app.py.
  • Edit 2. Replace reqirements.txt.
  • Edit 3. Add an additional file, utils.py.

Edit 1. Replace app.py

  • Copy From: Click on app.py above. Select all the code (CTRL A if Windows or CMD A if Mac).
  • Copy the code to your clipboard (CTRL C if Windows, or CMD C if Mac).
  • Copy To: Click in the Playground app.py tab. Select all the code in the app.py playground example (CTRL A if Windows or CMD A if Mac).
  • Paste the code from your clipboard (app.py as shown in this repo) into app.py in the Playground (CTRL V if Windows, or CMD V if Mac).
  • Verify the code copied correctly.

Edit 2. Replace requirements.txt

  • Copy From: Click on requirements.txt above. Select all the code.
  • Copy the code to your clipboard.
  • Copy To: Click in the Playground requirements.txt tab. Select all the requirements.txt code in the playground example.
  • Paste the code from your clipboard (requirements.txt as shown in this repo) into requirements.txt in the Playground.
  • Verify the code copied correctly.

Edit 3. Add utils.py

  • Copy From: Click on utils.py above. Select all the code.
  • Copy the code to your clipboard.
  • Add new: Click in the Playground code window. Add a new code file. Name the file exactly utils.py. Spelling and capitalization should be exact.
  • Paste the code from your clipboard (utils.py as shown in this repo) into your new utils.py file in the Playground.
  • Verify the code copied correctly.

Click the run arrow (⏵) to see the updated app.


Example: Comparing Linear Regression Options

Acknowledgements

Many thanks to the creators, contributors, and maintainors for making these powerful tools available for free:

  • Python
  • PyShiny
  • GitHub
  • pandas
  • plotly
  • numpy
  • scipy
  • statsmodels
  • scikit-learn

Screenshot (Dataset 1)

Example 2

Screenshot (Dataset 2)

Example 2

Historical Note

Back in his 1970 book, "Statistical Problems and How To Solve Them", L.H. Longley-Cook noted (as a footnote on page 153):

"The least squares line is not completely satisfactory because it gives too great a weight to extreme values. Some writers have proposed the least distance line, when |D1| + |D2| + |D3| + |D4| + ... is a minimum. |D| is the distance taken as positive in each case. In the past, it has been difficult to calculate this line because of the sign problem, but this can be overcome with modern computers."

That was 50 years ago. Why are we still avoiding "difficult" absolute value calculations by squaring?

Experiment

Add additional datasets to see how the different models perform.

About

Linear regression options and insights using Python and PyShiny

Topics

Resources

Stars

Watchers

Forks

Languages