This repo (python-ml-linear-regression) contains a PyShiny app for exploring different linear regression methods using Python. The app allows you to select and compare the performance of two linear regression methods on different datasets (with and without outliers), and visualize the results.
The app may take a while to load.
This app demonstrates how different linear regression methods perform on two datasets:
- Dataset 1 – Clean data without outliers.
- Dataset 2 – Data with outliers to observe the impact on model performance.
How to Use
- Select two different regression methods from the dropdowns to compare performance.
- Select the dataset - start with Dataset 1 (without outliers) then switch to Dataset 2 (with outliers).
- Compare the results shown side-by-side.
- Go to the PyShiny Playground.
- Run the Plotly Example.
- Update the PyShiny code:
- Edit 1. Replace app.py.
- Edit 2. Replace reqirements.txt.
- Edit 3. Add an additional file, utils.py.
- Copy From: Click on app.py above. Select all the code (CTRL A if Windows or CMD A if Mac).
- Copy the code to your clipboard (CTRL C if Windows, or CMD C if Mac).
- Copy To: Click in the Playground app.py tab. Select all the code in the app.py playground example (CTRL A if Windows or CMD A if Mac).
- Paste the code from your clipboard (app.py as shown in this repo) into app.py in the Playground (CTRL V if Windows, or CMD V if Mac).
- Verify the code copied correctly.
- Copy From: Click on requirements.txt above. Select all the code.
- Copy the code to your clipboard.
- Copy To: Click in the Playground requirements.txt tab. Select all the requirements.txt code in the playground example.
- Paste the code from your clipboard (requirements.txt as shown in this repo) into requirements.txt in the Playground.
- Verify the code copied correctly.
- Copy From: Click on utils.py above. Select all the code.
- Copy the code to your clipboard.
- Add new: Click in the Playground code window. Add a new code file. Name the file exactly utils.py. Spelling and capitalization should be exact.
- Paste the code from your clipboard (utils.py as shown in this repo) into your new utils.py file in the Playground.
- Verify the code copied correctly.
Click the run arrow (⏵) to see the updated app.
Example: Comparing Linear Regression Options
Many thanks to the creators, contributors, and maintainors for making these powerful tools available for free:
- Python
- PyShiny
- GitHub
- pandas
- plotly
- numpy
- scipy
- statsmodels
- scikit-learn
Back in his 1970 book, "Statistical Problems and How To Solve Them", L.H. Longley-Cook noted (as a footnote on page 153):
"The least squares line is not completely satisfactory because it gives too great a weight to extreme values. Some writers have proposed the least distance line, when |D1| + |D2| + |D3| + |D4| + ... is a minimum. |D| is the distance taken as positive in each case. In the past, it has been difficult to calculate this line because of the sign problem, but this can be overcome with modern computers."
That was 50 years ago. Why are we still avoiding "difficult" absolute value calculations by squaring?
Add additional datasets to see how the different models perform.