COF Database project

Data and context can be found on this paper: https://pubs.acs.org/doi/epdf/10.1021/acs.chemmater.8b01425?ref=article_openPDF

-- Project Status: [On-hold]

Project Intro/ Objective

This project attempts to find insights and predict methane uptake capacity of covalent organic frameworks via a regression model.

Project Description

First, I wanted to visualise the data to understand the trends and outliers. This includes:

a report of min, max and all categorical variables
boxplots of continuous values
histograms of discrete values

Then, the data was visualised using a sns.relplot() to show the relationship of predictors to the response (y = AbsMU_high_P_[molec/unit_cell]) and color-coded by bond types.

The data was then organised into X and y and using a random forest to find feature importance based on mean decrease of purity. This was done to reduce the dimensionality from p=1116. A threshold of 0.001 was used to chose important features, with supercell volume being the most important.

Many algorithms were assessed for selection. Algorithms (from sklearn) were trialed using default parameters with RepeatedKFold cross-validation (n_splits = 5, n_repeats = 10) include:

Linear Regression
Decision Tree
ensemble methods:
- Random Forest
- AdaBoost
- Bagging
- GradientBoosting
- XGBoost <-- using the XGBoost library
SVR
KNeighbors

Evalution of each algorithm includes:

metrics: Averages, train and validation scores printed
- mean_absolute_error
- mean_square_error
- root_mean_square_error
Plots
- Learning Curves (scoring = RMSE)
- Prediction plots of simulated data and predicted data

Random Forest had the best performance so was this algorithm was selected. Hyperparameter tuning using Optuna evaluated on held-out test set.

Future Work

Do a more in-depth search with classification:
- multi-nomial classification of qualitative values
  - bond_type (K=5)
  - parent network (K=309)
- evaluation of 2D and 3D COF
- unsupervised learning
  - clustering

Objective

Curate large dataset
Trained ML algorithm to predict target property
Select optimal algorithm for material representation
Validate algorithm
Developed an assessment protocol informed by construction of model

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
cof_regression.ipynb		cof_regression.ipynb
final_model_prediction.png		final_model_prediction.png
properties.tgz		properties.tgz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COF Database project

-- Project Status: [On-hold]

Project Intro/ Objective

Project Description

Future Work

Objective

About

Releases

Packages

Languages

mjdoom16/COF_Database_Project

Folders and files

Latest commit

History

Repository files navigation

COF Database project

-- Project Status: [On-hold]

Project Intro/ Objective

Project Description

Future Work

Objective

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages