Skip to content

refactor tutorial notebooks to default include nft minting #579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 0 additions & 11 deletions docs/docs/tutorials/protein-folding-nft-minting.md

This file was deleted.

104 changes: 67 additions & 37 deletions docs/docs/tutorials/protein-folding.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,38 +6,41 @@ sidebar_position: 2

import OpenInColab from '../../src/components/OpenInColab.js';

<OpenInColab link="https://colab.research.google.com/drive/1AfnJ50Ei4_9KXdKgexwdmEiwwDDXsfWJ?usp=sharing"></OpenInColab>
<OpenInColab link="https://colab.research.google.com/drive/1312M2VOx_YpTFgy60ZYChgR9h3a7aorr?usp=sharing"></OpenInColab>

## Protein folding in silico

In this tutorial, we perform protein folding with PLEX.
In this tutorial we perform protein folding with **plex**.

There are multiple reasons we believe PLEX is a new standard for computational biology 🧫:
1. With a simple python interface, running containerised tools with your data is only a few commands away
2. The infrastructure of the compute network is fully open source - use the public network or work with us to set up your own node
3. Every event on the compute network is tracked - no more results are lost in an interactive compute session. You can base your decisions and publications on fully reproducible results.
4. We made adding new tools to the network as easy as possible - moving your favorite tool to PLEX is one JSON document away.
There are multiple reasons we believe plex is a new standard for computational biology 🧫:
1. with a simple python interface, running containerised tools with your data is only a few commands away
2. the infrastructure of the compute network is fully open source - use the public network or work with us to set up your own node
3. every event on the compute network is tracked - no more results are lost in an interactive compute session. You can base your decisions and publications on fully reproducible results.
4. we made adding new tools to the network as easy as possible - moving your favorite tool to plex is one JSON document away.

We'll walk through an example of how to use PLEX to predict a protein's 3D structure using [ColabFold](https://www.nature.com/articles/s41592-022-01488-1). We will use the sequence of the Streptavidin protein for this demo.
In this tutorial, we'll walk through an example of how to use plex to predict a protein's 3D structure using [ColabFold](https://www.nature.com/articles/s41592-022-01488-1). We will use the sequence of the Streptavidin protein for this demo.

![img](../../static/img/protein-folding-graphic.png)
We will also walk through the process of minting a ProofOfScience NFT. These tokens represent on-chain, verifiable records of the compute job and its input/output data. This enables reproducible scientific results.

## Install PLEX
![protein-folding-graphic](../../static/img/protein-folding-graphic.png)

## Install plex


```python
!pip install PlexLabExchange
```

Collecting PlexLabExchange
Downloading PlexLabExchange-0.8.18-py3-none-manylinux2014_x86_64.whl (26.9 MB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26.9/26.9 MB 20.1 MB/s eta 0:00:00
Downloading PlexLabExchange-0.8.20-py3-none-manylinux2014_x86_64.whl (26.9 MB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26.9/26.9 MB 16.6 MB/s eta 0:00:00
[?25hInstalling collected packages: PlexLabExchange
Successfully installed PlexLabExchange-0.8.18
Successfully installed PlexLabExchange-0.8.20


Then, create a directory where we can save our project files.


```python
import os

Expand All @@ -52,69 +55,81 @@ dir_path = f"{cwd}/project"
We'll download a `.fasta` file containing the sequence of the protein we want to fold. Here, we're using the sequence of Streptavidin.




```python
!wget https://rest.uniprot.org/uniprotkb/P22629.fasta -O {dir_path}/P22629.fasta # Streptavidin
```

--2023-08-01 21:39:21-- https://rest.uniprot.org/uniprotkb/P22629.fasta
--2023-08-08 18:49:21-- https://rest.uniprot.org/uniprotkb/P22629.fasta
Resolving rest.uniprot.org (rest.uniprot.org)... 193.62.193.81
Connecting to rest.uniprot.org (rest.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 264 [text/plain]
Length: unspecified [text/plain]
Saving to: ‘/content/project/P22629.fasta’

/content/project/P2 100%[===================>] 264 --.-KB/s in 0s

2023-08-01 21:39:21 (144 MB/s) - ‘/content/project/P22629.fasta’ saved [264/264]

/content/project/P2 [ <=> ] 264 --.-KB/s in 0s

2023-08-08 18:49:21 (157 MB/s) - ‘/content/project/P22629.fasta’ saved [264]



## Fold the protein

With the sequence downloaded, we can now use ColabFold to fold the protein.




```python
from plex import CoreTools, plex_create
from plex import CoreTools, plex_init

initial_io_cid = plex_create(CoreTools.COLABFOLD_MINI.value, dir_path)
fasta_local_filepaths = [f"{dir_path}/P22629.fasta"]

initial_io_cid = plex_init(
CoreTools.COLABFOLD_MINI.value,
sequence=fasta_local_filepaths
)
```

plex init -t QmcRH74qfqDBJFku3mEDGxkAf6CSpaHTpdbe1pMkHnbcZD -i {"sequence": ["/content/project/P22629.fasta"]} --scatteringMethod=dotProduct
Plex version (v0.8.4) up to date.
Temporary directory created: /tmp/9ed8c638-c1b0-43da-bf92-7f054517d45c2889128719
Reading tool config: QmcRH74qfqDBJFku3mEDGxkAf6CSpaHTpdbe1pMkHnbcZD
Creating IO entries from input directory: /content/project
Initialized IO file at: /tmp/9ed8c638-c1b0-43da-bf92-7f054517d45c2889128719/io.json
Initial IO JSON file CID: QmUhysTE4aLZNw2ePRMCxHWko868xmQoXnGP25fKM1aofb
Pinned IO JSON CID: QmZgLQypfjvK9kTsqLXwbNRiFifEU5CC7eduWWPbminybi


This code initiates the folding process. We'll need to run it to complete the operation.


```python
from plex import plex_run

completed_io_cid, completed_io_filepath = plex_run(initial_io_cid, dir_path)
```

Plex version (v0.8.4) up to date.
Created working directory: /content/project/2ef79c16-6f59-4e44-aea7-c39db85280cb
Initialized IO file at: /content/project/2ef79c16-6f59-4e44-aea7-c39db85280cb/io.json
Created working directory: /content/project/9102a179-ac65-4823-9a03-93766ea32671
Initialized IO file at: /content/project/9102a179-ac65-4823-9a03-93766ea32671/io.json
Processing IO Entries
Starting to process IO entry 0
Job running...
Bacalhau job id: 476d232b-e1c6-42d6-b1c0-2f4d237244b1

Bacalhau job id: 271f4b64-cb2d-4be6-86af-ed16186e69e0
Computing default go-libp2p Resource Manager limits based on:
- 'Swarm.ResourceMgr.MaxMemory': "6.8 GB"
- 'Swarm.ResourceMgr.MaxFileDescriptors': 524288

Applying any user-supplied overrides on top.
Run 'ipfs swarm limit all' to see the resulting limits.

Success processing IO entry 0
Finished processing, results written to /content/project/2ef79c16-6f59-4e44-aea7-c39db85280cb/io.json
Finished processing, results written to /content/project/9102a179-ac65-4823-9a03-93766ea32671/io.json
Completed IO JSON CID: QmdnjMsUar6nTqGwgjCwN1Fyjaan4i3zyht9SE9L235YRm
2023/08/08 18:51:17 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.


After the job is complete, we can retrieve and view the results. The state of each object is written in a JSON object. Every file has a unique content-address.

## Viewing the results

After the job is complete, we can retrieve and view the results. The state of each object is written in a JSON object. Every file has a unique content address.


```python
Expand Down Expand Up @@ -181,6 +196,21 @@ with open(completed_io_filepath, 'r') as f:
}
]

The output is a JSON file with information about the folded protein structures. This can be used for further analysis, visualization, and more.

<OpenInColab link="https://colab.research.google.com/drive/1AfnJ50Ei4_9KXdKgexwdmEiwwDDXsfWJ?usp=sharing"></OpenInColab>
The results can also be viewed using an IPFS gateway. Below, the state of the IO JSON is read using the ipfs.io gateway.

**Note:** Depending on how long it takes for the results to propagate to the ipfs.io nodes, the data may not be available immediately. The results can also be viewed on IPFS Desktop or by accessing IPFS through the Brave browser (ipfs://completed_io_cid)


```python
print(f"View this result on IPFS: https://ipfs.io/ipfs/{completed_io_cid}")
```

View this result on IPFS: https://ipfs.io/ipfs/QmdnjMsUar6nTqGwgjCwN1Fyjaan4i3zyht9SE9L235YRm


## Visualization and NFT minting

For visualization and NFT minting steps, please visit the Colab notebook below.

<OpenInColab link="https://colab.research.google.com/drive/1312M2VOx_YpTFgy60ZYChgR9h3a7aorr?usp=sharing"></OpenInColab>
120 changes: 93 additions & 27 deletions docs/docs/tutorials/small-molecule-binding.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,72 +8,123 @@ import OpenInColab from '../../src/components/OpenInColab.js';

<OpenInColab link="https://colab.research.google.com/drive/15nZrm5k9fMdAHfzpR1g_8TPIz9qgRoys?usp=sharing"></OpenInColab>

## Small molecule binding in silico
## Small molecule docking with plex

Small molecule binding is a fundamental aspect of drug discovery, facilitating the interaction of potential drugs with target proteins. With PLEX, this intricate process is simplified and made efficient.
In this tutorial we perform small molecule docking with **plex**.

In the following tutorial, we illustrate how PLEX can be used to conduct small molecule binding studies to explore potential drug interactions with proteins. We demonstrate this with [Equibind](https://hannes-stark.com/assets/EquiBind.pdf).
There are multiple reasons we believe plex is a new standard for computational biology 🧫:
1. with a simple python interface, running containerised tools with your data is only a few commands away
2. the infrastructure of the compute network is fully open source - use the public network or work with us to set up your own node
3. every event on the compute network is tracked - no more results are lost in an interactive compute session. You can base your decisions and publications on fully reproducible results.
4. we made adding new tools to the network as easy as possible - moving your favorite tool to PLEX is one JSON document away.

![small-molecule-binding](../../static/img/small-molecule-binding-graphic.png)
In the following tutorial, we illustrate how plex can be used to conduct small molecule binding studies to explore potential drug interactions with proteins. We demonstrate this with [Equibind](https://hannes-stark.com/assets/EquiBind.pdf).

## Install PLEX
We will also walk through the process of minting a ProofOfScience NFT. These tokens represent on-chain, verifiable records of the compute job and its input/output data. This enables reproducible scientific results.

We first install the plex pip package.
![docking-graphic](../../static/img/small-molecule-binding-graphic.png)

## Install plex


```python
!pip install PlexLabExchange
```

Collecting PlexLabExchange
Downloading PlexLabExchange-0.8.18-py3-none-manylinux2014_x86_64.whl (26.9 MB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26.9/26.9 MB 19.2 MB/s eta 0:00:00
Downloading PlexLabExchange-0.8.20-py3-none-manylinux2014_x86_64.whl (26.9 MB)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26.9/26.9 MB 20.1 MB/s eta 0:00:00
[?25hInstalling collected packages: PlexLabExchange
Successfully installed PlexLabExchange-0.8.18
Successfully installed PlexLabExchange-0.8.20


## Load small molecule and protein data
Then, create a directory where we can save our project files.

Next, we need to load the data about the small molecule and the protein that we're studying. This data, which is available on IPFS, will be used to initialize an IO JSON. This JSON file will serve as the job instructions for our binding study.

```python
small_molecule_path = ["QmV6qVzdQLNM6SyEDB3rJ5R5BYJsQwQTn1fjmPzvCCkCYz/ZINC000003986735.sdf"]
protein_path = ["QmUWCBTqbRaKkPXQ3M14NkUuM4TEwfhVfrqLNoBB7syyyd/7n9g.pdb"]
import os

cwd = os.getcwd()
!mkdir project

dir_path = f"{cwd}/project"
```

## Download small molecule and protein data

We'll download the small molecule `.sdf` and protein `.pdb` we want to dock with Equibind.


```python
# small molecule
!wget https://raw.githubusercontent.com/labdao/plex/main/testdata/binding/abl/ZINC000003986735.sdf -O {dir_path}/ZINC000003986735.sdf
# protein
!wget https://raw.githubusercontent.com/labdao/plex/main/testdata/binding/abl/7n9g.pdb -O {dir_path}/7n9g.pdb
```

--2023-08-08 18:56:14-- https://raw.githubusercontent.com/labdao/plex/main/testdata/binding/abl/ZINC000003986735.sdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2967 (2.9K) [text/plain]
Saving to: ‘/content/project/ZINC000003986735.sdf’

/content/project/ZI 100%[===================>] 2.90K --.-KB/s in 0s

2023-08-08 18:56:14 (47.2 MB/s) - ‘/content/project/ZINC000003986735.sdf’ saved [2967/2967]

--2023-08-08 18:56:14-- https://raw.githubusercontent.com/labdao/plex/main/testdata/binding/abl/7n9g.pdb
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 580284 (567K) [text/plain]
Saving to: ‘/content/project/7n9g.pdb’

/content/project/7n 100%[===================>] 566.68K --.-KB/s in 0.05s

2023-08-08 18:56:14 (12.1 MB/s) - ‘/content/project/7n9g.pdb’ saved [580284/580284]



## Small molecule docking

With the small molecule and protein files downloaded, we can now use Equibind to run a docking simulation.


```python
from plex import CoreTools, plex_init

protein_path = [f"{dir_path}/7n9g.pdb"]
small_molecule_path = [f"{dir_path}/ZINC000003986735.sdf"]

initial_io_cid = plex_init(
CoreTools.EQUIBIND.value,
protein=protein_path,
small_molecule=small_molecule_path
small_molecule=small_molecule_path,
)
```

plex init -t QmZ2HarAgwZGjc3LBx9mWNwAQkPWiHMignqKup1ckp8NhB -i {"protein": ["QmUWCBTqbRaKkPXQ3M14NkUuM4TEwfhVfrqLNoBB7syyyd/7n9g.pdb"], "small_molecule": ["QmV6qVzdQLNM6SyEDB3rJ5R5BYJsQwQTn1fjmPzvCCkCYz/ZINC000003986735.sdf"]} --scatteringMethod=dotProduct
Plex version (v0.8.3) up to date.
plex init -t QmZ2HarAgwZGjc3LBx9mWNwAQkPWiHMignqKup1ckp8NhB -i {"protein": ["/content/project/7n9g.pdb"], "small_molecule": ["/content/project/ZINC000003986735.sdf"]} --scatteringMethod=dotProduct
Plex version (v0.8.4) up to date.
Pinned IO JSON CID: QmShD7ApeDBUqqy98RuuKdyv8AdmBsvyZqqxSLAEvB9EKP


## Dock the small molecule and protein using Equibind
This code initiates the docking process. We'll need to run it to complete the operation.

Now that we've prepared our job instructions, we're ready to dock the small molecule and protein using Equibind. With the IO JSON created and pinned to IPFS, we submit the job to the LabDAO Bacalhau cluster for computation.

```python
from plex import plex_run

completed_io_cid, io_local_filepath = plex_run(initial_io_cid)
completed_io_cid, io_local_filepath = plex_run(initial_io_cid, dir_path)
```

Plex version (v0.8.3) up to date.
Created working directory: /jobs/3f9b386d-a74d-463c-8ca6-a882d053c866
Initialized IO file at: /jobs/3f9b386d-a74d-463c-8ca6-a882d053c866/io.json
Plex version (v0.8.4) up to date.
Created working directory: /content/project/2e3a8afd-928d-4fb7-a381-fff63c7d51de
Initialized IO file at: /content/project/2e3a8afd-928d-4fb7-a381-fff63c7d51de/io.json
Processing IO Entries
Starting to process IO entry 0
Job running...
Bacalhau job id: a292c5fc-a717-47d5-a5b4-4d3401670a4f
Bacalhau job id: 892bf30d-7f6d-4cc7-a490-c1fa17d82171

Computing default go-libp2p Resource Manager limits based on:
- 'Swarm.ResourceMgr.MaxMemory': "6.8 GB"
Expand All @@ -83,13 +134,12 @@ completed_io_cid, io_local_filepath = plex_run(initial_io_cid)
Run 'ipfs swarm limit all' to see the resulting limits.

Success processing IO entry 0
Finished processing, results written to /jobs/3f9b386d-a74d-463c-8ca6-a882d053c866/io.json
Finished processing, results written to /content/project/2e3a8afd-928d-4fb7-a381-fff63c7d51de/io.json
Completed IO JSON CID: QmVG4mT2kkPSb6wzT5QxYZndB5VbKLU8nH2dErZW2zxae6
2023/08/08 18:56:21 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.


## Viewing the results

The final step is to view our results. We read in the IO JSON file that contains the output from our job and print it. This data includes the best docked small molecule and the protein used, each with their own IPFS CIDs.
After the job is complete, we can retrieve and view the results. The state of each object is written in a JSON object. Every file has a unique content-address.


```python
Expand Down Expand Up @@ -136,6 +186,22 @@ with open(io_local_filepath, 'r') as f:
}
]


This output provides us with key information about the small molecule-protein interaction. The "best_docked_small_molecule" represents the most likely interaction between the protein and the small molecule, which can inform subsequent analysis and experiments.

The results can also be viewed using an IPFS gateway. Below, the state of the IO JSON is read using the ipfs.io gateway.

**Note:** Depending on how long it takes for the results to propagate to the ipfs.io nodes, the data may not be available immediately. The results can also be viewed on IPFS Desktop or by accessing IPFS through the Brave browser (ipfs://completed_io_cid)


```python
print(f"View this result on IPFS: https://ipfs.io/ipfs/{completed_io_cid}")
```

View this result on IPFS: https://ipfs.io/ipfs/QmVG4mT2kkPSb6wzT5QxYZndB5VbKLU8nH2dErZW2zxae6

## Visualization and NFT minting

For visualization and NFT minting steps, please visit the Colab notebook below.

<OpenInColab link="https://colab.research.google.com/drive/15nZrm5k9fMdAHfzpR1g_8TPIz9qgRoys?usp=sharing"></OpenInColab>
Loading