Skip to content

Commit 4ca604e

Browse files
authored
refactor tutorial notebooks to default include nft minting (#579)
1 parent 74296bb commit 4ca604e

11 files changed

+1104
-994
lines changed

docs/docs/tutorials/protein-folding-nft-minting.md

-11
This file was deleted.

docs/docs/tutorials/protein-folding.md

+67-37
Original file line numberDiff line numberDiff line change
@@ -6,38 +6,41 @@ sidebar_position: 2
66

77
import OpenInColab from '../../src/components/OpenInColab.js';
88

9-
<OpenInColab link="https://colab.research.google.com/drive/1AfnJ50Ei4_9KXdKgexwdmEiwwDDXsfWJ?usp=sharing"></OpenInColab>
9+
<OpenInColab link="https://colab.research.google.com/drive/1312M2VOx_YpTFgy60ZYChgR9h3a7aorr?usp=sharing"></OpenInColab>
1010

1111
## Protein folding in silico
1212

13-
In this tutorial, we perform protein folding with PLEX.
13+
In this tutorial we perform protein folding with **plex**.
1414

15-
There are multiple reasons we believe PLEX is a new standard for computational biology 🧫:
16-
1. With a simple python interface, running containerised tools with your data is only a few commands away
17-
2. The infrastructure of the compute network is fully open source - use the public network or work with us to set up your own node
18-
3. Every event on the compute network is tracked - no more results are lost in an interactive compute session. You can base your decisions and publications on fully reproducible results.
19-
4. We made adding new tools to the network as easy as possible - moving your favorite tool to PLEX is one JSON document away.
15+
There are multiple reasons we believe plex is a new standard for computational biology 🧫:
16+
1. with a simple python interface, running containerised tools with your data is only a few commands away
17+
2. the infrastructure of the compute network is fully open source - use the public network or work with us to set up your own node
18+
3. every event on the compute network is tracked - no more results are lost in an interactive compute session. You can base your decisions and publications on fully reproducible results.
19+
4. we made adding new tools to the network as easy as possible - moving your favorite tool to plex is one JSON document away.
2020

21-
We'll walk through an example of how to use PLEX to predict a protein's 3D structure using [ColabFold](https://www.nature.com/articles/s41592-022-01488-1). We will use the sequence of the Streptavidin protein for this demo.
21+
In this tutorial, we'll walk through an example of how to use plex to predict a protein's 3D structure using [ColabFold](https://www.nature.com/articles/s41592-022-01488-1). We will use the sequence of the Streptavidin protein for this demo.
2222

23-
![img](../../static/img/protein-folding-graphic.png)
23+
We will also walk through the process of minting a ProofOfScience NFT. These tokens represent on-chain, verifiable records of the compute job and its input/output data. This enables reproducible scientific results.
2424

25-
## Install PLEX
25+
![protein-folding-graphic](../../static/img/protein-folding-graphic.png)
26+
27+
## Install plex
2628

2729

2830
```python
2931
!pip install PlexLabExchange
3032
```
3133

3234
Collecting PlexLabExchange
33-
Downloading PlexLabExchange-0.8.18-py3-none-manylinux2014_x86_64.whl (26.9 MB)
34-
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.9/26.9 MB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
35+
Downloading PlexLabExchange-0.8.20-py3-none-manylinux2014_x86_64.whl (26.9 MB)
36+
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.9/26.9 MB[0m [31m16.6 MB/s[0m eta [36m0:00:00[0m
3537
[?25hInstalling collected packages: PlexLabExchange
36-
Successfully installed PlexLabExchange-0.8.18
38+
Successfully installed PlexLabExchange-0.8.20
3739

3840

3941
Then, create a directory where we can save our project files.
4042

43+
4144
```python
4245
import os
4346

@@ -52,69 +55,81 @@ dir_path = f"{cwd}/project"
5255
We'll download a `.fasta` file containing the sequence of the protein we want to fold. Here, we're using the sequence of Streptavidin.
5356

5457

58+
59+
5560
```python
5661
!wget https://rest.uniprot.org/uniprotkb/P22629.fasta -O {dir_path}/P22629.fasta # Streptavidin
5762
```
5863

59-
--2023-08-01 21:39:21-- https://rest.uniprot.org/uniprotkb/P22629.fasta
64+
--2023-08-08 18:49:21-- https://rest.uniprot.org/uniprotkb/P22629.fasta
6065
Resolving rest.uniprot.org (rest.uniprot.org)... 193.62.193.81
6166
Connecting to rest.uniprot.org (rest.uniprot.org)|193.62.193.81|:443... connected.
6267
HTTP request sent, awaiting response... 200 OK
63-
Length: 264 [text/plain]
68+
Length: unspecified [text/plain]
6469
Saving to: ‘/content/project/P22629.fasta’
65-
66-
/content/project/P2 100%[===================>] 264 --.-KB/s in 0s
67-
68-
2023-08-01 21:39:21 (144 MB/s) - ‘/content/project/P22629.fasta’ saved [264/264]
70+
71+
/content/project/P2 [ <=> ] 264 --.-KB/s in 0s
72+
73+
2023-08-08 18:49:21 (157 MB/s) - ‘/content/project/P22629.fasta’ saved [264]
74+
6975

7076

7177
## Fold the protein
7278

7379
With the sequence downloaded, we can now use ColabFold to fold the protein.
7480

81+
82+
83+
7584
```python
76-
from plex import CoreTools, plex_create
85+
from plex import CoreTools, plex_init
7786

78-
initial_io_cid = plex_create(CoreTools.COLABFOLD_MINI.value, dir_path)
87+
fasta_local_filepaths = [f"{dir_path}/P22629.fasta"]
88+
89+
initial_io_cid = plex_init(
90+
CoreTools.COLABFOLD_MINI.value,
91+
sequence=fasta_local_filepaths
92+
)
7993
```
8094

95+
plex init -t QmcRH74qfqDBJFku3mEDGxkAf6CSpaHTpdbe1pMkHnbcZD -i {"sequence": ["/content/project/P22629.fasta"]} --scatteringMethod=dotProduct
8196
Plex version (v0.8.4) up to date.
82-
Temporary directory created: /tmp/9ed8c638-c1b0-43da-bf92-7f054517d45c2889128719
83-
Reading tool config: QmcRH74qfqDBJFku3mEDGxkAf6CSpaHTpdbe1pMkHnbcZD
84-
Creating IO entries from input directory: /content/project
85-
Initialized IO file at: /tmp/9ed8c638-c1b0-43da-bf92-7f054517d45c2889128719/io.json
86-
Initial IO JSON file CID: QmUhysTE4aLZNw2ePRMCxHWko868xmQoXnGP25fKM1aofb
97+
Pinned IO JSON CID: QmZgLQypfjvK9kTsqLXwbNRiFifEU5CC7eduWWPbminybi
98+
8799

88100
This code initiates the folding process. We'll need to run it to complete the operation.
89101

102+
90103
```python
91104
from plex import plex_run
92105

93106
completed_io_cid, completed_io_filepath = plex_run(initial_io_cid, dir_path)
94107
```
95108

96109
Plex version (v0.8.4) up to date.
97-
Created working directory: /content/project/2ef79c16-6f59-4e44-aea7-c39db85280cb
98-
Initialized IO file at: /content/project/2ef79c16-6f59-4e44-aea7-c39db85280cb/io.json
110+
Created working directory: /content/project/9102a179-ac65-4823-9a03-93766ea32671
111+
Initialized IO file at: /content/project/9102a179-ac65-4823-9a03-93766ea32671/io.json
99112
Processing IO Entries
100113
Starting to process IO entry 0
101114
Job running...
102-
Bacalhau job id: 476d232b-e1c6-42d6-b1c0-2f4d237244b1
103-
115+
Bacalhau job id: 271f4b64-cb2d-4be6-86af-ed16186e69e0
116+
104117
Computing default go-libp2p Resource Manager limits based on:
105118
- 'Swarm.ResourceMgr.MaxMemory': "6.8 GB"
106119
- 'Swarm.ResourceMgr.MaxFileDescriptors': 524288
107-
120+
108121
Applying any user-supplied overrides on top.
109122
Run 'ipfs swarm limit all' to see the resulting limits.
110-
123+
111124
Success processing IO entry 0
112-
Finished processing, results written to /content/project/2ef79c16-6f59-4e44-aea7-c39db85280cb/io.json
125+
Finished processing, results written to /content/project/9102a179-ac65-4823-9a03-93766ea32671/io.json
113126
Completed IO JSON CID: QmdnjMsUar6nTqGwgjCwN1Fyjaan4i3zyht9SE9L235YRm
127+
2023/08/08 18:51:17 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
128+
129+
130+
After the job is complete, we can retrieve and view the results. The state of each object is written in a JSON object. Every file has a unique content-address.
114131

115-
## Viewing the results
116132

117-
After the job is complete, we can retrieve and view the results. The state of each object is written in a JSON object. Every file has a unique content address.
118133

119134

120135
```python
@@ -181,6 +196,21 @@ with open(completed_io_filepath, 'r') as f:
181196
}
182197
]
183198

184-
The output is a JSON file with information about the folded protein structures. This can be used for further analysis, visualization, and more.
185199

186-
<OpenInColab link="https://colab.research.google.com/drive/1AfnJ50Ei4_9KXdKgexwdmEiwwDDXsfWJ?usp=sharing"></OpenInColab>
200+
The results can also be viewed using an IPFS gateway. Below, the state of the IO JSON is read using the ipfs.io gateway.
201+
202+
**Note:** Depending on how long it takes for the results to propagate to the ipfs.io nodes, the data may not be available immediately. The results can also be viewed on IPFS Desktop or by accessing IPFS through the Brave browser (ipfs://completed_io_cid)
203+
204+
205+
```python
206+
print(f"View this result on IPFS: https://ipfs.io/ipfs/{completed_io_cid}")
207+
```
208+
209+
View this result on IPFS: https://ipfs.io/ipfs/QmdnjMsUar6nTqGwgjCwN1Fyjaan4i3zyht9SE9L235YRm
210+
211+
212+
## Visualization and NFT minting
213+
214+
For visualization and NFT minting steps, please visit the Colab notebook below.
215+
216+
<OpenInColab link="https://colab.research.google.com/drive/1312M2VOx_YpTFgy60ZYChgR9h3a7aorr?usp=sharing"></OpenInColab>

docs/docs/tutorials/small-molecule-binding.md

+93-27
Original file line numberDiff line numberDiff line change
@@ -8,72 +8,123 @@ import OpenInColab from '../../src/components/OpenInColab.js';
88

99
<OpenInColab link="https://colab.research.google.com/drive/15nZrm5k9fMdAHfzpR1g_8TPIz9qgRoys?usp=sharing"></OpenInColab>
1010

11-
## Small molecule binding in silico
11+
## Small molecule docking with plex
1212

13-
Small molecule binding is a fundamental aspect of drug discovery, facilitating the interaction of potential drugs with target proteins. With PLEX, this intricate process is simplified and made efficient.
13+
In this tutorial we perform small molecule docking with **plex**.
1414

15-
In the following tutorial, we illustrate how PLEX can be used to conduct small molecule binding studies to explore potential drug interactions with proteins. We demonstrate this with [Equibind](https://hannes-stark.com/assets/EquiBind.pdf).
15+
There are multiple reasons we believe plex is a new standard for computational biology 🧫:
16+
1. with a simple python interface, running containerised tools with your data is only a few commands away
17+
2. the infrastructure of the compute network is fully open source - use the public network or work with us to set up your own node
18+
3. every event on the compute network is tracked - no more results are lost in an interactive compute session. You can base your decisions and publications on fully reproducible results.
19+
4. we made adding new tools to the network as easy as possible - moving your favorite tool to PLEX is one JSON document away.
1620

17-
![small-molecule-binding](../../static/img/small-molecule-binding-graphic.png)
21+
In the following tutorial, we illustrate how plex can be used to conduct small molecule binding studies to explore potential drug interactions with proteins. We demonstrate this with [Equibind](https://hannes-stark.com/assets/EquiBind.pdf).
1822

19-
## Install PLEX
23+
We will also walk through the process of minting a ProofOfScience NFT. These tokens represent on-chain, verifiable records of the compute job and its input/output data. This enables reproducible scientific results.
2024

21-
We first install the plex pip package.
25+
![docking-graphic](../../static/img/small-molecule-binding-graphic.png)
26+
27+
## Install plex
2228

2329

2430
```python
2531
!pip install PlexLabExchange
2632
```
2733

2834
Collecting PlexLabExchange
29-
Downloading PlexLabExchange-0.8.18-py3-none-manylinux2014_x86_64.whl (26.9 MB)
30-
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.9/26.9 MB[0m [31m19.2 MB/s[0m eta [36m0:00:00[0m
35+
Downloading PlexLabExchange-0.8.20-py3-none-manylinux2014_x86_64.whl (26.9 MB)
36+
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.9/26.9 MB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
3137
[?25hInstalling collected packages: PlexLabExchange
32-
Successfully installed PlexLabExchange-0.8.18
38+
Successfully installed PlexLabExchange-0.8.20
3339

3440

35-
## Load small molecule and protein data
41+
Then, create a directory where we can save our project files.
3642

37-
Next, we need to load the data about the small molecule and the protein that we're studying. This data, which is available on IPFS, will be used to initialize an IO JSON. This JSON file will serve as the job instructions for our binding study.
3843

3944
```python
40-
small_molecule_path = ["QmV6qVzdQLNM6SyEDB3rJ5R5BYJsQwQTn1fjmPzvCCkCYz/ZINC000003986735.sdf"]
41-
protein_path = ["QmUWCBTqbRaKkPXQ3M14NkUuM4TEwfhVfrqLNoBB7syyyd/7n9g.pdb"]
45+
import os
46+
47+
cwd = os.getcwd()
48+
!mkdir project
49+
50+
dir_path = f"{cwd}/project"
4251
```
4352

53+
## Download small molecule and protein data
54+
55+
We'll download the small molecule `.sdf` and protein `.pdb` we want to dock with Equibind.
56+
57+
58+
```python
59+
# small molecule
60+
!wget https://raw.githubusercontent.com/labdao/plex/main/testdata/binding/abl/ZINC000003986735.sdf -O {dir_path}/ZINC000003986735.sdf
61+
# protein
62+
!wget https://raw.githubusercontent.com/labdao/plex/main/testdata/binding/abl/7n9g.pdb -O {dir_path}/7n9g.pdb
63+
```
64+
65+
--2023-08-08 18:56:14-- https://raw.githubusercontent.com/labdao/plex/main/testdata/binding/abl/ZINC000003986735.sdf
66+
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
67+
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
68+
HTTP request sent, awaiting response... 200 OK
69+
Length: 2967 (2.9K) [text/plain]
70+
Saving to: ‘/content/project/ZINC000003986735.sdf’
71+
72+
/content/project/ZI 100%[===================>] 2.90K --.-KB/s in 0s
73+
74+
2023-08-08 18:56:14 (47.2 MB/s) - ‘/content/project/ZINC000003986735.sdf’ saved [2967/2967]
75+
76+
--2023-08-08 18:56:14-- https://raw.githubusercontent.com/labdao/plex/main/testdata/binding/abl/7n9g.pdb
77+
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
78+
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
79+
HTTP request sent, awaiting response... 200 OK
80+
Length: 580284 (567K) [text/plain]
81+
Saving to: ‘/content/project/7n9g.pdb’
82+
83+
/content/project/7n 100%[===================>] 566.68K --.-KB/s in 0.05s
84+
85+
2023-08-08 18:56:14 (12.1 MB/s) - ‘/content/project/7n9g.pdb’ saved [580284/580284]
86+
87+
88+
89+
## Small molecule docking
90+
91+
With the small molecule and protein files downloaded, we can now use Equibind to run a docking simulation.
92+
4493

4594
```python
4695
from plex import CoreTools, plex_init
4796

97+
protein_path = [f"{dir_path}/7n9g.pdb"]
98+
small_molecule_path = [f"{dir_path}/ZINC000003986735.sdf"]
99+
48100
initial_io_cid = plex_init(
49101
CoreTools.EQUIBIND.value,
50102
protein=protein_path,
51-
small_molecule=small_molecule_path
103+
small_molecule=small_molecule_path,
52104
)
53105
```
54106

55-
plex init -t QmZ2HarAgwZGjc3LBx9mWNwAQkPWiHMignqKup1ckp8NhB -i {"protein": ["QmUWCBTqbRaKkPXQ3M14NkUuM4TEwfhVfrqLNoBB7syyyd/7n9g.pdb"], "small_molecule": ["QmV6qVzdQLNM6SyEDB3rJ5R5BYJsQwQTn1fjmPzvCCkCYz/ZINC000003986735.sdf"]} --scatteringMethod=dotProduct
56-
Plex version (v0.8.3) up to date.
107+
plex init -t QmZ2HarAgwZGjc3LBx9mWNwAQkPWiHMignqKup1ckp8NhB -i {"protein": ["/content/project/7n9g.pdb"], "small_molecule": ["/content/project/ZINC000003986735.sdf"]} --scatteringMethod=dotProduct
108+
Plex version (v0.8.4) up to date.
57109
Pinned IO JSON CID: QmShD7ApeDBUqqy98RuuKdyv8AdmBsvyZqqxSLAEvB9EKP
58110

59111

60-
## Dock the small molecule and protein using Equibind
112+
This code initiates the docking process. We'll need to run it to complete the operation.
61113

62-
Now that we've prepared our job instructions, we're ready to dock the small molecule and protein using Equibind. With the IO JSON created and pinned to IPFS, we submit the job to the LabDAO Bacalhau cluster for computation.
63114

64115
```python
65116
from plex import plex_run
66117

67-
completed_io_cid, io_local_filepath = plex_run(initial_io_cid)
118+
completed_io_cid, io_local_filepath = plex_run(initial_io_cid, dir_path)
68119
```
69120

70-
Plex version (v0.8.3) up to date.
71-
Created working directory: /jobs/3f9b386d-a74d-463c-8ca6-a882d053c866
72-
Initialized IO file at: /jobs/3f9b386d-a74d-463c-8ca6-a882d053c866/io.json
121+
Plex version (v0.8.4) up to date.
122+
Created working directory: /content/project/2e3a8afd-928d-4fb7-a381-fff63c7d51de
123+
Initialized IO file at: /content/project/2e3a8afd-928d-4fb7-a381-fff63c7d51de/io.json
73124
Processing IO Entries
74125
Starting to process IO entry 0
75126
Job running...
76-
Bacalhau job id: a292c5fc-a717-47d5-a5b4-4d3401670a4f
127+
Bacalhau job id: 892bf30d-7f6d-4cc7-a490-c1fa17d82171
77128

78129
Computing default go-libp2p Resource Manager limits based on:
79130
- 'Swarm.ResourceMgr.MaxMemory': "6.8 GB"
@@ -83,13 +134,12 @@ completed_io_cid, io_local_filepath = plex_run(initial_io_cid)
83134
Run 'ipfs swarm limit all' to see the resulting limits.
84135

85136
Success processing IO entry 0
86-
Finished processing, results written to /jobs/3f9b386d-a74d-463c-8ca6-a882d053c866/io.json
137+
Finished processing, results written to /content/project/2e3a8afd-928d-4fb7-a381-fff63c7d51de/io.json
87138
Completed IO JSON CID: QmVG4mT2kkPSb6wzT5QxYZndB5VbKLU8nH2dErZW2zxae6
139+
2023/08/08 18:56:21 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
88140

89141

90-
## Viewing the results
91-
92-
The final step is to view our results. We read in the IO JSON file that contains the output from our job and print it. This data includes the best docked small molecule and the protein used, each with their own IPFS CIDs.
142+
After the job is complete, we can retrieve and view the results. The state of each object is written in a JSON object. Every file has a unique content-address.
93143

94144

95145
```python
@@ -136,6 +186,22 @@ with open(io_local_filepath, 'r') as f:
136186
}
137187
]
138188

189+
139190
This output provides us with key information about the small molecule-protein interaction. The "best_docked_small_molecule" represents the most likely interaction between the protein and the small molecule, which can inform subsequent analysis and experiments.
140191

192+
The results can also be viewed using an IPFS gateway. Below, the state of the IO JSON is read using the ipfs.io gateway.
193+
194+
**Note:** Depending on how long it takes for the results to propagate to the ipfs.io nodes, the data may not be available immediately. The results can also be viewed on IPFS Desktop or by accessing IPFS through the Brave browser (ipfs://completed_io_cid)
195+
196+
197+
```python
198+
print(f"View this result on IPFS: https://ipfs.io/ipfs/{completed_io_cid}")
199+
```
200+
201+
View this result on IPFS: https://ipfs.io/ipfs/QmVG4mT2kkPSb6wzT5QxYZndB5VbKLU8nH2dErZW2zxae6
202+
203+
## Visualization and NFT minting
204+
205+
For visualization and NFT minting steps, please visit the Colab notebook below.
206+
141207
<OpenInColab link="https://colab.research.google.com/drive/15nZrm5k9fMdAHfzpR1g_8TPIz9qgRoys?usp=sharing"></OpenInColab>

0 commit comments

Comments
 (0)