Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Didn't find make_genome_memmap.py script #11

Open
JiayiJennie opened this issue Dec 29, 2024 · 2 comments
Open

Didn't find make_genome_memmap.py script #11

JiayiJennie opened this issue Dec 29, 2024 · 2 comments

Comments

@JiayiJennie
Copy link

JiayiJennie commented Dec 29, 2024

Hi!

This is an interesting work!

When reproducing results, I want to find data/promoter_design/make_genome_memmap.py to generate .mmap file for human reference when generating promoter sequence, but didn't find it. Thanks!

@raghuramdr
Copy link

Hi, great work. Seconding the original question. The file data/promoter_design/make_genome_memmap.py is not present. Can you please upload it to reproduce the results? Cheers!

@PavelAvdeyev
Copy link
Collaborator

Dear all,

Solution 1

I added make_genome_memmap.py file into directory external.py. make_genome_memmap.py requires selene_utils.py file. The usage of this file is the following:

python make_genome_memmap.py <path_to_fasta>

The script creates file with one-hot encoding of a reference genome provided in fasta file.

For more details, one can visit this repository:

To speed up genome sequence retrieval, the training scripts use a memory-mapped genome file. Run python misc/make_genome_memmap.py before you use the training scripts (47Gb space needed).

Solution 2

Alternative solution is not to use a memory-mapped genome. For doing so, one can replace code of genome dataloader from:

        self.genome = MemmapGenome(
            input_path=config.ref_file,
            memmapfile=config.ref_file_mmap,
            blacklist_regions='hg38'
        )

to

         self.genome = MemmapGenome(
            input_path=config.ref_file,
            memmapfile=None,
            blacklist_regions='hg38'
        )

See lines 140-143 in file: train_promoter_designer.py

Hopefully, it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants