Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: cannot pickle '_io.TextIOWrapper' object #112

Open
Ivan-vechetti opened this issue Dec 17, 2020 · 7 comments
Open

TypeError: cannot pickle '_io.TextIOWrapper' object #112

Ivan-vechetti opened this issue Dec 17, 2020 · 7 comments

Comments

@Ivan-vechetti
Copy link

Hello, running

run_midas.py genes

Goes well but in the end, I get:
E::idx_fin_and_load Could not retrieve index file for 'midas_output//genes/temp/pangenomes.bam'

And then when I run:

run_midas.py snps

Goes well but in the end, I get:
TypeError: cannot pickle '_io.TextIOWrapper' object

Can someone help me with that?

Python 3.8.5

@nick-youngblut
Copy link

It seems to be caused by:

def iopen(inpath, mode='r'):
        """ Open input file for reading regardless of compression [gzip, bzip] or python version """
        ext = inpath.split('.')[-1]
        # Python2
        if sys.version_info[0] == 2:
                if ext == 'gz': return gzip.open(inpath, mode)
                elif ext == 'bz2': return bz2.BZ2File(inpath, mode)
                else: return open(inpath, mode)
        # Python3
        elif sys.version_info[0] == 3:
                if ext == 'gz': return io.TextIOWrapper(gzip.open(inpath, mode))
                elif ext == 'bz2': return bz2.BZ2File(inpath, mode)
                else: return open(inpath, mode)

which is called by species_pileup() in pysam_pileup(). I'm guessing that the file handler is not actually closed in the subprocess, which is causing the serialization error.

@nick-youngblut
Copy link

Actually, it seems to be due to passing the file hander in the args['log'] variable to species_pileup() via utility.parallel(). The file hander can't be serialized.

Changing:

def pysam_pileup(args, species, contigs):
        start = time()
        print("\nCounting alleles")
        args['log'].write("\nCounting alleles\n")

        # run pileups per species in parallel
        argument_list = []

to:

def pysam_pileup(args, species, contigs):
        start = time()
        print("\nCounting alleles")
        args['log'].write("\nCounting alleles\n")
        args['log'].close()      # new line

        # run pileups per species in parallel
        argument_list = []

Fixes the issue. It appears that the log file isn't actually written to by species_pileup anyway. I'll submit a PR

@Ivan-vechetti
Copy link
Author

Ivan-vechetti commented Dec 19, 2020 via email

@nick-youngblut
Copy link

Check out the PR edits: #113

@Ivan-vechetti
Copy link
Author

Ivan-vechetti commented Dec 19, 2020 via email

@nick-youngblut
Copy link

My editor defaults to spaces, but MIDAS is written all with tabs. This caused the indentation error. I've fixed it. Also, I added a pop for the log variable, since it appears that closing the file handler didn't actually fix the serialization error. It should work now. At least, it works for me. There's no CI for the PRs, so it's untested for a broader set of envs (eg., different version of Ubuntu), but it should work.

Aiswarya-prasad added a commit to Aiswarya-prasad/MIDAS that referenced this issue Oct 8, 2022
…r multithreading as discussed in snayfach#112 and snayfach#79 but the suggested solution of close() does not work. So I just delete it.
@Aiswarya-prasad
Copy link

Aiswarya-prasad commented Oct 8, 2022

I tried this (although del instead of pop). This works for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants