Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError [result_end_clip >= 0] in resolve_triplets_kmerify.py #99

Closed
lucapandolfini opened this issue Aug 27, 2022 · 15 comments
Closed

Comments

@lucapandolfini
Copy link

I ran into this error in the processONT stage:

Traceback (most recent call last):
  File "/usr/local/lib/verkko/scripts/resolve_triplets_kmerify.py", line 915, in <module>
    resolve(node_lens, edge_overlaps, node_seqs, edges, paths_crossing, coverage, min_allowed_coverage, removable_nodes)
  File "/usr/local/lib/verkko/scripts/resolve_triplets_kmerify.py", line 629, in resolve
    (new_nodes, resolved) = resolve_nodes(current_length, current_nodes, paths_crossing, node_seqs, node_lens, edges, maybe_resolvable, min_edge_support, min_coverage, removable_nodes)
  File "/usr/local/lib/verkko/scripts/resolve_triplets_kmerify.py", line 342, in resolve_nodes
    assert result_end_clip >= 0
AssertionError

Could you please provide any hint regarding how to circumvent this? the input files are too large to share, but please let me know should I inspect the logs further and/or provide other intermediate files.

Thanks!

@skoren
Copy link
Member

skoren commented Aug 28, 2022

This seems like a bug in verkko and without the data, it'd be hard to diagnose exactly what is happening. Are you able to share at least the 4-processONT folder, there are instructions on how to send us data: https://canu.readthedocs.io/en/latest/faq.html#how-can-i-send-data-to-you

@lucapandolfini
Copy link
Author

Thanks for the answer.
Looking at the relevant files required for the script raising the error

 /usr/local/lib/verkko/scripts/resolve_triplets_kmerify.py \
  normal-connected.gfa \  #input_gfa = sys.argv[1]
  fake-ont-paths.txt \    #out_path_file = sys.argv[2]
  fake-ont-nodecovs.csv \ #node_coverage_file = sys.argv[3]
  resolve-mapping.txt \   #resolve_namemapping_file = sys.argv[4]
  100000 \                #max_resolve_length = int(sys.argv[5])
  5 \                     #min_allowed_coverage = float(sys.argv[6])
  20 10 5 \               #resolve_steps = [int(n) for n in sys.argv[7:]]
  < fake-ont-alns.gaf \   
  > ont-resolved-graph.gfa \
 2> ont_resolved_graph.gfa.err

Due to the overall size for the moment I shared via ftp the files:

  • normal-connected.gfa
  • fake-ont-nodecovs.csv
  • fake-ont-alns.gaf
  • ont-resolved-graph.gfa.err

(ont-resolved-graph.gfa is 0 bytes, fake-ont-paths.txt and resolve-mapping.txt were either not produced or removed by snakemake)

Hope this helps and please let me know if you need any additional file or piece of information.
Best,

Luca

@skoren
Copy link
Member

skoren commented Sep 2, 2022

I just committed a potential fix for this. You could be able to replace the relevant python script with the new version and see if it continues without error.

@lucapandolfini
Copy link
Author

Ok now it worked. Thanks!

@JhinAir
Copy link

JhinAir commented Mar 9, 2023

Hi,
I also got a similar but different issue, I think, in step 4-processONT:

File "/share/home/zhou3lab/zhangxinpei/bioapp/miniconda3/envs/verkko/lib/verkko/scripts/resolve_triplets_kmerify.py", line 915, in <module> resolve(node_lens, edge_overlaps, node_seqs, edges, paths_crossing, coverage, min_allowed_coverage, removable_nodes)
File "/share/home/zhou3lab/zhangxinpei/bioapp/miniconda3/envs/verkko/lib/verkko/scripts/resolve_triplets_kmerify.py", line 629, in resolve
(new_nodes, resolved) = resolve_nodes(current_length, current_nodes, paths_crossing, node_seqs, node_lens, edges, maybe_resolvable, min_edge_support, min_coverage, removable_nodes)
^^^^^^^^^^^^^^^
File "/share/home/zhou3lab/zhangxinpei/bioapp/miniconda3/envs/verkko/lib/verkko/scripts/resolve_triplets_kmerify.py", line 288, in resolve_nodes
assert longest_extension_per_node[">" + node] > 0
AssertionError

I renew all the files in scripts as you mentioned above, but got the same error. The current version of verkko is 1.2.
For step16:
/share/home/zhou3lab/zhangxinpei/bioapp/miniconda3/envs/verkko/lib/verkko/scripts/resolve_triplets_kmerify.py normal-connected.gfa fake-ont-paths.txt fake-ont-nodecovs.csv resolve-mapping.txt 100000 5 20 10 5 < fake-ont-alns.gaf > ont-resolved-graph.gfa 2> ont_resolved_graph.gfa.err

Could you please help check this? I will attach the 4-processONT below.
Thank you

@JhinAir
Copy link

JhinAir commented Mar 9, 2023

I can't upload the 4-processONT folder via ftp due to connection error. You may find the files in https://drive.google.com/file/d/198bPbNkcsEGOuca5lgTfZNQaERRj8RNk/view?usp=sharing

I will also re-run the verkko v1.3.1 and update results here.
best,

@JhinAir
Copy link

JhinAir commented Mar 10, 2023

update here, I got the same error at step16 of 4-processONT using v1.3.1.

@JhinAir
Copy link

JhinAir commented Mar 12, 2023

@skoren could you please help check this? Thank you!
Jing

@skoren
Copy link
Member

skoren commented Mar 12, 2023

I can reproduce the error locally but not sure what the cause is, it looks like the script may have an off by one error.

@maickrau could you take a look. The data is in globus under issue99. The node utig1-19176 seems to end up with a 0bp extension.

@skoren skoren reopened this Mar 12, 2023
@maickrau
Copy link
Collaborator

This was caused by the resolution script not handling hairpin repeats (perfect palindromes) correctly, it should be fixed in fe83d29

@JhinAir
Copy link

JhinAir commented Mar 15, 2023

Issue99_update.zip
Hi,
Thanks very much for the quick response! I replaced the resolve_triplets_kmerify.py and got different outputs. Compared to the last ones, three more files were generated: forbidden_crosslinks.txt, gaps-ont.gaf and nodecovs-ont.csv, which I attached here. However, the file 'ont-resolved-graph.gfa' is still empty, and the processONT process again stopped at Step16. The file 'ont_resolved_graph.gfa.err' only shows: ': No such file or directory'. Please let me know if you need any additional files.
Best,

@JhinAir
Copy link

JhinAir commented Mar 21, 2023

@maickrau could you please check this new result again? can you successfully output ont-resolved-graph.gfa?

Best,

@JhinAir
Copy link

JhinAir commented Apr 4, 2023

Hi, @skoren @maickrau any hints about this issue? might the relatively short (N50 ~ 50Kb) ont reads be the reason? Thank you!

@skoren
Copy link
Member

skoren commented Apr 4, 2023

I was able to run your dataset to completion with the updated code and originally shared files, yes. That error sounds like something is incorrect in the script or the replaced file didn't get installed in the verkko library folder.

The step 16 code was:

echo Step 16
 /data/korens/devel/verkko-tip/lib/verkko/scripts/resolve_triplets_kmerify.py \
  normal-connected.gfa \
  fake-ont-paths.txt \
  fake-ont-nodecovs.csv \
  resolve-mapping.txt \
  100000 \
  5 \
  20 10 5 \
  < fake-ont-alns.gaf \
  > ont-resolved-graph.gfa \
 2> ont_resolved_graph.gfa.err

The final gfa is about 1.1 GB in size. Try running the above (with the proper path to your resolve_triplets_kmerify.py script by hand and see what it reports.

@JhinAir
Copy link

JhinAir commented Apr 11, 2023

Hi, thanks very much for help! Previously there was a window-format issue, it works well now.

Best,

@skoren skoren closed this as completed Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants