-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fasta parsing experiment #5
Comments
Thanks for providing readfq2. It helped me narrow down the perf bottleneck quickly. readLine was meant for getting the user input from the command line and not bulk reads from stdin. For that reason, I deprecated
On my linux machine, this is now twice as fast as the python3 version (still much room for improvement but now it's a more fair comparison in regards to reading lines from stdin). Although the python script seems to be doing more in the script... I'm going to see what missing functions there are and also flesh out more of the new File api. |
Boom shakalaka! Amazing work. Note: if cyber can compete favorably on these benchmarks I think you might unlock a bioinformatics market segment..... # Same for me on MacOS!
time python3 readfq.py < GCA_013297495.1_ASM1329749v1_genomic.fna
There are 341540 records and 161512289 bases
real 0m0.898s
user 0m0.794s
sys 0m0.072s
time ./cyber readfq3.cy < GCA_013297495.1_ASM1329749v1_genomic.fna
There are 163709211 bases from 341540 records in this file.
real 0m0.393s
user 0m0.323s
sys 0m0.062s |
I just made the same script even faster using simd to find the new line character. Also you can now provide a read buffer size to streamLines(). It defaults to 4096 bytes, but I've found that 4MB works well for larger files. Between this and simd (mostly simd), I'm seeing almost another 2x in performance gains. Also worth mentioning the same simd technique is now made available for |
Hi @fubark ,
Thanks again for your awesome language. I played around with cyber a bit today for fasta parsing to see how it might fare against some other languages (inspiration here). My results are here if you are interested in taking a look. Right now python is ahead by ~ 2 orders of magnitude. I know cyber is designed for embedded systems but I thought i might get lucky with some fast I/O as well :).
This is a really promising language thats been fun to use; thank you.
zach cp
The text was updated successfully, but these errors were encountered: