KoSCAPEbio is a QIIME 2 plugin designed for the exploration and analysis of the Klebsiella oxytoca species complex (KoSC) presence within microbial communities. KoSCAPEbio provides insight into KoSC distribution, aiding in microbial diversity and ecosystem health studies.
- PresenceCheck: Analyzes and identifies the presence of KoSC members in your samples using custom databases.
- AbundanceProfile: Calculates and visualizes the relative abundance of microbial species, emphasizing KoSC findings within the broader microbial community.
Below is a visual overview of the workflow for database curation:
All required libraries can either be installed via pip using the command: pip install biopython pandas numpy scikit-bio
or by running: pip install -r requirements.txt
.
KoSCAPEbio is designed for Linux and macOS environments. Windows users may encounter issues, especially when using WSL. For more details on Windows compatibility, see the Troubleshooting section.
Currently, KoSCAPEbio can only be installed directly from GitHub. Use the following command to install the tool:
pip install git+https://github.com/amar-cosic/q2-KoSCAPEbio.git
- Note: This installation method uses GitHub as the source. A standard
pip
installation from PyPI and aconda
installation will be available soon. Stay tuned for updates!
KoSCAPEbio is equipped with two curated databases for analyzing Klebsiella species:
- Default Database (
database.qza
): This database is specifically curated for analyzing the Klebsiella oxytoca species complex (KoSC). - Aerogenes-Specific Database (
database_aero.qza
): This specialized database is designed exclusively for Klebsiella aerogenes.
Users who require a customized database to suit particular research needs have the flexibility to modify the existing database or create a new one entirely. q2-KoSCAPEbio
includes a database curation extension that enables users to curate or construct their own databases.
For comprehensive instructions on customizing or constructing your database using the provided Python script, please consult our detailed guide:
Database Curation Guide
conda activate qiime2-<version>
Search the ASVs for presence of KoSC or any other species. by running pressence-check.
qiime koscapebio presence-check \
--p-rep-seqs <path-to-rep-seqs.qza> \
--p-table <path-to-feature-table.qza> \
--p-perc-identity <percentage-identity> \
--o-clustered-table <path-for-output.qza>
Visualize the results and make them human readable by creation of a heatmap and by calculating relative abundance.
qiime koscapebio abundance-profile \
--p-raw-table <path-to-raw-table.qza> \
--p-qiime-table <path-to-qiime-table.qza> \
--p-work-dir <path-to-intermediate-file> \
--o-relative-abundance <path-for-relative-abundance.qza>
`rep_seq_path`: Path for representative sequences in `.qza` format
`table_path`: Path for the feature table in `.qza` format
`perc_identity`: Percentage of identity for clustering (between 0 and 1)
`clustered_table`: Feature table with positive OTU/ASV in samples
`clustered_seq`: Positive sequences that could be found in the samples
`unclustered_seq`:Resulting unmatched sequences
or
`--output-dir`: Output unspecified results to a directory
`temp_dir`: Directory for temporary files. These files will be deleted if not provided
`user_db`: Path to a user-defined database
`strand`: Specifies whether to search only the forward strand or both forward and reverse strands
`raw_table`: Path to the raw table output from the KoSCAPEbio analysis. This table contains the initial analysis results
`qiime_table`: Path to the feature table generated by QIIME 2. Used for comparative analysis with the KoSCAPEbio output
`work_dir`: Directory path where intermediate files will be stored. Useful for detailed analysis or debugging
`relative_abundance`: Feature table indicating the relative abundance of features. Saved as a QIIME 2 artifact (`.qza`)
Running the presence-check
and abundance-profile
features will generate different sets of output files. Below is an explanation of each output and the folder structure if both features are run and the results are directed to the same folder.
When using presence-check
, the following files are generated (standard outputs from VSEARCH):
clustered_table.qza
: A feature table containing ASVs or IDs that were positively matched to KoSC members.clustered_seq.qza
: A file of sequences that were positively matched.unclustered_seq.qza
: A file of sequences that did not match any KoSC members.
When using abundance-profile
, multiple folders and files are created to store the analysis results:
-
relative_abundance.qza
: The final results of the abundance profile in.qza
format, ready for further analysis in QIIME 2. -
work_dir
folder: This directory contains intermediate files and organized results for detailed inspection and debugging. Insidework_dir
, there are two main folders and several files:-
Folders:
-
koscapebio_raw_table
:feature-table.biom
: KoSCAPEbio results in.biom
format.koscapebio_raw_table.tsv
: KoSCAPEbio results in.tsv
format, with detailed information on each species version.
-
qiime_table
:feature-table.biom
: The QIIME feature table in.biom
format.qiime_table.tsv
: The same QIIME feature table in.tsv
format for easier viewing.
-
-
Files:
final_table.tsv
: The summarized final results in.tsv
format, where versions belonging to the same species are summed.rel_table.biom
: The final results in.biom
format, representing the relative abundance of each feature.heatmap.png
: A heatmap visualization generated fromfinal_table.tsv
, providing a graphical representation of species abundance across samples.
-
Below is an example of how the heatmap visualization looks:
This heatmap provides a visual representation of the results. Each cell corresponds to a specific value indicating the relative abundance of a feature (species) in a sample. The color intensity in each cell reflects the magnitude of this value, with the scale shown on the right side of the heatmap.
- Rows: Each row represents a different sample.
- Columns: Each column represents a different feature (species).
- Colors: The color gradient from purple to yellow represents the range of values from low to high relative abundance.
This visualization helps in quickly identifying which features are more prevalent in each sample and comparing the abundance of features across different samples.
If both presence-check
and abundance-profile
are run with the same output directory, the folder structure will look like this:
output_directory/
├── clustered_table.qza
├── clustered_seq.qza
├── unclustered_seq.qza
├── relative_abundance.qza
└── work_dir
├── koscapebio_raw_table
│ ├── feature-table.biom
│ └── koscapebio_raw_table.tsv
├── qiime_table
│ ├── feature-table.biom
│ └── qiime_table.tsv
├── final_table.tsv
├── rel_table.biom
└── heatmap.png
The following examples demonstrate the functionality of KoSCAPEbio's presence-check
and abundance-profile
commands. For simplicity, we will use test files included with the tool. These files are part of the automated testing suite but can also serve as sample data for showcasing KoSCAPEbio's capabilities. Files are located in the tool path: q2_koscapebio/tests
By following these examples, you’ll see how KoSCAPEbio analyzes microbial community data and generates outputs like clustered feature tables, sequences, and relative abundance heatmaps.
To analyze the presence of KoSC members in the sample, run the presence-check
command with the provided test files.
qiime koscapebio presence-check \
--p-rep-seqs rep-seqs.qza \
--p-table table.qza \
--p-perc-identity 1 \
--o-clustered-table clustered_table.qza \
--o-clustered-seqs clustered_seqs.qza \
--o-unclustered-seqs unclustered_seqs.qza
To calculate relative abundance and generate visualizations, use the abundance-profile
command with the test files.
qiime koscapebio abundance-profile \
--i-raw-table clustered_table.qza \
--i-qiime-table table.qza \
--p-work-dir intermediate_files \
--o-relative-abundance rel_abundance.qza
KoSCAPEbio includes automated tests to verify functionality, reliability and stability. To run the tests, make sure you have pytest
installed:
pip install pytest
Once installed, navigate to the project’s root directory and run the following command:
pytest
This command will execute all tests, including those for presence-check and abundance-profile. Test results and any relevant warnings or errors will be displayed in the console.
Note -
Some tests may output warnings related to dependencies, such as deprecation warnings for pkg_resources
. These warnings do not affect the functionality of the tool, but they will be addressed in future releases.
-
Operating System Compatibility (Windows WSL):
The tool may have issues running in Windows Subsystem for Linux (WSL), as some dependencies and file paths may not work seamlessly within WSL. For the best experience, run the tool on native Linux or macOS systems. If you don’t have access to a native Linux or macOS environment, consider using a virtual machine instead of WSL. Tools like VirtualBox or VMware can help you set up a Linux VM on your Windows system. -
Permissions on Temporary Directories:
If you're working on a server or restricted environment where you don’t have write and delete permissions in the default temporary directory, you can use thetemp_dir
parameter to specify a custom directory where you have write and delete permissions. This parameter allows KoSCAPEbio to store intermediate files in a location that works for your environment:--temp_dir /path/to/your/temp/directory
-
Database Curation Troubleshooting:
For issues related to database curation, including API rate limits and access to NCBI, please refer to the Database Curation Troubleshooting section in the database curation README. -
Scikit-Bio Import Issue:
If you encounter errors when importingscikit-bio
, such as missing functions or modules, ensure that the correct version ofnumpy
andscipy
is installed. Scikit-Bio 0.5.7 requiresnumpy==1.21.6
andscipy==1.7.3
. You can reinstall them using:pip install numpy==1.21.6 scipy==1.7.3 --force-reinstall
If the issue persists, try reinstallingscikit-bio
with:pip install --no-build-isolation --no-cache-dir scikit-bio
This tool is BSD licensed, as found in the LICENSE file.
If you use KoSCAPEbio or the Database curation script in your research, please cite it as follows:
KoSCAPEbio: KoSC Analysis and Presence Exploration. (2025). Zenodo. https://doi.org/10.5281/zenodo.14927381
Additionally please cite the related paper:
[TBD]
For help and support, please contact:
- Name: Amar Cosic
- Email: amar.cosic995@gmail.com