Skip to content

KoSCAPEbio is a QIIME 2 plugin designed for the exploration and analysis of the Klebsiella oxytoca species complex (KoSC) presence within microbial communities.

License

Notifications You must be signed in to change notification settings

amar-cosic/q2-KoSCAPEbio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KoSCAPEbio: KoSC Analysis and Presence Exploration

KoSCAPEbio is a QIIME 2 plugin designed for the exploration and analysis of the Klebsiella oxytoca species complex (KoSC) presence within microbial communities. KoSCAPEbio provides insight into KoSC distribution, aiding in microbial diversity and ecosystem health studies.

Features

  • PresenceCheck: Analyzes and identifies the presence of KoSC members in your samples using custom databases.
  • AbundanceProfile: Calculates and visualizes the relative abundance of microbial species, emphasizing KoSC findings within the broader microbial community.

Workflow Overview

Below is a visual overview of the workflow for database curation:

Workflow Overview

Required Libraries:

All required libraries can either be installed via pip using the command: pip install biopython pandas numpy scikit-bio or by running: pip install -r requirements.txt.

System Requirements

KoSCAPEbio is designed for Linux and macOS environments. Windows users may encounter issues, especially when using WSL. For more details on Windows compatibility, see the Troubleshooting section.

Installation

Currently, KoSCAPEbio can only be installed directly from GitHub. Use the following command to install the tool:

pip install git+https://github.com/amar-cosic/q2-KoSCAPEbio.git
  • Note: This installation method uses GitHub as the source. A standard pip installation from PyPI and a conda installation will be available soon. Stay tuned for updates!

Database Preparation

KoSCAPEbio is equipped with two curated databases for analyzing Klebsiella species:

  • Default Database (database.qza): This database is specifically curated for analyzing the Klebsiella oxytoca species complex (KoSC).
  • Aerogenes-Specific Database (database_aero.qza): This specialized database is designed exclusively for Klebsiella aerogenes.

Users who require a customized database to suit particular research needs have the flexibility to modify the existing database or create a new one entirely. q2-KoSCAPEbio includes a database curation extension that enables users to curate or construct their own databases.

For comprehensive instructions on customizing or constructing your database using the provided Python script, please consult our detailed guide:
Database Curation Guide

Usage

Step 1: To use KoSCAPEbio, activate your QIIME 2 environment.

conda activate qiime2-<version>

Step 2: PresenceCheck

Search the ASVs for presence of KoSC or any other species. by running pressence-check.

qiime koscapebio presence-check \
--p-rep-seqs <path-to-rep-seqs.qza> \
--p-table <path-to-feature-table.qza> \
--p-perc-identity <percentage-identity> \
--o-clustered-table <path-for-output.qza>

Step 3: AbundanceProfile

Visualize the results and make them human readable by creation of a heatmap and by calculating relative abundance.

qiime koscapebio abundance-profile \
--p-raw-table <path-to-raw-table.qza> \
--p-qiime-table <path-to-qiime-table.qza> \
--p-work-dir <path-to-intermediate-file> \
--o-relative-abundance <path-for-relative-abundance.qza>

Detailed Parameter Descriptions

Parameters for presence-check:

Required

`rep_seq_path`: Path for representative sequences in `.qza` format
`table_path`: Path for the feature table in `.qza` format
`perc_identity`: Percentage of identity for clustering (between 0 and 1)

`clustered_table`: Feature table with positive OTU/ASV in samples
`clustered_seq`: Positive sequences that could be found in the samples
`unclustered_seq`:Resulting unmatched sequences
or 
`--output-dir`:  Output unspecified results to a directory

Optional

`temp_dir`: Directory for temporary files. These files will be deleted if not provided
`user_db`: Path to a user-defined database
`strand`: Specifies whether to search only the forward strand or both forward and reverse strands

Parameters for abundance-profile:

Required

`raw_table`: Path to the raw table output from the KoSCAPEbio analysis. This table contains the initial analysis results
`qiime_table`: Path to the feature table generated by QIIME 2. Used for comparative analysis with the KoSCAPEbio output
`work_dir`: Directory path where intermediate files will be stored. Useful for detailed analysis or debugging
`relative_abundance`: Feature table indicating the relative abundance of features. Saved as a QIIME 2 artifact (`.qza`)

Output Explanation

Running the presence-check and abundance-profile features will generate different sets of output files. Below is an explanation of each output and the folder structure if both features are run and the results are directed to the same folder.

PresenceCheck Output

When using presence-check, the following files are generated (standard outputs from VSEARCH):

  • clustered_table.qza: A feature table containing ASVs or IDs that were positively matched to KoSC members.
  • clustered_seq.qza: A file of sequences that were positively matched.
  • unclustered_seq.qza: A file of sequences that did not match any KoSC members.

AbundanceProfile Output

When using abundance-profile, multiple folders and files are created to store the analysis results:

  • relative_abundance.qza: The final results of the abundance profile in .qza format, ready for further analysis in QIIME 2.

  • work_dir folder: This directory contains intermediate files and organized results for detailed inspection and debugging. Inside work_dir, there are two main folders and several files:

    • Folders:

      • koscapebio_raw_table:

        • feature-table.biom: KoSCAPEbio results in .biom format.
        • koscapebio_raw_table.tsv: KoSCAPEbio results in .tsv format, with detailed information on each species version.
      • qiime_table:

        • feature-table.biom: The QIIME feature table in .biom format.
        • qiime_table.tsv: The same QIIME feature table in .tsv format for easier viewing.
    • Files:

      • final_table.tsv: The summarized final results in .tsv format, where versions belonging to the same species are summed.
      • rel_table.biom: The final results in .biom format, representing the relative abundance of each feature.
      • heatmap.png: A heatmap visualization generated from final_table.tsv, providing a graphical representation of species abundance across samples.

Heatmap Visualization Example

Below is an example of how the heatmap visualization looks:

Heatmap Example

This heatmap provides a visual representation of the results. Each cell corresponds to a specific value indicating the relative abundance of a feature (species) in a sample. The color intensity in each cell reflects the magnitude of this value, with the scale shown on the right side of the heatmap.

  • Rows: Each row represents a different sample.
  • Columns: Each column represents a different feature (species).
  • Colors: The color gradient from purple to yellow represents the range of values from low to high relative abundance.

This visualization helps in quickly identifying which features are more prevalent in each sample and comparing the abundance of features across different samples.

Output Directory Structure Example

If both presence-check and abundance-profile are run with the same output directory, the folder structure will look like this:

output_directory/
├── clustered_table.qza
├── clustered_seq.qza
├── unclustered_seq.qza
├── relative_abundance.qza
└── work_dir
    ├── koscapebio_raw_table
    │   ├── feature-table.biom
    │   └── koscapebio_raw_table.tsv
    ├── qiime_table
    │   ├── feature-table.biom
    │   └── qiime_table.tsv
    ├── final_table.tsv
    ├── rel_table.biom
    └── heatmap.png

Showcase

The following examples demonstrate the functionality of KoSCAPEbio's presence-check and abundance-profile commands. For simplicity, we will use test files included with the tool. These files are part of the automated testing suite but can also serve as sample data for showcasing KoSCAPEbio's capabilities. Files are located in the tool path: q2_koscapebio/tests

By following these examples, you’ll see how KoSCAPEbio analyzes microbial community data and generates outputs like clustered feature tables, sequences, and relative abundance heatmaps.


Example 1: PresenceCheck

To analyze the presence of KoSC members in the sample, run the presence-check command with the provided test files.

qiime koscapebio presence-check \
  --p-rep-seqs rep-seqs.qza \
  --p-table table.qza \
  --p-perc-identity 1 \
  --o-clustered-table clustered_table.qza \
  --o-clustered-seqs clustered_seqs.qza \
  --o-unclustered-seqs unclustered_seqs.qza

Example 2: AbundanceProfile

To calculate relative abundance and generate visualizations, use the abundance-profile command with the test files.

qiime koscapebio abundance-profile \
  --i-raw-table clustered_table.qza \
  --i-qiime-table table.qza \
  --p-work-dir intermediate_files \
  --o-relative-abundance rel_abundance.qza

Automated Testing

KoSCAPEbio includes automated tests to verify functionality, reliability and stability. To run the tests, make sure you have pytest installed:

pip install pytest

Once installed, navigate to the project’s root directory and run the following command:

pytest

This command will execute all tests, including those for presence-check and abundance-profile. Test results and any relevant warnings or errors will be displayed in the console.

Note - Some tests may output warnings related to dependencies, such as deprecation warnings for pkg_resources. These warnings do not affect the functionality of the tool, but they will be addressed in future releases.

Troubleshooting

  • Operating System Compatibility (Windows WSL):
    The tool may have issues running in Windows Subsystem for Linux (WSL), as some dependencies and file paths may not work seamlessly within WSL. For the best experience, run the tool on native Linux or macOS systems. If you don’t have access to a native Linux or macOS environment, consider using a virtual machine instead of WSL. Tools like VirtualBox or VMware can help you set up a Linux VM on your Windows system.

  • Permissions on Temporary Directories:
    If you're working on a server or restricted environment where you don’t have write and delete permissions in the default temporary directory, you can use the temp_dir parameter to specify a custom directory where you have write and delete permissions. This parameter allows KoSCAPEbio to store intermediate files in a location that works for your environment: --temp_dir /path/to/your/temp/directory

  • Database Curation Troubleshooting:
    For issues related to database curation, including API rate limits and access to NCBI, please refer to the Database Curation Troubleshooting section in the database curation README.

  • Scikit-Bio Import Issue:
    If you encounter errors when importing scikit-bio, such as missing functions or modules, ensure that the correct version of numpy and scipy is installed. Scikit-Bio 0.5.7 requires numpy==1.21.6 and scipy==1.7.3. You can reinstall them using: pip install numpy==1.21.6 scipy==1.7.3 --force-reinstall If the issue persists, try reinstalling scikit-bio with: pip install --no-build-isolation --no-cache-dir scikit-bio

License

This tool is BSD licensed, as found in the LICENSE file.

Citation

If you use KoSCAPEbio or the Database curation script in your research, please cite it as follows:

KoSCAPEbio: KoSC Analysis and Presence Exploration. (2025). Zenodo. https://doi.org/10.5281/zenodo.14927381

Additionally please cite the related paper: [TBD]

Contact

For help and support, please contact:

About

KoSCAPEbio is a QIIME 2 plugin designed for the exploration and analysis of the Klebsiella oxytoca species complex (KoSC) presence within microbial communities.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published