APOBEC mutagenesis is a common process in normal human small intestine. Wang et al.

Scripts used to reproduce the analyses of the manuscript:

Wang, Y., Robinson, P.S., Coorens, T.H.H. et al. APOBEC mutagenesis is a common process in normal human small intestine. Nat Genet 55, 246–254 (2023). https://doi.org/10.1038/s41588-022-01296-5

Please contact Yichen (yw2@sanger.ac.uk) if you have any questions and enquires.

Dataset

Raw sequencing data can be accessed from the European Genome-phenome Archive (EGA) with the accession code EGAD00001008764.

Input data for all figures can be found under the /data directory.

data/somatic_mutations/snp/ contains the final SNPs placed on phylogenetic branches.

data/somatic_mutations/indel/ contains the final INDELs placed on phylogenetic branches.

data/mutation_matrices/ contains the final SBS and ID matrices for the cohort.

data/signatures/ contains the original HDP sigantures, their decomposition to PCAWAG reference signatures, and the final reference signature exposures for each samples and each phylogenetic branch.

data/phylogenetic_trees/ contains the phylogenetic trees generated by MPBoot using single-base substitutions and indels. The length of the branch represents the number of mutations on the branch.

data/vcf/ contains the final SNP vcf files for finding and plotting kataegis.

data/cancer/ contains the paired cancer data for comparison (mutational burden and mutational siganture exposures).

data/motif/ contains enrichment scores for TCN motifs

Variant calling

The final mutation files can be found in extended data tables and data/somatic_mutations/.

Alternatively, they can be generated from the raw file via the Sanger pipeline (https://github.com/cancerit) using CaveMan, Pindel, ASCAT, BRASS. When a matched normal sample is available, run all algorithms using that sample as matched normal. Otherwise, run unmatched with a synthetic bam PDv37is.

Filtering

The filters applied to SNVs and Indels to exclude LCM artefacts can be found at: https://github.com/MathijsSanders/SangerLCMFiltering, and the beta-binomial filter to exclude germline mutations are here: https://github.com/TimCoorens/Unmatched_NormSeq.

Phylogenetic tree reconstruction

Phylogentic trees were constructed using MPBoot, with supplementary code in Phylogeny/.

Phylogeny/filtering.R contains the beta-binomial filter for the previous step and will generate the input *for_MPBoot.fa file for MPBoot. Then run:

mpboot -s $patient/${opt}_for_MPBoot.fa -bb 1000

Reconstructed phylogenetic trees in .tree and .csv format (with number of mutations on each branch) can be found at data/phylogenetic_trees/. The trees can then be visualised by treeplots.R.

This part of analysis generated the tree plots dislayed in Fig.2, Fig.3 and Extended Data Fig.3.

Mutational signature extraction

We only kept branches with > 50 mutations during the run, and the input data can be found at data/mutational_matrices/.

Workflow and code are in the directory /Signatures. This part of analysis generated Extended Data Fig.9.

Mutation burden analysis

The input file is at data/Extended_Data_Table3_crypt_summary.csv

Workflow and code are in the directory Mutation_burden/.

This part of analysis generated the plots dislayed in Fig.1, Extended Data Fig.4 and Extended Data Fig.8.

Local hypermutation (kataegis) analysis

The code is in the directory Kataegis/ and the input vcf files are at data/vcf/.

This part of analysis generated the plots dislayed in Fig.4.

Single cell RNA-seq of small and large intestine

The code is in the directory Expression/, instructions about how to download the input dataset are included in the script.

This analysis generated statistics in Table 1.

Others

Others/stem_cell/ contains code and input files for simulating stem cell dynamics.

Others/VAF.R : To generate VAF distribution plots of all samples (Extended Data Fig.1).

Others/APOBEC_motif_enrichment.R: We ran P-MACD to extract context freqeuncy and this is the post-processing code for P-MACD results (Extended Data Fig.7c).

Name	Name	Last commit message	Last commit date
Latest commit YichenWang1 Updated code for HDP extraction Mar 25, 2025 faf3fb8 · Mar 25, 2025 History 41 Commits
Expression	Expression	Nature Genetics resubmission	Dec 19, 2022
Kataegis	Kataegis	Nature Genetics resubmission	Dec 19, 2022
Mutation_burden	Mutation_burden	Nature Genetics resubmission	Dec 19, 2022
Others	Others	Nature Genetics resubmission	Dec 19, 2022
Phylogeny	Phylogeny	Nature Genetics resubmission	Dec 19, 2022
Signatures	Signatures	Updated code for HDP extraction	Mar 25, 2025
data	data	Nature Genetics resubmission	Dec 19, 2022
LICENSE	LICENSE	Nature Genetics resubmission	Dec 19, 2022
README.md	README.md	Update README.md	Apr 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

APOBEC mutagenesis is a common process in normal human small intestine. Wang et al.

Dataset

Variant calling

Filtering

Phylogenetic tree reconstruction

Mutational signature extraction

Mutation burden analysis

Local hypermutation (kataegis) analysis

Single cell RNA-seq of small and large intestine

Others

About

Releases

Packages

Languages

License

YichenWang1/small_bowel

Folders and files

Latest commit

History

Repository files navigation

APOBEC mutagenesis is a common process in normal human small intestine. Wang et al.

Dataset

Variant calling

Filtering

Phylogenetic tree reconstruction

Mutational signature extraction

Mutation burden analysis

Local hypermutation (kataegis) analysis

Single cell RNA-seq of small and large intestine

Others

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages