Skip to content

Handling / reporting of the peptide-spectrum-matches in case when peptide can originate both from target and decoy sequence #176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MatteoLacki opened this issue Apr 4, 2025 · 1 comment

Comments

@MatteoLacki
Copy link

Hi!

I have here more of a question than a bug report.

Together with @theGreatHerrLebert we went through the source code in order to check what is the status of peptide-spectral-matches where the peptide could have both a target and a decoy protein parent.

It seems to us, that nothing is done in that case, both being likely reported. Could you confirm that is the case?

It seems there are multiple ways of dealing with that cases, which we would like to investigate. But in order to do so, we need to know what we start with. I don't if that situation requires any action, but perhaps adding column to the output containing either 0 for case of psms being uniquely from one source and some number > 0 for determining the other peptide for which that happened could be an option, though of course it all is easily track-able by simply sorting outputs.

Best wishes!

Matteo Lacki

@lazear
Copy link
Owner

lazear commented Apr 4, 2025

Hi Matteo,

In the case you described, only the target match is retained. You should never see a PSM that is assigned to both a target and decoy protein. If you find such a case, please report it.

The relevant pieces of code are below. We create a set of all target peptides and remove any decoy peptides that match a target peptide (line 203).

let targets: DashSet<_, FnvBuildHasher> = DashSet::default();
digests
.par_iter()
.filter(|digest| !digest.decoy)
.for_each(|digest| {
targets.insert(digest.sequence.clone().into_bytes());
});
log::trace!("modifying peptides");
let mut target_decoys = digests
.into_par_iter()
.map(Peptide::try_from)
.filter_map(Result::ok)
.flat_map_iter(|peptide| {
peptide
.apply(&mods, &self.static_mods, self.max_variable_mods)
.into_iter()
.filter(|peptide| {
peptide.monoisotopic >= self.peptide_min_mass
&& peptide.monoisotopic <= self.peptide_max_mass
})
.flat_map(|peptide| {
if self.generate_decoys {
vec![peptide.reverse(), peptide].into_iter()
} else {
vec![peptide].into_iter()
}
})
.filter(|peptide| !peptide.decoy || !targets.contains(&(peptide.sequence[..])))
})
.collect::<Vec<_>>();

With recent updates to fasta chunking, we might need to add an additional check to prevent decoy annotations from being retained here (line 227)

target_decoys.dedup_by(|remove, keep| {
if remove.sequence == keep.sequence
&& remove.modifications == keep.modifications
&& remove.nterm == keep.nterm
&& remove.cterm == keep.cterm
{
keep.proteins.extend(remove.proteins.iter().cloned());
// When merging peptides from different Fastas,
// decoys in one fasta might be targets in another
keep.decoy &= remove.decoy;
true
} else {
false
}
});

This would be a good opportunity to add an extra test-case to Sage to confirm that fasta file chunking doesn't mess up decoy/target annotations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants