Filter unused fragments from peak_df #89

GeorgWa · 2025-03-19T17:02:58Z

Add remove_unused_peaks method to MSData_Base for efficient peak dataframe cleanup after spectrum filtering. Includes a numba-accelerated mask generator for high performance and supports both in-place and copy operations with full class inheritance preservation.

This PR was 1.87USD

jalew188 · 2025-03-20T01:04:25Z

Any redundencies with alphabase's remove unused fragments?

alpharaw/ms_data_base.py

github-actions · 2025-03-20T09:02:17Z

The implementation adds functionality to remove unused peaks from MS data, which is useful for memory efficiency. The code is generally well-structured with good documentation. The main issues are:

The use of Numba without error handling if Numba is not available (consider adding a fallback implementation)
The get_peaks_to_keep_mask function could be more efficient by using vectorized operations instead of loops
Consider adding parameter validation to the remove_unused_peaks method to check if the required columns exist in the dataframes
The test cases are thorough, but they don't test error cases or edge cases like empty dataframes

github-actions · 2025-03-20T09:02:19Z

Number of tokens: input_tokens=11268 output_tokens=2153 max_tokens=4096
review_instructions=''
config={}
thinking: ```
[]

tests/unit/test_remove_unused_peaks.py

mlorenz49

Tested this out on test file. Function is working and properly removing peaks from peak_df upon filtering spectrum_df.

GeorgWa · 2025-03-20T17:20:40Z

Any redundencies with alphabase's remove unused fragments?

Yes, potentially. This is something which could be improved in the future.

create remove unused peaks

79857bc

GeorgWa requested review from mschwoer and mo-sameh March 19, 2025 17:03

pre commit

557a7e5

mschwoer reviewed Mar 20, 2025

View reviewed changes

mschwoer added the code-review label Mar 20, 2025