A Python library for blind analysis of measurement data.
Do you need to blind your data?
Maybe not. But if you expect (or want?!) a measurement to produce a certain result, blinding can remove the temptation to tune your analysis.
Do you need a package to blind your data?
Nope. To randomly offset the values in a column of a pandas.DataFrame()
,
import numpy as np
import pandas as pd
# parameters
fil = "./my_awesome_measurement.csv"
name = "A"
offset = 0.1
# load
df = pd.read_csv(fil)
# shift data values
df[name] = df[name] + offset * np.random.rand()
blindat
extends this simple concept using transformation rules.
This library is an experiment in developing a reasonable workflow for blind analysis. It is not intended to be a universal solution for all forms of data or blind analysis techniques.
I assume the user wants to avoid bias. Trust allows for a simple and reversible approach (stored data is never altered). More paranoia is more appropriate for more critical applications.
Requires python>=3.10, numpy and pandas.
Activate your Python analysis environment. Clone the source code and cd
into the directory. Install using pip
pip install .
Blind analysis can be as simple as applying an unknown transform to an appropriate column of data (e.g., laser wavelength or microwave frequency).
In this example, a random offset is selected from the range of 10 to 20 and added to column A
of the pandas DataFrame, df
.
import blindat as bd
rules = bd.generate_rules("A", offset=(10.0, 20.0), random_seed=42)
df1 = bd.blind(df, rules)
df1.head()
A | B | C | D | |
---|---|---|---|---|
0 | 14.264812 | 0.030766 | 0.064909 | 0.930325 |
1 | 14.014989 | 0.562393 | 0.227109 | 0.202936 |
2 | 14.114655 | 0.579577 | 0.015450 | 0.534170 |
3 | 14.417311 | 0.868601 | 0.142738 | 0.573955 |
4 | 14.648785 | 0.921365 | 0.019821 | 0.263312 |
The @blindat
decorator can be used to wrap methods that load a pandas.Dataframe
with the blind()
function.
The normalize()
function blinds data by offsetting and scaling values to force a mean value of zero and a standard deviation of one.
See docs for examples.