Random Draws For SEIR Parameters Per Slot #429

TimothyWillard · 2024-12-17T17:55:48Z

Describe your changes.

This pull request addresses the bug described in GH-406 where new seir parameters were not drawn across slots.

Does this pull request make any user interface changes? If so please describe.

The user interface changes are:

Added a new Parameters.reinitialize_distributions method that recreates the randomly drawn seir parameters so that they do not hold onto the prior random state, and
SEIR.run_parallel_SEIR will now draw random seir parameters across jobs.

These changes have not been documented yet, @alsnhll where do you think is the best place to describe this behavior (I'll make these changes before marking this PR as ready for review)?

What does your pull request address? Tag relevant issues.

This pull request addresses GH-406.

Tag relevant team members.

@alsnhll

Created a test that shows the issue of the SEIR parameters not being randomly drawn per a slot wheras the outcome parameters are. Test currently fails.

Added a method to reinit distribution parameters. When the `Parameters` class is initialized it creates this distribution parameters but also captures the random seed at the time of initialization which makes the draws non-random across processes. Added a method to reinitialize distribution parameters when reseeding. Not a perfect solution, the ambigious random seed settings is rather confusing for development.

Added a unit test to the `Parameters` class unit tests for the `reinitialize_distributions` method that demonstrates how this method affects the seeding behavior across workers.

alsnhll · 2024-12-18T20:33:34Z

I did some testing of how random draws of parameters work for each parameter type, using the config config_sample_2pop_modifiers_test_random.yml which is now in the tutorials folder

In the original code in the main branch, the behavior is:

spar file, seir:parameters:Ro - same value by slot - Not working as intended
hpar file, outcomes:incidDeath:delay - varying by slot and by location. Only taking on integar values.
snpi file, seir_modifier:modifiers:Ro_relax - varying by slot and by location.
hnpi file, outcome_modifiers:modifiers:test_limits - varying by slot and by location.

Thus variation in spar was the only one not working as intended, while variation in hpar, snpi, and hpi seem to be working as intended.

In this branch, the behavior is:

spar file, seir:parameters:Ro - varying by slot
hpar file, outcomes:incidDeath:delay - varying by slot and by location. Only taking on integar values.
snpi file, seir_modifier:modifiers:Ro_relax - varying by slot and by location.
hnpi file, outcome_modifiers:modifiers:test_limits - varying by slot and by location.

So now variation in spar seems to be working as intended, and nothing has changed about how hpar, snpi, and hnpi work.

However note that users are likely surprised by the variation by location in hpar, snpi, and hnpi in this case, as I don't think this behavior is documented anywhere. It's likely not clear to most users that these parameters have different values by location by default when they randomly vary, even though when their value is fixed they have the same value by location. Is this what we want the default behavior to be? I know during inference values are fit by location by default but does that make sense as the default for forward simulations? It's also confusing that spar parameters don't vary by location while these other ones do (I know that's a long standing flepimop issue but it re-confuses me every time I encounter it).

If I instead run a config where for both seir and outcome modifiers I use subpop_groups: "all" - config_sample_2pop_modifiers_test_random_subpop_groups.yml- (which should force all subpopulations to have the same parameter value, at least for inference), then I get

snpi file, seir_modifier:modifiers:Ro_relax - varying by slot but same across locations.
hnpi file, outcome_modifiers:modifiers:test_limits - varying but same across locations.

which is what we expect. I don't know how to make hpar parameters be grouped by location - can we do this?

To summarize, yes this PR seems to fix the main underlying problem, but this issue has led me to notice a behavior that needs to be documented better and discussed - during forward-only simulations, when hpni, snpi, and hpni are set to be drawn randomly from distributions, they very by slot and location, while spar varies only by slot.

Thoughts from @jcblemai, @shauntruelove, @saraloo on the desired behavior here would be helpful.

_test_random has one parameter of each type (seir, outcome, seir_modifiers, outcome_modifiers) that varies by slot _test_random_subpop_groups is same as above but for snpi and hnpi parameters it uses subpop_groups to force the same for all locations

pearsonca · 2024-12-19T01:31:44Z

Nice investigation.

Agree we need to clearly document what matches / doesn't when stochastically simulating. Where in the gitbook do you think it's best to capture that?

Regarding behavior, all the stochastic features should match in the same way in my opinion. Weird that spar seems to deviate, so we should figure out why that is. I kind of think that location shouldn't cause a different sample either, but it's most important that we have consistent behavior.

jcblemai · 2024-12-19T14:18:44Z

I agree Alison on consistency. The current state is not motivated by anything but historical hastly made decisions during COVID (fixed R0 across subpops, but age structure different in each state). We should change that carefully.

Now:

@alsnhll I don't see this config in the repository (perhaps I am in the wrong branch), does the outcome section has a parameter modifying file (another weird thing).
@TimothyWillard An opinion: While this addresses the issue, I believe the underlying problem is the seed issue (see my comment here [Bug]: Sampling parameters from distributions not working as intended #406 (comment): python subprocesses inherit the random process state), and just resetting the distribution leaves other possible bugs:
- One would expect seeds to be different for each slot -- we can talk about handling that deterministically for reproducibility (but right now it is not done deterministically even if it is the same) but at least different simulations should not share seed.
- So in addition to allowing other random generators to still be in sync (e.g if one does stochastic simulations), it is pretty fuzzy (for me though, perhaps you got it) how it would work without being at the mercy of a scipy distribution implementation change (right now created distributions uses their own seed ? what happens here that makes the number different) ?
- Even if we keep this update distribution approach, we need to seed each correctly with a different seed (at least like the emcee, ideally deterministically).

alsnhll · 2024-12-19T14:53:33Z

Configs are here:
https://github.com/HopkinsIDD/flepiMoP/blob/feature/406/random-draws-per-slot-for-SEIR/examples/tutorials/config_sample_2pop_modifiers_test_random.yml
https://github.com/HopkinsIDD/flepiMoP/blob/feature/406/random-draws-per-slot-for-SEIR/examples/tutorials/config_sample_2pop_modifiers_test_random_subpop_groups.yml

No parameter modifying outcome file

saraloo · 2025-01-10T14:54:34Z

Hmm yes thanks for this investigation @alsnhll , agree with @jcblemai it's just historical decision making responsible for these inconsistencies. I'm not entirely sure what we should define here...

I see a few arguments either way

though inconsistent, I think defining spar parameters to be the same across locations makes sense given our typical application of subpops and strata. If these are considered more as biological parameters (eg infectious period) it makes sense to me to keep them as consistent across locations.
Alternatively one could argue it's possible to think of these as not necessarily biological factors but indeed varying across location (eg contact parameters varying across populations).
following the same logic then, it's not immediately clear to me why the hpar parameters vary by location. How we define these are indeed a bit muddied - can be combination of both biological severity and reporting etc.

Could use some discussion wrt how to harmonise these definitions. In any case, needs to be made clear in docs etc.

I don't know how to make hpar parameters be grouped by location - can we do this?

Similar inconsistencies in the npi parameters too as alison points out - ie we can group seir_modifiers but not outcome_modifiers. Again this was just development that was implemented at the time in response to something we needed to model, but can work on including this feature to all types of modifiers.

flepimop/gempyor_pkg/src/gempyor/outcomes.py

TimothyWillard · 2025-01-24T21:00:50Z

Work on this has been paused until after the next release due to higher priority items.

Added a utility to `gempyor.testing` to run a python script in a seperate process for unit testing purposes. Consolidated existing unit tests to use this new functionality.

@jcblemai

Instead of generating a random seed in `onerun_delayframe_outcomes` based on the sim id use an explicit random seed passed by `run_parallel_outcomes`, added internal wrapper `_onerun_delayframe_outcomes_with_random_seed` to facilitate per an @jcblemai suggestion.

@jcblemai

Similarly to outcomes in 48c1bf4, instead of generating a random seed in `onerun_SEIR` based on the sim id use an explicit random seed passed by `run_parallel_SEIR`, added internal wrapper `_onerun_SEIR_with_random_seed` to facilitate per @jcblemai suggestion.

Move helper being used in tests into `gempyor.utils.read_directory`. This helper will filter files in a given `model_output/` directory and combine them into one single pandas DataFrame. Not designed for prod use, but helpful in testing or small runs.

Added `gempyor.testing.setup_example_from_tutorials` testing helper to construct an example config and inputs from a file in the `examples/tutorials/` directory.

Changed `gempyor.testing.setup_example_from_tutorials` to use `$FLEPI_PATH` instead of using `__file__` because the the latter only works when the package is installed with the `--editable` flag from pip.

Refactored the unit tests for `flepimop simulate` related to SEIR parameter draws to use configurable config files and assertions with the purpose of expanding test cases.

Mostly for readability.

Added in test cases working with the 'config_sample_2pop_modifiers_test_random.yml' configuration file as well as making assertions on the outcome/SEIR modifier draws.

Changed out `np.random.randint` for `np.random.choice` to prevent the unlikely possibility of replacement draws.

TimothyWillard · 2025-03-17T16:25:05Z

This PR is now ready for review again. The biggest changes since the last comments are:

Unit tests for snpi and hnpi model outputs, and
Using explicit random seeds from the parallel caller rather than seeding based on the sim id.

The behavior is now:

	Varies By Slot	Varies By Location
`spar`	Yes	No
`hpar`	Yes	Yes
`snpi`	Yes	Yes
`hnpi`	Yes	Yes
`snpi` with `subpop_groups`	Yes	No
`hnpi` with `subpop_groups`	Yes	No

A question that was discussed, but doesn't seem like we came to a strong conclusion on is "Should hpar vary by location by default?". If we want to change that I can work those changes into this PR as well.

jcblemai

This is nice, appreciate the seed handling. Very nice that we get same behavior. Does hpar varies per location even without an helper file ? In any case, I think hpar and spar should be harmonized and I'm agnostic to the choice taken.

TimothyWillard · 2025-03-17T18:19:34Z

Does hpar varies per location even without an helper file?

I'm not sure what you mean by this, could you elaborate? What helper file are you referring to?

jcblemai · 2025-03-17T18:38:03Z

Does hpar varies per location even without an helper file?

I'm not sure what you mean by this, could you elaborate? What helper file are you referring to?

outcome config option param_from_file allow one to give a file that modifies each outcome parameters by multplying it with something (a somewhat old and inelegant interface, you could e.g use a county age-pyramid in there, that should be superseeded by something. I believe re-written modifiers are a good single-point to handle time and subpop variations of thing, as it handle all use cases -- though code is ugly.).

Also, to answer my question above: even without param_from_file, outcomes probabilities are "one draw per subpop" making them inconsistent with how spar are used.

saraloo · 2025-03-18T18:40:35Z

I think harmonising hpar behaviour with spar makes the most sense to me, given how the config is defined etc?

TimothyWillard added 3 commits December 2, 2024 16:58

Test to demonstrate bug GH-406

2f93ec3

Created a test that shows the issue of the SEIR parameters not being randomly drawn per a slot wheras the outcome parameters are. Test currently fails.

Add unit test for reinitialize_distributions

c828f1e

Added a unit test to the `Parameters` class unit tests for the `reinitialize_distributions` method that demonstrates how this method affects the seeding behavior across workers.

TimothyWillard linked an issue Dec 17, 2024 that may be closed by this pull request

[Bug]: Sampling parameters from distributions not working as intended #406

Open

TimothyWillard requested a review from alsnhll December 17, 2024 17:55

TimothyWillard added bug Defects or errors in the code. gempyor Concerns the Python core. high priority High priority. next release Marks a PR as a target to include in the next release. labels Dec 17, 2024

TimothyWillard changed the base branch from main to dev December 17, 2024 20:32

TimothyWillard marked this pull request as ready for review December 18, 2024 16:33

Merge dev into feature/406/random-draws-per-slot-for-SEIR

6081587

TimothyWillard force-pushed the feature/406/random-draws-per-slot-for-SEIR branch 11 times, most recently from 1f77156 to 64b180c Compare January 6, 2025 22:29

jcblemai reviewed Jan 10, 2025

View reviewed changes

flepimop/gempyor_pkg/src/gempyor/outcomes.py Outdated Show resolved Hide resolved

TimothyWillard added medium priority Medium priority. and removed next release Marks a PR as a target to include in the next release. high priority High priority. labels Jan 24, 2025

TimothyWillard added 3 commits January 27, 2025 16:27

Merge dev into feature/406/random-draws-per-slot-for-SEIR

b0c3b5b

Reorganize Parameters class unit test imports

8e9aa62

Add run_test_in_separate_process utility

f59add3

Added a utility to `gempyor.testing` to run a python script in a seperate process for unit testing purposes. Consolidated existing unit tests to use this new functionality.

TimothyWillard self-assigned this Feb 7, 2025

TimothyWillard added the next release Marks a PR as a target to include in the next release. label Mar 7, 2025

TimothyWillard added 11 commits March 13, 2025 09:49

Merge dev into feature/406/random-draws-per-slot-for-SEIR

b104eda

Add setup_example_from_tutorials helper

fd916bd

Added `gempyor.testing.setup_example_from_tutorials` testing helper to construct an example config and inputs from a file in the `examples/tutorials/` directory.

setup_example_from_tutorials uses $FLEPI_PATH

6f00037

Changed `gempyor.testing.setup_example_from_tutorials` to use `$FLEPI_PATH` instead of using `__file__` because the the latter only works when the package is installed with the `--editable` flag from pip.

Refactor random draw test

9bfcf9d

Refactored the unit tests for `flepimop simulate` related to SEIR parameter draws to use configurable config files and assertions with the purpose of expanding test cases.

Reset DataFrame index in read_directory

a805bfb

Mostly for readability.

Expand random draw unit tests

a48a0c3

Added in test cases working with the 'config_sample_2pop_modifiers_test_random.yml' configuration file as well as making assertions on the outcome/SEIR modifier draws.

Add unit tests with 'subpop_groups'

ce89ab7

Swap randint for choice

7dc4b31

Changed out `np.random.randint` for `np.random.choice` to prevent the unlikely possibility of replacement draws.

TimothyWillard requested review from pearsonca, jcblemai, saraloo and alsnhll and removed request for alsnhll March 17, 2025 16:25

jcblemai approved these changes Mar 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random Draws For SEIR Parameters Per Slot #429

Random Draws For SEIR Parameters Per Slot #429

TimothyWillard commented Dec 17, 2024 •

edited

Loading

alsnhll commented Dec 18, 2024 •

edited

Loading

pearsonca commented Dec 19, 2024

jcblemai commented Dec 19, 2024 •

edited

Loading

alsnhll commented Dec 19, 2024

saraloo commented Jan 10, 2025

TimothyWillard commented Jan 24, 2025

TimothyWillard commented Mar 17, 2025

jcblemai left a comment

TimothyWillard commented Mar 17, 2025 •

edited

Loading

jcblemai commented Mar 17, 2025

saraloo commented Mar 18, 2025

Random Draws For SEIR Parameters Per Slot #429

Are you sure you want to change the base?

Random Draws For SEIR Parameters Per Slot #429

Conversation

TimothyWillard commented Dec 17, 2024 • edited Loading

Describe your changes.

Does this pull request make any user interface changes? If so please describe.

What does your pull request address? Tag relevant issues.

Tag relevant team members.

alsnhll commented Dec 18, 2024 • edited Loading

pearsonca commented Dec 19, 2024

jcblemai commented Dec 19, 2024 • edited Loading

alsnhll commented Dec 19, 2024

saraloo commented Jan 10, 2025

TimothyWillard commented Jan 24, 2025

TimothyWillard commented Mar 17, 2025

jcblemai left a comment

Choose a reason for hiding this comment

TimothyWillard commented Mar 17, 2025 • edited Loading

jcblemai commented Mar 17, 2025

saraloo commented Mar 18, 2025

TimothyWillard commented Dec 17, 2024 •

edited

Loading

alsnhll commented Dec 18, 2024 •

edited

Loading

jcblemai commented Dec 19, 2024 •

edited

Loading

TimothyWillard commented Mar 17, 2025 •

edited

Loading