-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Variables with shared inputs are always resampled from the prior in sample_posterior_predictive #6047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This sounds serious! If you look at the control flow of So I would hypothesize that this is an issue with mutable dims. 👉 Does it go away if you pass 👉 What if instead of 👉 What if in the third example you manually broadcast the parameters already? ( |
Reporting back on the three suggestions:
A note on (2). This code snippet: three = aesara.shared(3)
pm.Normal('normal', size=three) Raised a type error for me on version 4.1.4 (windows). I instead had to pass the size in as a 1-tuple ( |
Thanks @jessegrabowski, I'd say this confirms my hypothesis: It's a bug related to broadcasted, symbolic sizes.
This sounds like a smaller bug whereby a shared variable is not automatically wrapped in a pymc/pymc/tests/test_shape_handling.py Lines 439 to 444 in 906fcdc
|
Perhaps the variable |
As an additional test, I changed the mean of the prior distribution I put over the estimated parameters. It seems that, when using
No idea why that would be happening, but it seems to suggest its not a shape or broadcasting issue. Instead, when a coord is a SharedVariable, the prior is being sample as if it were the posterior. |
Here are some more through outputs to show that with pm.Model() as model:
three = aesara.shared(3)
mu = pm.Normal('mu', mu=-10, sigma=10, shape=(three, ))
sigma = pm.HalfNormal('sigma', sigma=11, shape=(three, ))
ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
trace = pm.sample()
post = pm.sample_posterior_predictive(within_trace, var_names=['mu', 'sigma', 'obs']) Compare posterior and posterior_predictive for mu: # Posterior
print(trace.posterior.mu.mean(dim=['chain', 'draw']).values)
print(trace.posterior.mu.std(dim=['chain', 'draw']).values)
# Output
# [3.01789945 5.26561783 8.11427295]
# [0.03474686 0.20847781 0.29282972]
# Posterior Predictive
print(post.posterior_predictive.mu.mean(dim=['chain', 'draw']).values)
print(post.posterior_predictive.mu.std(dim=['chain', 'draw']).values)
# Output:
# [-10.33329768 -9.96841615 -9.62960897]
# [ 9.6122322 10.23897464 10.21420589] And for sigma: # Posterior
print(trace.posterior.sigma.mean(dim=['chain', 'draw']).values)
print(trace.posterior.sigma.std(dim=['chain', 'draw']).values)
# Output
# [1.08169416 6.39748516 9.09000896]
# [0.02457604 0.14622595 0.20021742]
# Posterior Predictive
print(post.posterior_predictive.sigma.mean(dim=['chain', 'draw']).values)
print(post.posterior_predictive.sigma.std(dim=['chain', 'draw']).values)
# Output
# [9.01756853 8.45109316 8.63402421]
# [6.54332978 6.36108147 6.72434589] |
Yeah, we hadn’t thought about the sharedvalue coordinates when the designed the volatility propagation rule in sample posterior predictive. We will have to change it slightly. If the user supplies an inference data object, we can retrieve the constant data group and the coords values from the posterior group, and check if the tensor values at the time we call sample posterior predictive have changed. If they have changed, then we can mark them as volatile, and if they haven’t changed we don’t mark them as volatile. That should fix this issue. |
This is not a shape problem per se. The question is what should the default be when any input to a model variable is a shared variable. with pm.Model() as m:
mu = pm.MutableData("mu", 0)
x = pm.Normal("x", mu)
y = pm.Normal("y, x, observed=[10, 10, 10])
idata = pm.sample()
pp = pm.sample_posterior_predictive(idata) In that example The reason we did these changes was to resample deterministics in GLM models where you have |
I'm having issues getting an out of sample prediction with coords changing for the out of sample data as well. Any progress on this issue? |
For now the solution is to not use MutableData for inputs of unobserved distributions |
Does that mean not to use pm.MutableData(....) or not to use pm.Data(...., mutable = True)? Can I use the latter with add_coords method? |
Check their implementation to see how they call |
I apologize as I'm still confused. Am I understanding correctly, if I fit a model with coords, then I can't make out of sample predictions with coords that go with that sample? |
Has there been any forward movement on this? I'm still not sure how to make predictions with new data without getting an error using the new coords. |
Any more advances in the sample posterior problem? I'd need to add variable coords and I don't find any workaround for this issue. |
Hi, Looking at the source and being a complete newbie when it comes to development and advanced python code, here are my thoughts: If the nodes from fg = FunctionGraph(outputs=outputs, clone=False) are named or not. If they are, it should be easy enough to filter the nodes by the coords name after the volatility logic. Just like is done in the return value of the compile_forward_sampling_function with volatile_nodes - set(givens_dict). I am sort of familiar with set, but it seems that subtracting the set(coord_named_shared_variables) would work as a solution?
It looks like var_names gets filled with observed_rvs + auto_deterministics, I'm not sure at what point a unobserved_rv becomes a observed_rv. Possibly after sampling? If then, it seems var_names just needs to be filtered by the coords names? If the goal is to update the coords via a shared variable, does that actually mean we need to resample? Since the data is being changed? My intuition is saying no, but there will be a loss of data(?). If they're not named, SharedVariables could be tagged with a name attribute, which I have no idea if it is possible. Just looked, and it seems SharedVariables are an aesara thing. Would it be possible to tag SharedVariables with a name attribute anyway?
I like this solution too, I'm just not advanced enough to implement it. I checked out the inference data from pm.sample, but I'm not seeing constant data, is this in the actual InferenceData object? I see that's arviz. Thanks all. |
Hi @jcstucki, yes, that’s the function with the volatile logic. The problem that prevents doing what you suggest is that mutable coordinates might or might not have changed between having done inference and trying to make predictions. If the coordinates did change, then they should be volatile. If they did not change, they shouldn’t.
Yes, it’s an arviz inference data object. The code that generates it is in the backends folder, under arviz (can’t link because I’m on my phone). Anyway, we intend to get around to fixing this next week. The work around at the moment is to use |
It's still unclear what is the goal of the first example in the issue with regards to mutable coords of unobserved variables. If the idea is to sample and then change coords to e.g., ["A", "B", "C", "D"], and hope the model gets one extra dimension but those corresponding to the old ["A", "B", "C"], stay intact that's NOT going to happen. All dimensions would be resampled from the prior. If the goal is to just reuse a model for the whole sampling+predictions without changing coords in between then a solution like what @lucianopaz proposed should be fine. To extend a model one could instead add two groups of variables, one with fixed coords for sampling and one with mutable ones for predictions, and stack the two together. It's possible that you could give the prediction variables an initial size of zero (not sure about coords but it should be feasible) and then "grow" those for posterior predictive? |
Having looked at a couple cases where this problem occurs, the main problem is the heavy use of index mappings to "expand" estimated parameters to match data. While it's true that the number of groups will not change (we cannot add "D"), the mapping between unobserved variables (e.g. intercepts) and rows of the observed data does change between inference and prediction time. One can imagine that the "training data" came from groups A, A, B, B, C, C, but we only want to predict for a single observation from group B. Given a usual setup, like: group_idx, _ = pd.factorize(data.index)
with pm.Model() as model:
group_map = pm.Data('group_idx', group_idx, mutable=True)
# non-centered intercept prior
mu = intercept[group_map]
# likelihood, etc. The user's expectation is that the index map As I pointed out in a thread on the discourse, this can be avoided by working with design matrices. But I would hazard a guess that close to 100% of the example notebooks and video tutorials floating around out there use index mappings rather than design matrices. It means that people will commonly run into this issue if they are following along with e.g. the Radon example, and try to extend it to prediction use the |
@jessegrabowski what's the problem with indexing designs? Indexing operations (and its coords) are free to change for prediction in most examples I've seen (i.e. ~ |
This is the exact use case that we had in mind while thinking about volatility. The The case that we didn’t consider was when the dimensionality of the intercept random variable was determined by a shared variable. Just to make this point clear, the design matrix approach would also suffer there because you would have a design matrix multiplied by a variable sized vector. As pointed out by @ricardoV94, the hardest thing to automate would be to determine which entries of the |
To handle the mixed in and out of sample indexes, we would mainly have to rewrite the out_of_sample_rv = intercept.clone(). # with the correct resizing and matching inputs
new_intercept = at.zeros(the_size_and_dtype)
new_intercept = at.set_subtensor(new_intercept[in_sample_idx], intercept)
new_intercept = at.set_subtensor(new_intercept[out_of_sample_idx], out_of_sample_rv) |
Well I went and tested it carefully, and @ricardoV94 and @lucianopaz, you're totally right. I will slink off back to my hole of shame. I now agree that this is a problem of communicating best practices, i,.e. making clear distinctions between fixed hierarchical maps, and flexible data maps, rather than an actual bug. Since the case of "mapping indices to deterministic variables" was clearly covered, I'm inclined to agree with @ricardoV94 that I can't think of a time when it's reasonable to expect the parameter shapes to change. Perhaps this is just a lack of imagination. But let it be known that my comments were based on a fundamental misunderstanding of the situation. I feel bad for spewing a wall of text at that poor guy on the discourse rather than just telling him he needs to fix a couple variables. The only other thing I'd say is that maybe there should be a warning raised by |
No shame at all @jessegrabowski! I think that the whole point of open source development is to have engaged people, like yourself, speak up and iterate as a community. When I started to contribute to pymc3, I was quite clueless and made some horrendous pieces of work, like #2991. Thanks to the team and the community I learnt a lot and kept trying to do stuff and improve. I really don’t want you to lose confidence because of a slight misunderstanding. |
Yeah I'd say so. It's come up enough lately that something should be explicitly stated when you sample the ppc. I think a little communication can go a long way here. |
This is the PR @lucianopaz mentioned about making posterior predictive more transparent: #6142 About the feelings of shame, I heard the only way to get rid of them is to open a PR that closes an open issue. Maybe two :P But honestly, misunderstandings tell us something very useful about how people are trying to use and misuse our tools so that we can try to make it better/more clear. Posterior predictive is clearly an area we have to document better (see this recent issue for example pymc-devs/pymc-examples#418) |
Want to do a PR?
…On Sun, Sep 25, 2022, 23:05 jessegrabowski ***@***.***> wrote:
Yeah I'd say so. It's come up enough lately that something should be
explicitly stated when you sample the ppc. I think a little communication
can go a long way here.
—
Reply to this email directly, view it on GitHub
<#6047 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFETGAY7Y5NZYT5E6EKA5TWAC5A5ANCNFSM56E7CZMQ>
.
You are receiving this because you are on a team that was mentioned.Message
ID: ***@***.***>
|
What is the consensus on warnings/messages? It seems like there was at least some move to reduce their number from v3 to v4 (I'm thinking about how a warning is no longer given when r_hat are large after sampling, for example). Would a warning on |
Do you think that the information that is logged after this pr isn’t enough? |
That's not entirely correct. You may be resampling intermediate variables, still conditioned on the values of inferred hyperparameters. Anyway, I think that logging the variables names as we do now removes any surprises. I don't think we need to tell users in words why those are being resampled. |
Actually, I think that it's important for you, @jessegrabowski, to try the new logging. Ricardo and I were the ones that discussed the changes in #6142, so our opinion is quite biased. Let us know if you think that those logs are enough or if you feel that they can be improved. |
Hi all, my apologies for the naive question - I don't think I understand why posterior predictive would ever sample from priors. I checked some of the referenced pulls and discussions, but I don't think I follow why. I.e., in what circumstance would that be valid and how is the user informed/warned about this? E.g., here's my example (which is basically similar to the toy examples discussed above) where I was expecting the posterior predictive to be in line with posterior rather than prior. I'd appreciate any clarification/help. Thanks! (PyMC version 5.5.0) DataModeldef line_model(x, y, dy):
with pm.Model() as model:
data_x = pm.ConstantData(name='data x',value=x)
data_y = pm.ConstantData(name='data y',value=y)
data_dy = pm.ConstantData(name='data dy',value=dy)
m = pm.Uniform('m',-2,4)
d = pm.Uniform('d',-10,10)
resp_y = pm.Normal('Y', mu=m*data_x+d, sigma=data_dy, observed=data_y)
return model
applied_model = line_model(x,y,dy) Samplingwith applied_model:
## Prior
prior = pm.sample_prior_predictive(1000,var_names=['m','d','Y'])
applied_model_idata = prior
## NUTS
hmc = pm.sample(tune=500,draws=1000, cores=4, chains=4)
applied_model_idata.extend(hmc)
## Posterior
post = pm.sample_posterior_predictive(applied_model_idata,var_names=['m','d','Y'])
applied_model_idata.extend(post) distribution for variable
|
@bersavosh the question is not naive at all. This is something we are trying to convey more clearly, as it's a source of confusion even among experienced developers.
The subtle point is that this is the only thing you could do when you are asking to sample variables using the forward function. Otherwise your trace already includes your posterior draws, and the only other thing
This is arguably not useful for root variables, but it makes sense for intermediate variables. For instance selecting the group level variables to be resampled is a valid way of asking what the model predicts new group elements to look like. The user is informed about this from the logging message, which says something like
|
Opened an issue do develop more comprehensive documentation: #6812 |
@ricardoV94 Thank you for the explanation, this is very helpful. I keep an eye on the issue you just opened. :-) |
Description of your problem
Using the model.add_coord method appears to break pm.sample_posterior_predictive.
Here’s a simple example, estimating the loc and scale of three independent normals:
Please provide a minimal, self-contained, and reproducible example.
"The dims on within_post are the same as the others, but it seems like totally wrong values are getting sampled. It is telling that the mean of each distribution is not the mean of the means, suggesting it’s not a case of correct values being shuffled."
This is pretty breaking when attempting to do out-of-sample predictions, since coords needs to be set somehow, and it can't (to my knowledge) be re-added to the model context after instantiation.
Versions and main components
The text was updated successfully, but these errors were encountered: