Variables with shared inputs are always resampled from the prior in sample_posterior_predictive #6047

jcstucki · 2022-08-10T15:28:51Z

Description of your problem

Using the model.add_coord method appears to break pm.sample_posterior_predictive.

Here’s a simple example, estimating the loc and scale of three independent normals:

Please provide a minimal, self-contained, and reproducible example.

# Generate data
data = np.random.normal(loc=np.array([3, 5, 8]), scale=np.array([1.1, 6.3, 9.1]), size=(1000, 3))

# Model 1: No coords
with pm.Model() as no_coords_model:
    mu = pm.Normal('mu', mu=0, sigma=10, size=3)
    sigma = pm.HalfNormal('sigma', sigma=10, size=3)
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    no_coords_trace = pm.sample()
    no_coords_post = pm.sample_posterior_predictive(no_coords_trace)

# Model 2: Context manager
coords = {'name': ['A', 'B', 'C']}
with pm.Model(coords=coords) as context_model:
    mu = pm.Normal('mu', mu=0, sigma=10, dims=['name'])
    sigma = pm.HalfNormal('sigma', sigma=10, dims=['name'])
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    
    context_trace = pm.sample()
    context_post = pm.sample_posterior_predictive(context_trace)

# Model 3: Within model
with pm.Model() as within_model:
    within_model.add_coord('name', ['A', 'B', 'C'], mutable=True)
    mu = pm.Normal('mu', mu=0, sigma=10, dims=['name'])
    sigma = pm.HalfNormal('sigma', sigma=10, dims=['name'])
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    within_trace = pm.sample()
    within_post = pm.sample_posterior_predictive(within_trace)



traces = [no_coords_trace, context_trace, within_trace]
mus = [trace.posterior.mu.values[..., i].mean() for trace in traces for i in range(3)]
sigmas = [trace.posterior.sigma.values[..., i].mean() for trace in traces for i in range(3)]
post_df = pd.DataFrame(np.c_[mus, sigmas], columns=['mu', 'sigma'], index=pd.MultiIndex.from_product([['no_coords', 'context', 'within'], ['A', 'B', 'C']]))
print(post_df.unstack(1).to_string())

                 mu                         sigma                    
                  A         B         C         A         B         C
context    2.977460  4.982624  7.826642  1.081710  6.287514  9.165928
no_coords  2.976785  4.984743  7.827109  1.081657  6.289910  9.174939
within     2.976568  4.990646  7.825051  1.081552  6.286198  9.167916


pps = [no_coords_post, context_post, within_post]
mean_value = [post.posterior_predictive.obs.values[..., i].mean() for post in pps for i in range(3)]
post_df = pd.DataFrame(mean_value, columns=['mean_ppc'], index=pd.MultiIndex.from_product([['no_coords', 'context', 'within'], ['A', 'B', 'C']]))

           mean_ppc                    
                  A         B         C
context    2.977167  4.985852  7.825006
no_coords  2.976837  4.982244  7.818495
within    -0.045788 -0.594845 -0.270400

"The dims on within_post are the same as the others, but it seems like totally wrong values are getting sampled. It is telling that the mean of each distribution is not the mean of the means, suggesting it’s not a case of correct values being shuffled."

This is pretty breaking when attempting to do out-of-sample predictions, since coords needs to be set somehow, and it can't (to my knowledge) be re-added to the model context after instantiation.

Versions and main components

PyMC/PyMC3 Version: 4.1.3
Aesara/Theano Version: 2.7.7
Python Version: 3.8
Operating system: Mac OS
How did you install PyMC/PyMC3: pip

michaelosthege · 2022-08-13T15:01:11Z

This sounds serious!

If you look at the control flow of Model.__init__(coords=...) you can see that internally it gets routed into Model.add_coords and thereby Model.add_coord.
But in the example you posted, there's a mutable=True setting which creates the difference.
Via Model(coords=...) the dimension will be immutable by default.

So I would hypothesize that this is an issue with mutable dims.
And since the likelihood is broadcasted from (3,) to observed of (1000, 3) this is probably (yet another) broadcasting bug..

👉 Does it go away if you pass mutable=False in the third example?

👉 What if instead of size=3 in the first example you do three = aesara.shared(3) and size=three?

👉 What if in the third example you manually broadcast the parameters already? (ll = pm.Normal('obs', mu=mu[None, :], sigma=sigma[None, :], observed=data))

jessegrabowski · 2022-08-15T10:24:24Z

Reporting back on the three suggestions:

Passing Mutable = False in the third example fixes the bug.
Passing a shared variable to size in the first example causes the bug.
Manually broadcasting does not fix the bug, at least not by expanding on the first dimension.

A note on (2). This code snippet:

three = aesara.shared(3)
pm.Normal('normal', size=three)

Raised a type error for me on version 4.1.4 (windows). I instead had to pass the size in as a 1-tuple (size=(three, )). I don't know if this is relevant. It doesn't appear to be in simple tests of pm.Normal(...).eval(), but I still thought it worth mentioning.

michaelosthege · 2022-08-15T12:50:05Z

Thanks @jessegrabowski, I'd say this confirms my hypothesis: It's a bug related to broadcasted, symbolic sizes.
@pymc-devs/dev-core can someone familiar with aeppl take a look at this? It does sound a little dangerous..

I instead had to pass the size in as a 1-tuple (size=(three, )). I don't know if this is relevant. It doesn't appear to be in simple tests of pm.Normal(...).eval(), but I still thought it worth mentioning.

This sounds like a smaller bug whereby a shared variable is not automatically wrapped in a tuple.
This is the test case where we're testing these "lazy" flavors of passing scalars. The test doesn't cover passing a (symbolic) scalar to size yet.

pymc/pymc/tests/test_shape_handling.py

Lines 439 to 444 in 906fcdc

    
           def test_lazy_flavors(self): 
        
               assert pm.Uniform.dist(2, [4, 5], size=[3, 2]).eval().shape == (3, 2) 
        
               assert pm.Uniform.dist(2, [4, 5], shape=[3, 2]).eval().shape == (3, 2) 
        
               with pm.Model(coords=dict(town=["Greifswald", "Madrid"])): 
        
                   assert pm.Normal("n1", mu=[1, 2], dims="town").eval().shape == (2,) 
        
                   assert pm.Normal("n2", mu=[1, 2], dims=["town"]).eval().shape == (2,)

ricardoV94 · 2022-08-15T13:34:04Z

Perhaps the variable mu is being resampled by mistake. #5973 could help

jessegrabowski · 2022-08-20T05:14:46Z

As an additional test, I changed the mean of the prior distribution I put over the estimated parameters. It seems that, when using model.add_coord, pm.sample_posterior actually samples from the prior. Here's my tests:

# Model 3: Within model
with pm.Model() as within_model:
    within_model.add_coord('name', ['A', 'B', 'C'], mutable=True)
    mu = pm.Normal('mu', mu=3, sigma=10, dims=['name'])
    sigma = pm.HalfNormal('sigma', sigma=10, dims=['name'])
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    within_trace = pm.sample()
    within_post = pm.sample_posterior_predictive(within_trace)
    print(within_post.posterior_predictive.obs.mean())
    >>> 2.83320012

with pm.Model() as within_model:
    within_model.add_coord('name', ['A', 'B', 'C'], mutable=True)
    mu = pm.Normal('mu', mu=-10, sigma=10, dims=['name'])
    sigma = pm.HalfNormal('sigma', sigma=10, dims=['name'])
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    within_trace = pm.sample()
    within_post = pm.sample_posterior_predictive(within_trace)
    print(within_post.posterior_predictive.obs.mean())
    >>> -9.98125202

No idea why that would be happening, but it seems to suggest its not a shape or broadcasting issue. Instead, when a coord is a SharedVariable, the prior is being sample as if it were the posterior.

jessegrabowski · 2022-08-20T05:34:50Z

Here are some more through outputs to show that pm.sample_posterior is sampling from the prior for all variables, and that the trigger is a shared variable on the shape:

with pm.Model() as model:
    three = aesara.shared(3)
    mu = pm.Normal('mu', mu=-10, sigma=10, shape=(three, ))
    sigma = pm.HalfNormal('sigma', sigma=11, shape=(three, ))
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    trace = pm.sample()
    post = pm.sample_posterior_predictive(within_trace, var_names=['mu', 'sigma', 'obs'])

Compare posterior and posterior_predictive for mu:

# Posterior
print(trace.posterior.mu.mean(dim=['chain', 'draw']).values)
print(trace.posterior.mu.std(dim=['chain', 'draw']).values)

# Output
# [3.01789945 5.26561783 8.11427295]
# [0.03474686 0.20847781 0.29282972]

# Posterior Predictive
print(post.posterior_predictive.mu.mean(dim=['chain', 'draw']).values)
print(post.posterior_predictive.mu.std(dim=['chain', 'draw']).values)

# Output:
# [-10.33329768  -9.96841615  -9.62960897]
# [ 9.6122322  10.23897464 10.21420589]

And for sigma:

# Posterior
print(trace.posterior.sigma.mean(dim=['chain', 'draw']).values)
print(trace.posterior.sigma.std(dim=['chain', 'draw']).values)
# Output 
# [1.08169416 6.39748516 9.09000896]
# [0.02457604 0.14622595 0.20021742]

# Posterior Predictive
print(post.posterior_predictive.sigma.mean(dim=['chain', 'draw']).values)
print(post.posterior_predictive.sigma.std(dim=['chain', 'draw']).values)

# Output
# [9.01756853 8.45109316 8.63402421]
# [6.54332978 6.36108147 6.72434589]

lucianopaz · 2022-08-20T05:54:12Z

Yeah, we hadn’t thought about the sharedvalue coordinates when the designed the volatility propagation rule in sample posterior predictive. We will have to change it slightly. If the user supplies an inference data object, we can retrieve the constant data group and the coords values from the posterior group, and check if the tensor values at the time we call sample posterior predictive have changed. If they have changed, then we can mark them as volatile, and if they haven’t changed we don’t mark them as volatile. That should fix this issue.
The problem I see is what should we do when the input isn’t an inference data object. Maybe we should assume that coords are not volatile but other tensors are?

ricardoV94 · 2022-08-20T09:29:24Z

This is not a shape problem per se. The question is what should the default be when any input to a model variable is a shared variable.

with pm.Model() as m:
  mu = pm.MutableData("mu", 0)
  x = pm.Normal("x", mu)
  y = pm.Normal("y, x, observed=[10, 10, 10])

  idata = pm.sample()
  pp = pm.sample_posterior_predictive(idata)

In that example x will be resampled from the prior again, and so will be y by association, which has nothing to do with posterior predictive.

The reason we did these changes was to resample deterministics in GLM models where you have mu=x@beta and x is mutable. We should check if it's possible to restrict the volatile logic to deterministics above the likelihood or variables in var_names, and not to all variables in the model by default.

jbh1128d1 · 2022-08-29T16:00:36Z

I'm having issues getting an out of sample prediction with coords changing for the out of sample data as well. Any progress on this issue?

ricardoV94 · 2022-08-30T09:41:28Z

For now the solution is to not use MutableData for inputs of unobserved distributions

jbh1128d1 · 2022-08-30T13:12:17Z

Does that mean not to use pm.MutableData(....) or not to use pm.Data(...., mutable = True)? Can I use the latter with add_coords method?

michaelosthege · 2022-08-30T17:11:20Z

Does that mean not to use pm.MutableData(....) or not to use pm.Data(...., mutable = True)? Can I use the latter with add_coords method?

pm.MutableData(....) is just a shortcut for pm.Data(...., mutable = True).

Check their implementation to see how they call add_coords under the hood..

jbh1128d1 · 2022-09-02T20:03:08Z

I apologize as I'm still confused. Am I understanding correctly, if I fit a model with coords, then I can't make out of sample predictions with coords that go with that sample?

jbh1128d1 · 2022-09-19T13:59:59Z

Has there been any forward movement on this? I'm still not sure how to make predictions with new data without getting an error using the new coords.

sp-leonardo-alcivar · 2022-09-22T21:26:29Z

Any more advances in the sample posterior problem? I'd need to add variable coords and I don't find any workaround for this issue.

jcstucki · 2022-09-23T16:37:24Z

Hi,
Is this the volatile logic in question? L1698 in sampling.py

Looking at the source and being a complete newbie when it comes to development and advanced python code, here are my thoughts:

If the nodes from fg = FunctionGraph(outputs=outputs, clone=False) are named or not. If they are, it should be easy enough to filter the nodes by the coords name after the volatility logic. Just like is done in the return value of the compile_forward_sampling_function with volatile_nodes - set(givens_dict). I am sort of familiar with set, but it seems that subtracting the set(coord_named_shared_variables) would work as a solution?

The reason we did these changes was to resample deterministics in GLM models where you have mu=x@beta and x is mutable. We should check if it's possible to restrict the volatile logic to deterministics above the likelihood or variables in var_names, and not to all variables in the model by default.

It looks like var_names gets filled with observed_rvs + auto_deterministics, I'm not sure at what point a unobserved_rv becomes a observed_rv. Possibly after sampling? If then, it seems var_names just needs to be filtered by the coords names?

If the goal is to update the coords via a shared variable, does that actually mean we need to resample? Since the data is being changed? My intuition is saying no, but there will be a loss of data(?).

If they're not named, SharedVariables could be tagged with a name attribute, which I have no idea if it is possible. Just looked, and it seems SharedVariables are an aesara thing. Would it be possible to tag SharedVariables with a name attribute anyway?

Yeah, we hadn’t thought about the sharedvalue coordinates when the designed the volatility propagation rule in sample posterior predictive. We will have to change it slightly. If the user supplies an inference data object, we can retrieve the constant data group and the coords values from the posterior group, and check if the tensor values at the time we call sample posterior predictive have changed. If they have changed, then we can mark them as volatile, and if they haven’t changed we don’t mark them as volatile. That should fix this issue. The problem I see is what should we do when the input isn’t an inference data object. Maybe we should assume that coords are not volatile but other tensors are?

I like this solution too, I'm just not advanced enough to implement it. I checked out the inference data from pm.sample, but I'm not seeing constant data, is this in the actual InferenceData object? I see that's arviz.

Thanks all.

lucianopaz · 2022-09-23T19:53:52Z

Hi, Is this the volatile logic in question? L1698 in sampling.py

Looking at the source and being a complete newbie when it comes to development and advanced python code, here are my thoughts:

If the nodes from fg = FunctionGraph(outputs=outputs, clone=False) are named or not. If they are, it should be easy enough to filter the nodes by the coords name after the volatility logic. Just like is done in the return value of the compile_forward_sampling_function with volatile_nodes - set(givens_dict). I am sort of familiar with set, but it seems that subtracting the set(coord_named_shared_variables) would work as a solution?

Hi @jcstucki, yes, that’s the function with the volatile logic. The problem that prevents doing what you suggest is that mutable coordinates might or might not have changed between having done inference and trying to make predictions. If the coordinates did change, then they should be volatile. If they did not change, they shouldn’t.
At the moment, the code is naive and defensive. It assumes that any shared variable might have changed so it’s descendants should also be volatile. What you are proposing is to do the opposite, assume that they did not ever change. To get the correct behaviour, we will need to actually detect a change between inference and prediction, and use that to mark volatile shared variables.

I like this solution too, I'm just not advanced enough to implement it. I checked out the inference data from pm.sample, but I'm not seeing constant data, is this in the actual InferenceData object? I see that's arviz.

Yes, it’s an arviz inference data object. The code that generates it is in the backends folder, under arviz (can’t link because I’m on my phone). Anyway, we intend to get around to fixing this next week.

The work around at the moment is to use ConstantData objects for inference and retrodiction. Then, when you want to do predictions on new data, you will have to create a new model instance using MutableData, or potentially different rv names so that the values in the race are ignored when appropriate. Regrettably, I can’t explain hue to do this without seeing your model code

ricardoV94 · 2022-09-25T18:22:56Z

It's still unclear what is the goal of the first example in the issue with regards to mutable coords of unobserved variables. If the idea is to sample and then change coords to e.g., ["A", "B", "C", "D"], and hope the model gets one extra dimension but those corresponding to the old ["A", "B", "C"], stay intact that's NOT going to happen. All dimensions would be resampled from the prior.

If the goal is to just reuse a model for the whole sampling+predictions without changing coords in between then a solution like what @lucianopaz proposed should be fine.

To extend a model one could instead add two groups of variables, one with fixed coords for sampling and one with mutable ones for predictions, and stack the two together. It's possible that you could give the prediction variables an initial size of zero (not sure about coords but it should be feasible) and then "grow" those for posterior predictive?

jessegrabowski · 2022-09-25T19:07:42Z

Having looked at a couple cases where this problem occurs, the main problem is the heavy use of index mappings to "expand" estimated parameters to match data. While it's true that the number of groups will not change (we cannot add "D"), the mapping between unobserved variables (e.g. intercepts) and rows of the observed data does change between inference and prediction time. One can imagine that the "training data" came from groups A, A, B, B, C, C, but we only want to predict for a single observation from group B. Given a usual setup, like:

group_idx, _ = pd.factorize(data.index)

with pm.Model() as model:
  group_map = pm.Data('group_idx', group_idx, mutable=True)
  # non-centered intercept prior
  mu = intercept[group_map]
  # likelihood, etc.

The user's expectation is that the index map group_idx will need to change to match new data, but this will in turn render the entire prior hierarchy over intercept volatile.

As I pointed out in a thread on the discourse, this can be avoided by working with design matrices. But I would hazard a guess that close to 100% of the example notebooks and video tutorials floating around out there use index mappings rather than design matrices. It means that people will commonly run into this issue if they are following along with e.g. the Radon example, and try to extend it to prediction use the set_data api. We've already seen it come up independently several times.

ricardoV94 · 2022-09-25T19:16:56Z

@jessegrabowski what's the problem with indexing designs? Indexing operations (and its coords) are free to change for prediction in most examples I've seen (i.e. ~ mu = b[idx] + x*m[idx]), the problem is when you want to change coordinates of unobserved variables (b, m).

lucianopaz · 2022-09-25T19:21:51Z

group_idx, _ = pd.factorize(data.index)

with pm.Model() as model:
  group_map = pm.Data('group_idx', group_idx, mutable=True)
  # non-centered intercept prior
  mu = intercept[group_map]
  # likelihood, etc.
The user's expectation is that the index map group_idx will need to change to match new data, but this will in turn render the entire prior hierarchy over intercept volatile.

This is the exact use case that we had in mind while thinking about volatility. The intercept random variable won’t be volatile because it doesn’t depend on the mutable data. Instead the deterministic mu variable will be volatile and its value will be recomputed using the group_map.

The case that we didn’t consider was when the dimensionality of the intercept random variable was determined by a shared variable. Just to make this point clear, the design matrix approach would also suffer there because you would have a design matrix multiplied by a variable sized vector.

As pointed out by @ricardoV94, the hardest thing to automate would be to determine which entries of the intercept should be taken from the prior and which from the posterior. Although difficult, aesara rewrites do enable us to do this, but I feel that it’s safer to begin only with a smarter default volatility for shared variables.

lucianopaz · 2022-09-25T19:33:01Z

To handle the mixed in and out of sample indexes, we would mainly have to rewrite the intercept_rv. The super simplified and pseudocodish way of doing this would look something like this

out_of_sample_rv = intercept.clone(). # with the correct resizing and matching inputs 
new_intercept = at.zeros(the_size_and_dtype)
new_intercept = at.set_subtensor(new_intercept[in_sample_idx], intercept)
new_intercept = at.set_subtensor(new_intercept[out_of_sample_idx], out_of_sample_rv)

jessegrabowski · 2022-09-25T19:38:59Z

Well I went and tested it carefully, and @ricardoV94 and @lucianopaz, you're totally right. I will slink off back to my hole of shame. I now agree that this is a problem of communicating best practices, i,.e. making clear distinctions between fixed hierarchical maps, and flexible data maps, rather than an actual bug. Since the case of "mapping indices to deterministic variables" was clearly covered, I'm inclined to agree with @ricardoV94 that I can't think of a time when it's reasonable to expect the parameter shapes to change. Perhaps this is just a lack of imagination. But let it be known that my comments were based on a fundamental misunderstanding of the situation.

I feel bad for spewing a wall of text at that poor guy on the discourse rather than just telling him he needs to fix a couple variables.

The only other thing I'd say is that maybe there should be a warning raised by pm.sample_posterior_predictive if there are SharedVariables on any distribution sizes that lets the user know this can result in unexpected outcomes.

lucianopaz · 2022-09-25T20:16:14Z

No shame at all @jessegrabowski! I think that the whole point of open source development is to have engaged people, like yourself, speak up and iterate as a community. When I started to contribute to pymc3, I was quite clueless and made some horrendous pieces of work, like #2991. Thanks to the team and the community I learnt a lot and kept trying to do stuff and improve. I really don’t want you to lose confidence because of a slight misunderstanding.
About warnings and logs, @ricardoV94 recently merged a pr to log the names of the variables that are being resampled. Do you think that’s good enough?

jessegrabowski · 2022-09-25T21:05:07Z

Yeah I'd say so. It's come up enough lately that something should be explicitly stated when you sample the ppc. I think a little communication can go a long way here.

ricardoV94 · 2022-09-26T01:47:05Z

This is the PR @lucianopaz mentioned about making posterior predictive more transparent: #6142

About the feelings of shame, I heard the only way to get rid of them is to open a PR that closes an open issue. Maybe two :P

But honestly, misunderstandings tell us something very useful about how people are trying to use and misuse our tools so that we can try to make it better/more clear. Posterior predictive is clearly an area we have to document better (see this recent issue for example pymc-devs/pymc-examples#418)

twiecki · 2022-10-11T08:04:06Z

Want to do a PR?

…

On Sun, Sep 25, 2022, 23:05 jessegrabowski ***@***.***> wrote: Yeah I'd say so. It's come up enough lately that something should be explicitly stated when you sample the ppc. I think a little communication can go a long way here. — Reply to this email directly, view it on GitHub <#6047 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFETGAY7Y5NZYT5E6EKA5TWAC5A5ANCNFSM56E7CZMQ> . You are receiving this because you are on a team that was mentioned.Message ID: ***@***.***>

jessegrabowski · 2022-10-11T09:05:45Z

What is the consensus on warnings/messages? It seems like there was at least some move to reduce their number from v3 to v4 (I'm thinking about how a warning is no longer given when r_hat are large after sampling, for example).

Would a warning on pm.sample_posterior_predictive that says something like "The shapes of the following random variables have changed, and will be sampled using their prior distributions: ..." be too "spammy" under the prevailing consensus on warnings?

lucianopaz · 2022-10-11T09:12:07Z

Would a warning on pm.sample_posterior_predictive that says something like "The shapes of the following random variables have changed, and will be sampled using their prior distributions: ..." be too "spammy" under the prevailing consensus on warnings?

Do you think that the information that is logged after this pr isn’t enough?

ricardoV94 · 2022-10-11T09:15:27Z

and will be sampled using their prior distributions

That's not entirely correct. You may be resampling intermediate variables, still conditioned on the values of inferred hyperparameters.

Anyway, I think that logging the variables names as we do now removes any surprises. I don't think we need to tell users in words why those are being resampled.

jessegrabowski · 2022-10-11T13:39:13Z

I haven't experienced the new changes after #6142, but it seems likes everyone is happy about where things are at now. So the direct answer to @twiecki is "No" : )

lucianopaz · 2022-10-11T13:44:22Z

Actually, I think that it's important for you, @jessegrabowski, to try the new logging. Ricardo and I were the ones that discussed the changes in #6142, so our opinion is quite biased. Let us know if you think that those logs are enough or if you feel that they can be improved.

bersavosh · 2023-07-04T01:09:02Z

Hi all, my apologies for the naive question - I don't think I understand why posterior predictive would ever sample from priors. I checked some of the referenced pulls and discussions, but I don't think I follow why. I.e., in what circumstance would that be valid and how is the user informed/warned about this?

E.g., here's my example (which is basically similar to the toy examples discussed above) where I was expecting the posterior predictive to be in line with posterior rather than prior. I'd appreciate any clarification/help.

Thanks!

(PyMC version 5.5.0)

Data

Model

def line_model(x, y, dy):
    with pm.Model() as model:
        data_x = pm.ConstantData(name='data x',value=x)
        data_y = pm.ConstantData(name='data y',value=y)
        data_dy = pm.ConstantData(name='data dy',value=dy)
        m = pm.Uniform('m',-2,4)
        d = pm.Uniform('d',-10,10)
        resp_y = pm.Normal('Y', mu=m*data_x+d, sigma=data_dy, observed=data_y)
    return model


applied_model = line_model(x,y,dy)

Sampling

with applied_model:
    ## Prior
    prior = pm.sample_prior_predictive(1000,var_names=['m','d','Y'])
    applied_model_idata = prior
    ## NUTS
    hmc = pm.sample(tune=500,draws=1000, cores=4, chains=4)
    applied_model_idata.extend(hmc)
    ## Posterior
    post = pm.sample_posterior_predictive(applied_model_idata,var_names=['m','d','Y'])
    applied_model_idata.extend(post)

distribution for variable $m$

fig = plt.figure(figsize=(6,4))
ax = fig.add_subplot(1,1,1)
ax.hist(applied_model_idata.prior['m'][0],label='prior',histtype='step',lw=2,bins=30);
ax.hist(applied_model_idata.posterior['m'][0],label='posterior',histtype='step',lw=2);
ax.hist(applied_model_idata.posterior_predictive['m'][0],label='posterior predictive',histtype='step',lw=2,bins=30);
ax.set_xlabel('m')
ax.set_ylabel('#samples/bin')
ax.legend()

ricardoV94 · 2023-07-04T08:14:50Z

Hi all, my apologies for the naive question - I don't think I understand why posterior predictive would ever sample from priors. I checked some of the referenced pulls and discussions, but I don't think I follow why.

@bersavosh the question is not naive at all. This is something we are trying to convey more clearly, as it's a source of confusion even among experienced developers.

var_names selects all the variables that should be sampled (or resampled). These variables are always sampled from the forward random function, conditioned on any variables that are in the trace and not themselves being sampled. This corresponds to the prior when you are sampling root variables (without any other RV inputs), and it's equivalent to what you would get when you sample a new variable that is not in the trace (either because you have an expanded model, or because you manually deleted them from the trace).

The subtle point is that this is the only thing you could do when you are asking to sample variables using the forward function. Otherwise your trace already includes your posterior draws, and the only other thing sample_posterior_predictive could do for those variables would be to copy them, which is not a very useful thing?

I.e., in what circumstance would that be valid and how is the user informed/warned about this?

This is arguably not useful for root variables, but it makes sense for intermediate variables. For instance selecting the group level variables to be resampled is a valid way of asking what the model predicts new group elements to look like.

The user is informed about this from the logging message, which says something like Sampling: [var1, var2, ...]. All the variables in this list are being sampled via the forward function, conditioned on any other ancestors that are in the trace. There have been some issues for VSCode users, where this logging message is annoyingly suppressed.

sample_prior_predictive is the limit case of sample_posterior_predictive when you have no variables in the trace or selected all variables in var_names. That's why sometimes posterior_predictive starts to behave like prior_predictive. The underlying machinery is exactly the same.

ricardoV94 · 2023-07-04T08:38:45Z

Opened an issue do develop more comprehensive documentation: #6812

bersavosh · 2023-07-07T00:34:10Z

@ricardoV94 Thank you for the explanation, this is very helpful. I keep an eye on the issue you just opened. :-)

michaelosthege added bug shape problem labels Aug 13, 2022

ricardoV94 mentioned this issue Aug 19, 2022

Log name of RVs that are being resampled in posterior predictive #5973

Closed

lucianopaz removed the shape problem label Aug 20, 2022

ricardoV94 changed the title ~~Using the model.add_coord method appears to break pm.sample_posterior_predictive~~ Variables with shared inputs are always resampled from the prior in sample_posterior_predictive Aug 20, 2022

ricardoV94 pinned this issue Aug 30, 2022

lucianopaz mentioned this issue Sep 26, 2022

Check shared variable values to determine volatility in posterior predictive sampling #6147

Merged

5 tasks

ricardoV94 closed this as completed in #6147 Sep 30, 2022

ricardoV94 unpinned this issue Sep 30, 2022

ricardoV94 mentioned this issue Jul 4, 2023

Add core notebook on sample_posterior_predictive for out-of-model predictions #6812

Open

ricardoV94 mentioned this issue Feb 27, 2024

Make sample_posterior_predictive API less surprising #7069

Open

Variables with shared inputs are always resampled from the prior in sample_posterior_predictive #6047

Variables with shared inputs are always resampled from the prior in sample_posterior_predictive #6047

Comments

jcstucki commented Aug 10, 2022 • edited Loading

Description of your problem

Versions and main components

michaelosthege commented Aug 13, 2022

jessegrabowski commented Aug 15, 2022 • edited Loading

michaelosthege commented Aug 15, 2022

ricardoV94 commented Aug 15, 2022 • edited Loading

jessegrabowski commented Aug 20, 2022

jessegrabowski commented Aug 20, 2022 • edited Loading

lucianopaz commented Aug 20, 2022 • edited Loading

ricardoV94 commented Aug 20, 2022 • edited Loading

jbh1128d1 commented Aug 29, 2022

ricardoV94 commented Aug 30, 2022

jbh1128d1 commented Aug 30, 2022

michaelosthege commented Aug 30, 2022

jbh1128d1 commented Sep 2, 2022

jbh1128d1 commented Sep 19, 2022

sp-leonardo-alcivar commented Sep 22, 2022

jcstucki commented Sep 23, 2022

lucianopaz commented Sep 23, 2022

ricardoV94 commented Sep 25, 2022 • edited Loading

jessegrabowski commented Sep 25, 2022

ricardoV94 commented Sep 25, 2022 • edited Loading

lucianopaz commented Sep 25, 2022

lucianopaz commented Sep 25, 2022

jessegrabowski commented Sep 25, 2022

lucianopaz commented Sep 25, 2022

jessegrabowski commented Sep 25, 2022

ricardoV94 commented Sep 26, 2022 • edited Loading

twiecki commented Oct 11, 2022 via email

jessegrabowski commented Oct 11, 2022

lucianopaz commented Oct 11, 2022

ricardoV94 commented Oct 11, 2022 • edited Loading

jessegrabowski commented Oct 11, 2022

lucianopaz commented Oct 11, 2022

bersavosh commented Jul 4, 2023

Data

Model

Sampling

distribution for variable $m$

ricardoV94 commented Jul 4, 2023 • edited Loading

ricardoV94 commented Jul 4, 2023

bersavosh commented Jul 7, 2023

jcstucki commented Aug 10, 2022 •

edited

Loading

jessegrabowski commented Aug 15, 2022 •

edited

Loading

ricardoV94 commented Aug 15, 2022 •

edited

Loading

jessegrabowski commented Aug 20, 2022 •

edited

Loading

lucianopaz commented Aug 20, 2022 •

edited

Loading

ricardoV94 commented Aug 20, 2022 •

edited

Loading

ricardoV94 commented Sep 25, 2022 •

edited

Loading

ricardoV94 commented Sep 25, 2022 •

edited

Loading

ricardoV94 commented Sep 26, 2022 •

edited

Loading

ricardoV94 commented Oct 11, 2022 •

edited

Loading

ricardoV94 commented Jul 4, 2023 •

edited

Loading