Skip to content

Variables with shared inputs are always resampled from the prior in sample_posterior_predictive #6047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jcstucki opened this issue Aug 10, 2022 · 36 comments · Fixed by #6147
Labels

Comments

@jcstucki
Copy link

jcstucki commented Aug 10, 2022

Description of your problem

Using the model.add_coord method appears to break pm.sample_posterior_predictive.

Here’s a simple example, estimating the loc and scale of three independent normals:

Please provide a minimal, self-contained, and reproducible example.

# Generate data
data = np.random.normal(loc=np.array([3, 5, 8]), scale=np.array([1.1, 6.3, 9.1]), size=(1000, 3))

# Model 1: No coords
with pm.Model() as no_coords_model:
    mu = pm.Normal('mu', mu=0, sigma=10, size=3)
    sigma = pm.HalfNormal('sigma', sigma=10, size=3)
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    no_coords_trace = pm.sample()
    no_coords_post = pm.sample_posterior_predictive(no_coords_trace)

# Model 2: Context manager
coords = {'name': ['A', 'B', 'C']}
with pm.Model(coords=coords) as context_model:
    mu = pm.Normal('mu', mu=0, sigma=10, dims=['name'])
    sigma = pm.HalfNormal('sigma', sigma=10, dims=['name'])
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    
    context_trace = pm.sample()
    context_post = pm.sample_posterior_predictive(context_trace)

# Model 3: Within model
with pm.Model() as within_model:
    within_model.add_coord('name', ['A', 'B', 'C'], mutable=True)
    mu = pm.Normal('mu', mu=0, sigma=10, dims=['name'])
    sigma = pm.HalfNormal('sigma', sigma=10, dims=['name'])
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    within_trace = pm.sample()
    within_post = pm.sample_posterior_predictive(within_trace)



traces = [no_coords_trace, context_trace, within_trace]
mus = [trace.posterior.mu.values[..., i].mean() for trace in traces for i in range(3)]
sigmas = [trace.posterior.sigma.values[..., i].mean() for trace in traces for i in range(3)]
post_df = pd.DataFrame(np.c_[mus, sigmas], columns=['mu', 'sigma'], index=pd.MultiIndex.from_product([['no_coords', 'context', 'within'], ['A', 'B', 'C']]))
print(post_df.unstack(1).to_string())

                 mu                         sigma                    
                  A         B         C         A         B         C
context    2.977460  4.982624  7.826642  1.081710  6.287514  9.165928
no_coords  2.976785  4.984743  7.827109  1.081657  6.289910  9.174939
within     2.976568  4.990646  7.825051  1.081552  6.286198  9.167916


pps = [no_coords_post, context_post, within_post]
mean_value = [post.posterior_predictive.obs.values[..., i].mean() for post in pps for i in range(3)]
post_df = pd.DataFrame(mean_value, columns=['mean_ppc'], index=pd.MultiIndex.from_product([['no_coords', 'context', 'within'], ['A', 'B', 'C']]))

           mean_ppc                    
                  A         B         C
context    2.977167  4.985852  7.825006
no_coords  2.976837  4.982244  7.818495
within    -0.045788 -0.594845 -0.270400

"The dims on within_post are the same as the others, but it seems like totally wrong values are getting sampled. It is telling that the mean of each distribution is not the mean of the means, suggesting it’s not a case of correct values being shuffled."

This is pretty breaking when attempting to do out-of-sample predictions, since coords needs to be set somehow, and it can't (to my knowledge) be re-added to the model context after instantiation.

Versions and main components

  • PyMC/PyMC3 Version: 4.1.3
  • Aesara/Theano Version: 2.7.7
  • Python Version: 3.8
  • Operating system: Mac OS
  • How did you install PyMC/PyMC3: pip
@michaelosthege
Copy link
Member

This sounds serious!

If you look at the control flow of Model.__init__(coords=...) you can see that internally it gets routed into Model.add_coords and thereby Model.add_coord.
But in the example you posted, there's a mutable=True setting which creates the difference.
Via Model(coords=...) the dimension will be immutable by default.

So I would hypothesize that this is an issue with mutable dims.
And since the likelihood is broadcasted from (3,) to observed of (1000, 3) this is probably (yet another) broadcasting bug..

👉 Does it go away if you pass mutable=False in the third example?

👉 What if instead of size=3 in the first example you do three = aesara.shared(3) and size=three?

👉 What if in the third example you manually broadcast the parameters already? (ll = pm.Normal('obs', mu=mu[None, :], sigma=sigma[None, :], observed=data))

@jessegrabowski
Copy link
Member

jessegrabowski commented Aug 15, 2022

Reporting back on the three suggestions:

  1. Passing Mutable = False in the third example fixes the bug.
  2. Passing a shared variable to size in the first example causes the bug.
  3. Manually broadcasting does not fix the bug, at least not by expanding on the first dimension.

A note on (2). This code snippet:

three = aesara.shared(3)
pm.Normal('normal', size=three)

Raised a type error for me on version 4.1.4 (windows). I instead had to pass the size in as a 1-tuple (size=(three, )). I don't know if this is relevant. It doesn't appear to be in simple tests of pm.Normal(...).eval(), but I still thought it worth mentioning.

@michaelosthege
Copy link
Member

Thanks @jessegrabowski, I'd say this confirms my hypothesis: It's a bug related to broadcasted, symbolic sizes.
@pymc-devs/dev-core can someone familiar with aeppl take a look at this? It does sound a little dangerous..

I instead had to pass the size in as a 1-tuple (size=(three, )). I don't know if this is relevant. It doesn't appear to be in simple tests of pm.Normal(...).eval(), but I still thought it worth mentioning.

This sounds like a smaller bug whereby a shared variable is not automatically wrapped in a tuple.
This is the test case where we're testing these "lazy" flavors of passing scalars. The test doesn't cover passing a (symbolic) scalar to size yet.

def test_lazy_flavors(self):
assert pm.Uniform.dist(2, [4, 5], size=[3, 2]).eval().shape == (3, 2)
assert pm.Uniform.dist(2, [4, 5], shape=[3, 2]).eval().shape == (3, 2)
with pm.Model(coords=dict(town=["Greifswald", "Madrid"])):
assert pm.Normal("n1", mu=[1, 2], dims="town").eval().shape == (2,)
assert pm.Normal("n2", mu=[1, 2], dims=["town"]).eval().shape == (2,)

@ricardoV94
Copy link
Member

ricardoV94 commented Aug 15, 2022

Perhaps the variable mu is being resampled by mistake. #5973 could help

@jessegrabowski
Copy link
Member

As an additional test, I changed the mean of the prior distribution I put over the estimated parameters. It seems that, when using model.add_coord, pm.sample_posterior actually samples from the prior. Here's my tests:

# Model 3: Within model
with pm.Model() as within_model:
    within_model.add_coord('name', ['A', 'B', 'C'], mutable=True)
    mu = pm.Normal('mu', mu=3, sigma=10, dims=['name'])
    sigma = pm.HalfNormal('sigma', sigma=10, dims=['name'])
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    within_trace = pm.sample()
    within_post = pm.sample_posterior_predictive(within_trace)
    print(within_post.posterior_predictive.obs.mean())
    >>> 2.83320012

with pm.Model() as within_model:
    within_model.add_coord('name', ['A', 'B', 'C'], mutable=True)
    mu = pm.Normal('mu', mu=-10, sigma=10, dims=['name'])
    sigma = pm.HalfNormal('sigma', sigma=10, dims=['name'])
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    within_trace = pm.sample()
    within_post = pm.sample_posterior_predictive(within_trace)
    print(within_post.posterior_predictive.obs.mean())
    >>> -9.98125202

No idea why that would be happening, but it seems to suggest its not a shape or broadcasting issue. Instead, when a coord is a SharedVariable, the prior is being sample as if it were the posterior.

@jessegrabowski
Copy link
Member

jessegrabowski commented Aug 20, 2022

Here are some more through outputs to show that pm.sample_posterior is sampling from the prior for all variables, and that the trigger is a shared variable on the shape:

with pm.Model() as model:
    three = aesara.shared(3)
    mu = pm.Normal('mu', mu=-10, sigma=10, shape=(three, ))
    sigma = pm.HalfNormal('sigma', sigma=11, shape=(three, ))
    
    ll = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    trace = pm.sample()
    post = pm.sample_posterior_predictive(within_trace, var_names=['mu', 'sigma', 'obs'])

Compare posterior and posterior_predictive for mu:

# Posterior
print(trace.posterior.mu.mean(dim=['chain', 'draw']).values)
print(trace.posterior.mu.std(dim=['chain', 'draw']).values)

# Output
# [3.01789945 5.26561783 8.11427295]
# [0.03474686 0.20847781 0.29282972]

# Posterior Predictive
print(post.posterior_predictive.mu.mean(dim=['chain', 'draw']).values)
print(post.posterior_predictive.mu.std(dim=['chain', 'draw']).values)

# Output:
# [-10.33329768  -9.96841615  -9.62960897]
# [ 9.6122322  10.23897464 10.21420589]

And for sigma:

# Posterior
print(trace.posterior.sigma.mean(dim=['chain', 'draw']).values)
print(trace.posterior.sigma.std(dim=['chain', 'draw']).values)
# Output 
# [1.08169416 6.39748516 9.09000896]
# [0.02457604 0.14622595 0.20021742]

# Posterior Predictive
print(post.posterior_predictive.sigma.mean(dim=['chain', 'draw']).values)
print(post.posterior_predictive.sigma.std(dim=['chain', 'draw']).values)

# Output
# [9.01756853 8.45109316 8.63402421]
# [6.54332978 6.36108147 6.72434589]

@lucianopaz
Copy link
Member

lucianopaz commented Aug 20, 2022

Yeah, we hadn’t thought about the sharedvalue coordinates when the designed the volatility propagation rule in sample posterior predictive. We will have to change it slightly. If the user supplies an inference data object, we can retrieve the constant data group and the coords values from the posterior group, and check if the tensor values at the time we call sample posterior predictive have changed. If they have changed, then we can mark them as volatile, and if they haven’t changed we don’t mark them as volatile. That should fix this issue.
The problem I see is what should we do when the input isn’t an inference data object. Maybe we should assume that coords are not volatile but other tensors are?

@ricardoV94
Copy link
Member

ricardoV94 commented Aug 20, 2022

This is not a shape problem per se. The question is what should the default be when any input to a model variable is a shared variable.

with pm.Model() as m:
  mu = pm.MutableData("mu", 0)
  x = pm.Normal("x", mu)
  y = pm.Normal("y, x, observed=[10, 10, 10])

  idata = pm.sample()
  pp = pm.sample_posterior_predictive(idata)

In that example x will be resampled from the prior again, and so will be y by association, which has nothing to do with posterior predictive.

The reason we did these changes was to resample deterministics in GLM models where you have mu=x@beta and x is mutable. We should check if it's possible to restrict the volatile logic to deterministics above the likelihood or variables in var_names, and not to all variables in the model by default.

@ricardoV94 ricardoV94 changed the title Using the model.add_coord method appears to break pm.sample_posterior_predictive Variables with shared inputs are always resampled from the prior in sample_posterior_predictive Aug 20, 2022
@jbh1128d1
Copy link

I'm having issues getting an out of sample prediction with coords changing for the out of sample data as well. Any progress on this issue?

@ricardoV94
Copy link
Member

For now the solution is to not use MutableData for inputs of unobserved distributions

@ricardoV94 ricardoV94 pinned this issue Aug 30, 2022
@jbh1128d1
Copy link

Does that mean not to use pm.MutableData(....) or not to use pm.Data(...., mutable = True)? Can I use the latter with add_coords method?

@michaelosthege
Copy link
Member

Does that mean not to use pm.MutableData(....) or not to use pm.Data(...., mutable = True)? Can I use the latter with add_coords method?

pm.MutableData(....) is just a shortcut for pm.Data(...., mutable = True).

Check their implementation to see how they call add_coords under the hood..

@jbh1128d1
Copy link

I apologize as I'm still confused. Am I understanding correctly, if I fit a model with coords, then I can't make out of sample predictions with coords that go with that sample?

@jbh1128d1
Copy link

Has there been any forward movement on this? I'm still not sure how to make predictions with new data without getting an error using the new coords.

@sp-leonardo-alcivar
Copy link

Any more advances in the sample posterior problem? I'd need to add variable coords and I don't find any workaround for this issue.

@jcstucki
Copy link
Author

Hi,
Is this the volatile logic in question? L1698 in sampling.py

Looking at the source and being a complete newbie when it comes to development and advanced python code, here are my thoughts:

If the nodes from fg = FunctionGraph(outputs=outputs, clone=False) are named or not. If they are, it should be easy enough to filter the nodes by the coords name after the volatility logic. Just like is done in the return value of the compile_forward_sampling_function with volatile_nodes - set(givens_dict). I am sort of familiar with set, but it seems that subtracting the set(coord_named_shared_variables) would work as a solution?

The reason we did these changes was to resample deterministics in GLM models where you have mu=x@beta and x is mutable. We should check if it's possible to restrict the volatile logic to deterministics above the likelihood or variables in var_names, and not to all variables in the model by default.

It looks like var_names gets filled with observed_rvs + auto_deterministics, I'm not sure at what point a unobserved_rv becomes a observed_rv. Possibly after sampling? If then, it seems var_names just needs to be filtered by the coords names?

If the goal is to update the coords via a shared variable, does that actually mean we need to resample? Since the data is being changed? My intuition is saying no, but there will be a loss of data(?).

If they're not named, SharedVariables could be tagged with a name attribute, which I have no idea if it is possible. Just looked, and it seems SharedVariables are an aesara thing. Would it be possible to tag SharedVariables with a name attribute anyway?

Yeah, we hadn’t thought about the sharedvalue coordinates when the designed the volatility propagation rule in sample posterior predictive. We will have to change it slightly. If the user supplies an inference data object, we can retrieve the constant data group and the coords values from the posterior group, and check if the tensor values at the time we call sample posterior predictive have changed. If they have changed, then we can mark them as volatile, and if they haven’t changed we don’t mark them as volatile. That should fix this issue. The problem I see is what should we do when the input isn’t an inference data object. Maybe we should assume that coords are not volatile but other tensors are?

I like this solution too, I'm just not advanced enough to implement it. I checked out the inference data from pm.sample, but I'm not seeing constant data, is this in the actual InferenceData object? I see that's arviz.

Thanks all.

@lucianopaz
Copy link
Member

Hi, Is this the volatile logic in question? L1698 in sampling.py

Looking at the source and being a complete newbie when it comes to development and advanced python code, here are my thoughts:

If the nodes from fg = FunctionGraph(outputs=outputs, clone=False) are named or not. If they are, it should be easy enough to filter the nodes by the coords name after the volatility logic. Just like is done in the return value of the compile_forward_sampling_function with volatile_nodes - set(givens_dict). I am sort of familiar with set, but it seems that subtracting the set(coord_named_shared_variables) would work as a solution?

Hi @jcstucki, yes, that’s the function with the volatile logic. The problem that prevents doing what you suggest is that mutable coordinates might or might not have changed between having done inference and trying to make predictions. If the coordinates did change, then they should be volatile. If they did not change, they shouldn’t.
At the moment, the code is naive and defensive. It assumes that any shared variable might have changed so it’s descendants should also be volatile. What you are proposing is to do the opposite, assume that they did not ever change. To get the correct behaviour, we will need to actually detect a change between inference and prediction, and use that to mark volatile shared variables.

I like this solution too, I'm just not advanced enough to implement it. I checked out the inference data from pm.sample, but I'm not seeing constant data, is this in the actual InferenceData object? I see that's arviz.

Yes, it’s an arviz inference data object. The code that generates it is in the backends folder, under arviz (can’t link because I’m on my phone). Anyway, we intend to get around to fixing this next week.

The work around at the moment is to use ConstantData objects for inference and retrodiction. Then, when you want to do predictions on new data, you will have to create a new model instance using MutableData, or potentially different rv names so that the values in the race are ignored when appropriate. Regrettably, I can’t explain hue to do this without seeing your model code

@ricardoV94
Copy link
Member

ricardoV94 commented Sep 25, 2022

It's still unclear what is the goal of the first example in the issue with regards to mutable coords of unobserved variables. If the idea is to sample and then change coords to e.g., ["A", "B", "C", "D"], and hope the model gets one extra dimension but those corresponding to the old ["A", "B", "C"], stay intact that's NOT going to happen. All dimensions would be resampled from the prior.

If the goal is to just reuse a model for the whole sampling+predictions without changing coords in between then a solution like what @lucianopaz proposed should be fine.

To extend a model one could instead add two groups of variables, one with fixed coords for sampling and one with mutable ones for predictions, and stack the two together. It's possible that you could give the prediction variables an initial size of zero (not sure about coords but it should be feasible) and then "grow" those for posterior predictive?

@jessegrabowski
Copy link
Member

Having looked at a couple cases where this problem occurs, the main problem is the heavy use of index mappings to "expand" estimated parameters to match data. While it's true that the number of groups will not change (we cannot add "D"), the mapping between unobserved variables (e.g. intercepts) and rows of the observed data does change between inference and prediction time. One can imagine that the "training data" came from groups A, A, B, B, C, C, but we only want to predict for a single observation from group B. Given a usual setup, like:

group_idx, _ = pd.factorize(data.index)

with pm.Model() as model:
  group_map = pm.Data('group_idx', group_idx, mutable=True)
  # non-centered intercept prior
  mu = intercept[group_map]
  # likelihood, etc.

The user's expectation is that the index map group_idx will need to change to match new data, but this will in turn render the entire prior hierarchy over intercept volatile.

As I pointed out in a thread on the discourse, this can be avoided by working with design matrices. But I would hazard a guess that close to 100% of the example notebooks and video tutorials floating around out there use index mappings rather than design matrices. It means that people will commonly run into this issue if they are following along with e.g. the Radon example, and try to extend it to prediction use the set_data api. We've already seen it come up independently several times.

@ricardoV94
Copy link
Member

ricardoV94 commented Sep 25, 2022

@jessegrabowski what's the problem with indexing designs? Indexing operations (and its coords) are free to change for prediction in most examples I've seen (i.e. ~ mu = b[idx] + x*m[idx]), the problem is when you want to change coordinates of unobserved variables (b, m).

@lucianopaz
Copy link
Member

group_idx, _ = pd.factorize(data.index)

with pm.Model() as model:
  group_map = pm.Data('group_idx', group_idx, mutable=True)
  # non-centered intercept prior
  mu = intercept[group_map]
  # likelihood, etc.

The user's expectation is that the index map group_idx will need to change to match new data, but this will in turn render the entire prior hierarchy over intercept volatile.

This is the exact use case that we had in mind while thinking about volatility. The intercept random variable won’t be volatile because it doesn’t depend on the mutable data. Instead the deterministic mu variable will be volatile and its value will be recomputed using the group_map.

The case that we didn’t consider was when the dimensionality of the intercept random variable was determined by a shared variable. Just to make this point clear, the design matrix approach would also suffer there because you would have a design matrix multiplied by a variable sized vector.

As pointed out by @ricardoV94, the hardest thing to automate would be to determine which entries of the intercept should be taken from the prior and which from the posterior. Although difficult, aesara rewrites do enable us to do this, but I feel that it’s safer to begin only with a smarter default volatility for shared variables.

@lucianopaz
Copy link
Member

To handle the mixed in and out of sample indexes, we would mainly have to rewrite the intercept_rv. The super simplified and pseudocodish way of doing this would look something like this

out_of_sample_rv = intercept.clone(). # with the correct resizing and matching inputs 
new_intercept = at.zeros(the_size_and_dtype)
new_intercept = at.set_subtensor(new_intercept[in_sample_idx], intercept)
new_intercept = at.set_subtensor(new_intercept[out_of_sample_idx], out_of_sample_rv)

@jessegrabowski
Copy link
Member

Well I went and tested it carefully, and @ricardoV94 and @lucianopaz, you're totally right. I will slink off back to my hole of shame. I now agree that this is a problem of communicating best practices, i,.e. making clear distinctions between fixed hierarchical maps, and flexible data maps, rather than an actual bug. Since the case of "mapping indices to deterministic variables" was clearly covered, I'm inclined to agree with @ricardoV94 that I can't think of a time when it's reasonable to expect the parameter shapes to change. Perhaps this is just a lack of imagination. But let it be known that my comments were based on a fundamental misunderstanding of the situation.

I feel bad for spewing a wall of text at that poor guy on the discourse rather than just telling him he needs to fix a couple variables.

The only other thing I'd say is that maybe there should be a warning raised by pm.sample_posterior_predictive if there are SharedVariables on any distribution sizes that lets the user know this can result in unexpected outcomes.

@lucianopaz
Copy link
Member

No shame at all @jessegrabowski! I think that the whole point of open source development is to have engaged people, like yourself, speak up and iterate as a community. When I started to contribute to pymc3, I was quite clueless and made some horrendous pieces of work, like #2991. Thanks to the team and the community I learnt a lot and kept trying to do stuff and improve. I really don’t want you to lose confidence because of a slight misunderstanding.
About warnings and logs, @ricardoV94 recently merged a pr to log the names of the variables that are being resampled. Do you think that’s good enough?

@jessegrabowski
Copy link
Member

Yeah I'd say so. It's come up enough lately that something should be explicitly stated when you sample the ppc. I think a little communication can go a long way here.

@ricardoV94
Copy link
Member

ricardoV94 commented Sep 26, 2022

This is the PR @lucianopaz mentioned about making posterior predictive more transparent: #6142

About the feelings of shame, I heard the only way to get rid of them is to open a PR that closes an open issue. Maybe two :P

But honestly, misunderstandings tell us something very useful about how people are trying to use and misuse our tools so that we can try to make it better/more clear. Posterior predictive is clearly an area we have to document better (see this recent issue for example pymc-devs/pymc-examples#418)

@twiecki
Copy link
Member

twiecki commented Oct 11, 2022 via email

@jessegrabowski
Copy link
Member

What is the consensus on warnings/messages? It seems like there was at least some move to reduce their number from v3 to v4 (I'm thinking about how a warning is no longer given when r_hat are large after sampling, for example).

Would a warning on pm.sample_posterior_predictive that says something like "The shapes of the following random variables have changed, and will be sampled using their prior distributions: ..." be too "spammy" under the prevailing consensus on warnings?

@lucianopaz
Copy link
Member

Would a warning on pm.sample_posterior_predictive that says something like "The shapes of the following random variables have changed, and will be sampled using their prior distributions: ..." be too "spammy" under the prevailing consensus on warnings?

Do you think that the information that is logged after this pr isn’t enough?

@ricardoV94
Copy link
Member

ricardoV94 commented Oct 11, 2022

and will be sampled using their prior distributions

That's not entirely correct. You may be resampling intermediate variables, still conditioned on the values of inferred hyperparameters.

Anyway, I think that logging the variables names as we do now removes any surprises. I don't think we need to tell users in words why those are being resampled.

@jessegrabowski
Copy link
Member

I haven't experienced the new changes after #6142, but it seems likes everyone is happy about where things are at now. So the direct answer to @twiecki is "No" : )

@lucianopaz
Copy link
Member

Actually, I think that it's important for you, @jessegrabowski, to try the new logging. Ricardo and I were the ones that discussed the changes in #6142, so our opinion is quite biased. Let us know if you think that those logs are enough or if you feel that they can be improved.

@bersavosh
Copy link

Hi all, my apologies for the naive question - I don't think I understand why posterior predictive would ever sample from priors. I checked some of the referenced pulls and discussions, but I don't think I follow why. I.e., in what circumstance would that be valid and how is the user informed/warned about this?

E.g., here's my example (which is basically similar to the toy examples discussed above) where I was expecting the posterior predictive to be in line with posterior rather than prior. I'd appreciate any clarification/help.

Thanks!

(PyMC version 5.5.0)

Data

data

Model

def line_model(x, y, dy):
    with pm.Model() as model:
        data_x = pm.ConstantData(name='data x',value=x)
        data_y = pm.ConstantData(name='data y',value=y)
        data_dy = pm.ConstantData(name='data dy',value=dy)
        m = pm.Uniform('m',-2,4)
        d = pm.Uniform('d',-10,10)
        resp_y = pm.Normal('Y', mu=m*data_x+d, sigma=data_dy, observed=data_y)
    return model


applied_model = line_model(x,y,dy)

Sampling

with applied_model:
    ## Prior
    prior = pm.sample_prior_predictive(1000,var_names=['m','d','Y'])
    applied_model_idata = prior
    ## NUTS
    hmc = pm.sample(tune=500,draws=1000, cores=4, chains=4)
    applied_model_idata.extend(hmc)
    ## Posterior
    post = pm.sample_posterior_predictive(applied_model_idata,var_names=['m','d','Y'])
    applied_model_idata.extend(post)

distribution for variable $m$

fig = plt.figure(figsize=(6,4))
ax = fig.add_subplot(1,1,1)
ax.hist(applied_model_idata.prior['m'][0],label='prior',histtype='step',lw=2,bins=30);
ax.hist(applied_model_idata.posterior['m'][0],label='posterior',histtype='step',lw=2);
ax.hist(applied_model_idata.posterior_predictive['m'][0],label='posterior predictive',histtype='step',lw=2,bins=30);
ax.set_xlabel('m')
ax.set_ylabel('#samples/bin')
ax.legend()

m_dist

@ricardoV94
Copy link
Member

ricardoV94 commented Jul 4, 2023

Hi all, my apologies for the naive question - I don't think I understand why posterior predictive would ever sample from priors. I checked some of the referenced pulls and discussions, but I don't think I follow why.

@bersavosh the question is not naive at all. This is something we are trying to convey more clearly, as it's a source of confusion even among experienced developers.

var_names selects all the variables that should be sampled (or resampled). These variables are always sampled from the forward random function, conditioned on any variables that are in the trace and not themselves being sampled. This corresponds to the prior when you are sampling root variables (without any other RV inputs), and it's equivalent to what you would get when you sample a new variable that is not in the trace (either because you have an expanded model, or because you manually deleted them from the trace).

The subtle point is that this is the only thing you could do when you are asking to sample variables using the forward function. Otherwise your trace already includes your posterior draws, and the only other thing sample_posterior_predictive could do for those variables would be to copy them, which is not a very useful thing?

I.e., in what circumstance would that be valid and how is the user informed/warned about this?

This is arguably not useful for root variables, but it makes sense for intermediate variables. For instance selecting the group level variables to be resampled is a valid way of asking what the model predicts new group elements to look like.

The user is informed about this from the logging message, which says something like Sampling: [var1, var2, ...]. All the variables in this list are being sampled via the forward function, conditioned on any other ancestors that are in the trace. There have been some issues for VSCode users, where this logging message is annoyingly suppressed.


sample_prior_predictive is the limit case of sample_posterior_predictive when you have no variables in the trace or selected all variables in var_names. That's why sometimes posterior_predictive starts to behave like prior_predictive. The underlying machinery is exactly the same.

@ricardoV94
Copy link
Member

Opened an issue do develop more comprehensive documentation: #6812

@bersavosh
Copy link

@ricardoV94 Thank you for the explanation, this is very helpful. I keep an eye on the issue you just opened. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
9 participants