Skip to content

Variables in logp and dlogp functions used in find_MAP are not always aligned #5923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
OleBialas opened this issue Jun 23, 2022 · 14 comments · Fixed by #5928
Closed

Variables in logp and dlogp functions used in find_MAP are not always aligned #5923

OleBialas opened this issue Jun 23, 2022 · 14 comments · Fixed by #5928
Labels
Milestone

Comments

@OleBialas
Copy link

In this reproducible example, sampling from the model gives the correct
parameter estimates while pymc.find_MAP() does not.
(I just installed the latest pymc version using conda in a new environment on Ubuntu)

I am modeling weight as a function of height, here is some simulated data:

import numpy as np
from scipy import stats
import pymc
n_individuals = 1000
alpha = 50
beta = 0.3
sigma = 5
height = np.random.uniform(130, 170, n_individuals)
mu = alpha + beta * (height-height.mean())
weight = stats.norm.rvs(loc=mu, scale=sigma, size=n_individuals)

When the prior for alpha is centered on the true value of alpha,
find_MAP() correctly estimates the values for alpha, beta and sigma:

with pymc.Model() as model:
    alpha = pymc.Normal('alpha', mu=50, sigma=10)
    beta = pymc.Lognormal('beta', mu=0, sigma=1)
    sigma = pymc.Uniform('sigma', lower=0, upper=10)
    mu = alpha + beta * (height-height.mean())
    weight = pymc.Normal('height', mu=mu, sigma=sigma, observed=weight)
    mean_q = pymc.find_MAP()
print(mean_q)

However, when the true value of alpha differs from the prior mean,
beta and sigma are estimated to be 0:

import numpy as np
from scipy import stats
import pymc
n_individuals = 1000
alpha = 30
beta = 0.3
sigma = 5
height = np.random.uniform(130, 170, n_individuals)
mu = alpha + beta * (height-height.mean())
weight = stats.norm.rvs(loc=mu, scale=sigma, size=n_individuals)
with pymc.Model() as model:
    alpha = pymc.Normal('alpha', mu=50, sigma=10)
    beta = pymc.Lognormal('beta', mu=0, sigma=1)
    sigma = pymc.Uniform('sigma', lower=0, upper=10)
    mu = alpha + beta * (height-height.mean())
    weight = pymc.Normal('height', mu=mu, sigma=sigma, observed=weight)
    mean_q = pymc.find_MAP()
print(mean_q)

Sampling from the model still returns the correct values for all three parameters:

with model:
    trace = pymc.sample(1000, tune=1000)
pymc.summary(trace)

So I am wondering if this is a bug or if find_MAP() is supposed to fail in the second case.
If latter, I would have expected some error or warning being put out by the function.

@ricardoV94
Copy link
Member

Can you add the print results?

@quantheory
Copy link
Contributor

I just tried an upgrade to v4 yesterday, and a lot of the diagnostics for my project use find_MAP. I noticed that find_MAP was considerably less stable in v4, and when it wasn't outright unstable, it tended to produce output where most or all parameters were pretty close to the start value (and often not very near the optimum found using MCMC). But I didn't have an easily reproducible example, so I'm glad that this issue was just posted. When I try the following:

import numpy as np
from scipy import stats
import pymc as pm
n_individuals = 1000
alpha = 30
beta = 0.3
sigma = 5
height = np.random.uniform(130, 170, n_individuals)
mu = alpha + beta * (height-height.mean())
weight = stats.norm.rvs(loc=mu, scale=sigma, size=n_individuals)
with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=50, sigma=10)
    beta = pm.Lognormal('beta', mu=0, sigma=1)
    sigma = pm.Uniform('sigma', lower=0, upper=10)
    mu = alpha + beta * (height-height.mean())
    weight = pm.Normal('height', mu=mu, sigma=sigma, observed=weight)
    mean_q = pm.find_MAP()
print(mean_q)

I get the output:

{'alpha': array(1.0025977e+14), 'beta_log__': array(-2.51355127e+13), 'sigma_interval__': array(-8.65309668e+12), 'beta': array(0.), 'sigma': array(0.)}

Using PyMC3 instead works well, producing this output:

{'alpha': array(29.88057992), 'beta_log__': array(-1.14830438), 'sigma_interval__': array(0.0518484), 'beta': array(0.31717412), 'sigma': array(5.12959197)}

The instability can be avoided in v4 by specifying a closer starting position like so:

    mean_q = pm.find_MAP(start={
        alpha: 25.,
        beta: 1.,
        sigma: 5.,
    })

But this produces

{'alpha': array(26.18118026), 'beta_log__': array(-1.23207473), 'sigma_interval__': array(0.71096296), 'beta': array(0.29168678), 'sigma': array(6.70613904)}

so alpha and sigma are still quite a ways from the actual optimal values. Sampling works fine though:

n_individuals = 1000
alpha = 30
beta = 0.3
sigma = 5
height = np.random.uniform(130, 170, n_individuals)
mu = alpha + beta * (height-height.mean())
weight = stats.norm.rvs(loc=mu, scale=sigma, size=n_individuals)
with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=50, sigma=10)
    beta = pm.Lognormal('beta', mu=0, sigma=1)
    sigma = pm.Uniform('sigma', lower=0, upper=10)
    mu = alpha + beta * (height-height.mean())
    weight = pm.Normal('height', mu=mu, sigma=sigma, observed=weight)
    trace = pm.sample(1000, tune=1000)
pm.summary(trace)

produces:

  | mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
30.164 | 0.166 | 29.845 | 30.470 | 0.002 | 0.002 | 5835.0 | 3067.0 | 1.0
0.292 | 0.014 | 0.266 | 0.318 | 0.000 | 0.000 | 5125.0 | 2800.0 | 1.0
5.072 | 0.116 | 4.867 | 5.296 | 0.002 | 0.001 | 4912.0 | 2886.0 | 1.0

@quantheory
Copy link
Contributor

I also suspect that this is the same issue as #5848.

@aseyboldt
Copy link
Member

Hm, if we remove the transformed variables, and run the optimizer manually, I get good results, but find_MAP still returns nonsense:

import numpy as np
from scipy import stats
import pymc
n_individuals = 1000
alpha = 30
beta = 0.3
sigma = 5
height = np.random.uniform(130, 170, n_individuals)
mu = alpha + beta * (height-height.mean())
weight = stats.norm.rvs(loc=mu, scale=sigma, size=n_individuals)
with pymc.Model() as model:
    alpha = pymc.Normal('alpha', mu=50, sigma=10)
    #beta = pymc.Lognormal('beta', mu=0, sigma=1)
    log_beta = pymc.Normal('log_beta', mu=0, sigma=1)
    beta = at.exp(log_beta)
    #sigma = pymc.Uniform('sigma', lower=0, upper=10)
    
    raw = pymc.Normal("raw", sigma=10)
    sigma = (at.tanh(raw) + 1) / 2 * 10
    mu = alpha + beta * (height-height.mean())
    weight = pymc.Normal('height', mu=mu, sigma=sigma, observed=weight)
    mean_q = pymc.find_MAP()
print(mean_q)

from scipy import optimize

func = model.compile_logp(jacobian=False)
dfunc = model.compile_dlogp(jacobian=False)
start = {'alpha': np.array(50.), 'log_beta': np.array(0.), 'raw': np.array(0.)}

def to_array(point):
    return np.stack(list(point.values()))

def to_dict(x):
    return {name: val for name, val in zip(start, x)}

optimize.minimize(lambda x: -func(to_dict(x)), to_array(start), jac=lambda x: -dfunc(to_dict(x)))

gives

      fun: 3076.9491119974145
 hess_inv: array([[ 2.71151116e-02, -1.65447210e-04,  1.57946843e-06],
       [-1.65447210e-04,  2.10950069e-03,  2.16615266e-05],
       [ 1.57946843e-06,  2.16615266e-05,  5.40777409e-04]])
      jac: array([ 1.05262719e-06, -7.48754670e-06, -5.38448635e-06])
  message: 'Optimization terminated successfully.'
     nfev: 23
      nit: 17
     njev: 23
   status: 0
  success: True
        x: array([30.11441656, -1.1667908 ,  0.03932564])

@quantheory
Copy link
Contributor

I played around a bit with rearranging the order that variables are declared in, which does change the output of find_MAP. But then I found that if you just specify vars in the right way when calling find_MAP, you can sometimes get good behavior. Specifically, if you re-run the following several times, it will often give a good answer (although it usually produces garbage), but it consistently fails if you change the order of variables in the vars argument.

import numpy as np
from scipy import stats
import pymc
import aesara.tensor as at
n_individuals = 1000
alpha = 30
beta = 0.3
sigma = 5
height = np.random.uniform(130, 170, n_individuals)
mu = alpha + beta * (height-height.mean())
weight = stats.norm.rvs(loc=mu, scale=sigma, size=n_individuals)
with pymc.Model() as model:
    #beta = pymc.Lognormal('beta', mu=0, sigma=1)
    #sigma = pymc.Uniform('sigma', lower=0, upper=10)
    alpha = pymc.Normal('alpha', mu=50, sigma=10)
    log_beta = pymc.Normal('log_beta', mu=0, sigma=1)
    beta = at.exp(log_beta)
    raw = pymc.Normal("raw", sigma=10)
    sigma = (at.tanh(raw) + 1) / 2 * 10
    mu = alpha + beta * (height-height.mean())
    weight = pymc.Normal('height', mu=mu, sigma=sigma, observed=weight)
    mean_q = pymc.find_MAP(vars=[raw, log_beta, alpha])
print(mean_q)

I suspect that the problem has to do with these lines in find_MAP, and possibly also the use of DictToArrayBijection here.

Although Python3 dictionaries are technically ordered now, I don't think that the use of DictToArrayBijection is a good idea, because it's not clear (at least to me) that the way that various dictionaries are constructed will lead them to place the same variables in the same order, or even that a given dictionary will use the same order when the same code is run twice. This may be why the code above behaves stochastically.

In addition, the order of the variables in the output of the gradient function compiled by compile_dlogp is determined by the vars variable. This order will not in general match the order in any of the DictToArrayBijection calls. And as noted above, the ordering of the elements of vars can be overridden by the user, which is also a problem; if the order of this array determines the order of derivatives in the output of the compiled gradient function, then any user-specified value should be sorted before use.

@quantheory
Copy link
Contributor

quantheory commented Jun 24, 2022

OK, I've figured this out. First off, the stochastic behavior was my own fault, and not a bug. I was accidentally resampling height and weight when I was rerunning the model sometimes. When I don't do that, the results are always reproducible.

After looking closely, I take back what I said about the DictToArrayBijection being a problem itself, but I was right that the order of vars is a problem. The bug can be fixed by replacing this line with:

    vars_dict = {var.name: var for var in vars}
    rvs = [model.values_to_rvs[vars_dict[name]]
           for name, _, _ in x0.point_map_info]

This fixes the primary problem, which is that the gradient information being passed to minimize was not in the correct order. This is the regression from v3 to v4.

The second problem is that for some data sets, find_MAP just doesn't work very well unless it's initialized very close to the optimal point, and this has kind of always been true. At least for the data here, performance is far more consistent if the nan_to_high and nan_to_num functions are removed from CostFuncWrapper.__call__. Although this might not have been true in the past, scipy.optimize.minimize with method='L-BFGS-B' seems to be handling non-finite values just fine, so these functions actually appear to be actively harmful. In particular, using nan_to_num on the gradient seems ill-advised, since this converts NaN values to 0, and potentially could convince the optimizer that it has found a local extremum in regions where logp is actually behaving pathologically.

@ricardoV94
Copy link
Member

Nice find @quantheory! However, it seems that function will now compute the gradient wrt to variables we are not sampling, as I have the impression that x0.point_map_info will contain all model variables, not just the ones in vars.

I have no opinion about the nan_to_num and nan_to_high, if it's hurting with the default optimizer we can consider removing it.

@quantheory
Copy link
Contributor

quantheory commented Jun 24, 2022

@ricardoV94

Thanks!

I believe that x0.point_map_info should have the appropriate contents due to these lines just above my suggested modification:

    var_names = {var.name for var in vars} 
    x0 = DictToArrayBijection.map( 
        {var_namevalue for var_namevalue in start.items() if var_name in var_names} 
    )

Because var_names is set this way, the variables in x0.point_map_info must be a subset of the variables in vars.

In any case, it is a requirement of minimize (I think) that the Jacobian is a square matrix. So the inputs and outputs of the dlogp function passed to the optimizer have to match, meaning that the dimension of x0 must match the dimension of the gradient function.

@ricardoV94
Copy link
Member

ricardoV94 commented Jun 24, 2022

I know they will match, my concern was that under the hood we would be computing the gradient wrt to the constant values we are not sampling (but which are passed from the start dict, whenever we compute the logp and dlogp). If that's not the case, then you seem to have solved it! Would you like to open a PR?

We could actually compile a logp/dlogp where we set those to constants, but that's another matter.

@quantheory
Copy link
Contributor

Oh! I understand your concern. My modification should not introduce any unnecessary dimensions to the gradient, regardless of the start value, unless I have wildly misunderstood something here.

I will try to put together a PR tomorrow, but I am on a series of tight deadlines for the next month or so (ending my current project), meaning that I can't guarantee a fast response to feedback. I'd prefer it if someone more experienced can shepard this through, but if not, I'll do what I can.

@aseyboldt
Copy link
Member

Shouldn't we use model.logp_dlogp_function here anyway? We also don't want to compile two different functions for the logp and dlogp anyway, if we do, we recompute lots of things in the dlogp code that we already computed in logp.
Somehow the interface for logp_dlogp_function changed in v4 (I'd say broke, but maybe someone disagrees with this :-) )

This used to work:

func = pymc_model.logp_dlogp_function()

func.size

func.set_extra_values({})

x0 = func.array_to_dict(pymc_model.initial_point())
func(x0)

@ricardoV94
Copy link
Member

@aseyboldt I think we followed a simple refactoring strategy, as find_MAP was not using that function either in V3:

def find_MAP(

We should definitely avoid computing the two outputs separately, either via the logp_dlop_function or a more traditional Aesara function. That can be done separately from the more urgent bugfix though.

Regarding the API change of the logp_dlogp_function, feel free to open a PR to fix whatever unnecessary pain points you think were introduced since V3. I don't think special care was given to that functionality, as it was always a more specialized (and perhaps internal facing) utility compared to the remaining model (d)logp methods...

@ricardoV94 ricardoV94 added this to the v4.0.2 milestone Jun 24, 2022
@ricardoV94
Copy link
Member

@quantheory no worries about speed. If you open a PR with the option for devs to modify it, we can also help if you find yourself out of time!

@ricardoV94 ricardoV94 added the bug label Jun 24, 2022
@ricardoV94 ricardoV94 changed the title unexpected behavior of find_MAP Order of variables in logp and dlogp is not always aligned inside find_MAP Jun 25, 2022
@ricardoV94 ricardoV94 changed the title Order of variables in logp and dlogp is not always aligned inside find_MAP Variables in logp and dlogp functions used in find_MAP are not always aligned Jun 25, 2022
@ricardoV94 ricardoV94 changed the title Variables in logp and dlogp functions used in find_MAP are not always aligned Variables in logp and dlogp functions used in find_MAP are not always aligned Jun 25, 2022
quantheory added a commit to quantheory/pymc that referenced this issue Jun 25, 2022
@danhphan
Copy link
Member

Hi, that's great to know. Thanks @quantheory :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants