Variables in logp and dlogp functions used in `find_MAP` are not always aligned #5923

OleBialas · 2022-06-23T17:23:55Z

In this reproducible example, sampling from the model gives the correct
parameter estimates while pymc.find_MAP() does not.
(I just installed the latest pymc version using conda in a new environment on Ubuntu)

I am modeling weight as a function of height, here is some simulated data:

import numpy as np
from scipy import stats
import pymc
n_individuals = 1000
alpha = 50
beta = 0.3
sigma = 5
height = np.random.uniform(130, 170, n_individuals)
mu = alpha + beta * (height-height.mean())
weight = stats.norm.rvs(loc=mu, scale=sigma, size=n_individuals)

When the prior for alpha is centered on the true value of alpha,
find_MAP() correctly estimates the values for alpha, beta and sigma:

with pymc.Model() as model:
    alpha = pymc.Normal('alpha', mu=50, sigma=10)
    beta = pymc.Lognormal('beta', mu=0, sigma=1)
    sigma = pymc.Uniform('sigma', lower=0, upper=10)
    mu = alpha + beta * (height-height.mean())
    weight = pymc.Normal('height', mu=mu, sigma=sigma, observed=weight)
    mean_q = pymc.find_MAP()
print(mean_q)

However, when the true value of alpha differs from the prior mean,
beta and sigma are estimated to be 0:

import numpy as np
from scipy import stats
import pymc
n_individuals = 1000
alpha = 30
beta = 0.3
sigma = 5
height = np.random.uniform(130, 170, n_individuals)
mu = alpha + beta * (height-height.mean())
weight = stats.norm.rvs(loc=mu, scale=sigma, size=n_individuals)
with pymc.Model() as model:
    alpha = pymc.Normal('alpha', mu=50, sigma=10)
    beta = pymc.Lognormal('beta', mu=0, sigma=1)
    sigma = pymc.Uniform('sigma', lower=0, upper=10)
    mu = alpha + beta * (height-height.mean())
    weight = pymc.Normal('height', mu=mu, sigma=sigma, observed=weight)
    mean_q = pymc.find_MAP()
print(mean_q)

Sampling from the model still returns the correct values for all three parameters:

with model:
    trace = pymc.sample(1000, tune=1000)
pymc.summary(trace)

So I am wondering if this is a bug or if find_MAP() is supposed to fail in the second case.
If latter, I would have expected some error or warning being put out by the function.

The text was updated successfully, but these errors were encountered:

ricardoV94 · 2022-06-23T19:21:51Z

Can you add the print results?

quantheory · 2022-06-23T19:43:29Z

I just tried an upgrade to v4 yesterday, and a lot of the diagnostics for my project use find_MAP. I noticed that find_MAP was considerably less stable in v4, and when it wasn't outright unstable, it tended to produce output where most or all parameters were pretty close to the start value (and often not very near the optimum found using MCMC). But I didn't have an easily reproducible example, so I'm glad that this issue was just posted. When I try the following:

import numpy as np
from scipy import stats
import pymc as pm
n_individuals = 1000
alpha = 30
beta = 0.3
sigma = 5
height = np.random.uniform(130, 170, n_individuals)
mu = alpha + beta * (height-height.mean())
weight = stats.norm.rvs(loc=mu, scale=sigma, size=n_individuals)
with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=50, sigma=10)
    beta = pm.Lognormal('beta', mu=0, sigma=1)
    sigma = pm.Uniform('sigma', lower=0, upper=10)
    mu = alpha + beta * (height-height.mean())
    weight = pm.Normal('height', mu=mu, sigma=sigma, observed=weight)
    mean_q = pm.find_MAP()
print(mean_q)

I get the output:

{'alpha': array(1.0025977e+14), 'beta_log__': array(-2.51355127e+13), 'sigma_interval__': array(-8.65309668e+12), 'beta': array(0.), 'sigma': array(0.)}

Using PyMC3 instead works well, producing this output:

{'alpha': array(29.88057992), 'beta_log__': array(-1.14830438), 'sigma_interval__': array(0.0518484), 'beta': array(0.31717412), 'sigma': array(5.12959197)}

The instability can be avoided in v4 by specifying a closer starting position like so:

    mean_q = pm.find_MAP(start={
        alpha: 25.,
        beta: 1.,
        sigma: 5.,
    })

But this produces

{'alpha': array(26.18118026), 'beta_log__': array(-1.23207473), 'sigma_interval__': array(0.71096296), 'beta': array(0.29168678), 'sigma': array(6.70613904)}

so alpha and sigma are still quite a ways from the actual optimal values. Sampling works fine though:

n_individuals = 1000
alpha = 30
beta = 0.3
sigma = 5
height = np.random.uniform(130, 170, n_individuals)
mu = alpha + beta * (height-height.mean())
weight = stats.norm.rvs(loc=mu, scale=sigma, size=n_individuals)
with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=50, sigma=10)
    beta = pm.Lognormal('beta', mu=0, sigma=1)
    sigma = pm.Uniform('sigma', lower=0, upper=10)
    mu = alpha + beta * (height-height.mean())
    weight = pm.Normal('height', mu=mu, sigma=sigma, observed=weight)
    trace = pm.sample(1000, tune=1000)
pm.summary(trace)

produces:

  | mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
30.164 | 0.166 | 29.845 | 30.470 | 0.002 | 0.002 | 5835.0 | 3067.0 | 1.0
0.292 | 0.014 | 0.266 | 0.318 | 0.000 | 0.000 | 5125.0 | 2800.0 | 1.0
5.072 | 0.116 | 4.867 | 5.296 | 0.002 | 0.001 | 4912.0 | 2886.0 | 1.0

quantheory · 2022-06-23T19:57:53Z

I also suspect that this is the same issue as #5848.

aseyboldt · 2022-06-23T20:21:33Z

Hm, if we remove the transformed variables, and run the optimizer manually, I get good results, but find_MAP still returns nonsense:

import numpy as np
from scipy import stats
import pymc
n_individuals = 1000
alpha = 30
beta = 0.3
sigma = 5
height = np.random.uniform(130, 170, n_individuals)
mu = alpha + beta * (height-height.mean())
weight = stats.norm.rvs(loc=mu, scale=sigma, size=n_individuals)
with pymc.Model() as model:
    alpha = pymc.Normal('alpha', mu=50, sigma=10)
    #beta = pymc.Lognormal('beta', mu=0, sigma=1)
    log_beta = pymc.Normal('log_beta', mu=0, sigma=1)
    beta = at.exp(log_beta)
    #sigma = pymc.Uniform('sigma', lower=0, upper=10)
    
    raw = pymc.Normal("raw", sigma=10)
    sigma = (at.tanh(raw) + 1) / 2 * 10
    mu = alpha + beta * (height-height.mean())
    weight = pymc.Normal('height', mu=mu, sigma=sigma, observed=weight)
    mean_q = pymc.find_MAP()
print(mean_q)

from scipy import optimize

func = model.compile_logp(jacobian=False)
dfunc = model.compile_dlogp(jacobian=False)
start = {'alpha': np.array(50.), 'log_beta': np.array(0.), 'raw': np.array(0.)}

def to_array(point):
    return np.stack(list(point.values()))

def to_dict(x):
    return {name: val for name, val in zip(start, x)}

optimize.minimize(lambda x: -func(to_dict(x)), to_array(start), jac=lambda x: -dfunc(to_dict(x)))

gives

      fun: 3076.9491119974145
 hess_inv: array([[ 2.71151116e-02, -1.65447210e-04,  1.57946843e-06],
       [-1.65447210e-04,  2.10950069e-03,  2.16615266e-05],
       [ 1.57946843e-06,  2.16615266e-05,  5.40777409e-04]])
      jac: array([ 1.05262719e-06, -7.48754670e-06, -5.38448635e-06])
  message: 'Optimization terminated successfully.'
     nfev: 23
      nit: 17
     njev: 23
   status: 0
  success: True
        x: array([30.11441656, -1.1667908 ,  0.03932564])

quantheory · 2022-06-23T23:02:35Z

I played around a bit with rearranging the order that variables are declared in, which does change the output of find_MAP. But then I found that if you just specify vars in the right way when calling find_MAP, you can sometimes get good behavior. Specifically, if you re-run the following several times, it will often give a good answer (although it usually produces garbage), but it consistently fails if you change the order of variables in the vars argument.

import numpy as np
from scipy import stats
import pymc
import aesara.tensor as at
n_individuals = 1000
alpha = 30
beta = 0.3
sigma = 5
height = np.random.uniform(130, 170, n_individuals)
mu = alpha + beta * (height-height.mean())
weight = stats.norm.rvs(loc=mu, scale=sigma, size=n_individuals)
with pymc.Model() as model:
    #beta = pymc.Lognormal('beta', mu=0, sigma=1)
    #sigma = pymc.Uniform('sigma', lower=0, upper=10)
    alpha = pymc.Normal('alpha', mu=50, sigma=10)
    log_beta = pymc.Normal('log_beta', mu=0, sigma=1)
    beta = at.exp(log_beta)
    raw = pymc.Normal("raw", sigma=10)
    sigma = (at.tanh(raw) + 1) / 2 * 10
    mu = alpha + beta * (height-height.mean())
    weight = pymc.Normal('height', mu=mu, sigma=sigma, observed=weight)
    mean_q = pymc.find_MAP(vars=[raw, log_beta, alpha])
print(mean_q)

I suspect that the problem has to do with these lines in find_MAP, and possibly also the use of DictToArrayBijection here.

Although Python3 dictionaries are technically ordered now, I don't think that the use of DictToArrayBijection is a good idea, because it's not clear (at least to me) that the way that various dictionaries are constructed will lead them to place the same variables in the same order, or even that a given dictionary will use the same order when the same code is run twice. This may be why the code above behaves stochastically.

In addition, the order of the variables in the output of the gradient function compiled by compile_dlogp is determined by the vars variable. This order will not in general match the order in any of the DictToArrayBijection calls. And as noted above, the ordering of the elements of vars can be overridden by the user, which is also a problem; if the order of this array determines the order of derivatives in the output of the compiled gradient function, then any user-specified value should be sorted before use.

quantheory · 2022-06-24T02:51:13Z

OK, I've figured this out. First off, the stochastic behavior was my own fault, and not a bug. I was accidentally resampling height and weight when I was rerunning the model sometimes. When I don't do that, the results are always reproducible.

After looking closely, I take back what I said about the DictToArrayBijection being a problem itself, but I was right that the order of vars is a problem. The bug can be fixed by replacing this line with:

    vars_dict = {var.name: var for var in vars}
    rvs = [model.values_to_rvs[vars_dict[name]]
           for name, _, _ in x0.point_map_info]

This fixes the primary problem, which is that the gradient information being passed to minimize was not in the correct order. This is the regression from v3 to v4.

The second problem is that for some data sets, find_MAP just doesn't work very well unless it's initialized very close to the optimal point, and this has kind of always been true. At least for the data here, performance is far more consistent if the nan_to_high and nan_to_num functions are removed from CostFuncWrapper.__call__. Although this might not have been true in the past, scipy.optimize.minimize with method='L-BFGS-B' seems to be handling non-finite values just fine, so these functions actually appear to be actively harmful. In particular, using nan_to_num on the gradient seems ill-advised, since this converts NaN values to 0, and potentially could convince the optimizer that it has found a local extremum in regions where logp is actually behaving pathologically.

ricardoV94 · 2022-06-24T06:23:09Z

Nice find @quantheory! However, it seems that function will now compute the gradient wrt to variables we are not sampling, as I have the impression that x0.point_map_info will contain all model variables, not just the ones in vars.

I have no opinion about the nan_to_num and nan_to_high, if it's hurting with the default optimizer we can consider removing it.

quantheory · 2022-06-24T06:40:59Z

@ricardoV94

Thanks!

I believe that x0.point_map_info should have the appropriate contents due to these lines just above my suggested modification:

    var_names = {var.name for var in vars} 
    x0 = DictToArrayBijection.map( 
        {var_name: value for var_name, value in start.items() if var_name in var_names} 
    )

Because var_names is set this way, the variables in x0.point_map_info must be a subset of the variables in vars.

In any case, it is a requirement of minimize (I think) that the Jacobian is a square matrix. So the inputs and outputs of the dlogp function passed to the optimizer have to match, meaning that the dimension of x0 must match the dimension of the gradient function.

ricardoV94 · 2022-06-24T06:53:55Z

I know they will match, my concern was that under the hood we would be computing the gradient wrt to the constant values we are not sampling (but which are passed from the start dict, whenever we compute the logp and dlogp). If that's not the case, then you seem to have solved it! Would you like to open a PR?

We could actually compile a logp/dlogp where we set those to constants, but that's another matter.

quantheory · 2022-06-24T07:27:49Z

Oh! I understand your concern. My modification should not introduce any unnecessary dimensions to the gradient, regardless of the start value, unless I have wildly misunderstood something here.

I will try to put together a PR tomorrow, but I am on a series of tight deadlines for the next month or so (ending my current project), meaning that I can't guarantee a fast response to feedback. I'd prefer it if someone more experienced can shepard this through, but if not, I'll do what I can.

aseyboldt · 2022-06-24T07:32:25Z

Shouldn't we use model.logp_dlogp_function here anyway? We also don't want to compile two different functions for the logp and dlogp anyway, if we do, we recompute lots of things in the dlogp code that we already computed in logp.
Somehow the interface for logp_dlogp_function changed in v4 (I'd say broke, but maybe someone disagrees with this :-) )

This used to work:

func = pymc_model.logp_dlogp_function()

func.size

func.set_extra_values({})

x0 = func.array_to_dict(pymc_model.initial_point())
func(x0)

ricardoV94 · 2022-06-24T07:50:44Z

@aseyboldt I think we followed a simple refactoring strategy, as find_MAP was not using that function either in V3:

pymc/pymc3/tuning/starting.py

Line 45 in ed74406

def find_MAP(

We should definitely avoid computing the two outputs separately, either via the logp_dlop_function or a more traditional Aesara function. That can be done separately from the more urgent bugfix though.

Regarding the API change of the logp_dlogp_function, feel free to open a PR to fix whatever unnecessary pain points you think were introduced since V3. I don't think special care was given to that functionality, as it was always a more specialized (and perhaps internal facing) utility compared to the remaining model (d)logp methods...

ricardoV94 · 2022-06-24T07:52:23Z

@quantheory no worries about speed. If you open a PR with the option for devs to modify it, we can also help if you find yourself out of time!

…c-devs#5923)

danhphan · 2022-06-26T00:05:39Z

Hi, that's great to know. Thanks @quantheory :)

quantheory mentioned this issue Jun 23, 2022

MarginalApprox doesn't allow non-constant covariance parameters or inducing point locations in v4 #5922

Closed

quantheory mentioned this issue Jun 23, 2022

GP Mauna Loa pymc-devs/pymc-examples#73

Open

ricardoV94 added this to the v4.0.2 milestone Jun 24, 2022

ricardoV94 added the bug label Jun 24, 2022

ricardoV94 changed the title ~~unexpected behavior of find_MAP~~ Order of variables in logp and dlogp is not always aligned inside find_MAP Jun 25, 2022

ricardoV94 changed the title ~~Order of variables in logp and dlogp is not always aligned inside find_MAP~~ Variables in logp and dlogp functions used in find_MAP are not always aligned Jun 25, 2022

ricardoV94 changed the title ~~Variables in logp and dlogp functions used in find_MAP are not always aligned~~ Variables in logp and dlogp functions used in find_MAP are not always aligned Jun 25, 2022

ricardoV94 mentioned this issue Jun 25, 2022

Gaussian Process inconsistent results with find_MAP #5848

Open

quantheory mentioned this issue Jun 25, 2022

Correct the order of rvs sent to compile_dlogp in find_MAP #5928

Merged

quantheory added a commit to quantheory/pymc that referenced this issue Jun 25, 2022

Correct the order of rvs sent to compile_dlogp in find_MAP (pym…

4970a76

…c-devs#5923)

ricardoV94 closed this as completed in #5928 Jun 26, 2022

ricardoV94 pushed a commit that referenced this issue Jun 26, 2022

Correct the order of rvs sent to compile_dlogp in find_MAP (#5923)

be048a4

michaelosthege modified the milestones: v4.0.2, v4.1.0 Jul 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variables in logp and dlogp functions used in `find_MAP` are not always aligned #5923

Variables in logp and dlogp functions used in `find_MAP` are not always aligned #5923

OleBialas commented Jun 23, 2022

ricardoV94 commented Jun 23, 2022

quantheory commented Jun 23, 2022

quantheory commented Jun 23, 2022

aseyboldt commented Jun 23, 2022

quantheory commented Jun 23, 2022

quantheory commented Jun 24, 2022 •

edited

Loading

ricardoV94 commented Jun 24, 2022

quantheory commented Jun 24, 2022 •

edited

Loading

ricardoV94 commented Jun 24, 2022 •

edited

Loading

quantheory commented Jun 24, 2022

aseyboldt commented Jun 24, 2022

ricardoV94 commented Jun 24, 2022

ricardoV94 commented Jun 24, 2022

danhphan commented Jun 26, 2022

Variables in logp and dlogp functions used in find_MAP are not always aligned #5923

Variables in logp and dlogp functions used in find_MAP are not always aligned #5923

Comments

OleBialas commented Jun 23, 2022

ricardoV94 commented Jun 23, 2022

quantheory commented Jun 23, 2022

quantheory commented Jun 23, 2022

aseyboldt commented Jun 23, 2022

quantheory commented Jun 23, 2022

quantheory commented Jun 24, 2022 • edited Loading

ricardoV94 commented Jun 24, 2022

quantheory commented Jun 24, 2022 • edited Loading

ricardoV94 commented Jun 24, 2022 • edited Loading

quantheory commented Jun 24, 2022

aseyboldt commented Jun 24, 2022

ricardoV94 commented Jun 24, 2022

ricardoV94 commented Jun 24, 2022

danhphan commented Jun 26, 2022

Variables in logp and dlogp functions used in `find_MAP` are not always aligned #5923

Variables in logp and dlogp functions used in `find_MAP` are not always aligned #5923

quantheory commented Jun 24, 2022 •

edited

Loading

quantheory commented Jun 24, 2022 •

edited

Loading

ricardoV94 commented Jun 24, 2022 •

edited

Loading