Skip to content

Update to Turing 0.38 #599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
711 changes: 352 additions & 359 deletions Manifest.toml

Large diffs are not rendered by default.

3 changes: 1 addition & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,8 @@ StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
TuringBenchmarking = "0db1332d-5c25-4deb-809f-459bc696f94f"
UnPack = "3a884ed6-31ef-47d7-9d2a-63182c4928ed"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"

[compat]
Turing = "0.37"
Turing = "0.38"
6 changes: 3 additions & 3 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ website:
text: Team
right:
# Current version
- text: "v0.37"
- text: "v0.38"
menu:
- text: Changelog
href: https://turinglang.org/docs/changelog.html
Expand Down Expand Up @@ -60,7 +60,7 @@ website:
- usage/custom-distribution/index.qmd
- usage/probability-interface/index.qmd
- usage/modifying-logprob/index.qmd
- usage/generated-quantities/index.qmd
- usage/tracking-extra-quantities/index.qmd
- usage/mode-estimation/index.qmd
- usage/performance-tips/index.qmd
- usage/sampler-visualisation/index.qmd
Expand Down Expand Up @@ -190,7 +190,7 @@ using-turing-external-samplers: tutorials/docs-16-using-turing-external-samplers
using-turing-mode-estimation: tutorials/docs-17-mode-estimation
usage-probability-interface: tutorials/usage-probability-interface
usage-custom-distribution: tutorials/usage-custom-distribution
usage-generated-quantities: tutorials/usage-generated-quantities
usage-tracking-extra-quantities: tutorials/tracking-extra-quantities
usage-modifying-logprob: tutorials/usage-modifying-logprob

contributing-guide: developers/contributing
Expand Down
2 changes: 1 addition & 1 deletion tutorials/bayesian-time-series-analysis/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ end

function get_decomposition(model, x, cyclic_features, chain, op)
chain_params = Turing.MCMCChains.get_sections(chain, :parameters)
return generated_quantities(model(x, cyclic_features, op), chain_params)
return returned(model(x, cyclic_features, op), chain_params)
end

function plot_fit(x, y, decomp, ymax)
Expand Down
2 changes: 1 addition & 1 deletion tutorials/gaussian-mixture-models/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -403,7 +403,7 @@ chains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initia
Given a sample from the marginalized posterior, these assignments can be recovered with:

```{julia}
assignments = mean(generated_quantities(gmm_recover(x), chains));
assignments = mean(returned(gmm_recover(x), chains));
```

```{julia}
Expand Down
2 changes: 1 addition & 1 deletion tutorials/gaussian-processes-introduction/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ posterior probability of success at any distance we choose:

```{julia}
d_pred = 1:0.2:21
samples = map(generated_quantities(m_post, chn)[1:10:end]) do x
samples = map(returned(m_post, chn)[1:10:end]) do x
return logistic.(rand(posterior(x.fx, x.f_latent)(d_pred, 1e-4)))
end
p = plot()
Expand Down
4 changes: 2 additions & 2 deletions tutorials/hidden-markov-models/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -123,11 +123,11 @@ The priors on our transition matrix are noninformative, using `T[i] ~ Dirichlet(
end;
```

We will use a combination of two samplers ([HMC](https://turinglang.org/dev/docs/library/#Turing.Inference.HMC) and [Particle Gibbs](https://turinglang.org/dev/docs/library/#Turing.Inference.PG)) by passing them to the [Gibbs](https://turinglang.org/dev/docs/library/#Turing.Inference.Gibbs) sampler. The Gibbs sampler allows for compositional inference, where we can utilize different samplers on different parameters.
We will use a combination of two samplers (HMC and Particle Gibbs) by passing them to the Gibbs sampler. The Gibbs sampler allows for compositional inference, where we can utilize different samplers on different parameters. (For API details of these samplers, please see [Turing.jl's API documentation](https://turinglang.org/Turing.jl/stable/api/Inference/).)

In this case, we use HMC for `m` and `T`, representing the emission and transition matrices respectively. We use the Particle Gibbs sampler for `s`, the state sequence. You may wonder why it is that we are not assigning `s` to the HMC sampler, and why it is that we need compositional Gibbs sampling at all.

The parameter `s` is not a continuous variable. It is a vector of **integers**, and thus Hamiltonian methods like HMC and [NUTS](https://turinglang.org/dev/docs/library/#Turing.Inference.NUTS) won't work correctly. Gibbs allows us to apply the right tools to the best effect. If you are a particularly advanced user interested in higher performance, you may benefit from setting up your Gibbs sampler to use [different automatic differentiation]({{<meta using-turing-autodiff>}}#compositional-sampling-with-differing-ad-modes) backends for each parameter space.
The parameter `s` is not a continuous variable. It is a vector of **integers**, and thus Hamiltonian methods like HMC and NUTS won't work correctly. Gibbs allows us to apply the right tools to the best effect. If you are a particularly advanced user interested in higher performance, you may benefit from setting up your Gibbs sampler to use [different automatic differentiation]({{<meta using-turing-autodiff>}}#compositional-sampling-with-differing-ad-modes) backends for each parameter space.

Time to run our sampler.

Expand Down
25 changes: 18 additions & 7 deletions usage/automatic-differentiation/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ Pkg.instantiate();

## Switching AD Modes

Turing currently supports four automatic differentiation (AD) backends for sampling: [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) for forward-mode AD; and [Mooncake](https://github.com/compintell/Mooncake.jl), [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl), and [Zygote](https://github.com/FluxML/Zygote.jl) for reverse-mode AD.
`ForwardDiff` is automatically imported by Turing. To utilize `Mooncake`, `Zygote`, or `ReverseDiff` for AD, users must explicitly import them with `import Mooncake`, `import Zygote` or `import ReverseDiff`, alongside `using Turing`.
Turing currently supports four automatic differentiation (AD) backends for sampling: [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) for forward-mode AD; and [Mooncake](https://github.com/compintell/Mooncake.jl) and [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl) for reverse-mode AD.
`ForwardDiff` is automatically imported by Turing. To utilize `Mooncake`, or `ReverseDiff` for AD, users must explicitly import them with `import Mooncake`, alongside `using Turing`.

As of Turing version v0.30, the global configuration flag for the AD backend has been removed in favour of [`AdTypes.jl`](https://github.com/SciML/ADTypes.jl), allowing users to specify the AD backend for individual samplers independently.
Users can pass the `adtype` keyword argument to the sampler constructor to select the desired AD backend, with the default being `AutoForwardDiff(; chunksize=0)`.
Expand All @@ -33,8 +33,6 @@ For instance, `if`-statements with conditions that can be determined at compile
However, `if`-statements that depend on the model parameters can take different branches during sampling; hence, the compiled tape might be incorrect.
Thus you must not use compiled tapes when your model makes decisions based on the model parameters, and you should be careful if you compute functions of parameters that those functions do not have branching which might cause them to execute different code for different values of the parameter.

For `Zygote`, pass `adtype=AutoZygote()` to the sampler constructor.

And the previously used interface functions including `ADBackend`, `setadbackend`, `setsafe`, `setchunksize`, and `setrdcache` are deprecated and removed.

## Compositional Sampling with Differing AD Modes
Expand Down Expand Up @@ -70,9 +68,22 @@ Generally, reverse-mode AD, for instance `ReverseDiff`, is faster when sampling
If the differentiation method is not specified in this way, Turing will default to using whatever the global AD backend is.
Currently, this defaults to `ForwardDiff`.

The most reliable way to ensure you are using the fastest AD that works for your problem is to benchmark them using [`TuringBenchmarking`](https://github.com/TuringLang/TuringBenchmarking.jl):
The most reliable way to ensure you are using the fastest AD that works for your problem is to benchmark them using the functionality in DynamicPPL (see [the API documentation](https://turinglang.org/DynamicPPL.jl/stable/api/#AD-testing-and-benchmarking-utilities)):

```{julia}
using TuringBenchmarking
benchmark_model(gdemo(1.5, 2), adbackends=[AutoForwardDiff(), AutoReverseDiff()])
using DynamicPPL.TestUtils.AD: run_ad, ADResult
using ForwardDiff, ReverseDiff

model = gdemo(1.5, 2)

for adtype in [AutoForwardDiff(), AutoReverseDiff()]
result = run_ad(model, adtype; benchmark=true)
@show result.time_vs_primal
end
```

In this specific instance, ForwardDiff is clearly faster (due to the small size of the model).

We also have a table of benchmarks for various models and AD backends in [the ADTests website](https://turinglang.org/ADTests/).
These models aim to capture a variety of different Turing.jl features.
If you have suggestions for things to include, please do let us know by [creating an issue on GitHub](https://github.com/TuringLang/ADTests/issues/new)!
68 changes: 0 additions & 68 deletions usage/generated-quantities/index.qmd

This file was deleted.

152 changes: 152 additions & 0 deletions usage/tracking-extra-quantities/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
---
title: Tracking Extra Quantities
engine: julia
aliases:
- ../../tutorials/usage-generated-quantities/index.html
- ../generated-quantities/index.html
---

```{julia}
#| echo: false
#| output: false
using Pkg;
Pkg.instantiate();
```

Often, there are quantities in models that we might be interested in viewing the values of, but which are not random variables in the model that are explicitly drawn from a distribution.

As a motivating example, the most natural parameterization for a model might not the most computationally feasible.
Consider the following (efficiently reparametrized) implementation of Neal's funnel [(Neal, 2003)](https://arxiv.org/abs/physics/0009028):

```{julia}
using Turing
setprogress!(false)

@model function Neal()
# Raw draws
y_raw ~ Normal(0, 1)
x_raw ~ arraydist([Normal(0, 1) for i in 1:9])

# Transform:
y = 3 * y_raw
x = exp.(y ./ 2) .* x_raw
return nothing
end
```

In this case, the random variables exposed in the chain (`x_raw`, `y_raw`) are not in a helpful form — what we're after are the deterministically transformed variables `x` and `y`.

There are two ways to track these extra quantities in Turing.jl.

## Using `:=` (during inference)

The first way is to use the `:=` operator, which behaves exactly like `=` except that the values of the variables on its left-hand side are automatically added to the chain returned by the sampler.
For example:

```{julia}
@model function Neal_coloneq()
# Raw draws
y_raw ~ Normal(0, 1)
x_raw ~ arraydist([Normal(0, 1) for i in 1:9])

# Transform:
y := 3 * y_raw
x := exp.(y ./ 2) .* x_raw
end

sample(Neal_coloneq(), NUTS(), 1000)
```

## Using `returned` (post-inference)

Alternatively, one can specify the extra quantities as part of the model function's return statement:

```{julia}
@model function Neal_return()
# Raw draws
y_raw ~ Normal(0, 1)
x_raw ~ arraydist([Normal(0, 1) for i in 1:9])

# Transform and return as a NamedTuple
y = 3 * y_raw
x = exp.(y ./ 2) .* x_raw
return (x=x, y=y)
end

chain = sample(Neal_return(), NUTS(), 1000)
```

The sampled chain does not contain `x` and `y`, but we can extract the values using the `returned` function.
Calling this function outputs an array:

```{julia}
nts = returned(Neal_return(), chain)
```

where each element of which is a NamedTuple, as specified in the return statement of the model.

```{julia}
nts[1]
```

## Which to use?

There are some pros and cons of using `returned`, as opposed to `:=`.

Firstly, `returned` is more flexible, as it allows you to track any type of object; `:=` only works with variables that can be inserted into an `MCMCChains.Chains` object.
(Notice that `x` is a vector, and in the first case where we used `:=`, reconstructing the vector value of `x` can also be rather annoying as the chain stores each individual element of `x` separately.)

However, if used carelessly, `returned` can lead to unnecessary computation.
For example, in `Neal_return()` above, the `x` and `y` variables are also calculated during the inference process (i.e. the call to `sample()`), but are then thrown away.
They are then calculated _again_ when `returned()` is called.

To avoid this, you will essentially have to create two different models, one for inference and one for post-inference.
The simplest way of doing this is to add a parameter to the model argument:

```{julia}
@model function Neal_coloneq_optional(track::Bool)
# Raw draws
y_raw ~ Normal(0, 1)
x_raw ~ arraydist([Normal(0, 1) for i in 1:9])

if track
y = 3 * y_raw
x = exp.(y ./ 2) .* x_raw
return (x=x, y=y)
else
return nothing
end
end

chain = sample(Neal_coloneq_optional(false), NUTS(), 1000)
```

The above ensures that `x` and `y` are not calculated during inference, but allows us to still use `returned` to extract them:

```{julia}
returned(Neal_coloneq_optional(true), chain)
```

Another equivalent option is to use a submodel:

```{julia}
@model function Neal()
y_raw ~ Normal(0, 1)
x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
return (x_raw=x_raw, y_raw=y_raw)
end

chain = sample(Neal(), NUTS(), 1000)

@model function Neal_with_extras()
neal ~ to_submodel(Neal(), false)
y = 3 * neal.y_raw
x = exp.(y ./ 2) .* neal.x_raw
return (x=x, y=y)
end

returned(Neal_with_extras(), chain)
```

Note that for the `returned` call to work, the `Neal_with_extras()` model must have the same variable names as stored in `chain`.
This means the submodel `Neal()` must not be prefixed, i.e. `to_submodel()` must be passed a second parameter `false`.
Loading