Chapter 7 The sampling algorithms

As the name of the package indicates, the samplers in employed in this package are based on piecewise deterministic Markov processes (see e.g. Fearnhead et al. (2018)) with Hamiltonian dynamic between events (see Kleppe (2022a)).

Two distinct such samplers are so far implemented: fixed mass (section 7.1) and Riemannian manifold (Section 7.2). The Hamiltonians, which are fully specifies the between event dynamics are provided below.

The same set of numerical differential equation solvers (Section 7.3) are used for both solvers. Finally, methods for adapting the scaling of the differential equations (Section 7.4) and the different available event intensities (Section 7.5) are described below.

The sampling algorithm to be used are chosen prior to the compilation pdmphmc::build(). By default, a fixed mass sampler is used.

Whether to use the RM sampler or fixed metric sampler is inherently problem-specific. As a rule of thumb, if your target distribution involves complicated non-linear dependencies, or the scale of some subset of \(\boldsymbol \theta\) depends strongly on some other subset of \(\boldsymbol \theta\), the RM sampler is often a good choice. However, the differential equations associated with the RM sampler are more expensive to compute than for the fixed metric sampler. Hence, if it is possible to rewrite your model so that the fixed metric sampler gives good results (see e.g. Kleppe (2019) or Osmundsen, Kleppe, and Liesenfeld (2021)), this approach is often a good choice.

7.1 Fixed mass sampler

Suppose the target density is given by \(\pi (\boldsymbol \theta)\) and that the model specification admits the evaluation \(\bar \pi(\boldsymbol \theta) \propto \pi (\boldsymbol \theta)\). The Hamiltonian \(\mathcal H(\mathbf q,\mathbf p)\) used to define the deterministic dynamics for the fixed mass sampler is \[ \mathcal H(\mathbf q,\mathbf p) = -\log \pi(\mathbf m + \mathbf S \mathbf q) + \frac{1}{2} \mathbf p^T \mathbf p. \] Here \(\mathbf p\) is the (fictitious) momentum variable introduced to construct a dynamical system with Boltzmann-Gibbs distribution \(\pi(\mathbf q, \mathbf p) \propto \exp(-\mathcal H(\mathbf q,\mathbf p))\).

Samples are subsequently recorded for \(\boldsymbol \theta=\mathbf m + \mathbf S \mathbf q\) which will be distributed according to \(\pi(\boldsymbol \theta)\). Here vector \(\mathbf m\) and diagonal positive definite matrix \(\mathbf S\) are adapted during the warmup process (see Chapter 7.4)

The vector \(\mathbf m\) should reflect the mean of the target distribution, and the diagonal elements of \(\mathbf S\) should reflect the scale properties of \(\boldsymbol \theta\) under the target distribution.

The reasoning behind introducing the re-parameterization \(\mathbf q \leftrightarrow \boldsymbol \theta\) (rather than to account for different scales by introducing a non-identity mass matrix in the kinetic energy term of \(\mathcal H(\mathbf q,\mathbf p)\)) is based on the desire for obtaining well-scaled Hamilton’s equations \[ \dot{ \mathbf q}(t) = \mathbf p(t) \] \[ \dot{\mathbf p}(t) = \mathbf S \mathbf g(\mathbf m+\mathbf S \mathbf q(t)), \; \mathbf g(\boldsymbol \theta) = \nabla_{\boldsymbol \theta}\log \pi (\boldsymbol \theta) \] suitable for numerical integration. I.e. for suitably chosen \(\mathbf S\) the force term in both equations should have order \(1\) (as \(Var(\mathbf p)=\mathbf I\) under the Boltzmann-Gibbs distribution).

The sampler is chosen explicitly by selecting process.type = "HMCProcess" in the call to pdmphmc::build(), but as mentioned is already the default option.

7.2 Riemann manifold sampler

The Riemann manifold (RM) sampler is based on the availability of a “metric tensor” \(\mathbf G(\boldsymbol \theta)\). Namely for each \(\boldsymbol \theta\), then \(\mathbf G(\boldsymbol \theta)\) is a symmetric positive definite matrix that may be interpreted as the “local precision matrix” of the target distribution.

The amt-library provides an automatic methodology for computing metric tensors from a given model specification (see Kleppe (2022b) for details), and by default this methodology is used if the RM sampler is selected via process.type = "RMHMCProcess" in the call to pdmphmc::build().

The Hamiltonian for the RM sampler is given by \[ \mathcal H(\mathbf q,\mathbf p) = -\log \bar \pi(\mathbf m + \mathbf S \mathbf q) + \frac{1}{2}\log(|\bar{\mathbf G}(\mathbf q)|) + \frac{1}{2}\mathbf p^T[\bar{\mathbf G}(\mathbf q)]^{-1}\mathbf p \] where \[ \bar{\mathbf G}(\mathbf q) = \mathbf S^T \mathbf G(\mathbf m + \mathbf S \mathbf q) \mathbf S. \]

It is seen that the interpretation of \(\mathbf m\) and \(\mathbf S\) are the same also for the RM sampler, i.e. \(\mathbf m\) should reflect the center/mean of the distribution, and \(\mathbf S\) should reflect the scale of each element in \(\boldsymbol \theta\).

7.2.1 Metric tensor storage

Two storage schemes for the metric tensor are available

  • sparse storage using on the ldl Cholesky factorization (T. A. Davis (2005))

  • dense storage using the Cholesky factorization of the Stan math library.

One should use the sparse storage scheme if the metric tensor is very sparse, which is often the case for hierarchical models. The metric tensor scheme is chosen during the pdmphmc::build()-process by selecting metric.tensor.type = c("Sparse", "Dense").

Note that for the sparse storage scheme, the ordering of the parameters may play a role for the performance of the Cholesky factorization (see Timothy A. Davis (2006)). The ordering of the parameters are determined by the ordering of the PARAMETER_* statements in the model specification.

7.2.2 Advanced

It is also possible to specify ones own metric tensor. One turns off the default amt-library metric tensor by choosing the option amt=FALSE in the call to pdmphmc::build(). This process will be documented in more detail soon.

7.3 ODE solvers

The ODE solver is a bespoke Runge Kutta method with a PI controller for adaptive time step sizes. The Runge Kutta type steps (including dense formulas) implemented so far are

  • The order 5 (4) pair of Dormand and Prince (1980) (obtained by choosing step.type = "RKDP54" in build()).
  • The order 3 (2) pair of Bogacki and Shampine (1989) (obtained by choosing step.type = "RKBS32" in build())

The latter is generally preferred only for models with constraints (see Section 5) or when running with very lax integration tolerances.

7.4 Scaling adaptation

So far, only square-roots of estimates of the diagonal elements of \(Var(\boldsymbol \theta)\) are available as the diagonal elements of \(\mathbf S\). See Kleppe (2022a) for details

7.5 Event intensity

So far, only constant event intensities are available. See Kleppe (2022a) for details on on the adaptive choice of this constant event intensity.

References

Bogacki, P., and L. F. Shampine. 1989. “A 3(2) Pair of Runge - Kutta Formulas.” Applied Mathematics Letters 2 (4): 321–25. https://doi.org/https://doi.org/10.1016/0893-9659(89)90079-7.
Davis, T. A. 2005. “Algorithm 849: A Concise Sparse Cholesky Factorization Package.” ACM Transactions Mathematical Software 31: 587–91.
Davis, Timothy A. 2006. Direct Methods for Sparse Linear Systems. Vol. 2. Fundamentals of Algorithms. SIAM.
Dormand, J. R., and P. J. Prince. 1980. “A Family of Embedded Runge-Kutta Formulae.” Journal of Computational and Applied Mathematics 6 (1): 19–26. https://doi.org/https://doi.org/10.1016/0771-050X(80)90013-3.
Fearnhead, Paul, Joris Bierkens, Murray Pollock, and Gareth O. Roberts. 2018. “Piecewise Deterministic Markov Processes for Continuous-Time Monte Carlo.” Statist. Sci. 33 (3): 386–412. https://doi.org/10.1214/18-STS648.
Kleppe, Tore Selland. 2019. “Dynamically Rescaled Hamiltonian Monte Carlo for Bayesian Hierarchical Models.” Journal of Computational and Graphical Statistics 28 (3): 493–507. https://doi.org/10.1080/10618600.2019.1584901.
———. 2022a. “Connecting the Dots: Numerical Randomized Hamiltonian Monte Carlo with State-Dependent Event Rates.” Journal of Computational and Graphical Statistics.
———. 2022b. “Log-Density Gradient Covariance and Automatic Metric Tensors for Riemann Manifold Monte Carlo Methods.”
Osmundsen, Kjartan Kloster, Tore Selland Kleppe, and Roman Liesenfeld. 2021. “Importance Sampling-Based Transport Map Hamiltonian Monte Carlo for Bayesian Hierarchical Models.” Journal of Computational and Graphical Statistics 30 (4): 906–19.