Nonparametric Models¶
Bayesian nonparametric models, such as DP mixture models, provide a flexible approach in which model structure (e.g. the number of components) can be adapted to data. The capability and effectiveness of such models have been proven in many applications.
This flexibility, however, leads to new questions to probabilistic programming – how to express models whose size/structure can vary. The Julia macro system makes it possible to address this problem in an elegant way due to is lazy evaluation nature.
Dirichlet Process Mixture Model¶
Dirichlet process mixture model (DPMM) is one of the most widely used in Bayesian nonparametrics. The formulation of DPMM is given by
Here,
is the concentration paramater,
is the base measure, and
denotes the component models that generate the observed samples. The model specification for a DPMM is given as below. It has been shown that
is almost surely discrete, and can be expressed in the followin form:
Hence, there exists positive probability that some of the components will be repeatedly sampled, and thus the number of distinct components is usually smaller than the number of samples. In practice, it is useful to construct a pool of components
, and introduce an indicator
for each sample
, such that
. To make this explicit, we can reformulate the model as below
Here, \pi, a sample from the stick breaking process, is an infinite sequence that sums to unity (i.e.
). The values of
are defined as
The model specification for this formulation is given below
@model DPMixtureModel{B, G} begin @constant n::Int # the number of observed samples @hyperparam alpha::Float64 # the concentration parameter pi ~ StickBreak(alpha) for k = 1 : Inf phi[k] ~ B end # samples for i = 1 : n z[i] ~ pi x[i] ~ G(phi[z[i]]) end end
Remarks:
- The parametric setting makes it possible to use arbitrary base distribution
and component
here, with the same generic formulation. - In theory,
is an infinite sequence, and therefore it is not feasible to completely instantiate a sample path of
in computer. This variable may be marginalized out in the inference, and thus directly querying
is not allowed.
has infinitely many components. However, only a finite number of them are needed in inference. The compiler should generate a lazy data structure that only memorizes the subset of components needed during the inference. In particular, phi[k]is constructed and memorized when there is anisuch thatk = z[i]. Some efforts (e.g. the LazySequences.jl) have demonstrated that lazy data structure can be implemented efficiently in Julia.
Hierarchical Dirichlet Processes¶
HDP is an extension of the DP mixture models, which allows groups of data to be modeled by different DPs that share components. The formulation of HDP is given below
Using
as a base measure for the DP associated with each group, all groups share components in
while allowing potentially infinite number of components. This formulation can be re-written (equivalently) using Pitman-Yor process, as follows
Then for each
,
Here is a brief description of this procedure:
- To generate
, we first draw an infinite multinomial distribution
from a Pitman-Yor process with concentration parameter
, and draw each component
from
. Then
. - Then for each group (say the k-th one), we draw
from a stick breaking process and draw each component from
. Note that drawing a component
from
is equivalent to choosing one of the atoms in
, which can be done in two steps: draw
from
and then set
. In other words, the
-th component in the
-th group is identical to the
-th component in
. - Finally, to generate the
-th sample in the
-th group, denoted by
, we first draw
from
and use the corresponding component
to generate the sample.
This formulation can be expressed using the DSL as below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 @model HierarchicalDP{B, G} begin @constant m::Int # the number of groups @constant ns::Vector{Int} # the number of samples in each group @hyperparam alpha::Float64 # the base concentration @hyperparam gamma::Float64 # the group specific concentration # for D0 pi0 ~ StickBreak(alpha) for j = 1 : Inf psi[j] ~ B end # each group for k = 1 : m pi[k] ~ StickBreak(gamma) # Dk for t = 1 : Inf u[k][t] ~ pi0 phi[k][t] = psi[u[k][t]] end # samples for i = 1 : ns[k] z[k][t] ~ pi[k] x[k][i] ~ G(phi[z[k][t]]) end end end
Gaussian Processes¶
Gaussian process (GP) is another important stochastic process that is widely used in Bayesian modeling. Formally, a Gaussian process is defined to be a a function-valued distribution
, where
can be arbitrary domain, such that any finite subset of values in
is normally distributed. A Gaussian process is characterized by a mean function
and a positive definite covariance function
. The covariance function is typically given in a parametric form. The following is one that is widely used
In many application, the GP is considered to be hidden, and observations are a noisy transformation of the samples generated from the GP, as
The following model specification describes this model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 @model TransformedGaussProcess{B, F} begin @constant n::Int # the number of observed samples # define mean and covariance function @param theta::Float^3 mu(x) = 0. kappa(x, y) = theta[1] * delta(x, y) + theta[2] * exp(- theta[3] * abs2(x - y)) # GP g ~ GaussianProcess(mu, kappa) # samples for i = 1 : n x[i] ~ B y[i] ~ F(g[x[i]]) end end
The GaussianProcess distribution here is a high-order stochastic function, which takes into two function arguments and generates another function. This is readily implementable in Julia, where functions are first-class citizens like in many functional programming languages.








