**We will show how to perform multivariate random sampling using one of the Markov Chain Monte Carlo (MCMC) algorithms, called the Gibbs sampler.**

Random sampling with rabbit on the bed plane

**To start, what are MCMC algorithms and what are they based on?**

Suppose we are interested in generating a random variablewith a distribution of, over.

If we are not able to do this directly, we will be satisfied with generating a sequence of random variables, which in a sense tending to a distribution of. The MCMC approach explains how to do so:

Build a Markov chain, for, whose stationary distribution is.

If the structure is correct, we should expect random variables to converge to.

In addition, we can expect that

for function, occurs:

with probability equals to 1.

In essence, we want the above equality to occur for any arbitrary random variable.

**One of the MCMC algorithms that guarantee the above properties is the so-called Gibbs sampler.**

**Gibbs Sampler – description of the algorithm.**

Assumptions:

- is defined on the product space
- We are able to draw from the conditional distributions

,

where

Algorithm steps:

- Select the initial values
- Forrepeat:
Forsamplefrom distribution

- Repeat step 2 until the distribution of vectorstabilizes.
- The subsequent iterations of point 2 will provide a randomization of.

How do we understand point 3?

We are most satisfied when average coordinates stabilize to some accuracy.

**Intuition in two-dimensional case:**

Source: [3]

**Gibbs sampling for randomization with a two-dimensional normal distribution.**

We will sample from the distribution of, where

and.

Knowing joint density of, it’s easy to show, that:

R implementation:

Using the above function, let’s see how the 10 points were scored at:

And for 10000 iterations:

We leave you a comparison of how the stability of the parameters changes depending on the selectedparameter.

**Let’s move on to use the Gibbs sampler to estimate the density parameters.**

We will show the use of the Gibbs sampler and bayesian statistics to estimate the mean parameters in the mix of normal distributions.

Assumptions (simplified case):

- iid. samplecomes from a mixture of normal distributions, where,iare known.
- For i=1,2(a priori distributions) andwithare independent.
- –\Delta_{i} = 0,is drawn from, whenthen drawn from).

With above assumptions it can be shown that:

The form of the algorithm:

- Choose starting point for the mean
- In the-th iteration do:
- With the probability:

drawfor - Calculate:
- Generate:

- With the probability:
- Perform step 2. until the distribution of vectorstabilizes.

How to do this in R?

At start let’s generate random sample from mixture of normals with parameters.

Take a quick look at a histogram of our data:

The following task is to estimateandfrom above model.

Let’s see the relation of sampled data to known one:

The following plot presents the mean of thevector at each iteration:

Let’s check how mean of parametersandstabilize at used algorithm:

Note how little iteration was enough to stabilize the parameters.

Finally let’s see estimated density with our initial sample:

To those concerned about the topic, refer to [1] where you can find a generalization of normal distribution mixture by extending a priori distributions to other parameters.

It is also worth to compare the above algorithm with its deterministic counterpart, Expectation Maximization (EM) algorithm see [2].

Write your question and comments below. We’d love to hear what you think.

Resources:

[1] http://gauss.stat.su.se/rr/RR2006_1.pdf

[2] http://web.stanford.edu/~hastie/ElemStatLearn/

[3] http://vcla.stat.ucla.edu/old/MCMC/MCMC_tutorial.htm