Posterior Distribution of μ and σ for Normal Distribution Based on Unknown Prior


Home | Academic Articles


The basics

A normal distribution has two parameters, the mean  which indicates where the bell curve is centered and the standard deviation  which indicates the shape of the bell curve. From a frequentist point of view,  and  are fixed quantities. From a Bayesian point of view, they are random variables each with their own distribution, mean and standard deviation.

If a researcher has a set of data that appears to follow a normal distribution, we would like to find the distributions of  and .

If the researcher does not have a clear idea as to what distribution  and  jointly follow, Jeffreys suggested a diffuse prior distribution to indicate this lack of knowledge. When this prior distribution is combined with the data (known as the likelihood), the joint posterior distribution of  and  does not follow any readily identifiable distribution.

However, once we solve for just  from the joint posterior distribution, we find that it follows a t distribution with the mean equal to  which represents the sample mean of the data and the variance equal to

in which  represents the sample standard deviation of the data. Note that the sample size needs to be more than 3 in order to have a standard deviation. The other thing to note is that as the sample size increases, the standard deviation of  gets closer to zero.

Similarly, once we solve for just  from the joint posterior distribution, we find that it follows an inverse gamma distribution with the mean equal to

where  is the gamma function of x. To have a mean, the sample size needs to be more than 1. The variance is

To derive the standard deviation, we take the square root of the above quantity. To have a standard deviation, the sample size needs to be more than 3.

Example

Suppose a researcher takes a sample of 10 observations of people buying gas at a service station:

39.62

48.21

52.48

57.06

57.24

60.04

63.64

68.05

73.98

81.24

 

Analysis indicates the data is normally distributed.

We have  = 60.16,  = 12.2545,  = 150.17 and  = 10.

The posterior distribution of  indicates it follows a t distribution with a mean of 60.16 and variance of (9/7)(150.17/10) = 19.3078 or a standard deviation of 4.39 with 9 degrees of freedom.

Employing Chebyshev’s theorem, at least 8/9 of the distribution lies between 60.16 – 3(4.39) = 46.99 and 60.16 + 3(4.39) = 73.33.

The posterior distribution of  follows an inverse gamma distribution with a mean of

The variance is

The standard deviation is then the square root of 13.2663 which is 3.64.

Employing Chebyshev’s theorem, at least 8/9 of the distribution lies between 13.41 – 3(3.64) = 2.49 and 13.41 + 3(3.64) = 24.33.

Then, returning to the distribution of X, we can construct a table indicating the range of  depending on their values:

μ | σ

2.49

13.41

24.33

46.99

(39.52, 54.46)

(6.76,87.22)

(0, 119.98)

60.16

(52.69, 67.63)

(19.93, 100.39)

(0, 133.15)

73.33

(65.86, 80.80)

(33.10, 113.56)

(0.34, 146.32)

Of these ranges, the one with μ = 60.16 and σ = 13.41 seems the most plausible. As more data is added, the range of μ and σ will tighten up. For example, I generated 1000 normal random numbers with a mean of 60.16 and standard deviation of 13.41. The mean of the data is 60.21 and the sample variance is 182.0687. Based on this data, μ follows a t distribution with a mean of 60.21 and standard deviation of 0.43 and σ follows an inverse gamma distribution with a mean of 13.50 and standard deviation of 0.30.

Technical details

If the researcher is starting from scratch, the joint prior distribution of  and  should convey this. The suggestion made by Jeffreys is to have

.

Since our data appears to follow a normal distribution, each value y follows this distribution:

Given the random sample , the likelihood function is:

where  represents the sample mean of the data.

The expression  is derived as follows:

The first term is derived from the fact that .

There is no middle term from FOIL since .

Then

Posterior Distribution of μ

To derive the posterior distribution of , we integrate  with respect to .

We use the substitution  in which  represents the degrees of freedom. The result is:

Thus,  follows a t distribution.

If we let

then

where is the gamma function of x.

To find the mean of t, E(t | y), we have:

This follows since  is an odd function of t. From that we derive and consequently  provided  > 1. Note that if  = 1, the integral does not provide a finite solution. (In fact, t would follow a Cauchy distribution.)

Since E(t) = 0, the variance of t, Var(t) = E(t2)

Since

then

This follow from the previous derivation of .

Then,

This indicates that as the sample size increases, the variance of  decreases.

Posterior Distribution of σ

To derive the posterior distribution of , we integrate  with respect to .

We use the substitution . The result is:

This is in the form of an inverse Gamma distribution. Thus,

In this case,  and .

To find the mean of σ, E(σ | y), we have:

Let y = 1/σ2. Then σ2 = y-1 leading to σ = y-0.5 and = -0.5y-1.5 dy. Substituting, we get:

The last line follows from the equation for the gamma function:

Then,

To find the variance of σ, we need E(σ2 | y):

Again, let y = 1/σ2. Then σ = y-0.5 and = -0.5y-1.5 dy. Substituting, we get:

Then

Thus, we need  > 2 in order to have a variance and subsequently a standard deviation.

Reference:

Zellner, Arnold. An Introduction to Bayesian Inference in Econometrics. New York: John Wiley & Sons, 1970.