# Posterior Distribution of μ and σ for Normal Distribution Based on Unknown Prior

### Home | Academic Articles

The basics

A normal distribution has two parameters, the mean which indicates where the bell curve is centered and the standard deviation which indicates the shape of the bell curve. From a frequentist point of view, and are fixed quantities. From a Bayesian point of view, they are random variables each with their own distribution, mean and standard deviation.

If a researcher has a set of data that appears to follow a normal distribution, we would like to find the distributions of and .

If the researcher does not have a clear idea as to what distribution and jointly follow, Jeffreys suggested a diffuse prior distribution to indicate this lack of knowledge. When this prior distribution is combined with the data (known as the likelihood), the joint posterior distribution of and does not follow any readily identifiable distribution.

However, once we solve for just from the joint posterior distribution, we find that it follows a t distribution with the mean equal to which represents the sample mean of the data and the variance equal to in which represents the sample standard deviation of the data. Note that the sample size needs to be more than 3 in order to have a standard deviation. The other thing to note is that as the sample size increases, the standard deviation of gets closer to zero.

Similarly, once we solve for just from the joint posterior distribution, we find that it follows an inverse gamma distribution with the mean equal to where is the gamma function of x. To have a mean, the sample size needs to be more than 1. The variance is To derive the standard deviation, we take the square root of the above quantity. To have a standard deviation, the sample size needs to be more than 3.

Example

Suppose a researcher takes a sample of 10 observations of people buying gas at a service station:

 39.62 48.21 52.48 57.06 57.24 60.04 63.64 68.05 73.98 81.24

Analysis indicates the data is normally distributed.

We have = 60.16, = 12.2545, = 150.17 and = 10.

The posterior distribution of indicates it follows a t distribution with a mean of 60.16 and variance of (9/7)(150.17/10) = 19.3078 or a standard deviation of 4.39 with 9 degrees of freedom.

Employing Chebyshev’s theorem, at least 8/9 of the distribution lies between 60.16 – 3(4.39) = 46.99 and 60.16 + 3(4.39) = 73.33.

The posterior distribution of follows an inverse gamma distribution with a mean of The variance is The standard deviation is then the square root of 13.2663 which is 3.64.

Employing Chebyshev’s theorem, at least 8/9 of the distribution lies between 13.41 – 3(3.64) = 2.49 and 13.41 + 3(3.64) = 24.33.

Then, returning to the distribution of X, we can construct a table indicating the range of depending on their values:

 μ | σ 2.49 13.41 24.33 46.99 (39.52, 54.46) (6.76,87.22) (0, 119.98) 60.16 (52.69, 67.63) (19.93, 100.39) (0, 133.15) 73.33 (65.86, 80.80) (33.10, 113.56) (0.34, 146.32)

Of these ranges, the one with μ = 60.16 and σ = 13.41 seems the most plausible. As more data is added, the range of μ and σ will tighten up. For example, I generated 1000 normal random numbers with a mean of 60.16 and standard deviation of 13.41. The mean of the data is 60.21 and the sample variance is 182.0687. Based on this data, μ follows a t distribution with a mean of 60.21 and standard deviation of 0.43 and σ follows an inverse gamma distribution with a mean of 13.50 and standard deviation of 0.30.

Technical details

If the researcher is starting from scratch, the joint prior distribution of and should convey this. The suggestion made by Jeffreys is to have .

Since our data appears to follow a normal distribution, each value y follows this distribution: Given the random sample , the likelihood function is:  where represents the sample mean of the data.

The expression is derived as follows:    The first term is derived from the fact that .

There is no middle term from FOIL since .

Then  Posterior Distribution of μ

To derive the posterior distribution of , we integrate with respect to .

We use the substitution in which represents the degrees of freedom. The result is:  Thus, follows a t distribution.

If we let then where is the gamma function of x.

To find the mean of t, E(t | y), we have: This follows since is an odd function of t. From that we derive and consequently provided > 1. Note that if = 1, the integral does not provide a finite solution. (In fact, t would follow a Cauchy distribution.)

Since E(t) = 0, the variance of t, Var(t) = E(t2) Since then   This follow from the previous derivation of .

Then, This indicates that as the sample size increases, the variance of decreases.

Posterior Distribution of σ

To derive the posterior distribution of , we integrate with respect to .

We use the substitution . The result is: This is in the form of an inverse Gamma distribution. Thus, In this case, and .

To find the mean of σ, E(σ | y), we have: Let y = 1/σ2. Then σ2 = y-1 leading to σ = y-0.5 and = -0.5y-1.5 dy. Substituting, we get:  The last line follows from the equation for the gamma function: Then, To find the variance of σ, we need E(σ2 | y): Again, let y = 1/σ2. Then σ = y-0.5 and = -0.5y-1.5 dy. Substituting, we get:   Then  Thus, we need > 2 in order to have a variance and subsequently a standard deviation.

Reference:

Zellner, Arnold. An Introduction to Bayesian Inference in Econometrics. New York: John Wiley & Sons, 1970.