Simple Principle of Maximum Entropy, without Lagrange Multipliers.
6.050J / 2.110J, April 28, 2004.

The set-up phase:
- Identify the states (assume a finite number of them), A_i
- Identify the energy of each state, E_i
- Identify the expected value of energy E which must be between the lowest and highest E_i (or possibly equal to one of them)
Phase of assigning probability distribution p(A_i):
- We want the p(A_i) with the highest uncertainty
  S = Σ_i p(A_i) log₂ (1/p(A_i) )
  consistent with the right value for energy
  E = Σ_i p(A_i)E_i
- Consider the candidate probability distributions of the form
  p_c(A_i) = e^-α e^-βE_i
  where α is chosen so as to make the sum of all p_c(A_i) = 1, i.e.,
  α = log₂ [Σ_i e^-βE_i]
- These candidate distributions are functions of β, a real number
- These candidate distributions have an uncertainty
  S_c = Σ_i p_c(A_i) log₂ (1/p_c(A_i) )
- These candidate distributions have an expected value of energy
  E_c = Σ_i p_c(A_i)E_i which may or may not be the right energy E
- Assertion: If p(A_i) is any other probability distribution with the same expected value of energy E_c, then its uncertainty
  S = Σ_i p(A_i) log₂ (1/p(A_i) ) is no greater than S_c
- Therefore, if β is chosen to make the energy E_c equal to the desired expected value E, the resulting candidate distribution is the one which has the maximum entropy

Proof of the assertion:

The Gibbs inequality (6.4) is:
Σ_i p(A_i) log₂ (1/p(A_i) ) ≤ Σ_i p(A_i) log₂ (1/p_c(A_i) )
Just substitute the form for p_c(A_i) in the Gibbs inequality, and use the fact that
Σ_i p(A_i)E_i = Σ_i p_c(A_i)E_i

At this point all the general properties of Chapter 11 can be used for the candidate distribution. In particular, a value of β can be found for which the energy matches what is needed, as a sketch of E_c(β) vs. β reveals.

Click here for information on MIT Accessibility