Bayesian Theory
The Likelihood Ratio Test g The Probability of Error g The Bayes Risk g Bayes, MAP and ML Criteria g Multi-class problems g Discriminant Functions
g
Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University
1
Likelihood Ratio Test (LRT)
g g
Assume we are to classify an object based on the evidence provided by a measurement(or feature vector) x Would you agree that a reasonable decision rule would be the following?
n
"Choose the class that is most ‘probable’ given the observed feature vector x”
g
g
Let us examine the implications of this decision rule for a 2-class problem
n
More formally: Evaluate the posterior probability of each class P(ωi|x) and choose the class with largest P(ωi|x)
In this casethe decision rule becomes
if P(ω1 | x ) > P(ω2 | x ) choose ω1 else choose ω2 > P(ω1 | x ) < P(ω2 | x )
ω1
g
Or, in a more compact form
n
Applying Bayes Rule
ω2
n
P(x) does not affect the decision rule so it can be eliminated*. Rearranging the previous expression ω
Λ( x ) = P( x | ω1 ) >1 P(ω2 ) < P( x | ω2 ) ω2 P(ω1 )
P( x | ω1 )P(ω1 ) >1 P( x | ω2 )P(ω2 ) < P( x ) P( x) ω2
ω
n
The term Λ(x) is called the likelihood ratio, and the decision rule is known as the likelihood ratio test
*P(x) can be disregarded in the decision rule since it is constant regardless of class ωI. However, P(x) will be needed if we want to estimate the posterior P(ωi|x) which, unlike P(x|ωi)P(x), is a true probability value and, therefore, gives us an estimate of the “goodness”of our decision.
Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University
2
Likelihood Ratio Test: an example
g
Given a classification problem with the following class conditional densities, derive a decision rule based on the Likelihood Ratio Test (assume equal priors)
− (x − 4)2 1 e 2 P(x | ω1 ) = 2π 1 − (x −10)2 1 e 2 P(x | ω2 ) = 2π
− ( x − 4 )2 1 ω1 e 2>1 2π 1 2 − ( x −10 ) < 1 1 e 2 ω2 2π 1
1
g
Solution
n
Substituting the given likelihoods and priors into the LRT expression: Λ( x ) = Simplifying the LRT expression: Changing signs and taking logs: Which yields:
< x 7 >
ω2 ω1
n
Λ( x ) =
e e
1 ω1 − ( x − 4 )2 2
>
1 − ( x −10 )2 2
<
ω2
1
ω1
n
( x − 4)2 − ( x − 10)2
< 0 >
R1: say ω1 P(x|ω1)
R2:say ω2 P(x|ω2)
ω2
n
n
This LRT result makes sense from an intuitive point of view since the likelihoods are identical and differ only in their mean value
4
10
x
g
How would the LRT decision rule change if, say, the priors were such that P(ω1)=2P(ω2) ?
Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 3
The probability of error (1)
gThe performance of any decision rule can be measured by its probability of error P[error] which, making use of the Theorem of total probability (Lecture 2), can be broken up into
P[error] = ∑ P[error | ωi ]P[ωi ]
i=1 C
g
The class conditional probability of error P[error|ωi] can be expressed as
P[error | ωi ] = P[choose ω j | ωi ] = ∫ P( x | ωi )dx
Rj
g
So, for our 2-class problem,the probability of error becomes
P[error ] = P[ω1 ] ∫ P( x | ω1 )dx + P[ω 2 ] ∫ P( x | ω 2 )dx
R2
14 244 4 3
ε1
R1
14 244 4 3
ε2
n
where εi is the integral of the likelihood P(x|ωi) over the region Rj where we choose ωj Since we assumed equal priors, then P[error] = (ε1 + ε2)/2
g
For the decision rule of the previous example, the integrals ε1 and ε2 are depicted below
nR1: say ω1 P(x|ω1)
R2: say ω2 P(x|ω2)
g
Compute the probability for the example above
Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University
4 ε2
10 ε1
x
4
The probability of error (2)
g
Now that we can measure the performance of a decision rule we can ask the following question: How good is the Likelihood Ratio Test decision rule?
n
For...
Regístrate para leer el documento completo.