< previous lecture main page Lecture 9 >
March 8, 2007
8.1  Outline of Topics  
Probability  
Bayes Rule  
Example 

Bayesian Networks  
Bayesian Approach: Pros and Cons  
8.2  Probability  
Probability  In case of uncertainty, probability is the "guess"
that some proposition is true or false. This guess may sometimes be true
and sometimes be false. If the guess is correct half of the time it is random and the probability is 0.5. ("Fiftyfifty" in daily speak.) 

Notation: P(e)  Given event e that we don't know whether happened, we use notation P(e) to refer to "the probability of event e happening" or "the probability of e or having happened". Without any further information, this is called the a priori or prior probability of e.  
∀e ∈ E: 0 <= P(e) <= 1  Probability is represented by a real number between 0 and 1.  
Prior probability: P(e)  For a set of atomic events/variables in a population of events E, a simple recording of their occurrence is their prior probability.  
Posterior probability: P(eD)  After we get the "evidence" (observed data) D, what is the probability of e?  
8.3  Bayes rule  
Product rule  P(A ∧ B) = P(A)P(BA) P(A ∧ B) = P(BA)P(B) 

If we want to know  P(AB)  
We need to compute  P(BA), P(A), P(B)  
Bayes rule  P(AB) = P(BA)P(A) / P(B)  
Intuitively  P(BA): The higher the probability of A,
the higher the probability of A given B P(A): The higher the probability of B given A, the higher the probability of A given B. P(B): The higher the probability of B in the absence of A, the less of a predictor of A is the presence of B. Hence it is the denominator. 

We need three terms  to compute the fourth  
Very often these three conditional terms can be computed before hand  For example, in medical diagnosis, computer vision, speech recognition  
8.4 
Example  
Patient  has a particular form of hayfever or not  
H1  Patient has hayfever  
H2  Patient does not have hayfever  
Laboratory test  Provides two outcomes, + and   
Lab test is not perfect  Outputs correct positive result in 98% of cases. Outputs correct negative result in 97% of cases. In other cases it returns the inverse result. 

The percent of the population with this kind of hayfever  0.8%  
The probabilities for this 


The two hypotheses  Either the patient has hayfever or not  
Case: We run the test on a patient  The result says: +  
The less probable hypothesis, H1  P(+hayfever)P(hayfever) = 0.98*0.008 = 0.0078  
The more probable hypothesis, H2  P(+¬hayfever)P(¬hayfever) = 0.03*0.992 = 0.0298  
Alternative name for the most probable hypothesis  h_{MAP} (maximum a posteriori)  
8.6  Bayesian Networks  
Bayesian Belief Network  Directed acyclic graph, DAG  a directed graph with no cycles, e.g. A>B>C  
Each state in the network has an associated conditional probability table  The table lists the conditional distribution for the variable given its immediate parents  
Example conditional probability table for E→B→Q 


More complex conditional probability table where B has two antecedents, E and W 


B = Broadway show; E = Electricity; W = Weekend  
The blue cell asserts  P(Broadway show = True  Electricty = True, Weekend = True) = 0.8  
8.6  Bayesian Approach: Pros and Cons  
Pros  Can calculate explicit probabilities for hypotheses. Can be used for statisticallybased learning. Can in some cases outperform other learning methods. Prior knowledge can be (incrementally) combined with newer knowledge to better approximate perfect knowledge. Can make probabilistic predictions. 

Cons  Require initial knowledge of many probabilities. Can become computationally intractable. 
