# INTRODUCTION TO A.I.

## Bayes Rule and Bayesian Networks

#### Lecture 8

March 8, 2007

 8.1 Outline of Topics Probability Bayes Rule Example Bayesian Networks Bayesian Approach: Pros and Cons

 8.2 Probability Probability In case of uncertainty, probability is the "guess" that some proposition is true or false. This guess may sometimes be true and sometimes be false. If the guess is correct half of the time it is random and the probability is 0.5. ("Fifty-fifty" in daily speak.) Notation: P(e) Given event e that we don't know whether happened, we use notation P(e) to refer to "the probability of event e happening" or "the probability of e or having happened". Without any further information, this is called the a priori or prior probability of e. ∀e ∈ E: 0 <= P(e) <= 1 Probability is represented by a real number between 0 and 1. Prior probability: P(e) For a set of atomic events/variables in a population of events E, a simple recording of their occurrence is their prior probability. Posterior probability: P(e|D) After we get the "evidence" (observed data) D, what is the probability of e?

 8.3 Bayes rule Product rule P(A ∧ B) = P(A)P(B|A) P(A ∧ B) = P(B|A)P(B) If we want to know P(A|B) We need to compute P(B|A), P(A), P(B) Bayes rule P(A|B) = P(B|A)P(A) / P(B) Intuitively P(B|A): The higher the probability of A, the higher the probability of A given B P(A): The higher the probability of B given A, the higher the probability of A given B. P(B): The higher the probability of B in the absence of A, the less of a predictor of A is the presence of B. Hence it is the denominator. We need three terms to compute the fourth Very often these three conditional terms can be computed before hand For example, in medical diagnosis, computer vision, speech recognition

8.4
Example
Patient has a particular form of hayfever or not
H1 Patient has hayfever
H2 Patient does not have hayfever
Laboratory test Provides two outcomes, + and -
Lab test is not perfect Outputs correct positive result in 98% of cases.
Outputs correct negative result in 97% of cases.
In other cases it returns the inverse result.
The percent of the population with this kind of hayfever 0.8%
The probabilities for this
 P(hayfever) = 0.008 P(¬hayfever) = 0.992 P(+|hayfever) = 0.98 P(-|hayfever) = 0.02 P(+|¬hayfever) = 0.03 P(-|¬hayfever = 0.97
The two hypotheses Either the patient has hayfever or not
Case: We run the test on a patient The result says: +
The less probable hypothesis, H1 P(+|hayfever)P(hayfever) = 0.98*0.008 = 0.0078
The more probable hypothesis, H2 P(+|¬hayfever)P(¬hayfever) = 0.03*0.992 = 0.0298
Alternative name for the most probable hypothesis hMAP (maximum a posteriori)

8.6 Bayesian Networks
Bayesian Belief Network Directed acyclic graph, DAG - a directed graph with no cycles, e.g. A->B->C
Each state in the network has an associated conditional probability table The table lists the conditional distribution for the variable given its immediate parents
Example conditional probability table for E→B→Q
 E ¬E B 0.8 0.1 ¬B 0.2 0.9
More complex conditional probability table where B has two antecedents, E and W
 E,W E,¬W ¬E,W ¬E,¬W B 0.8 0.2 0.2 0.1 ¬B 0.2 0.8 0.8 0.9
B = Broadway show; E = Electricity; W = Weekend
The blue cell asserts P(Broadway show = True | Electricty = True, Weekend = True) = 0.8

 8.6 Bayesian Approach: Pros and Cons Pros Can calculate explicit probabilities for hypotheses. Can be used for statistically-based learning. Can in some cases outperform other learning methods. Prior knowledge can be (incrementally) combined with newer knowledge to better approximate perfect knowledge. Can make probabilistic predictions. Cons Require initial knowledge of many probabilities. Can become computationally intractable. 