Burglary Alarm

Global Semantics Formula

To understand how that formula is derived, we need to break down how probability builds from a single event into a massive sequence of events. Let’s tackle this step-by-step.

1. What is a "Complete Path"?

In the context of Bayesian Networks, a "Complete Path" (often called a "full joint assignment" or a "possible world") is a specific scenario where every single variable in the network has an assigned, known value (True or False).

There are no "hidden" or "unknown" variables in a complete path.

2. Deriving the Long Formula: The Chain Rule of Probability

The long formula you provided does not actually come from the Bayesian Network yet. It is a fundamental law of statistics called the Chain Rule of Probability. It is how you calculate the probability of any sequence of events occurring together.

It starts with the basic definition of conditional probability for two events:

P(XY)=P(X|Y)×P(Y)

(The probability of X and Y happening = the probability of X happening given that Y happened, multiplied by the probability that Y happened in the first place).

If we have three events, we just expand the chain:

P(XYZ)=P(X|YZ)×P(YZ)

Which further expands to:

P(XYZ)=P(X|YZ)×P(Y|Z)×P(Z)

To get the formula in slides, we apply this exact same expanding chain to all 5 variables, peeling them off one by one from left to right:

  1. Start with the full sequence: P(jma¬b¬e)

  2. Peel off 'j': =P(jma¬b¬e)×P(ma¬b¬e)

  3. Peel off 'm': =P(jma¬b¬e)×P(ma¬b¬e)×P(a¬b¬e)

  4. Peel off 'a': =P(jma¬b¬e)×P(ma¬b¬e)×P(a¬b¬e)×P(¬b¬e)

  5. Peel off '¬b': =P(jma¬b¬e)×P(ma¬b¬e)×P(a¬b¬e)×P(¬b¬e)×P(¬e)


    That mathematical expansion is universally true for any probability distribution.

3. The Global Semantics Formula (The Bayesian Magic)

While the long formula above is mathematically correct, it is practically useless for computation. To calculate P(jma¬b¬e), you would need a massive data table tracking how often John calls under every possible combination of Mary, Alarms, Burglars, and Earthquakes.

This is where the Global Semantics Formula comes in.

The Global Semantics Formula defines the joint probability of a Bayesian Network by stating that a node only cares about its direct parents. The network's structure allows us to cross out the irrelevant variables in that long, ugly Chain Rule equation.

Mathematically, the Global Semantics states:

P(X1,...,Xn)=i=1nP(XiParents(Xi))

Let's apply the Global Semantics rule to simplify your long Chain Rule formula step-by-step:

The Final Result:

Thanks to the Global Semantics of the Bayesian Network, that massive, computationally heavy Chain Rule formula is transformed into a highly efficient product of simple conditional probabilities:

P(j,m,a,¬b,¬e)=P(j|a)×P(m|a)×P(a|¬b,¬e)×P(¬b)×P(¬e)

Instead of needing a giant table of 32 probabilities, you can just look up these 5 simple numbers in your network's Conditional Probability Tables (CPTs) and multiply them together.


Problem 2

In Problem 2, the lecture addresses a significantly more complex challenge than calculating a simple joint probability. The goal is to calculate the conditional probability P(j,¬m|b) — the probability that John calls and Mary does not call, given that we know a burglary has occurred.

The core difficulty here is that the network contains 5 variables, but our query only mentions 3 (J, M, and B). The remaining two variables—Alarm (A) and Earthquake (E)—are neither given as evidence nor part of the final answer. These are called Hidden Variables.

To get the correct probability, we cannot just ignore them. We must account for every possible state they could be in. This mathematical process is called Marginalization (or "summing out").

1. Setting Up the Equation

Because we know B is true, we need to calculate the probability of our target events (j,¬m) across all possible combinations of the hidden variables A and E, while keeping B as true.

Using the Global Semantics of the network, the general term inside our sum looks like this:

P(j|a)×P(¬m|a)×P(a|b,e)×P(e)

Notice that P(b) is omitted from this product. Because B is a given condition (we already know it happened), its probability is effectively 1 in this isolated context.

To marginalize, we sum this product over the four possible "worlds" (combinations of A and E):

P(j,¬m|b)=a{a,¬a}e{e,¬e}P(j|a)P(¬m|a)P(a|b,e)P(e)

2. The Four "Worlds"

We must calculate the product for these four distinct scenarios and then add the results together:

World 1: The Alarm sounds, and an Earthquake happens (a,e)

P(j|a)×P(¬m|a)×P(a|b,e)×P(e)

=0.90×0.30×0.95×0.002=0.000513

World 2: The Alarm sounds, but NO Earthquake happens (a,¬e)

P(j|a)×P(¬m|a)×P(a|b,¬e)×P(¬e)

=0.90×0.30×0.94×0.998=0.2532924


World 3: The Alarm is silent, but an Earthquake happens (¬a,e)

P(j|¬a)×P(¬m|¬a)×P(¬a|b,e)×P(e)

=0.05×0.99×0.05×0.002=0.00000495


World 4: The Alarm is silent, and NO Earthquake happens (¬a,¬e)

P(j|¬a)×P(¬m|¬a)×P(¬a|b,¬e)×P(¬e)

=0.05×0.99×0.06×0.998=0.00296406

3. The Final Sum

Because this involves summing multiple small decimal products, it is easy to make a transcription error. Storing each of these four intermediate results into the memory variables (A, B, C, D) on an fx-991ES Plus calculator allows you to compute each scenario separately and sum them at the end without losing floating-point precision.

Adding the four worlds together:

0.000513+0.2532924+0.00000495+0.00296406=0.25677441

So, the exact conditional probability P(j,¬m|b) is approximately 0.257.

4. The Intuition: Why does it equal 0.257?

The lecture emphasizes a powerful shortcut in probabilistic reasoning: High-probability paths dominate the calculation.

Look closely at the four worlds above. World 2 (0.253) contributes almost the entirely of the final answer. Why?

  1. Earthquakes are extremely rare (0.2%). Therefore, Worlds 1 and 3 are so mathematically tiny they barely affect the outcome.

  2. If a burglary happens, the alarm is almost guaranteed to sound (94%). Therefore, World 4 (Burglary happens but the alarm is silent) is also highly unlikely.

Because we know a burglary happened, we can logically assume the alarm probably sounded, and an earthquake probably didn't happen. If we only calculate the highest probability scenario (assuming A is true):

P(j|a)×P(¬m|a)=0.90×0.30=0.27

The exact answer (0.257) is incredibly close to our simplified logical guess (0.27). This demonstrates that in large Bayesian Networks, you can often estimate outcomes by simply following the single most likely path of events and ignoring the extremely rare "edge case" universes.


Problem 3

Problem 3 demonstrates Diagnostic Inference, which is a "bottom-up" approach where we reason backward from an observed effect to a hidden cause.

Specifically, the goal is to calculate the probability that an earthquake occurred (e), given the observed evidence that John has called (j).

To solve this, we cannot use the standard top-down chain rule. Instead, we must use Bayes' Rule:

P(e|j)=P(j|e)P(e)P(j)

Here is the detailed, step-by-step breakdown of how the lecture solves this equation.

Step 1: Identify the Prior Probability

The easiest piece of the puzzle is P(e), which is the baseline prior probability of an earthquake occurring.

Step 2: Calculate P(j|e) via Marginalization

We need to figure out the probability of John calling given that an earthquake is happening. However, John's calling is not directly connected to the earthquake; it is mediated by the Alarm (A) and influenced by the possibility of a Burglary (B).

Because Alarm and Burglary are hidden variables in this specific query, we must marginalize (sum) over all four of their possible states. The formula for this is:

P(j|e)=b,aP(j|a)P(a|b,e)P(b)

Using the given constants P(b)=0.001 and P(¬b)=0.999, we calculate the four possible scenarios:

Summing these four scenarios gives us our marginalized probability: P(j|e)0.29706.

Step 3: Calculate the Normalizer P(j)

The denominator in Bayes' rule, P(j), represents the absolute total probability that John calls under any circumstance.

To find this, we must sum the probability of John calling when there is an earthquake, and the probability of him calling when there is not an earthquake:

P(j)=P(j|e)P(e)+P(j|¬e)P(¬e)

Plugging these in:

P(j)=(0.297×0.002)+(0.0521×0.998) P(j)0.000594+0.05199=0.05258

Step 4: Final Computation and Interpretation

Now we have all three pieces for Bayes' Rule:

P(e|j)=0.29706×0.0020.05258P(e|j)=0.0005940.052580.0113

The Conceptual Takeaway: This result perfectly illustrates how evidence shifts our beliefs. The baseline probability of an earthquake occurring on any given day is an incredibly low 0.2% (0.002). However, the moment we receive the evidence that John called, our mathematical belief that an earthquake is happening jumps to about 1.1%.

Despite this increase, the network correctly models reality: a burglary is mathematically a much more likely cause for John's call than a rare earthquake.


Local Semantics Vs Markov Blanket

Here is a detailed breakdown of the concepts of Local Semantics and the Markov Blanket, which are crucial for understanding how information flows (and gets blocked) in a Bayesian Network.

1. Local Semantics (The "Top-Down" Shield)

Local Semantics define the most basic rule of independence in a Bayesian Network: A node is conditionally independent of all its non-descendants, given its parents.

2. The Markov Blanket (The "Complete Isolation" Zone)

While Local Semantics only shields a node from its ancestors, the Markov Blanket is a stronger concept. It is the minimal set of surrounding nodes required to make a target node conditionally independent of every other node in the entire network.

To completely isolate a target node, its Markov Blanket must include exactly three things:

  1. Its Parents (to shield it from ancestors).

  2. Its Children (to shield it from descendants).

  3. The Co-parents of its children (often called "spouses").

Why do we need the Co-parents?

This is due to the "explaining away" effect (the collider structure we discussed earlier). If you observe a child node, its parents suddenly become dependent on each other. Therefore, to fully understand the probability of a target node, you must know what its "rival" co-parents are doing, because they might be providing an alternative explanation for the child's behavior.

3. The Rain-Sprinkler-Grass Example

To make this concrete, the lecture introduces a new 4-node network:

Let's look at the Sprinkler (S) node using both concepts:

Applying Local Semantics to the Sprinkler:

Applying the Markov Blanket to the Sprinkler:

To completely isolate the Sprinkler from the entire universe, we must gather its blanket:

4. The Final Contrast (Slide 26 Summary)

The lecture concludes this section by contrasting how these two concepts are used in practice:

By defining these boundaries, Bayesian Networks allow AI to make incredibly complex calculations efficiently, only looking at the localized "blanket" of variables that actually matter, rather than the entire universe of data at once.

Calculating Inference Using the Markov Blanket (Slide 25)

Slide 25 provides a concrete mathematical example of how to use a Markov Blanket to determine the state of a specific node. The scenario asks us to figure out the probability that the Sprinkler is ON (S=t), given a specific set of observations about its Markov Blanket.

To find the probability of the sprinkler being on given this Markov Blanket—written mathematically as P(S=t|MB(S))—we must calculate a "score" for both possible states of the sprinkler (ON and OFF).

1. Calculating the Score for Sprinkler ON (S=t) We multiply the probability of the sprinkler turning on when it is cloudy by the probability of the grass being wet when both the sprinkler is on and it is raining.

2. Calculating the Score for Sprinkler OFF (S=f) We multiply the probability of the sprinkler staying off when it is cloudy by the probability of the grass being wet when the sprinkler is off but it is raining.

Dependence on the Spouse "Raining given cloudy", is missing, we need the whole blanket.

3. Normalization These raw scores (0.099 and 0.810) do not add up to 1. To convert them into true probabilities, we must normalize them by dividing the target score by the sum of all possible scores.

The Takeaway Even though the grass is wet, the final probability that the sprinkler is running is only 10.9%. Because we know it is raining, the rain provides a sufficient explanation for the wet grass, making the sprinkler highly unlikely to be the cause.


Local Semantics vs. Markov Blanket (Slide 26)

Slide 26 shifts to a conceptual summary, contrasting the two primary ways information flows through a Bayesian Network using a new hypothetical example involving an employee's Lateness (L), an Accident (A), and another variable (X, such as rain or a specific route).

1. Prediction via Local Semantics Local Semantics is used for top-down prediction.

2. Inference via the Markov Blanket The Markov Blanket is used for bottom-up inference, which requires completely isolating a node.

The "Explaining Away" Phenomenon This slide formally defines the "explaining away" effect we saw with the sprinkler.