2025-02-AI/final/1127.md

# Study Guide: Undirected Graphical Models (Markov Random Fields)

**Date:** 2025.11.27
**Topic:** Potential Functions, Partition Function, and Conditional Independence in MRFs.

---

### **1. Recap: Decomposition in Undirected Graphs**
Unlike Directed Graphical Models (Bayesian Networks) which use conditional probabilities, **Undirected Graphical Models (Markov Random Fields - MRFs)** cannot directly use probabilities because there is no direction/causality. Instead, they decompose the joint distribution based on **Maximal Cliques**.

* **Cliques:** Subsets of nodes where every node is connected to every other node.
* **Maximal Clique:** A clique that cannot be expanded (e.g., in the example graph, the maximal cliques covers the graph).
* **Decomposition Rule:** The joint distribution is the product of functions defined over these maximal cliques.

---

### **2. Potential Functions ($\psi$)**
* **Definition:** For each maximal clique $C$, we define a **Potential Function** $\psi_C(x_C)$ (often denoted as $\phi$ or $\psi$).
    * It is a **positive function** ($\psi(x) \ge 0$) mapping the state of the clique variables to a real number.
    * It represents the "compatibility" or "energy" of that configuration.
* **Key Distinction:** A potential function is **NOT a probability**. It does not sum to 1. It is just a score (non-negative function).
    * *Example:* $\psi_{12}(x_1, x_2)$ scores the interaction between $x_1$ and $x_2$.

---

### **3. The Partition Function ($Z$)**
Since the product of potential functions is not a probability distribution (it doesn't sum to 1), we must normalize it.

* **Definition:** The normalization constant is called the **Partition Function** ($Z$).
    $$Z = \sum_{x} \prod_{C} \psi_C(x_C)$$
* **Role:** It ensures that the resulting distribution sums to 1, making it a valid probability distribution.
    $$P(x) = \frac{1}{Z} \prod_{C} \psi_C(x_C)$$
* **Calculation:** To find $Z$, we must sum the product of potentials over **all possible states** (combinations) of the random variables. This summation is often computationally expensive.

#### **Example Calculation**
The lecture walks through a simple example with 4 binary variables and two cliques: $\{x_1, x_2, x_3\}$ and $\{x_3, x_4\}$.
* **Step 1:** Define potential tables for $\psi_{123}$ and $\psi_{34}$.
* **Step 2:** Calculate the score for every combination.
* **Step 3:** Sum all scores to get $Z$. In the example, $Z=10$.
* **Step 4:** The probability of any specific state (e.g., $P(1,0,0,0)$) is its specific score divided by $Z$ (e.g., $(1 \times 3)/10$ or similar depending on values).

---

### **4. Parameter Estimation**
* **Discrete Case:** If variables are discrete (like the email spam example), the parameters are the entries in the potential tables. We estimate these values from data to maximize the likelihood.
* **Continuous Case:** If variables are continuous, potential functions are typically Gaussian distributions. We estimate means and covariances.
* **Reduction:** Just like in Bayesian Networks, using the graph structure reduces the number of parameters.
    * *Without Graph:* A full table for 4 binary variables needs $2^4 = 16$ entries.
    * *With Graph:* We only need tables for the cliques, significantly reducing complexity.

---

### **5. Verifying Conditional Independence**
The lecture demonstrates analytically that the potential function formulation preserves the conditional independence properties of the graph.

* **Scenario:** Graph with structure $x_1 - x_2 - x_3 - x_4$.
    * Is $x_4$ independent of $x_1$ given $x_3$?
* **Analytical Check:**
    * We calculate $P(x_4=1 | x_1=0, x_2=1, x_3=1)$.
    * We also calculate $P(x_4=1 | x_1=0, x_2=0, x_3=1)$.
* **Result:** The calculation shows that as long as $x_3$ is fixed (given), the value of $x_1$ and $x_2$ cancels out in the probability ratio.
    * $P(x_4|x_1, x_2, x_3) = \frac{\phi_1}{\phi_1 + \phi_0}$ (depends only on potentials involving $x_4$ and $x_3$).
* **Conclusion:** This confirms that $x_4 \perp \{x_1, x_2\} | x_3$. The formulation correctly encodes the global Markov property.