Next: Logical hidden Markov models Up: Statistical relational learning Previous: Statistical relational learning Contents

Combination rules

There is one non-trivial point in forming a Bayesian network from a PRM or a BLP. It is when there are one-to-many or many-to-many relationships or equivalently multiple proofs for a single atom. The child node in the Bayesian network would get many sets of parents where each set defines a conditional probability distribution for the child. The number of parents varies from sample to sample. This is solved by combination rules (or combining rules or aggregate dependencies or aggregate functions) which combine many probability distributions into one.

Figure 6.1 shows a situation where two rules apply to $\mathrm{opinion}(john,b)$ . Each rule gives a conditional probability for John's opinion about wine $ b$ and they must be combined using a combination rule. Note that the number of rules that apply, varies from sample to sample.

The most typical combination rule is the noisy-or (see Pearl, 1988) for binary variables. The probability of the binary variable $ x$ being false given its binary parents $\mathbf{y}=(y_1,\dots,y_n)$ is $P(x=0\mid \mathbf{y})=\prod_{i\mid y_i=1} q_i$ , that is, $ x$ is false iff all its possible causes $ y_i$ are independently inhibited by noise with probability $ q_i$ each. For example, each disease $ y_i$ has a probability $ 1-q_i$ to cause fever $ x$ and the patient gets the fever if any one of the diseases cause it.

Noisy-or is asymmetric with respect to the binary variables it deals with. If zeros and ones are mutually exchanged, the rule becomes noisy-and. Publication VI studies two combination rules that are symmetric and can be applied to discrete and continuous values as well. The first is Naïve Bayes (or maximum entropy) combination rule which corresponds to having a Markov network where the different sets of parents are not connected to each other. The second is product of experts, where the probability density is a function of the product of probability densities proposed by the different experts (or sets of parents). Other combination rules include sigmoid (Neal, 1992), noisy maximum and minimum (Diez, 1993), mixture of experts, and any aggregate functions such as sum, average, median, mode, and count (Getoor, 2001; Kersting and De Raedt, 2006).

Next: Logical hidden Markov models Up: Statistical relational learning Previous: Statistical relational learning Contents

Tapani Raiko 2006-11-21