Let us first study a simple example of propositional data mining in the domain of wine tasting. The task is to recommend wines based on the list of other wines that a person likes. Let us assume that we have a large database with information of whether or not some people like a particular wine. This can be represented as a table with wines as columns and people as rows. Each cell contains a 1 if the person likes the wine and 0 if not. Such a table is shown in Figure 5.1.
![]() ![]() |
First, we will find interesting sets of wines. We measure how often
all the wines of the set are liked by the same people. A frequent
itemset is a set of columns for which the number of rows that has only
1s in the corresponding cells is greater than some threshold. Let us say
that a group of wines is a frequent itemset if at least 30% of the
people like all of them. In this case, the frequent itemsets are
, and
.
The hypothesis space for different itemsets (or patterns) is depicted in
Figure 5.1. The different hypotheses have a partial order
for generality and specificity. If an itemset is frequent, all
itemsets that are generalisations of it, are also. If an itemset is
not frequent, all itemsets that are specialisations of it, are
infrequent as well. This property is essential for pruning the hypothesis space
during search. For instance, if we know that is not
frequent, we also know that
is not frequent without
testing. The size of the hypothesis space is exponential with respect
to the number of items, so pruning is essential to achieve a reasonable
computational complexity.
An association rule tells that if a person likes a certain set of
wines, he or she will like some other wine. A frequent itemset can be
transformed into an association rule by choosing one of the wines to
be the one to be predicted based on the others. Now we can test
whether the rule applies to all the people in the database, or is
statistically significant. The found rules are the logic programme that
were inferred from the data inductively. In the example, we can find
the rule
that applies to all cases, that is, everyone
who likes wine
also likes wine
.