Stefano Bragaglia and Fabrizio Riguzzi
Combining logic and probability is a field of research that has received much attention in the last few years.
In this vein, many formalisms have been introduced such as Markov Logic Networks, ProbLog and Logic Programs with Annotated Disjunctions (LPADs), that allow to represent probabilistic information in logic.
LPADs are particularly interesting because they can express cause–effect relationships among events, possible effects of a cause and the contemporary contribution of more causes to the same effect in a very natural way.
From a syntactic point of view, an LPAD consists of a set of disjunctive clauses in which each atom in the head is annotated with a probability value between 0 and 1. The atoms of the head represent the mutually exclusive and exhaustive set of effects of the event represented by the body. The sum of the probabilities associated to the head atoms must be 1.
Their semantic is based on the concept of instance, that is a normal logic program obtained by choosing a logical atom from the head of each grounding of every clause of the LPAD. The probability of an instance is computed by multiplying the probability values of all the atoms chosen for that instance. The probability of a query is given by the sum of the probabilities of each instance where the query is true.
Inference with LPADs can be performed with the
Once all the explanations for the query have been found, a BDD is built that allows to compute the probability by using a dynamic programming algorithm that traverses the diagram. In some domains, exact inference may be impossible. Therefore we considered several approximate algorithms, both deterministic and stochastic.
Among these, the approximate deterministic algorithm based on branch and bound builds the set of explanations incrementally by keeping only the k most probable explanations. Using a fixed number of proofs to approximate the probability actually allows better control of the overall complexity, which is fundamental in the evaluation of large problems. The chosen explanations provide then a lower bound on the probability of the query.
We considered also a stochastic algorithm based on a Monte Carlo approach: instances are repeatedly sampled from the LPAD and the probability of the query is given by the fraction of sampled instances where the query is true. Sampling is done efficiently by considering only the clauses that are relevant for the query. The algorithm uses a meta–interpreter that chooses stochastically head atoms of resolving clauses.
Each algorithm has been applied to several datasets, both artificial and real.
One of the real world dataset consists of biological graphs sampled from a network with 5220 nodes and 11530 edges that describes the biological entities responsible for Alzheimer’s disease together with their relationships.
Since the problem is shaped as a graph where each edge has a probability value that expresses the strength of the connection, the query consists in computing the probability that a path exists between couples of nodes representing genes of Alzheimer’s disease.
The results we gathered from experiments on this dataset show that the Monte Carlo algorithm can solve subgraphs almost three times larger than the standard inference (3400 edges instead of 1200). In addition, despite being slower on smaller problems, Monte Carlo algorithm is less demanding in terms of time and memory.
Three artificial sets of graphs of increasing complexity have been created to evaluate the effectiveness of the algorithms. Again, the query is to evaluate the probability that a path exists between two given nodes of the graphs.
In this case, both algorithms solve almost 50% more problems than the standard inference in less time. Moreover, although Monte Carlo is nearly twice as fast as the other, KBest produces almost no absolute error while the former bears a 10% error in the worst case.