Workshops‎ > ‎RiCeRcA 2009‎ > ‎

Mining Probabilistic Declarative Process Models

Elena Bellodi,  Fabrizio Riguzzi and Evelina Lamma

ENDIF – Università di Ferrara – Via Saragat, 1 – 44122 Ferrara, Italy.

Organizations usually rely on a number of processes to achieve their mission, which describe the way resources are exploited. Formal ways of representing business processes have been studied in the so-called area of Business Processes Management (BPM). Recently, the problem of automatically mining a structured description of a business process directly from real data has been studied by many authors. The data consist of execution traces of the process and are collected by information systems which log the activities performed by the users. This problem has been called Process Mining. Recently new declarative languages have been proposed to express only constraints on process execution.

In particular SCIFF adopts first-order logic in order to represent the constraints. A trace t is a sequence of events, described by a number of attributes. A bag of process traces L is called a log. The aim of Process Mining is to infer a process model from a log. A process trace can be represented as a logical interpretation (set of ground atoms): each event is modeled with an atom whose predicate is the event type and whose arguments store the attributes. A process model in SCIFF language is a set of Integrity Constraints(ICs). The theory (or model) composed of all the ICs must be such that all the ICs are true when considering a positive trace and at least one IC is false when considering a negative one. The algorithm Declarative Process Model Learner (DPML) finds an IC theory solving the learning problem.

At this point we investigate the possibility of encoding probabilistic information in the IC theory with Markov Logic, a language extending first-order logic. ML allows to attach weights to ICs by means of the Alchemy system. In the infinite-weight limit, ML reduces to standard first-order logic. The resulting set of couples (weight,formula) is called Markov Logic Network (MLN). A set of ICs can be seen as a “hard” theory: if a world violates even one formula, it is considered impossible; in ML it is less probable, but not impossible. The weight associated to each formula reflects how strong the constraint is.

Our goal is to demonstrate that the combined use of DPML, for learning an IC theory, and Alchemy, for learning weights for formulas, produces better results than the sharp classification realized by the SCIFF theory.
The experiments based on a real dataset of university students careers, where positive traces are students who graduated, and negative ones are students who did not finish their studies. First we induced ten SCIFF theories using a ten-fold cross-validation. Then we assigned weights to the learned theories translated into ML, with an Alchemy algorithm. Ten MLNs were also generated from the learned theories by assigning the very large weight 1e+10 to all the clauses, in order to approximate a purely logical theory. Then we computed the probability of each test trace of being negative, i.e. probability of the atoms Neg(I) with I representing students' id in the test dataset, by running the belief propagation inference algorithm both on the MLNs with learned weights and on the MLNs with pseudo-infinite weights. Finally, the average area under the ROC curve (AUC) was computed: it is a measure for evaluating the classification performances of algorithms with respect to accuracy. The sharp MLN achieved an average AUC of 0.7107528, while the weighted MLN 0.7227286. We also applied a one-tailed paired t test: the null hypothesis that the two algorithms are equivalent can be rejected with a probability of 90.58%.

Conclusions. The probabilistic classification made with Alchemy is more accurate than the pure logical one made only by SCIFF.

    author = {Elena Bellodi and Fabrizio Riguzzi and Evelina Lamma},
    title  ={Mining Probabilistic Declarative Process Models},
    year = {2009},
    editor = {Marco Gavanelli and Toni Mancini},
    booktitle = {R.i.C.e.R.c.A. 2009: RCRA Incontri E Confronti},

Marco Gavanelli,
Dec 15, 2009, 9:44 AM