A Decomposition Approach for Discovering Discriminative Motifs in a Sequence Database
Résumé
Considerable effort has been invested over the years in ad-hoc algorithms for item set and pattern mining. Constraint programming has recently been proposed as a means to tackle item set mining tasks within a general modelling framework. We follow this approach to address the discovery of discriminative n-ary motifs in databases of labeled sequences. We define a n-ary motif as a mapping of n patterns to n class-wide embeddings and we restrict the interpretation of constraints on a motif to the sequences embedding all patterns. We formulate core constraints that minimize redundancy between motifs and introduce a general constraint optimization framework to compute common and exclusive motifs. We cast the discovery of closed and replication-free motifs in this framework for which we propose a two-stage approach based on constraint programming. Experimental results on datasets of protein sequences demonstrate the efficiency of the approach.
Domaines
Informatique [cs]
Origine : Fichiers produits par l'(les) auteur(s)