A Decomposition Approach for Discovering Discriminative Motifs in a Sequence Database
Résumé
This paper addresses the discovery of discriminative nary motifs in databases of labeled sequences. We consider databases made up of positive and negative sequences and define a motif as a set of patterns embedded in all positive sequences and subject to alignment constraints. We formulate constraints to eliminate redundant motifs and present a general constraint optimization framework to compute motifs that are exclusive to the positive sequences. We cast the discovery of closed and replication-free motifs in this framework and propose a two-stage approach whose last stage reduces to a minimum set covering problem. Experiments on protein sequence datasets demonstrate its efficiency.
Domaines
Informatique [cs]
Origine : Fichiers produits par l'(les) auteur(s)