Arrêt de service programmé du vendredi 10 juin 16h jusqu’au lundi 13 juin 9h. Pour en savoir plus
Accéder directement au contenu Accéder directement à la navigation
Communication dans un congrès

Using Text Segmentation to Enhance the Cluster Hypothesis

Abstract :

An alternative way to tackle Information Retrieval, called Passage Retrieval, considers text fragments independently rather than assessing global relevance of documents. In such a context, the fact that relevant information is surrounded by parts of text deviating from the interesting topic does not penalize the document. In this paper, we propose to study the impact of the consideration of these text fragments on a document clustering process. The use of clustering in the field of Information Retrieval is mainly supported by the cluster hypothesis which states that relevant documents tend to be more similar one to each other than to non-relevant documents and hence a clustering process is likely to gather them. Previous experiments have shown that clustering the first retrieved documents as response to a user’s query allows the Information Retrieval systems to improve their effectiveness. In the clustering process used in these studies, documents have been considered globally. Nevertheless, the assumption stating that a document can refer to more than one topic/concept may have also impacts on the document clustering process. Considering passages of the retrieved documents separately may allow to create more representative clusters of the addressed topics. Different approaches have been assessed and results show that using text fragments in the clustering process may turn out to be actually relevant.

Type de document :
Communication dans un congrès
Liste complète des métadonnées

https://hal.univ-angers.fr/hal-03350624
Contributeur : Okina Univ Angers Connectez-vous pour contacter le contributeur
Soumis le : mardi 21 septembre 2021 - 14:34:57
Dernière modification le : mercredi 20 octobre 2021 - 03:19:09

Lien texte intégral

Identifiants

Collections

Citation

Sylvain Lamprier, Tassadit Amghar, Bernard Levrat, Frédéric Saubion. Using Text Segmentation to Enhance the Cluster Hypothesis. 13th International Conference, AIMSA 2008, 2008, Varna, Bulgaria. pp.69 - 82, ⟨10.1007/978-3-540-85776-1_7⟩. ⟨hal-03350624⟩

Partager

Métriques

Consultations de la notice

9