Arrêt de service programmé du vendredi 10 juin 16h jusqu’au lundi 13 juin 9h. Pour en savoir plus
Accéder directement au contenu Accéder directement à la navigation
Communication dans un congrès

Thematic Segment Retrieval Revisited

Abstract :

Documents, especially long ones, may contain very diverse passages related to different topics. Passages Retrieval approaches have shown that, in most cases, there is a great potential benefit in considering these passages independently when computing the similarity of a document with a user’s query. Experiments have been realized in order to identify the kinds of passage which are the best suited for such a process. Contrarily to what could have been expected, working with thematic segments, which are likely to represent only one topic each, has led to greatly lower effectiveness results than the use of arbitrary sequences of words. In this paper, we show that this paradoxical observation is mainly due to biases induced by the great length diversity of the thematic passages. Therefore, we propose here to cope with these biases by using a more powerful text length normalization technique. Experiments show that, when length biases are laid aside, the use of thematic passages is better suited than arbitrary sequences of words to retrieve relevant informations as response to a user’s query.

Type de document :
Communication dans un congrès
Liste complète des métadonnées
Contributeur : Okina Univ Angers Connectez-vous pour contacter le contributeur
Soumis le : mardi 21 septembre 2021 - 14:34:31
Dernière modification le : mercredi 20 octobre 2021 - 03:19:09

Lien texte intégral




Sylvain Lamprier, Tassadit Amghar, Bernard Levrat, Frédéric Saubion. Thematic Segment Retrieval Revisited. 13th International Conference, AIMSA 2008, 2008, Varna, Bulgaria. pp.157 - 166, ⟨10.1007/978-3-540-85776-1_14⟩. ⟨hal-03350606⟩



Consultations de la notice