Thematic Segment Retrieval Revisited - Université d'Angers Accéder directement au contenu
Communication Dans Un Congrès Année : 2008

Thematic Segment Retrieval Revisited

Résumé

Documents, especially long ones, may contain very diverse passages related to different topics. Passages Retrieval approaches have shown that, in most cases, there is a great potential benefit in considering these passages independently when computing the similarity of a document with a user’s query. Experiments have been realized in order to identify the kinds of passage which are the best suited for such a process. Contrarily to what could have been expected, working with thematic segments, which are likely to represent only one topic each, has led to greatly lower effectiveness results than the use of arbitrary sequences of words. In this paper, we show that this paradoxical observation is mainly due to biases induced by the great length diversity of the thematic passages. Therefore, we propose here to cope with these biases by using a more powerful text length normalization technique. Experiments show that, when length biases are laid aside, the use of thematic passages is better suited than arbitrary sequences of words to retrieve relevant informations as response to a user’s query.

Dates et versions

hal-03350606 , version 1 (21-09-2021)

Identifiants

Citer

Sylvain Lamprier, Tassadit Amghar, Bernard Levrat, Frédéric Saubion. Thematic Segment Retrieval Revisited. 13th International Conference, AIMSA 2008, 2008, Varna, Bulgaria. pp.157 - 166, ⟨10.1007/978-3-540-85776-1_14⟩. ⟨hal-03350606⟩

Collections

UNIV-ANGERS LERIA
7 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More