Accéder directement au contenu Accéder directement à la navigation
Article dans une revue

Using an Evolving Thematic Clustering in a Text Segmentation Process.

Abstract :

The thematic text segmentation task consists in identifying the most important thematic breaks in a document in order to cut it into homogeneous passages. We propose in this paper an algorithm for linear text segmentation on general corpuses. It relies on an initial clustering of the sentences of the text. This preliminary partitioningprovides a global view on the sentences relations existing in the text, considering the similarities in a group rather than individually. The method, so-called ClassStruggle, is based on the distribution of the occurrences of the members of each class. During the process, the clusters then evolve, by considering a notion of proximity and of layout in the text, in the aim to create groups that contain only sentences related to a same topic development. Finally, boundaries are created between sentences belonging to two different classes. First experimental results are promising, ClassStruggle appears to be very competitive compared with existing methods.

Type de document :
Article dans une revue
Liste complète des métadonnées

https://hal.univ-angers.fr/hal-03255366
Contributeur : Okina Université d'Angers <>
Soumis le : mercredi 9 juin 2021 - 14:58:12
Dernière modification le : jeudi 10 juin 2021 - 03:39:57

Identifiants

  • HAL Id : hal-03255366, version 1
  • OKINA : ua4283

Collections

Citation

Sylvain Lamprier, Tassadit Amghar, Bernard Levrat, Frédéric Saubion. Using an Evolving Thematic Clustering in a Text Segmentation Process.. Journal of Universal Computer Science, 2008, 14 (2), pp.178 - 192. ⟨hal-03255366⟩

Partager

Métriques

Consultations de la notice

17