lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Sequential Exception Technique for Text Anomalies
Universiti Utara Malaysia, Malaysia.
Universiti Utara Malaysia, Malaysia.
Linnaeus University, Faculty of Arts and Humanities, Department of Cultural Sciences.ORCID iD: 0000-0002-0025-118X
2024 (English)In: Intelligent Systems of Computing and Informatics / [ed] Samsul Ariffin Abdul Karim;Anand J. Kulkarni;Chin Kim On;Mohd Hanafi Ahmad Hijazi, Informa UK Limited , 2024, p. 65-79Chapter in book (Refereed)
Abstract [en]

The repository of world knowledge is experiencing a substantial influx of textual data in natural language, surpassing the contribution of structured databases. The nature of unstructured text data, which is sparse and contains high feature dimensions, poses a non-trivial challenge for the anomaly detection task. Text anomalies refer to rare or unusual patterns of data hidden in a text dataset, making them difficult to identify. Various machine learning methods based on clustering and classification tasks have been suggested and documented in the existing literature to tackle this challenge, each with its own advantages and limitations. The deviation-based method, particularly the sequential exception technique, has shown astonishing performance in identifying anomalies in categorical datasets. However, this technique has not been tested on text data. In this study, we adapted the sequential exception technique to detect text anomalies by modifying the dissimilarity function of the technique. We evaluated the adapted technique on two text datasets: the ENRON email messages and the 20Newsgroup dataset. The experimental results illustrate the capability of the proposed method to successfully identify text anomalies, with an F-score of 78.1% for the ENRON dataset and 95% for the 20 Newsgroup dataset.

Place, publisher, year, edition, pages
Informa UK Limited , 2024. p. 65-79
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
URN: urn:nbn:se:lnu:diva-138388DOI: 10.1201/9781003400387-5Scopus ID: 2-s2.0-85196458366OAI: oai:DiVA.org:lnu-138388DiVA, id: diva2:1956710
Available from: 2025-05-07 Created: 2025-05-07 Last updated: 2025-05-07

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Mohammed, Ahmed Taiye

Search in DiVA

By author/editor
Mohammed, Ahmed Taiye
By organisation
Department of Cultural Sciences
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 1 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf