lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Using dated training sets for classifying recent news articles with Naive Bayes and Support Vector Machines: An experiment comparing the accuracy of classifications using test sets from 2005 and 2017
Linnaeus University, Faculty of Technology, Department of Computer Science.
Linnaeus University, Faculty of Technology, Department of Computer Science.
2017 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Text categorisation is an important feature for organising text data and making it easier to find information on the world wide web.  The categorisation of text data can be done through the use of machine learning classifiers. These classifiers need to be trained with data in order to predict a result for future input. The authors chose to investigate how accurate two classifiers are when classifying recent news articles on a classifier model that is trained with older news articles. To reach a result the authors chose the Naive Bayes and Support Vector Machine classifiers and conducted an experiment. The experiment involved training models of both classifiers with news articles from 2005 and testing the models with news articles from 2005 and 2017 to compare the results. The results showed that both classifiers did considerably worse when classifying the news articles from 2017 compared to classifying the news articles from the same year as the training data.

Place, publisher, year, edition, pages
2017. , p. 29
Keywords [en]
News Articles, Machine Learning, Naive Bayes, Support vector machine, SVM, Text categorisation
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:lnu:diva-64769OAI: oai:DiVA.org:lnu-64769DiVA, id: diva2:1105495
Subject / course
Computer Science; Computer Science
Educational program
Datavetenskap, kandidatprogram, 60 hp; Digital Service Development Programme, 180 hp
Supervisors
Examiners
Available from: 2017-06-05 Created: 2017-06-04 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

fulltext(680 kB)218 downloads
File information
File name FULLTEXT01.pdfFile size 680 kBChecksum SHA-512
f0f24533cd5c1e2380b6750b12cddba78127eb90ab6456f4be550b043b5a5730f63039fa4d4546a4bdb9eb3c7f25af9d13eb510a6c2098ba00fb658d8c3ed989
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Rydberg, FilipTornfors, Jonas
By organisation
Department of Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 218 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 308 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf