lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Visualization of Text Duplicates in Documents
Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering.
Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering.
2009 (English)Independent thesis Advanced level (degree of Master (Two Years)), 15 credits / 22,5 HE creditsStudent thesis
Abstract [en]

In this thesis, a tool to visualize duplicate parts in a series of given documents is developed.

Text duplicates are very common nowadays in all fields. This behavior severelyharms the rights of the original authors though it facilitates the work of those whocopy from them. Effective legal measures have been taken when it comes to copyrightissue. An increasing large number of people have paid serious attention to what theywrite when they refer to other people's works. Although references are properly madeby many who admire and respect others' achievements, plagiarism takes place all thetime. Therefore, an intuitive way of visualizing duplicate parts is needed so thatpeople can easily grasp the purpose and decide the legality of those duplicates. Whenit comes to computer science, software clone is very typical phenomenon amongdifferent development groups or even within one group. Since a piece of softwareusually have its hierarchy, it is also interesting to group members when they do aclone detection of their own or other software. For example, if a good overview of thehierarchies is provided in a tree representation, one can easily locate the clones of aparticular node in other trees. More interaction techniques can allow concrete codeaccesses through double clicking on a highlighted node.

To visualize duplicate parts in a nice and intuitive way, a visualization tool isdeveloped for this thesis project. By the time it is done, the following features shouldbe fulfilled. First, the tool can visualize similar or identical parts given a data set.Second, hierarchies of those files can be demonstrated with proper layout. Third, theuser can manipulate the data items on the screen in order to get a better insight of thedata set and help with analysis tasks. Forth, different levels of abstraction areprovided so that the user can either get an overview of all the files or specificallycheck the duplicate parts in the documents of interest.

Place, publisher, year, edition, pages
2009. , p. 77
Series
Reports from MSI, ISSN 1650-2647
Keywords [en]
Duplicates, PREFUSE, Visualization, Treemap, Similarity, Interaction
Identifiers
URN: urn:nbn:se:vxu:diva-5408ISRN: VXU/MSI/DA/E/--09029/--SEOAI: oai:DiVA.org:vxu-5408DiVA, id: diva2:224192
Presentation
D1169, School of Mathematics and Systems Engineering, D building, Växjö University (English)
Uppsok

Supervisors
Examiners
Projects
Visualization of Text Duplicates in DocumentsAvailable from: 2009-06-17 Created: 2009-06-17 Last updated: 2010-03-10Bibliographically approved

Open Access in DiVA

fulltext(3277 kB)351 downloads
File information
File name FULLTEXT01.pdfFile size 3277 kBChecksum SHA-512
0346157fbf818e3e9e6e5932d36fb0b8b7b0d778dd2feec0e4d53ac6c9f6b861d541155884b5da91908529d6147a669b2edd358cbaa555697adac8a8a6f31575
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Wang, ChaoPan, Han
By organisation
School of Mathematics and Systems Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 351 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 395 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf