The EnTag project explored the combination and comparison of controlled and folksonomy approaches to semantic interoperability in the context of repositories and digital collections. The aim was to investigate the effect on indexing and retrieval when using only social tagging versus when using social tagging in combination with a knowledge organization system. Two different contexts were explored: tagging by readers (Intute) and tagging by authors (Science and Technology Facilities Council (STFC)). The major development was that of Intute.
For each of these a separate demonstrator was developed, one operating on data extracted from Intute (Intute 2008), and the other operating over STFC’s repository (STFC ePublication Archive 2008) in which tagging was conducted by authors submitting papers to the repository. A user study was conducted for each demonstrator, which allowed a general comparison of a repository versus digital collection context, a different knowledge organization system, interface and user community.
Three major methods to collect user data were log analysis, questionnaires, and interviews. The evaluation of the Intute demonstrator involved comparing basic and advanced system for indexing and retrieval implications. The test setting comprised 28 students in political science and 60 documents covering 4 topics of relevance for the students. Dewey Decimal Classification was used. The STFC study involved 10 authors depositors. The ACM Computing Classification Scheme was used.
The results of the Intute study showed the importance of controlled vocabulary suggestions (to produce ideas of tags to use, to ensure consistency and retrieval, to make it easier to find focus for the tagging, etc.) Furthermore, the value and usefulness of the suggestions proved to be very dependent on the quality of the suggestions. The suggestions must be user-oriented as regards to level of specificity, perspective and currency. Most tags were added by typing them directly in, as common in social tagging applications; of the other features used, the most frequent one was DDC suggestions, and another tagger’s cloud. That the participants appreciated the suggestions was also seen from their comments. Both simple tagging and enhanced tagging provided additional entry points (for retrieval) beyond the original indexing. There was some evidence that vocabulary-based suggestions, in particular, provided additional access points beyond the literal text. Most participants claimed that they would be willing to use similar tools in real life.
The results of the STFC study show that there is a general pervading sentiment amongst the depositors that choosing terms from a controlled vocabulary was a “good thing” and in fact better than own terms. The participants could overall see the point of the adding terms for information retrieval purposes, and could see the advantages of consistency of retrieval if the terms used were from an authoritative source. Most claimed that they would be willing to use a tool similar to the one provided, albeit with some reservations and suggestions about the interface. ACM classification was however not seen as good enough for the purposes of this group.
In conclusion, we recommend that social tagging be allowed in the JISC context (e.g., repositories), enhanced with suggestions from a controlled vocabulary. More findings are needed so it is important to further analyze, experiment and pilot test tools derivative from both Intute and STFC demonstrators. It was shown that further developments and improvements are needed in the following major aspects: automated suggestions, controlled suggestions, tag input features such as auto-complete and spelling checking, controlled vocabulary presentation, other controlled vocabularies and the user interface. Detailed recommendations are discussed in Deliverable 5.1: Recommendations briefing paper.
JISC , 2009. , p. 13