lnu.sePublications
Change search
Link to record
Permanent link

Direct link
Hall, Johan
Publications (10 of 26) Show all publications
Hall, J. & Nivre, J. (2008). A Dependency-Driven Parser for German Dependency and Constituency Representations. In: Proceedings of the ACL-08: HLT Workshop on Parsing German (PaGe-08) (pp. 47-54). Association for Computational Linguistics (ACL),Stroudsburg
Open this publication in new window or tab >>A Dependency-Driven Parser for German Dependency and Constituency Representations
2008 (English)In: Proceedings of the ACL-08: HLT Workshop on Parsing German (PaGe-08), Association for Computational Linguistics (ACL),Stroudsburg , 2008, p. 47-54Conference paper, Published paper (Refereed)
Abstract [en]

We present a dependency-driven parser that parses both dependency structures and constituent structures. Constituency representations are automatically transformed into dependency representations with complex arc labels, which makes it possible to recover the constituent structure with both constituent labels and grammatical functions. We report a labeled attachment score close to 90% for dependency versions of the TIGER and TüBa-D/Z treebanks. Moreover, the parser is able to recover both constituent labels and grammatical functions with an F-Score over 75% for TüBa-D/Z and over 65% for TIGER.

Place, publisher, year, edition, pages
Association for Computational Linguistics (ACL),Stroudsburg, 2008
Keywords
Data-Driven Parsing, Phrase Structure Parsing, Dependency Parsing
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:vxu:diva-3503 (URN)978-1-932432-15-2 (ISBN)
Available from: 2008-09-04 Created: 2008-09-04 Last updated: 2018-01-13Bibliographically approved
Hall, J. & Nivre, J. (2008). Parsing Discontinuous Phrase Structure with Grammatical Functions. In: Advances in Natural Language Processing (pp. 169-180). Springer Berlin / Heidelberg
Open this publication in new window or tab >>Parsing Discontinuous Phrase Structure with Grammatical Functions
2008 (English)In: Advances in Natural Language Processing, Springer Berlin / Heidelberg , 2008, p. 169-180Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a novel technique for parsing discontinuous phrase structure representations, labeled with both phrase labels and grammatical functions. Phrase structure representations are transformed into dependency representations with complex edge labels, which makes it possible to induce a dependency parser model that recovers the phrase structure with both phrase labels and grammatical functions. We perform an evaluation on the German TIGER treebank and the Swedish Talbanken05 treebank and report competitive results for both data sets.

Place, publisher, year, edition, pages
Springer Berlin / Heidelberg, 2008
Keywords
Transition-Based Parsing, Phrase Structure
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:vxu:diva-3505 (URN)doi:10.1007/978-3-540-85287-2 (DOI)978-3-540-85286-5 (ISBN)
Available from: 2008-09-04 Created: 2008-09-04 Last updated: 2018-01-13Bibliographically approved
Hall, J. (2008). Transition-Based Natural Language Parsing with Dependency and Constituency Representations. (Doctoral dissertation). Växjö: Växjö University Press
Open this publication in new window or tab >>Transition-Based Natural Language Parsing with Dependency and Constituency Representations
2008 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [sv]

Denna doktorsavhandling undersöker olika aspekter av automatisk syntaktisk analys av texter på naturligt språk. En parser eller syntaktisk analysator, som vi definierar den i denna avhandling, har till uppgift att skapa en syntaktisk analys för varje mening i en text på naturligt språk. Vår metod är datadriven, vilket innebär att den bygger på maskininlärning från uppmärkta datamängder av naturligt språk, s.k. korpusar. Vår metod är också dependensbaserad, vilket innebär att parsning är en process som bygger en dependensgraf för varje mening, bestående av binära relationer mellan ord. Dessutom introducerar avhandlingen en ny metod för att koda frasstrukturer, en annan syntaktisk representationsform, som dependensgrafer vilka kan avkodas utan att information i frasstrukturen går förlorad. Denna metod möjliggör att en dependensbaserad parser kan användas för att syntaktiskt analysera frasstrukturer. Avhandlingen är baserad på fem artiklar, varav tre artiklar utforskar olika aspekter av maskininlärning för datadriven dependensparsning och två artiklar undersöker metoden för dependensbaserad frasstrukturparsning.

Den första artikeln presenterar vår första storskaliga empiriska studie av parsning av naturligt språk (i detta fall svenska) med dependensrepresentationer. En transitionsbaserad deterministisk parsningsalgoritm skapar en dependensgraf för varje mening genom att härleda en sekvens av transitioner, och minnesbaserad inlärning (MBL) används för att förutsäga transitionssekvensen. Den andra artikeln undersöker ytterligare hur maskininlärning kan användas för att vägleda en transitionsbaserad dependensparser. Den empiriska studien jämför två metoder för maskininlärning med fem särdragsmodeller för tre språk (kinesiska, engelska och svenska), och studien visar att supportvektormaskiner (SVM) med lexikaliserade särdragsmodeller är bättre lämpade än MBL för att vägleda en transitionsbaserad dependensparser. Den tredje artikeln sammanfattar vår erfarenhet av att optimera MaltParser, vår implementation av transitionsbaserad dependensparsning, för ett stort antal språk. MaltParser har använts för att analysera över tjugo olika språk och var bland de främsta systemen i CoNLLs utvärdering 2006 och 2007.

Den fjärde artikeln är vår första undersökning av dependensbaserad frastrukturparsning med konkurrenskraftiga resultat för parsning av tyska. Den femte och sista artikeln introducerar en förbättrad algoritm för att transformera frasstrukturer till dependensgrafer och tillbaka, vilket gör det möjligt att parsa kontinuerliga och diskontinuerliga frasstrukturer utökade med grammatiska funktioner.

Abstract [en]

Hall, Johan, 2008. Transition-Based Natural Language Parsing with Dependency and Constituency Representations, Acta Wexionensia No 152/2008. ISSN: 1404-4307, ISBN: 978-91-7636-625-7. Written in English.

This thesis investigates different aspects of transition-based syntactic parsing of natural language text, where we view syntactic parsing as the process of mapping sentences in unrestricted text to their syntactic representations. Our parsing approach is data-driven, which means that it relies on machine learning from annotated linguistic corpora. Our parsing approach is also dependency-based, which means that the parsing process builds a dependency graph for each sentence consisting of lexical nodes linked by binary relations called dependencies. However, the output of the parsing process is not restricted to dependency-based representations, and the thesis presents a new method for encoding phrase structure representations as dependency representations that enable an inverse transformation without loss of information. The thesis is based on five papers, where three papers explore different ways of using machine learning to guide a transition-based dependency parser and two papers investigate the method for dependency-based phrase structure parsing.

The first paper presents our first large-scale empirical study of parsing a natural language (in this case Swedish) with labeled dependency representations using a transition-based deterministic parsing algorithm, where the dependency graph for each sentence is constructed by a sequence of transitions and memory-based learning (MBL) is used to predict the transition sequence. The second paper further investigates how machine learning can be used for guiding a transition-based dependency parser. The empirical study compares two machine learning methods with five feature models for three languages (Chinese, English and Swedish), and the study shows that support vector machines (SVM) with lexicalized feature models are better suited than MBL for guiding a transition-based dependency parser. The third paper summarizes our experience of optimizing and tuning MaltParser, our implementation of transition-based parsing, for a wide range of languages. MaltParser has been applied to over twenty languages and was one of the top-performing systems in the CoNLL shared tasks of 2006 and 2007.

The fourth paper is our first investigation of dependency-based phrase structure parsing with competitive results for parsing German. The fifth paper presents an improved encoding method for transforming phrase structure representations into dependency graphs and back. With this method it is possible to parse continuous and discontinuous phrase structure extended with grammatical functions.

Place, publisher, year, edition, pages
Växjö: Växjö University Press, 2008. p. 77
Series
Acta Wexionensia, ISSN 1404-4307 ; 152
Keywords
Natural Language Parsing, Syntactic Parsing, Dependency Structure, Phrase Structure, Machine Learning
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:vxu:diva-2367 (URN)978-91-7636-625-7 (ISBN)
Public defence
2008-12-18, Myrdal, K, Växjö universitet, Växjö, 13:15 (English)
Opponent
Supervisors
Available from: 2008-10-24 Created: 2008-10-24 Last updated: 2018-01-13Bibliographically approved
Hall, J., Nivre, J. & Nilsson, J. (2007). A Hybrid Constituency-Dependency Parser for Swedish. In: Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA) (pp. 284–287).
Open this publication in new window or tab >>A Hybrid Constituency-Dependency Parser for Swedish
2007 (English)In: Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA), 2007, p. 284–287-Conference paper, Published paper (Refereed)
Abstract [en]

We present a data-driven parser that derives both constituent structures and dependency structures, alone or in combination, in one

and the same process. When trained and tested on data from the Swedish treebank Talbanken05, the parser achieves a labeled dependency accuracy of 82% and a labeled bracketing F-score of 75%.

Keywords
dependency parsing, parsing, dependency structures, constituent structures
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:vxu:diva-4746 (URN)
Available from: 2007-10-24 Created: 2007-10-24 Last updated: 2018-01-13Bibliographically approved
Nilsson, J., Nivre, J. & Hall, J. (2007). Generalizing Tree Transformations for Inductive Dependency Parsing. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (pp. 968–975). Association for Computational Linguistics
Open this publication in new window or tab >>Generalizing Tree Transformations for Inductive Dependency Parsing
2007 (English)In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Association for Computational Linguistics , 2007, p. 968–975-Conference paper, Published paper (Refereed)
Abstract [en]

Previous studies in data-driven dependency parsing have shown that tree transformations can improve parsing accuracy for specific parsers and data sets. We investigate to what extent this can be generalized across languages/treebanks and parsers, focusing on pseudo-projective parsing, as a way of capturing non-projective dependencies, and transformations used to facilitate parsing of coordinate structures and verb groups. The results indicate that the beneficial effect of pseudo-projective parsing is independent of parsing strategy but sensitive to language or treebank specific properties. By contrast, the construction specific transformations appear to be more sensitive to parsing strategy but have a constant positive effect over several languages.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2007
Keywords
Tree Transformations, Inductive Dependency Parsing, data-driven, treebank, pseudo-projective parsing
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:vxu:diva-4747 (URN)
Available from: 2007-10-24 Created: 2007-10-24 Last updated: 2018-01-13Bibliographically approved
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., . . . Marsi, E. (2007). MaltParser: A Language-Independent System for Data-Driven Dependency Parsing. Natural Language Engineering, 13(2), 95-135
Open this publication in new window or tab >>MaltParser: A Language-Independent System for Data-Driven Dependency Parsing
Show others...
2007 (English)In: Natural Language Engineering, Vol. 13, no 2, p. 95-135Article in journal (Refereed) Published
Abstract [en]

Parsing unrestricted text is useful for many language technology applications but requires parsing methods that are both robust and efficient. MaltParser is a language-independent system for data-driven dependency parsing that can be used to induce a parser for a new language from a treebank sample in a simple yet flexible manner. Experimental evaluation confirms thatMaltParser can achieve robust, efficient and accurate parsing for a wide range of languages without language-specific enhancements and with rather limited amounts of training data.

Place, publisher, year, edition, pages
Cambridge University Press, 2007
Keywords
dependency parsing, treebank, machine learning, data-driven, parsing
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:vxu:diva-4745 (URN)doi:10.1017/S1351324906004505 (DOI)
Available from: 2007-10-24 Created: 2007-10-24 Last updated: 2018-01-13Bibliographically approved
Hall, J., Nilsson, J., Nivre, J., Eryigit, G., Megyesi, B., Nilsson, M. & Saers, M. (2007). Single Malt or Blended? A Study in Multilingual Parser Optimization. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007 (pp. 933–939). Association for Computational Linguistics
Open this publication in new window or tab >>Single Malt or Blended? A Study in Multilingual Parser Optimization
Show others...
2007 (English)In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, Association for Computational Linguistics , 2007, p. 933–939-Conference paper, Published paper (Refereed)
Abstract [en]

We describe a two-stage optimization of the MaltParser system for the ten languages in the multilingual track of the CoNLL 2007 shared task on dependency parsing. The first stage consists in tuning a single-parser system for each language by optimizing parameters of the parsing algorithm, the feature model, and the learning algorithm. The second stage consists in building an ensemble system that combines six different parsing strategies, extrapolating from the optimal parameters settings for each language. When evaluated on the official test sets, the ensemble system significantly outperforms the single-parser system and achieves the highest average labeled attachment score.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2007
Keywords
dependency parsing, data-driven, CoNLL
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:vxu:diva-4748 (URN)
Available from: 2007-10-24 Created: 2007-10-24 Last updated: 2018-01-13Bibliographically approved
Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S. & Yuret, D. (2007). The CoNLL 2007 Shared Task on Dependency Parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007 (pp. 915–932). Association for Computational Linguistics
Open this publication in new window or tab >>The CoNLL 2007 Shared Task on Dependency Parsing
Show others...
2007 (English)In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, Association for Computational Linguistics , 2007, p. 915–932-Conference paper, Published paper (Other academic)
Abstract [en]

The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In this paper, we define the tasks of the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2007
Keywords
CoNLL, dependency parsing, data-driven, Natural Language Learning
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:vxu:diva-4749 (URN)
Available from: 2007-10-24 Created: 2007-10-24 Last updated: 2018-01-13Bibliographically approved
Hall, J. & Nilsson, J. (2006). CoNLL-X SharedTask: Multi-lingual Dependency Parsing. Växjö: Matematiska och systemtekniska institutionen
Open this publication in new window or tab >>CoNLL-X SharedTask: Multi-lingual Dependency Parsing
2006 (English)Report (Other academic)
Abstract [en]

The goal of this report is to summarize our experiments and present the final result of our participation in the CoNLL-X Shared Task 2006. The topic of this year's shared task was multi-lingual dependency parsing.

The organizers have prepared 13 existing dependency treebanks so that they all comply to the same markup format. The training and test data for the languages differ in size, granularity and quality, but they have tried to even out differences in the markup format. No additional information is allowed to be used besides the provided training data, forcing the parser to be fully automatic and data-driven. Ideally, the same parser should be trainable for all languages, possibly by adjusting parameters.

The main goal is to assign labeled dependency structure for all languages on held out test data, approximately 5 000 tokens for each language. The main metric for comparison of the different parsers of the participants is therefore labeled attachment score.

Place, publisher, year, edition, pages
Växjö: Matematiska och systemtekniska institutionen, 2006. p. 22
Series
Reports from MSI, ISSN 1650-2647 ; 06060
Keywords
Dependency Parsing, Support Vector Machines, Machine Learning, Memory Based Learning
National Category
Natural Language Processing
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:vxu:diva-697 (URN)
Available from: 2006-06-07 Created: 2006-06-07 Last updated: 2025-02-07Bibliographically approved
Hall, J., Nivre, J. & Nilsson, J. (2006). Discriminative Classifiers for Deterministic Dependency Parsing. In: Proceedings of the 44rd Annual Meeting of the Association for Computational Linguistics and 21th International Conference on Computational Linguistics (COLING-ACL 2006), July 17-21, 2006, Sydney, Australia (pp. 316-323). Association for Computational Linguistics, Stroudsburg
Open this publication in new window or tab >>Discriminative Classifiers for Deterministic Dependency Parsing
2006 (English)In: Proceedings of the 44rd Annual Meeting of the Association for Computational Linguistics and 21th International Conference on Computational Linguistics (COLING-ACL 2006), July 17-21, 2006, Sydney, Australia, Association for Computational Linguistics, Stroudsburg , 2006, p. 316-323Conference paper, Published paper (Refereed)
Abstract [en]

Deterministic parsing guided by treebankinduced classifiers has emerged as a simple and efficient alternative to more complex models for data-driven parsing. We present a systematic comparison of memory-based learning (MBL) and support vector machines (SVM) for inducing classifiers for deterministic dependency parsing, using data from Chinese, English and Swedish, together with a variety of different feature models. The comparison shows that SVM gives higher accuracy for richly articulated feature models across all languages, albeit with considerably longer training times. The results also confirm that classifier-based deterministic parsing can achieve parsing accuracy very close to the best results reported for more complex parsing models.

Place, publisher, year, edition, pages
Association for Computational Linguistics, Stroudsburg, 2006
Keywords
Dependency Parsing, Support Vector Machines, Data-Driven Parsing
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:vxu:diva-4667 (URN)1-932432-65-5 (ISBN)
Available from: 2007-04-15 Created: 2007-04-15 Last updated: 2018-01-13Bibliographically approved
Organisations

Search in DiVA

Show all publications