IceNLP



IceNLP is an open source software for analyzing and processing Icelandic texts. The software is implemented in Java and consists of the following components: tokeniser, unknown word guesser, part-of-speech tagger, lemmatiser, parser and named-entity recogniser.

The software was originally developed as a part of Hrafn Loftsson's Ph.D. study during the years 2004-2007, but since then students at the University of Reykjavík and the University of Iceland have helped in developing individual components.



USING IceNLP

• It is possible to try a part of the functionality of IceNLP here.
• IceNLP is an open source software that can be downloaded here. Distributed with LGPL license.

About IceNLP
IceNLP can be used for various tasks, such as breaking up text into individual tokens, tagging each token with its morphosyntactic tag, finding the lemma of a particular word and returning a shallow phrase structure and labels indicating syntactic functions.

Individual components of IceNLP can be run independently or the JAVA clusters in question connected directly to software that is being developed.



Contact
Hrafn Loftsson
Associate Professor
Reykjavík University, School of Computer Science
Menntavegi 1, 105 Reykjavík
Work phone: +354-5996227
E-mail: hrafn@ru.is
Web Page: http://www.ru.is/~hrafn


References
Hrafn Loftsson. 2008. Tagging Icelandic text: A linguistic rule-based approach. Appeared in a revised form, subsequent to editorial input by Cambridge University Press, in Nordic Journal of Linguistics, 31(1), 47-72. © 2008 Cambridge University Press.

Hrafn Loftsson, Sigrún Helgadóttir and Eiríkur Rögnvaldsson. 2011. Using a morphological database to increase the accuracy in PoS tagging. In Proceedings of Recent Advances in Natural Language Processing (RANLP 2011). Hissar, Bulgaria.

Hrafn Loftsson, Ida Kramarczyk, Sigrún Helgadóttir and Eiríkur Rögnvaldsson. 2009. Improving the PoS tagging accuracy of Icelandic text. In Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA-2009). Odense, Denmark.

Anton K. Ingason, Sigrún Helgadóttir, Hrafn Loftsson and Eiríkur Rögnvaldsson. 2008. A Mixed Method Lemmatization Algorithm Using Hierachy of Linguistic Identities (HOLI). In B. Nordström and A. Ranta (eds.), Advances in Natural Language Processing, 6th International Conference on NLP, GoTAL 2008, Proceedings. Gothenburg, Sweden.

Hrafn Loftsson and Eiríkur Rögnvaldsson. 2008. Linguistic richness and technical aspects of an incremental finite-state parser. In Proceedings of "Partial Parsing 2008", workshop at the 6th International Conference on Language Resources and Evaluation, LREC 2008. Marrakech, Morocco.

Hrafn Loftsson and Eiríkur Rögnvaldsson. 2007. IceNLP: A Natural Language Processing Toolkit for Icelandic. In Proceedings of InterSpeech 2007, Special session: "Speech and language technology for less-resourced languages". Antwerp, Belgium.

Hrafn Loftsson and Eiríkur Rögnvaldsson. 2007. IceParser: An Incremental Finite-State Parser for Icelandic. In J. Nivre, H-J. Kaalep, K. Muischnek and M. Koit (eds.), Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA-2007). Tartu, Estonia.

Hrafn Loftsson. 2007. Tagging Icelandic Text using a Linguistic and a Statistical Tagger. In Proceedings of Human Language Technologies 2007: The Conference of the North American Chapter of the ACL. Rochester, NY, USA.

Hrafn Loftsson. 2006. Tagging a morphologically complex language using heuristics. In T. Salakoski, F. Ginter, S. Pyysalo and T. Pahikkala (eds.), Advances in Natural Language Processing, 5th International Conference on NLP, FinTAL 2006, Proceedings. Turku, Finland.