Using Wiktionary to build an Italian part-of-speech tagger | CLiPS
Tom De Smedt (Computational Linguistics Research Group, University of Antwerp) Fabio Marfia (Dipartimento di Elettronica, Politecnico di Milano) Pattern contains part-of-speech taggers for a number of languages (including English, Spanish, German, French and Dutch). Part-of-speech tagging is useful in many data mining tasks. A part-of-speech tagger takes a string of text and identifies the sentences and the words in the text along with their word type. The word type or part-of-speech can vary according to a word's role in the sentence. For example, in English, can can be a verb ("Can I have a can of soda?") or a noun ("Can I have a can of soda?"). The output takes the following form: Can I have a can of soda . POS-tag MD indicates a modal verb, PRP a personal pronoun, VB a verb, DT a determiner, NN a noun and IN a preposition. The tags are part of the Penn Treebank II tagset . Pattern uses Brill's algorithm to construct its part-of-speech taggers.Read full article from Using Wiktionary to build an Italian part-of-speech tagger | CLiPS
No comments:
Post a Comment