Salmon Run: An UIMA Noun Phrase POS Annotator using OpenNLP
I stumbled on these two posts in Davelog: Getting starting with OpenNLP 1.5.0 - Sentence Detection and Tokenizing and Part of Speech (POS) Tagging with OpenNLP 1.5.0.
So I decided to replace my SentenceAnnotator (which annotated the text with sentence annotation markers) with a NounPhraseAnnotator. This one also first splits the input text into sentences using the SentenceDetector, then for each sentence it tokenizes it into words using the Tokenizer, then find POS tags for each token using the POSTagger. Now using the tokens and the associated tags, it uses the Chunker to break up the sentence into phrase chunks. For each chunk, it checks its type and only noun-phrases (NP) are annotated. The SentenceDetector, Tokenizer, POSTagger and Chunker are all OpenNLP components, each backed by their own maximum entropy based models. Pre-built versions of these models are available for download from here.
I stumbled on these two posts in Davelog: Getting starting with OpenNLP 1.5.0 - Sentence Detection and Tokenizing and Part of Speech (POS) Tagging with OpenNLP 1.5.0.
So I decided to replace my SentenceAnnotator (which annotated the text with sentence annotation markers) with a NounPhraseAnnotator. This one also first splits the input text into sentences using the SentenceDetector, then for each sentence it tokenizes it into words using the Tokenizer, then find POS tags for each token using the POSTagger. Now using the tokens and the associated tags, it uses the Chunker to break up the sentence into phrase chunks. For each chunk, it checks its type and only noun-phrases (NP) are annotated. The SentenceDetector, Tokenizer, POSTagger and Chunker are all OpenNLP components, each backed by their own maximum entropy based models. Pre-built versions of these models are available for download from here.
Read full article from Salmon Run: An UIMA Noun Phrase POS Annotator using OpenNLP
No comments:
Post a Comment