Salmon Run: An UIMA Noun Phrase POS Annotator using OpenNLP



Salmon Run: An UIMA Noun Phrase POS Annotator using OpenNLP
I stumbled on these two posts in Davelog: Getting starting with OpenNLP 1.5.0 - Sentence Detection and Tokenizing and Part of Speech (POS) Tagging with OpenNLP 1.5.0.

 So I decided to replace my SentenceAnnotator (which annotated the text with sentence annotation markers) with a NounPhraseAnnotator. This one also first splits the input text into sentences using the SentenceDetector, then for each sentence it tokenizes it into words using the Tokenizer, then find POS tags for each token using the POSTagger. Now using the tokens and the associated tags, it uses the Chunker to break up the sentence into phrase chunks. For each chunk, it checks its type and only noun-phrases (NP) are annotated. The SentenceDetector, Tokenizer, POSTagger and Chunker are all OpenNLP components, each backed by their own maximum entropy based models. Pre-built versions of these models are available for download from here.


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts