The Stanford NLP (Natural Language Processing) Group



There are various ways to call the code, but here's a simple example to get started with using either PTBTokenizer directly or calling DocumentPreprocessor.

import java.io.FileReader;  import java.io.IOException;  import java.util.List;    import edu.stanford.nlp.ling.CoreLabel;  import edu.stanford.nlp.ling.HasWord;  import edu.stanford.nlp.process.CoreLabelTokenFactory;  import edu.stanford.nlp.process.DocumentPreprocessor;  import edu.stanford.nlp.process.PTBTokenizer;    public class TokenizerDemo {      public static void main(String[] args) throws IOException {      for (String arg : args) {        // option #1: By sentence.        DocumentPreprocessor dp = new DocumentPreprocessor(arg);        for (List sentence : dp) {          System.out.println(sentence);        }        // option #2: By token        PTBTokenizer ptbt = new PTBTokenizer(new FileReader(arg),                new CoreLabelTokenFactory(), "");        for (CoreLabel label; ptbt.hasNext(); ) {          label = ptbt.next();          System.out.println(label);        }      }    }  }

Read full article from The Stanford NLP (Natural Language Processing) Group


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts