Converting regular expressions into nondeterministic finite automata (NFAs): An implementation in Java



Related pages:

The library works by building the syntax for regular expressions out of Java syntax. A regular expression pattern r can be:

  • A literal string: "string"
  • A literal character: 'char'
  • A sequence of patterns: s(r1,...,r2)
  • A choice of patterns: or(r1,...,r2)
  • A zero-or-more repetition of a pattern: rep(r)
So a regular expression like
(foo|bar|baz)*xoo
would be
s(rep(or("foo","bar","baz")),"xoo")
in the embedded syntax.

The library works by converting each regular expression into an NFA on the fly. For whatever reason, I didn't implement it in a functional fashion, so you can't re-use a sub-expression in more than one regular expression. That is, the following breaks:

  NFA foo = s("foo") ;    NFA pattern = s(foo,foo) ;   
but
  NFA pattern = s(s("foo"),s("foo")) ; // or s("foo","foo")

Read full article from Converting regular expressions into nondeterministic finite automata (NFAs): An implementation in Java


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts