Java fuzzy String matching with names - Stack Overflow



public static final double MAX_SCORE_FOR_NO_FIRST_LETTER_MATCH = 0.3; public double scoreName(String searchName, String candidateName) { if (searchName.equals(candidateName)) return 1.0 int editDistance = StringUtils.getLevenshteinDistance( searchName, candidateName); // Normalize for length: double score = (candidateName.length() - editDistance) / candidateName.length(); // Artificially reduce the score if the first letters don't match if (searchName.charAt(0) != candidateName.charAt(0)) { score = Math.min(score, MAX_SCORE_FOR_NO_FIRST_LETTER_MATCH); } // Try Soundex or other matching here. Remember that you don't want // to go above 1.0, so you may want to create a second score and // return the higher. return Math.max(0.0, Math.min(score, 1.0)); }

Read full article from Java fuzzy String matching with names - Stack Overflow


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts