百度笔试题目剖析――拼写纠错 - summerbell - ITeye技术网站



百度笔试题目剖析――拼写纠错 - summerbell - ITeye技术网站

在用户输入英文单词时,经常发生错误,我们需要对其进行纠错。假设已经有一个包含了正确英文单词的词典,请你设计一个拼写纠错的程序。

(1)请描述你解决这个问题的思路;

(2)请给出主要的处理流程,算法,以及算法的复杂度;

(3)请描述可能的改进(改进的方向如效果,性能等等,这是一个开放问题)

 

网上流传解答:

 (1)思路:

字典以字母键树组织,在用户输入同时匹配

 

(2)流程:

每输入一个字母:

沿字典树向下一层,

a)若可以顺利下行,则继续至结束,给出结果;

b)若该处不能匹配,纠错处理,给出拼写建议,继续至a)

 

算法:

1.在字典中查找单词

字典采用27叉树组织,每个节点对应一个字母,查找就是一个字母一个字母匹配.算法时间就是单词的长度k

 

2.纠错算法

情况:当输入的最后一个字母不能匹配时就提示出错,简化出错处理,动态提示可能处理方法:

(a)当前字母前缺少了一个字母:搜索树上两层到当前的匹配作为建议;

(b)当前字母拼写错误:当前字母的键盘相邻作为提示;(只是简单的描述,可以有更多的)根据分析字典特征和用户单词已输入部分选择(a)(b)处理


Read full article from 百度笔试题目剖析――拼写纠错 - summerbell - ITeye技术网站


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts