第二层思考



第二层思考

这里有一篇文章详细的描述了一次正则回溯导致 CPU 100% 的发现和解决过程,原文比较长,我之前也在 OpenResty 的开发中遇到过两次类似的问题。

这里简单归纳下,你就可以不用花费时间去了解背景了:

  1. 大部分开发语言的正则引擎是用基于回溯的 NFA 来实现(而不是基于 Thompson's NFA);

  2. 如果回溯次数过多,就会导致灾难性回溯,CPU 100%;

  3. 需要用 gdb 分析 dump,或者 systemtap 分析线上环境来定位;

  4. 这种问题很难在代码上线前发现,需要逐个 review 正则表达式;


站在开发的角度,修复完有问题的正则表达式,就告一段落了。最多再加上一个保险机制,限制下回溯的次数,比如在 OpenResty 中这样设置:

lua_regex_match_limit 100000;

这样即使出现灾难性回溯,也会被限制住,不会跑满 CPU。

嗯,看上去已经很完美了吗?让我们来跳出开发的层面,用不同的维度来看待这个问题。


Read full article from 第二层思考


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts