|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| The '''regular expression denial of service''' ('''ReDoS''')<ref name="ReDoS in OWASP">
| |
| {{
| |
| cite web
| |
| |author=[[OWASP]]
| |
| |date=2010-02-10
| |
| |url=http://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS
| |
| |title=Regex Denial of Service
| |
| |accessdate=2010-04-16
| |
| }}
| |
| </ref>
| |
| is a [[denial-of-service attack]] that exploits the fact that most [[regular expression]] implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size). An attacker can then cause a program using a regular expression to enter these extreme situations and then hang for a very long time.<ref name="RiverStar">
| |
| {{
| |
| cite web
| |
| |author=[[RiverStar Software]]
| |
| |date=2010-01-18
| |
| |url=https://www.riverstarsoftware.com/kb/article/security-bulletin-caution-using-regular-expressions-17.html
| |
| |title=Security Bulletin: Caution Using Regular Expressions
| |
| |accessdate=2010-04-16
| |
| }}
| |
| </ref><ref name="ModSecurity">
| |
| {{cite book
| |
| | last = Ristic
| |
| | first = Ivan
| |
| | authorlink = http://blog.ivanristic.com/
| |
| | title = ModSecurity Handbook
| |
| | publisher = [https://www.feistyduck.com/ Feisty Duck Ltd]
| |
| | date = 2010-03-15
| |
| | location = London, UK
| |
| | page = 173
| |
| | url = https://www.feistyduck.com/books/modsecurity-handbook/index.html
| |
| | isbn = 978-1-907117-02-2
| |
| }}</ref>
| |
|
| |
|
| ==Description==
| |
| Regular expression matching can be done by building a [[finite-state automaton]]. Regular expressions can be easily converted to [[nondeterministic finite-state automaton|nondeterministic automata]] (NFAs), in which for each pair of state and input symbol there may be several possible next states. After building the automaton, several possibilities exist:
| |
|
| |
|
| * the engine may convert it to a [[Deterministic finite automaton|deterministic finite-state automaton (DFA)]] and run the input through the result;
| | Female drivers will surely increase up the number of [http://browse.deviantart.com/?q=insurance insurance] that you aren't relinquishing the long run In case you loved this article and you want to receive details about [http://Nutshellurl.com/whichstatesmandateseniordiscountsforauto affordable Auto Insurance] i implore you to visit our internet site. . |
| * the engine may try one by one all the possible paths until a match is found or all the paths are tried and fail ("backtracking").<ref name="Crossby&Wallach">
| |
| {{
| |
| cite web
| |
| |author=Crosby and Wallach, Usenix Security
| |
| |year=2003
| |
| |url=http://www.cs.rice.edu/~scrosby/hash/slides/USENIX-RegexpWIP.2.ppt
| |
| |title=Regular Expression Denial Of Service
| |
| |accessdate=2010-01-13
| |
| }}
| |
| </ref><ref name="Sullivan">
| |
| {{
| |
| cite web
| |
| |author=[http://msdn.microsoft.com/en-us/magazine/ee532098.aspx?sdmr=BryanSullivan&sdmi=authors Bryan Sullivan], [[MSDN Magazine]]
| |
| |date=2010-05-03
| |
| |url=http://msdn.microsoft.com/en-au/magazine/ff646973.aspx
| |
| |title=Regular Expression Denial of Service Attacks and Defenses
| |
| |accessdate=2010-05-06
| |
| }}
| |
| </ref>
| |
| * the engine may consider all possible paths through the nondeterministic automaton in parallel;
| |
| * the engine may convert the nondeterministic automaton to a DFA [[lazy evaluation|lazily]] (''i.e.'', on the fly, during the match).
| |
| | |
| Of the above algorithms, the first two are problematic. The first is problematic because a deterministic automaton could have up to <math>2^m</math> states where <math>m</math> is the number of states in the nondeterministic automaton; thus, the conversion from NFA to DFA may take [[EXPTIME|exponential time]]. The second is problematic because a nondeterministic automaton could have an exponential number of paths of length <math>n</math>, so that walking through an input of length <math>n</math> will also take exponential time.<ref name="KirrageRathnayakeThielecke">
| |
| {{
| |
| cite conference
| |
| |last1 = Kirrage | first1 = J.
| |
| |last2 = Rathnayake | first2 = A.
| |
| |last3 = Thielecke | first3 = H.
| |
| |title = Static Analysis for Regular Expression Denial-of-Service Attacks
| |
| |booktitle = Network and System Security
| |
| |place = Madrid, Spain
| |
| |pages = 135-148
| |
| |publisher = Springer
| |
| |year = 2013
| |
| |url = http://link.springer.com/chapter/10.1007/978-3-642-38631-2_11
| |
| |doi = 10.1007/978-3-642-38631-2_11
| |
| }}
| |
| </ref>
| |
| The last two algorithms, however, do not exhibit pathological behavior.
| |
| | |
| Note that for non-pathological regular expressions the problematic algorithms are usually fast, and in practice one can expect them to "[[compiler|compile]]" a regular expression in O(m) time and match it in O(n) time; instead, simulation of an NFA and lazy computation of the DFA have [[Time complexity|O]](m<sup>2</sup>n) worst-case complexity.<ref>Lazy computation of the DFA can usually reach the speed of deterministic automatons while keeping worst case behavior similar to simulation of an NFA. However, it is considerably more complex to implement and can use more memory.</ref> Regular expression denial of service occurs when these expectations are applied to regular expressions provided by the user, and malicious regular expressions provided by the user trigger the worst-case complexity of the regular expression matcher.
| |
| | |
| While regex algorithms can be written in an efficient way, most regular expression engines in existence extend the regular expression languages with additional constructs that cannot always be solved efficiently. Such [[Regular expression#Patterns for non-regular languages|extended patterns]] essentially force the implementation of regular expression in most [[programming language]]s to use backtracking.
| |
| | |
| ==Examples==
| |
| ===Evil regexes===
| |
| Evil regexes, that get stuck on crafted input, can be different depending on the regular expression matcher that is under attack. For backtracking matchers, they occur whenever these factors occur:<ref name="Podcast">
| |
| {{
| |
| cite web
| |
| |author=Jim Manico and Adar Weidman
| |
| |date=2009-12-07
| |
| |url=http://www.owasp.org/index.php/Podcast_56
| |
| |title=OWASP Podcast 56 (ReDoS)
| |
| |accessdate=2010-04-02
| |
| }}
| |
| </ref>
| |
| * the regular expression applies repetition ("+", "*") to a complex subexpression;
| |
| * for the repeated subexpression, there exists a match which is also a suffix of another valid match.
| |
| | |
| The second condition is best explained with an example: in the regular expression <tt>(a[ab]*)+</tt>, both "a" and "aa" can match the repeated subexpression <tt>a[ab]*</tt>. Therefore, after matching "a", the nondeterministic automaton may try a new match of <tt>a[ab]*</tt> or a new match of <tt>a</tt>. If the input has many consecutive "a"s, each of them will double the number of possible paths through the automaton. Examples of "evil regexes" include the following:
| |
| * <tt>(a+)+</tt>
| |
| * <tt>([a-zA-Z]+)*</tt>
| |
| * <tt>(a|aa)+</tt>
| |
| * <tt>(a|a?)+</tt>
| |
| * <tt>(.*a){x}</tt> for x > 10
| |
| | |
| All the above are susceptible to the input <tt>aaaaaaaaaaaaaaaaaaaaaaaa!</tt> (The minimum input length might change slightly, when using faster or slower machines).
| |
| | |
| Other patterns may not cause an exponential behavior, but for long enough inputs (a few hundreds of characters, usually) they could still cause long elaboration times. An example of such a pattern is "a*b?a*x", the input being an arbitrarily long sequence of "a"s. Such a pattern may also cause backtracking matchers to hang.
| |
| | |
| ===Vulnerable regexes in online repositories===
| |
| Evil regexes have been found in online regular expression repositories. Note that it is enough to find an evil ''sub''expression in order to attack the full regex:
| |
| | |
| # [http://regexlib.com/REDetails.aspx?regexp_id=1757 RegExLib, id=1757 (email validation)] - see {{red|red}} part, which is an Evil Regex<br><code>^([a-zA-Z0-9]){{red|<nowiki>(([\-.]|[_]+)?([a-zA-Z0-9]+))*</nowiki>}}(@){1}[a-z0-9]+[.]{1}(([a-z]{2,3})|([a-z]{2,3}[.]{1}[a-z]{2,3}))$</code>
| |
| # [http://www.owasp.org/index.php/OWASP_Validation_Regex_Repository OWASP Validation Regex Repository], Java Classname - see {{red|red}} part, which is an Evil Regex<br><code>^{{red|(([a-z])+.)+}}[A-Z]([a-z])+$</code>
| |
| | |
| These two examples are also susceptible to the input <tt>aaaaaaaaaaaaaaaaaaaaaaaa!</tt>.
| |
| | |
| ===Attacks===
| |
| If a Regex itself is affected by a user input, the attacker can inject an Evil Regex, and make the system vulnerable. Therefore, in most cases, regular expression denial of service can be avoided by removing the possibility for the user to execute arbitrary patterns on the server. In this case, web applications and databases are the main vulnerable applications. Alternatively, a malicious page could hang the user's web browser or cause it to use arbitrary amounts of memory.
| |
| | |
| However, some of the examples in the above paragraphs are considerably less "artificial" than the others; thus, they demonstrate how a vulnerable regexes may be used as a result of programming mistakes. In this case [[e-mail scanner]]s and [[intrusion detection system]]s could also be vulnerable. Fortunately, in most cases the problematic regular expressions can be rewritten as "non-evil" patterns. For example, <tt>(.*a){x}</tt> can be rewritten to <tt>([^a]*a){x,}</tt>.
| |
| | |
| In the case of a web application, the programmer may use the same regular expression to validate input on both the client and the server side of the system. An attacker could inspect the client code, looking for evil regular expressions, and send crafted input directly to the web server in order to hang it.
| |
| | |
| ==References==
| |
| {{reflist}}
| |
| | |
| ==External links==
| |
| * Examples of ReDoS in open source applications:
| |
| ** [http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2009-3277 ReDoS in DataVault]
| |
| ** [http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2009-3275 ReDoS in EntLib]
| |
| ** [http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2009-3276 ReDoS in NASD CORE.NET Terelik]
| |
| * Some benchmarks for ReDoS
| |
| ** Achim Hoffman (2010). "[http://github.com/EnDe/ReDoS/ ReDoS - benchmark for regular expression DoS in JavaScript]". Retrieved 2010-04-19.
| |
| ** Richard M. Smith (2010). "[http://www.computerbytesman.com/redos Regular expression denial of service (ReDoS) attack test results]". Retrieved 2010-04-19.
| |
| * Paper on implementing regular expressions not vulnerable to certain classes of ReDoS
| |
| ** Russ Cox (2007). "[http://swtch.com/~rsc/regexp/regexp1.html Regular Expression Matching Can Be Simple And Fast]". Retrieved 2011-04-20.
| |
| * A tool for detecting ReDoS vulnerabilities
| |
| ** H. Thielecke, A. Rathnayake (2013). "[http://www.cs.bham.ac.uk/~hxt/research/rxxr.shtml Regular expression denial of service (ReDoS) static analysis]". Retrieved 2013-05-30.
| |
| | |
| [[Category:Denial-of-service attacks]]
| |
| [[Category:Pattern matching]]
| |
| [[Category:Regular expressions]]
| |