JS: add query: js/incomplete-url-regexp · Pull Request #631 · github/codeql

ghost · 2018-12-06T13:37:41Z

Adds a query for identifying unescaped . characters in regular expressions used in URL sanitization. The unescaped . is interesting because it looks correct on the surface: /https:\/\/subdomain.example.com\/.*/ matches https://subdomainXexample.com!

The query is has many similarities with #623. But I think it is different enough to be a separate query.

It is rather messy to identify regular expressions for URLs using regular expressions, but I think it is the best solution until we manage to parse strings as regular expressions using QL. This causes a few misparsings that result in a silly messages or false positives:

/^(https?:)?\/\/((service|www).)?example.com(?=$|\/)/; // NOT OK
// ^ This regular expression has an unescaped '.', which means that ')?example.com' might not match the intended host of a matched URL.

/example.dev|example.com/; // OK, but still flagged
// ^ This regular expression has an unescaped '.', which means that 'dev\|example.com' might not match the intended host of a matched URL.

I think they are rare enough to ignore for now.

The evaluation shows a lot of interesting results, but the performance does not look that good in isolation because of the taint tracking steps (I suspect). I will do a larger performance run over the weekend.

asger-semmle · 2018-12-06T15:17:57Z

I like this query, but why should we consider a repeated ., such as .*, as a reason to flag the regexp? I would say it's the other way around: .* indicates the . was intentionally not escaped. Some of the first alerts from this look like FPs to me.

ghost · 2018-12-06T15:31:04Z

Hmm, yes, perhaps we should leave it out for now.

Motivation for inclusion: the use of .* allows the target domain to be part of the path or query string if an entire URL is matched: 'http://evil.com/?example.com'.match(/http:\/\/.*.example.com/).
The issue is not solved by escaping, but by using a restricted character class.

asger-semmle · 2018-12-06T16:12:34Z

+ */
+
+import javascript
+import semmle.javascript.security.dataflow.RegExpInjection


This may be part of the performance issue. By bringing this taint tracking-configuration into scope, it will be evaluated alongside IncompleteUrlRegExpTracking::Configuration in sort of a "get 1 pay for 2" deal. Or at least I think it will (@xiemaisi?)

I think it would be better to extract the relevant bits of RegExpInjection::Sink into a shared library, and then also create a .qll file for this library, like we have for the other taint-tracking queries.

asger-semmle · 2018-12-06T16:29:10Z

Motivation for inclusion: the use of .* allows the target domain to be part of the path or query string if an entire URL is matched:

That's a good point - I had completely overlooked this possibility. But based on the current results, a better alert message and some precision improvements would be needed, I think.

ghost · 2018-12-06T21:48:19Z

All comments addressed.

I have introduced RegularExpressions.qll for the bits shared with RegExpInjection.ql, I suspect that we may need to put some of the predicates from IdentityReplacement.ql, IncompleteSanitization.ql and DoubleEscaping.ql there later.
Maybe @xiemaisi has an opinion on merging this into Regexp.qll, which seems to be dedicated to the extensional regexp AST.

ghost · 2018-12-09T21:29:35Z

All comments addressed.
Performance is not an issue: https://git.semmle.com/esben/dist-compare-reports/tree/js/bad-url-regexing_1544244537373

"Incomplete URL regular expression" -> "Incomplete regular expression for hostnames".

ghost · 2018-12-10T21:24:46Z

RegularExpressions.qll has been merged with Regexp.qll.
I have also done a squash of the fixups and a rebase.

xiemaisi

Mostly LGTM, just a few minor suggestions.

xiemaisi · 2018-12-13T08:23:09Z

LGTM, ping @Semmle/doc for doc review.

mchammer01

@esben-semmle - documentation review completed. This is looking good, I've made some minor comments (some of which you may decide to ignore). Hope this helps.

ghost · 2018-12-14T09:25:39Z

Thank you @mc-semmle, all comments addressed.

mchammer01 · 2018-12-14T11:43:56Z

Thank you for the updates to the documentation @esben-semmle
Good to go from my point of view.

ghost added the JS label Dec 6, 2018

ghost self-requested a review as a code owner December 6, 2018 13:37

asger-semmle suggested changes Dec 6, 2018

View reviewed changes

xiemaisi suggested changes Dec 7, 2018

View reviewed changes

Esben Sparre Andreasen added 7 commits December 10, 2018 22:20

JS: add query js/incomplete-url-regexp

52ca696

JS: change notes for js/incomplete-url-regexp

c65c7e7

JS: sharpen js/incomplete-url-regexp by not matching .* or .+

d4e4bc6

JS: address non-semantic review comments

994fe1b

JS: introduce near-empty RegularExpressions.qll

7c6e28d

JS: rename query

ab519d4

"Incomplete URL regular expression" -> "Incomplete regular expression for hostnames".

JS: update change notes for renamed query

09e7124

xiemaisi suggested changes Dec 11, 2018

View reviewed changes

JS: address review comments

1bc73ab

xiemaisi previously approved these changes Dec 13, 2018

View reviewed changes

xiemaisi requested a review from mchammer01 December 13, 2018 08:23

mchammer01 requested changes Dec 13, 2018

View reviewed changes

JS: address doc review comments

bb3e3a5

ghost dismissed xiemaisi’s stale review via bb3e3a5 December 14, 2018 09:24

xiemaisi previously approved these changes Dec 14, 2018

View reviewed changes

xiemaisi reviewed Dec 14, 2018

View reviewed changes

Comment thread javascript/ql/src/Security/CWE-020/IncompleteHostnameRegExp.qhelp Outdated

mchammer01 previously approved these changes Dec 14, 2018

View reviewed changes

JS: fix <p></p> issue

487b8c5

ghost dismissed stale reviews from mchammer01 and xiemaisi via 487b8c5 December 14, 2018 12:04

xiemaisi approved these changes Dec 14, 2018

View reviewed changes

asger-semmle approved these changes Dec 17, 2018

View reviewed changes

asger-semmle merged commit 7adf1d9 into github:master Dec 17, 2018

Uh oh!

Conversation

ghost commented Dec 6, 2018

Uh oh!

asger-semmle commented Dec 6, 2018

Uh oh!

ghost commented Dec 6, 2018

Uh oh!

Uh oh!

asger-semmle Dec 6, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

asger-semmle commented Dec 6, 2018

Uh oh!

ghost commented Dec 6, 2018

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ghost commented Dec 9, 2018

Uh oh!

ghost commented Dec 10, 2018

Uh oh!

xiemaisi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xiemaisi commented Dec 13, 2018

Uh oh!

mchammer01 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ghost commented Dec 14, 2018

Uh oh!

Uh oh!

mchammer01 commented Dec 14, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants