JS: add query: js/incomplete-url-regexp#631
Conversation
|
I like this query, but why should we consider a repeated |
|
Hmm, yes, perhaps we should leave it out for now. Motivation for inclusion: the use of |
| */ | ||
|
|
||
| import javascript | ||
| import semmle.javascript.security.dataflow.RegExpInjection |
There was a problem hiding this comment.
This may be part of the performance issue. By bringing this taint tracking-configuration into scope, it will be evaluated alongside IncompleteUrlRegExpTracking::Configuration in sort of a "get 1 pay for 2" deal. Or at least I think it will (@xiemaisi?)
I think it would be better to extract the relevant bits of RegExpInjection::Sink into a shared library, and then also create a .qll file for this library, like we have for the other taint-tracking queries.
That's a good point - I had completely overlooked this possibility. But based on the current results, a better alert message and some precision improvements would be needed, I think. |
|
All comments addressed. I have introduced |
|
All comments addressed. |
"Incomplete URL regular expression" -> "Incomplete regular expression for hostnames".
|
RegularExpressions.qll has been merged with |
xiemaisi
left a comment
There was a problem hiding this comment.
Mostly LGTM, just a few minor suggestions.
|
LGTM, ping @Semmle/doc for doc review. |
mchammer01
left a comment
There was a problem hiding this comment.
@esben-semmle - documentation review completed. This is looking good, I've made some minor comments (some of which you may decide to ignore). Hope this helps.
|
Thank you @mc-semmle, all comments addressed. |
|
Thank you for the updates to the documentation @esben-semmle |
Adds a query for identifying unescaped
.characters in regular expressions used in URL sanitization. The unescaped.is interesting because it looks correct on the surface:/https:\/\/subdomain.example.com\/.*/matcheshttps://subdomainXexample.com!The query is has many similarities with #623. But I think it is different enough to be a separate query.
It is rather messy to identify regular expressions for URLs using regular expressions, but I think it is the best solution until we manage to parse strings as regular expressions using QL. This causes a few misparsings that result in a silly messages or false positives:
I think they are rare enough to ignore for now.
The evaluation shows a lot of interesting results, but the performance does not look that good in isolation because of the taint tracking steps (I suspect). I will do a larger performance run over the weekend.