C#: Update all security queries to path-problems#367
Conversation
hvitved
left a comment
There was a problem hiding this comment.
Many tests still need to have their expected output updated. I generally prefer the style used by e.g. https://lgtm.com/query/rule:1506493930005/lang:cpp/ (see comment), but let me know what you think.
| where c.hasFlow(source, sink) | ||
| select sink, "$@ flows to here and is used in a path.", source, "User-provided value" | ||
| select sink, source.getPathNode(c), sink.getPathNode(c), | ||
| "$@ flows to here and is used in a path.", source, "User-provided value" |
There was a problem hiding this comment.
I think I would prefer
from TaintTrackingConfiguration c, DataFlow::PathNode source, DataFlow::PathNode sink
where c.hasFlowPath(source, sink)
select sink, source, sink, "$@ flows to here and is used in a path.", source, "User-provided value"
There was a problem hiding this comment.
Or actually I think this might be slightly better:
from TaintTrackingConfiguration c, DataFlow::PathNode source, DataFlow::PathNode sink
where c.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "$@ flows to here and is used in a path.", source.getNode(), "User-provided value"
There was a problem hiding this comment.
Why? Isn't toString() and getLocation() on DataFlow::Node and DataFlow::PathNode the same?
There was a problem hiding this comment.
toString() is different when you have field flow, as the PathNode will include the accesspath.
There was a problem hiding this comment.
It seems like there's a choice between
from Configuration c, DataFlow::Node source, DataFlow::Node sink
where c.hasFlow(source, sink)
select sink, source.getPathNode(c), sink.getPathNode(c),
"$@ could flow here.", source, "User input"
vs
from Configuration c, DataFlow::PathNode source, DataFlow::PathNode sink
where c.hasFlowPath(source, sink)
select sink.getNode(), source, sink,
"$@ could flow here.", source.getNode(), "User input"
The second is actually longer. Either way, this mapping to and fro seems somewhat unsatisfactory. I have a slight preference for the first version actually, because
- The query is actually dealing with dataflow nodes. Paths are an implementation detail.
- You often use a specific type of source and sink, not
DataFlow::Node. Often the source and sink have additional predicates relating to information about the source and the sink.PathNodesdon't capture any of this. - You could forget to call
getNode().
Anyway, this sounds like something to be discussed offline in the CPH office.
There was a problem hiding this comment.
I think those are some convincing arguments, @calumgrant!
There was a problem hiding this comment.
You're missing an important argument for using PathNode. There are potentially many PathNodes corresponding to a single Node (e.g. if there are multiple configurations in scope with overlapping sources or sinks, which can easily happen), and in that case using getPathNode to get back to the PathNodes becomes semantically shaky, as you'll risk multiplying your results.
There was a problem hiding this comment.
Yes but getPathNode(c) ensures that only nodes of configuration c are used in the path. Therefore I think the risk of multiplication -- which only occurs in the edges relation -- is the same in either case.
There was a problem hiding this comment.
Still, you may get a pair of path nodes for which there isn't actually flow, as guaranteed by c.hasFlowPath(). So, unless the engine filters out results in the select predicate, for which the edges predicate does not produce a path from the select predicate's second to third column, I think we need to use the other approach.
f3df87d to
42fdddb
Compare
1ef20e5 to
793eaee
Compare
| from TaintTrackingConfiguration c, DataFlow::PathNode source, DataFlow::PathNode sink | ||
| where c.hasFlowPath(source, sink) | ||
| select sink.getNode(), source, sink, | ||
| "Private data returned by $@ is written to an external location.", source.getNode(), source.toString() |
There was a problem hiding this comment.
Should be source.getNode().toString().
| where | ||
| source = sourcePath.getNode() and | ||
| sink = sinkPath.getNode() and | ||
| c.hasFlow(source, sink) and |
There was a problem hiding this comment.
use c.hasFlowPath(sourcePath, sinkPath) instead.
There was a problem hiding this comment.
No, in this case it gives different results due to hackery around sources and sinks. The configuration's hasFlow has been changed, and hasFlow isn't final so I guess it's allowed.
There was a problem hiding this comment.
It's probably best to avoid such hacks, as it's not very clear what's going on. It would be much clearer to just define the negationbody from the overridden hasFlow as a separate member predicate, which could then be explicitly anti-joined here.
|
Drive-by comment: I was reminded today that our analysis toolchain assumes that each node selected by a path query appears in the Are you guarding against that case somehow? (I think the easiest workaround is to define a |
I don't know the exact structure of |
|
OK, glad to hear that you're already handling this for Java. Does the workaround of creating paths of length 1 have any additional benefits over explicitly specifying nodes (e.g., are there other parts of the toolchain that assume paths have at least length one)? |
The main reason for modelling sinks explicitly was AFAIR to throw away the call context on the final node in a path in order to avoid potential issues with duplicated results (if a sink was reachable from multiple call contexts). I'm not sure about what the tool chain assumptions re. path length are, if any. |
793eaee to
cf4b04a
Compare
|
@calumgrant : Happy to merge after you have added the |
Add a queries.xml file (for CWE coverage docs)
Add 1.6.20 support
Converts all of the security queries to type
path-problem. Update the expected qltest output.In most cases, the change has simply involved changing
DataFlow::NodetoDataFlow::PathNode, but there are a few cases where this has not been done. For example, when there would be semantic changes, or if the sources and sinks have additional member predicates. In this case, the predicateNode::getPathNodeis used instead.Do not merge this yet - we still need to validate the performance.