Skip to content

C#: Update all security queries to path-problems#367

Merged
hvitved merged 6 commits into
github:masterfrom
calumgrant:cs/path-problems
Nov 22, 2018
Merged

C#: Update all security queries to path-problems#367
hvitved merged 6 commits into
github:masterfrom
calumgrant:cs/path-problems

Conversation

@calumgrant

@calumgrant calumgrant commented Oct 25, 2018

Copy link
Copy Markdown
Contributor

Converts all of the security queries to type path-problem. Update the expected qltest output.

In most cases, the change has simply involved changing DataFlow::Node to DataFlow::PathNode, but there are a few cases where this has not been done. For example, when there would be semantic changes, or if the sources and sinks have additional member predicates. In this case, the predicate Node::getPathNode is used instead.

Do not merge this yet - we still need to validate the performance.

@calumgrant calumgrant added the WIP This is a work-in-progress, do not merge yet! label Oct 25, 2018
@calumgrant calumgrant requested a review from a team as a code owner October 25, 2018 14:38
@calumgrant calumgrant closed this Oct 25, 2018
@calumgrant calumgrant reopened this Oct 25, 2018

@hvitved hvitved left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many tests still need to have their expected output updated. I generally prefer the style used by e.g. https://lgtm.com/query/rule:1506493930005/lang:cpp/ (see comment), but let me know what you think.

where c.hasFlow(source, sink)
select sink, "$@ flows to here and is used in a path.", source, "User-provided value"
select sink, source.getPathNode(c), sink.getPathNode(c),
"$@ flows to here and is used in a path.", source, "User-provided value"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer

from TaintTrackingConfiguration c, DataFlow::PathNode source, DataFlow::PathNode sink
where c.hasFlowPath(source, sink)
select sink, source, sink, "$@ flows to here and is used in a path.", source, "User-provided value"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

@aschackmull aschackmull Oct 26, 2018

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or actually I think this might be slightly better:

from TaintTrackingConfiguration c, DataFlow::PathNode source, DataFlow::PathNode sink
where c.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "$@ flows to here and is used in a path.", source.getNode(), "User-provided value"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Isn't toString() and getLocation() on DataFlow::Node and DataFlow::PathNode the same?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toString() is different when you have field flow, as the PathNode will include the accesspath.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there's a choice between

from Configuration c, DataFlow::Node source, DataFlow::Node sink
where c.hasFlow(source, sink)
select sink, source.getPathNode(c), sink.getPathNode(c),
  "$@ could flow here.", source, "User input"

vs

from Configuration c, DataFlow::PathNode source, DataFlow::PathNode sink
where c.hasFlowPath(source, sink)
select sink.getNode(), source, sink,
  "$@ could flow here.", source.getNode(), "User input"

The second is actually longer. Either way, this mapping to and fro seems somewhat unsatisfactory. I have a slight preference for the first version actually, because

  • The query is actually dealing with dataflow nodes. Paths are an implementation detail.
  • You often use a specific type of source and sink, not DataFlow::Node. Often the source and sink have additional predicates relating to information about the source and the sink. PathNodes don't capture any of this.
  • You could forget to call getNode().

Anyway, this sounds like something to be discussed offline in the CPH office.

@hvitved hvitved Oct 26, 2018

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those are some convincing arguments, @calumgrant!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing an important argument for using PathNode. There are potentially many PathNodes corresponding to a single Node (e.g. if there are multiple configurations in scope with overlapping sources or sinks, which can easily happen), and in that case using getPathNode to get back to the PathNodes becomes semantically shaky, as you'll risk multiplying your results.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but getPathNode(c) ensures that only nodes of configuration c are used in the path. Therefore I think the risk of multiplication -- which only occurs in the edges relation -- is the same in either case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, you may get a pair of path nodes for which there isn't actually flow, as guaranteed by c.hasFlowPath(). So, unless the engine filters out results in the select predicate, for which the edges predicate does not produce a path from the select predicate's second to third column, I think we need to use the other approach.

@calumgrant calumgrant force-pushed the cs/path-problems branch 2 times, most recently from f3df87d to 42fdddb Compare October 26, 2018 11:52
@calumgrant calumgrant added the C# label Oct 26, 2018
@calumgrant calumgrant removed the WIP This is a work-in-progress, do not merge yet! label Nov 6, 2018
Comment thread csharp/ql/src/Security Features/CWE-327/DontInstallRootCert.ql Outdated
from TaintTrackingConfiguration c, DataFlow::PathNode source, DataFlow::PathNode sink
where c.hasFlowPath(source, sink)
select sink.getNode(), source, sink,
"Private data returned by $@ is written to an external location.", source.getNode(), source.toString()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be source.getNode().toString().

where
source = sourcePath.getNode() and
sink = sinkPath.getNode() and
c.hasFlow(source, sink) and

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use c.hasFlowPath(sourcePath, sinkPath) instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in this case it gives different results due to hackery around sources and sinks. The configuration's hasFlow has been changed, and hasFlow isn't final so I guess it's allowed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably best to avoid such hacks, as it's not very clear what's going on. It would be much clearer to just define the negationbody from the overridden hasFlow as a separate member predicate, which could then be explicitly anti-joined here.

Comment thread csharp/ql/src/Security Features/CWE-807/ConditionalBypass.ql Outdated
Comment thread csharp/ql/src/Security Features/CWE-838/InappropriateEncoding.ql Outdated
Comment thread csharp/ql/src/Security Features/CWE-838/InappropriateEncoding.ql Outdated
Comment thread csharp/ql/src/Security Features/InsecureRandomness.ql Outdated
@xiemaisi

Copy link
Copy Markdown

Drive-by comment: I was reminded today that our analysis toolchain assumes that each node selected by a path query appears in the edges relation unless an explicit nodes relation is provided. So in particular if you have an empty edges relation, any query results (which might well exist, since the path from source to sink could be empty) are discarded.

Are you guarding against that case somehow? (I think the easiest workaround is to define a nodes query relation in PathGraph.)

@aschackmull

Copy link
Copy Markdown
Contributor

Drive-by comment: I was reminded today that our analysis toolchain assumes that each node selected by a path query appears in the edges relation unless an explicit nodes relation is provided. So in particular if you have an empty edges relation, any query results (which might well exist, since the path from source to sink could be empty) are discarded.

Are you guarding against that case somehow? (I think the easiest workaround is to define a nodes query relation in PathGraph.)

I don't know the exact structure of PathNodes in C#, but for java the PathNodes corresponding to sinks are special, so if a source is also a sink such that the Node path is the empty path then the corresponding PathNode path in the edges relation will have length 1.

@xiemaisi

Copy link
Copy Markdown

OK, glad to hear that you're already handling this for Java. Does the workaround of creating paths of length 1 have any additional benefits over explicitly specifying nodes (e.g., are there other parts of the toolchain that assume paths have at least length one)?

@aschackmull

Copy link
Copy Markdown
Contributor

OK, glad to hear that you're already handling this for Java. Does the workaround of creating paths of length 1 have any additional benefits over explicitly specifying nodes (e.g., are there other parts of the toolchain that assume paths have at least length one)?

The main reason for modelling sinks explicitly was AFAIR to throw away the call context on the final node in a path in order to avoid potential issues with duplicated results (if a sink was reachable from multiple call contexts). I'm not sure about what the tool chain assumptions re. path length are, if any.

@hvitved

hvitved commented Nov 21, 2018

Copy link
Copy Markdown
Contributor

@calumgrant : Happy to merge after you have added the nodes predicate suggested by @xiemaisi and fixed the tests.

hvitved
hvitved previously approved these changes Nov 21, 2018
@hvitved hvitved merged commit 201f64e into github:master Nov 22, 2018
aibaars added a commit that referenced this pull request Oct 19, 2021
Add a queries.xml file (for CWE coverage docs)
smowton pushed a commit to smowton/codeql that referenced this pull request Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants