Skip to content

C++: identify back-edges in the control flow graph.#639

Closed
kevinbackhouse wants to merge 1 commit into
github:masterfrom
kevinbackhouse:BackEdges
Closed

C++: identify back-edges in the control flow graph.#639
kevinbackhouse wants to merge 1 commit into
github:masterfrom
kevinbackhouse:BackEdges

Conversation

@kevinbackhouse

Copy link
Copy Markdown
Contributor

Draft library for identifying back-edges in the control flow graph. This is an improved version of the draft which I previously posted on our internal repo (#23551). This version works pretty well on the examples that I have looked at. I haven't checked all the corner cases though.

To handle all the corner cases correctly, it might be easier to build this on top of the new IR.

@rdmarsh2: I think you need something like this for #633.

@kevinbackhouse kevinbackhouse added the WIP This is a work-in-progress, do not merge yet! label Dec 7, 2018
@kevinbackhouse kevinbackhouse requested a review from a team as a code owner December 7, 2018 12:29

@jbj jbj left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some beautiful code. I wonder if this applies to the IR at all, given that there's no tree form of the IR. It's always a graph.

Some of the comments in https://git.semmle.com/Semmle/code/pull/23551 continue to apply here.

I don't understand how continue and goto are handled in this code.

exprparents(child,i,parent) or
stmtparents(child,i,parent) or
stmt_decl_bind(parent,i,child) or
(initialisers(child,parent,_,_) and i = 0)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #26937 we no longer call dbscheme relations directly from outside the AST classes. I've written almost this predicate recently as part of my QL CFG work. My version is just missing the i. Also note the comment about ConditionDeclExpr, which doesn't apply here. It would need to be added.

private class Node extends ControlFlowNodeBase {
  /**
   * Gets the nearest control-flow node that's a parent of this one, never
   * crossing function boundaries.
   */
  final Node getParentNode() {
    result = this.(Expr).getParent()
    or
    result = this.(Stmt).getParent()
    or
    // An Initializer under a ConditionDeclExpr is not part of the CFG.
    result.(DeclStmt).getADeclaration().(LocalVariable) = this.(Initializer).getDeclaration()
  }
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of building this on top of your QL CFG work. The purpose of this predicate is really to define the order of evaluation within an expression tree. It makes sense that you have written almost the same logic for the CFG work. So it would be much better to reuse your logic here.

// happen that the same element is listed multiple times with
// different indices. For example, in the expression `a?:b` `a` has
// index 0 and 1.
i = min (int j | parents(element, j, child) | j)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to @ian-semmle : this is another workaround for https://jira.semmle.com/browse/CPP-297

| (child = loop.getInitialization() and i = 0) or
(child = loop.getCondition() and i = 1) or
(child = loop.getStmt() and i = 2) or
(child = loop.getUpdate() and i = 3))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should probably also be a case for RangeBasedForStmt.

element instanceof VlaDimensionStmt or
element instanceof AsmStmt or
element instanceof DeclStmt or
element instanceof Initializer)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any statements deliberately missing from this list? If all statements should be here, except a few special ones, then I suggest flipping the logic around to explicitly list the special ones.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this list by trial-and-error. You might be right that it should apply to all statements.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, then I think this list is almost exactly identical to the predicate I have in the QL CFG code that describes all the CFG nodes that should be visited before their children.

@jbj

jbj commented Dec 7, 2018

Copy link
Copy Markdown
Contributor

I think this PR is valuable, but for #633 it's IR back edges we want. Can we construct a good approximation of those while building the IR CFG? On TranslatedStmt there can be a predicate to enumerate the back edges from it just like there are predicates to enumerate other edges. This will be something like edges going around a loop, edges from continue, and edges from goto that are not obviously moving forward.

@rdmarsh2

rdmarsh2 commented Dec 7, 2018

Copy link
Copy Markdown
Contributor

I agree that what #633 needs is IR back edges. I'm not sure that we should construct them while building the IR CFG; @dave-bartolomeo and I have discussed inserting inference operators/pi nodes that might require manipulating the CFG while building later stages of the IR.

@jbj

jbj commented Dec 10, 2018

Copy link
Copy Markdown
Contributor

If we need the IR back-edge identification now, I don't see why a hypothetical feature should block the simplest solution. If we want to add the hypothetical feature in the future, we'd need to pay for the loss of simplicity at that point.

@rdmarsh2

Copy link
Copy Markdown
Contributor

#648 might also invalidate back-edge identification, but I haven't thought hard about it yet.

@geoffw0

geoffw0 commented Dec 12, 2018

Copy link
Copy Markdown
Contributor

I'd like to see some tests for this, covering things like [nested] loops, continue, break and goto.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

WIP This is a work-in-progress, do not merge yet!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants