Skip to content

[Bug] SubgraphRunner unfail mechanism fails when block hasn't advanced past error block #6205

@bocaigo

Description

@bocaigo

Bug report

Description:
When a subgraph encounters a non-deterministic error (e.g., DatabaseUnavailable),
the unfail mechanism only attempts once. If the first unfail attempt occurs before
the subgraph has processed past the error block, it returns UnfailOutcome::Noop,
but the should_try_unfail_non_deterministic flag is set to false and never
retried.

This causes subgraphs to remain permanently in Failed state even though they
continue indexing successfully.

Reproduction:

  1. Subgraph encounters DatabaseUnavailable at block N
  2. Database recovers, subgraph restarts from checkpoint at block N-3
  3. First unfail attempt happens at block N-3 (< N), returns Noop
  4. Flag is set to false, never retried
  5. Subgraph continues indexing to N+1000, but health remains "failed"

Evidence:
Log showing the issue:
INFO Subgraph error is still ahead of deployment head, nothing to unfail,
error_block_range: (Included(392332788), Unbounded),
block_number: 392332785

Location:
core/src/subgraph/runner.rs:996

Suggested Fix:
Only set should_try_unfail_non_deterministic = false when UnfailOutcome::Unfailed,
keep it true when UnfailOutcome::Noop to retry on next block.

Relevant log output

IPFS hash

No response

Subgraph name or link to explorer

No response

Some information to help us out

  • Tick this box if this bug is caused by a regression found in the latest release.
  • Tick this box if this bug is specific to the hosted service.
  • I have searched the issue tracker to make sure this issue is not a duplicate.

OS information

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions