Skip to content

Subscribe history snapshot races live fan-out (lost/duplicated events) #44

@nficano

Description

@nficano

Category: bug Severity: major
Location: src/Arcp.Runtime/SessionState.Jobs.cs:80-120

What

History is snapshotted (line 83) before the subscription is registered (line 88). Any event the owner emits in that window is neither in the snapshot nor fanned out to the new subscriber, so it is lost; conversely events at the buffer tail can be both replayed and fanned out. The replay filter only compares against the client-supplied from_event_seq, not against what live fan-out already delivered, so it cannot close the gap/duplicate window the code comment itself acknowledges.

Evidence

var history = sub.History
    ? job.SnapshotEventHistory()
    : Array.Empty<Envelope>();
...
_server.Subscriptions.Subscribe(job.JobId, SessionId);
...
foreach (var historic in history)
{
    if (fromSeq is { } f && historic.EventSeq is { } seq && seq <= f) continue;

Proposed fix

Register the subscription first under a lock that also captures the current high-water mark, replay strictly up to that mark, then release live events with seq beyond it — so the boundary is exact with no lost or duplicated events.

Acceptance criteria

  • A subscriber that attaches to a job receiving concurrent events sees each event exactly once, in order, with no gap at the subscribe boundary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    audit/bugAudit: bug / inefficiencysev/majorSeverity: major

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions