Skip to content

Idempotent job.submit replay re-executes the job a second time (§7.2) #71

@nficano

Description

@nficano

Category: correctness Severity: blocker
Location: src/Arcp.Runtime/SessionState.Jobs.cs:28-45, src/Arcp.Runtime/JobManager.cs:104-118
Spec: ARCP v1.1 §7.2

What

On a duplicate job.submit with the same idempotency_key and identical parameters, JobManager.SubmitAsync returns the existing Job (correct per §7.2). But HandleJobSubmitAsync cannot distinguish a replay from a fresh submission — the return tuple has no "is replay" flag — so it unconditionally resolves the agent and launches RunAsync again on that already-running/terminal job. The agent body executes a second time, re-emitting all events plus a second terminal job.result/job.error. For a terminal job it even resets status back to Running (job.MarkRunning()), re-revokes credentials, and schedules a second terminal cleanup. Spec §7.2 requires the runtime to return the same job.accepted for a replay — not to run the job twice.

Evidence

// JobManager.SubmitAsync — returns the existing job on idempotent hit:
if (_jobs.TryGetValue(existingRecord.JobId, out var existing))
{
    return (existing, BuildAccepted(existing));
}
// HandleJobSubmitAsync — always runs, even for a replayed existing job:
var submission = await _server.JobManager.SubmitAsync(...).ConfigureAwait(false);
var job = submission.Job;
...
await SendAsync(new Envelope { Type = MessageTypeNames.JobAccepted, ... }).ConfigureAwait(false);
var resolved = _server.AgentRegistry.Resolve(job.Agent).Agent;
_ = Task.Run(() => _server.JobManager.RunAsync(job, resolved, emit, _cts.Token), _cts.Token);

RunAsync then calls job.MarkRunning() on a job that may already be terminal (JobManager.cs:189). IdempotencyTests.Identical_retry_returns_existing_job_id only asserts the returned JobId matches — it never asserts the agent ran once — so the regression is untested.

Proposed fix

  1. Have SubmitAsync signal a replay, e.g. return (Job Job, JobAcceptedPayload Accepted, bool IsReplay) (set IsReplay = true on the idempotent-hit early return at JobManager.cs:116).
  2. In HandleJobSubmitAsync, send job.accepted for a replay but skip Resolve/RunAsync when IsReplay is true.
  3. Add an integration test: submit the same key twice to an agent that increments a shared counter; assert the counter is 1 and exactly one terminal job.result is observed.

Acceptance criteria

  • A duplicate idempotent submit returns job.accepted without invoking the agent a second time.
  • Exactly one terminal event is emitted across both submissions.
  • A terminal job's status is never reset to Running by a replay.

Metadata

Metadata

Assignees

No one assigned

    Labels

    audit/bugAudit: bug / inefficiencysev/blockerSeverity: blocker

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions