Skip to content

Server-rejected job.submit / session.list_jobs never reaches the awaiting caller — the client hangs #73

@nficano

Description

@nficano

Category: correctness Severity: major
Location: src/Arcp.Client/ArcpClient.Dispatch.cs:131-143, src/Arcp.Client/ArcpClient.Jobs.cs:37,79-90
Spec: ARCP v1.1 §12

What

When the runtime rejects a submission (e.g. DUPLICATE_KEY, AGENT_NOT_AVAILABLE, LEASE_SUBSET_VIOLATION, INVALID_REQUEST) it emits a session.error. On the client, PropagateSessionError only fails handles already present in _handles — but a not-yet-accepted submission lives in _pendingSubmits, and a pending list_jobs lives in _listJobsRequests. Neither collection is touched. Consequently SubmitAsync (awaiting handle.Accepted) and ListJobsAsync (awaiting its TCS) never complete and hang until the caller's CancellationToken fires. IdempotencyTests.Mismatched_input_with_same_key_raises_session_error actually encodes this bug as expected behavior (it asserts the submit task does not finish), masking it.

Evidence

private void PropagateSessionError(SessionErrorPayload err)
{
    foreach (var h in _handles.Values)   // pending submits + list_jobs requests are NOT here
    {
        h.OnError(new JobErrorPayload { Code = err.Code, Message = err.Message, ... });
    }
}
// SubmitAsync waits forever if the submit is rejected before acceptance:
await handle.Accepted.WaitAsync(cancellationToken).ConfigureAwait(false);

Proposed fix

  1. In PropagateSessionError, also fault every pending submit and list-jobs request: drain _pendingSubmits calling OnError/setting an exception on each, and TrySetException on every _listJobsRequests TCS. (Because session.error is not correlated to a specific request id, failing all outstanding requests is the safest contract; alternatively, have the server echo the offending envelope id and correlate.)
  2. Fix Mismatched_input_with_same_key_raises_session_error to assert SubmitAsync throws DuplicateKeyException rather than hangs.

Acceptance criteria

  • A submit rejected by the server causes SubmitAsync to throw an ArcpException carrying the server error code.
  • ListJobsAsync throws (not hangs) when the server returns session.error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    audit/bugAudit: bug / inefficiencysev/majorSeverity: major

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions