Skip to content

Telemetry gaps: 85% errors unclassified, missing session metadata, no doom loop protection #586

@anandgupta42

Description

@anandgupta42

Problem

Deep analysis of altimate-code-os AppInsights telemetry (10-day window) revealed:

  1. 85%+ of core_failure events classified as "unknown" — top patterns are file_not_found (334), edit_mismatch (152), not_configured errors that aren't covered by current patterns
  2. session_start missing OS/arch/node metadata — can't segment users by environment
  3. No protection against tool call loops — one user called todowrite 2,080 times in a single session; the doom loop detector only catches 3 identical calls
  4. Token tracking appears broken — tokens are in customMeasurements but analysts query customDimensions, and Anthropic's tokens_input excludes cached tokens (showing 2-3 input tokens for Opus)
  5. "moat" terminology in test files — inappropriate naming that suggests data mining

Expected

  • Error classification covers the most common patterns (file not found, edit mismatch, driver not installed, OOM)
  • Session metadata enables environment segmentation
  • Doom loop protection catches varied-input loops (not just identical calls)
  • Token tracking includes total input tokens (with cache) for Anthropic
  • No "moat" references in codebase

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions