Skip to content

ADFA-3911 | Fix widget tag parsing and centralize OCR correction rules#1310

Merged
jatezzz merged 3 commits into
stagefrom
fix/ADFA-3911-widget-tag-parsing-and-ocr-rules-experimental
May 15, 2026
Merged

ADFA-3911 | Fix widget tag parsing and centralize OCR correction rules#1310
jatezzz merged 3 commits into
stagefrom
fix/ADFA-3911-widget-tag-parsing-and-ocr-rules-experimental

Conversation

@jatezzz
Copy link
Copy Markdown
Collaborator

@jatezzz jatezzz commented May 14, 2026

Description

Fixed an issue where Button widget values were being mixed up with Switch widget values due to improper parsing of separator characters. Additionally, centralized and expanded OCR sanitization rules to better handle common misreads (such as "bsld" or "text stjle" to "text_style") to improve overall attribute parsing accuracy.

Details

  • WidgetTagParser.kt: Added the dot (.) character to the separator regex group in tagExtractRegex to correctly parse and isolate widget tags, preventing Button/Switch collision.
  • OcrSanitizationRules.kt: Created a new dedicated file containing a list of OcrCorrectionRule to centralize regex patterns and replacements for colors, dimensions, margins, and text attributes.
  • FuzzyAttributeParser.kt: Refactored tokenizeAnnotation to fold over the new preTokenizationRules list instead of using hardcoded chained string replacements.
  • MarginAnnotationParser.kt: Updated margin parsing logic to safely concatenate overlapping right margin annotations using a pipe (|) delimiter instead of overwriting existing map keys.
Screen.Recording.2026-05-14.at.11.56.39.AM.mov

Ticket

ADFA-3911

Observation

Moving the OCR sanitization rules into their own file (OcrSanitizationRules.kt) significantly cleans up the FuzzyAttributeParser and makes it much easier to scale and maintain new OCR correction patterns as more edge cases are discovered in the future.

@jatezzz jatezzz requested review from a team, Daniel-ADFA and avestaadfa May 14, 2026 20:22
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Review Change Stack

📝 Walkthrough

Release Notes

Features & Improvements

  • Widget Tag Parsing Fix: Updated WidgetTagParser regex to include dot (.) character in separator patterns, fixing incorrect parsing that caused Button widget values to be mixed with Switch widget values
  • Centralized OCR Correction Rules: Introduced new OcrSanitizerRules.kt with dedicated sanitizer classes for common OCR misreads:
    • ColorSanitizer: Normalizes background color variants
    • TextAttributeSanitizer: Fixes text style attribute misreads (e.g., "text stjle" → "text_style")
    • DimensionSanitizer: Corrects width/height/match_parent OCR errors
    • MarginPaddingSanitizer: Normalizes margin and padding direction phrases
    • StructureSanitizer: Fixes horizontal gravity center layout patterns
  • Streamlined OCR Sanitization: Refactored FuzzyAttributeParser to delegate OCR cleanup to centralized OcrSanitizer infrastructure instead of hardcoded replacements
  • Improved Margin Annotation Handling: MarginAnnotationParser now processes all margin data in a single global pass and concatenates overlapping right margin annotations using pipe (|) delimiter to preserve all margin data

Technical Changes

  • Added OcrSanitizer interface and DictionaryRegexSanitizer abstract base class for composable OCR rules
  • Implemented CompositeOcrSanitizer to apply multiple sanitizers sequentially
  • Added OcrSanitizerFactory to create and manage default sanitizer instances
  • Comprehensive test coverage in OcrSanitizerRulesTest validating all new sanitizer behaviors

⚠️ Risks & Breaking Changes

  • Data Format Change: Overlapping margin annotations now concatenate with " | " delimiter instead of overwriting values. Downstream consumers must handle multi-value margin strings or update parsing logic
  • Regex Pattern Modification: Widget tag parsing regex changes may affect edge cases with dot-containing widget identifiers—recommend testing against existing widget datasets
  • OCR Correction Ordering: Sequential application of sanitizers matters; changes to sanitizer ordering or rules in OcrSanitizerFactory could alter output unpredictably
  • Increased Complexity: New sanitizer infrastructure adds multiple abstraction layers; ensure maintainability is prioritized as rules grow

Walkthrough

Adds a composable OCR sanitizer framework and rules, integrates the default sanitizer into FuzzyAttributeParser, tightens WidgetTagParser tag extraction/validation, and refactors MarginAnnotationParser to resolve and merge margin annotations in a single global pass.

Changes

OCR parsing infrastructure and sanitization framework

Layer / File(s) Summary
OCR sanitizer framework setup
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizer.kt, CompositeOcrSanitizer.kt, OcrSanitizerFactory.kt, OcrSanitizerRules.kt
OcrSanitizer interface and DictionaryRegexSanitizer base class establish regex-replacement sanitizers. CompositeOcrSanitizer composes multiple sanitizers. OcrSanitizerFactory builds a default composite from ColorSanitizer, TextAttributeSanitizer, DimensionSanitizer, MarginPaddingSanitizer, and StructureSanitizer.
FuzzyAttributeParser sanitizer integration
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/FuzzyAttributeParser.kt
FuzzyAttributeParser obtains a default sanitizer from OcrSanitizerFactory and applies sanitizer.sanitize(annotation) before tokenization, replacing inline regex cleanup.
WidgetTagParser extraction and validation
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt
tagExtractRegex now captures prefix, separator, and token; parseTagParts returns nullable parsed parts and validates prefixes against VALID_PREFIXES. normalizeTagText and extractTag use parsed parts with uppercase fallback.
MarginAnnotationParser unified margin handling
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
parse now delegates to parseMarginsGlobally which pools explicit blocks from both margins, resolves implicit blocks against the shared canvasTags pool, and merges annotations; duplicate widget values are joined with "
Sanitizer rules tests
cv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRulesTest.kt
Unit tests validate each sanitizer class and a test confirms the default composite sanitizer applies multiple rules in order on combined input.

Sequence Diagram(s)

sequenceDiagram
  participant FA as FuzzyAttributeParser
  participant OSF as OcrSanitizerFactory
  participant COS as CompositeOcrSanitizer
  participant CS as ColorSanitizer
  participant TS as TextAttributeSanitizer
  participant DS as DimensionSanitizer
  participant MPS as MarginPaddingSanitizer
  participant SS as StructureSanitizer
  FA->>OSF: createDefaultSanitizer()
  OSF->>COS: new CompositeOcrSanitizer([CS, TS, DS, MPS, SS])
  COS-->>FA: sanitizer instance
  FA->>COS: sanitize(annotation)
  COS->>CS: sanitize(annotation)
  CS-->>COS: color-normalized
  COS->>TS: sanitize(previous)
  TS-->>COS: text-style-normalized
  COS->>DS: sanitize(previous)
  DS-->>COS: dimension-normalized
  COS->>MPS: sanitize(previous)
  MPS-->>COS: margin-padding-normalized
  COS->>SS: sanitize(previous)
  SS-->>COS: structure-normalized
  COS-->>FA: fully sanitized annotation
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • avestaadfa
  • Daniel-ADFA

"I'm a rabbit with a tidy mind,
I hop through OCR crumbs I find.
I stitch punctuation, color, size,
and join the margins, neat and wise.
Hooray — clean tags and saner lines!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.05% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the two main changes: widget tag parsing fix and centralized OCR correction rules.
Description check ✅ Passed The description provides detailed explanation of all major changes across four files, including the specific issues fixed and the rationale for centralization.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ADFA-3911-widget-tag-parsing-and-ocr-rules-experimental

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerFactory.kt (1)

5-15: ⚡ Quick win

Consider documenting the sanitizer ordering rationale.

The order in which sanitizers are applied may affect the final output if rules overlap. Consider adding a comment explaining why this specific sequence (Color → TextAttribute → Dimension → MarginPadding → Structure) was chosen.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerFactory.kt`
around lines 5 - 15, Add a brief comment above createDefaultSanitizer explaining
the rationale for the sanitizer application order: why ColorSanitizer runs
before TextAttributeSanitizer, then DimensionSanitizer, MarginPaddingSanitizer,
and finally StructureSanitizer; reference CompositeOcrSanitizer and the list of
sanitizers (ColorSanitizer, TextAttributeSanitizer, DimensionSanitizer,
MarginPaddingSanitizer, StructureSanitizer) and describe any dependencies or
side-effects that justify the sequence (e.g., color normalization first so later
attribute merging/structure decisions operate on normalized values).
cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt (2)

8-10: ⚡ Quick win

Avoid duplicating the valid-prefix list between tagRegex and VALID_PREFIXES.

tagRegex (line 8) and VALID_PREFIXES (line 10) both encode the same prefix set (B|P|D|T|C|R|SW|S). A future addition (e.g., a new widget prefix) is easy to apply to one and forget the other, silently desynchronizing extraction validation from matches(tagRegex) checks. Consider deriving tagRegex from VALID_PREFIXES so they stay in lockstep.

♻️ Proposed derivation
-    private val tagRegex = Regex("^(?i)(B|P|D|T|C|R|SW|S)-[A-Z0-9_]+$")
-    private val tagExtractRegex = Regex("^(?i)([A-Z0-9\\s]+)([\\s\\-_.]+)([A-Z0-9_\\-]+)")
-    private val VALID_PREFIXES = setOf("B", "P", "D", "T", "C", "R", "SW", "S")
+    private val VALID_PREFIXES = setOf("B", "P", "D", "T", "C", "R", "SW", "S")
+    private val tagRegex = Regex(
+        "^(?i)(${VALID_PREFIXES.joinToString("|")})-[A-Z0-9_]+$"
+    )
+    private val tagExtractRegex = Regex("^(?i)([A-Z0-9\\s]+)([\\s\\-_.]+)([A-Z0-9_\\-]+)")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt`
around lines 8 - 10, The prefix list is duplicated between VALID_PREFIXES and
the hardcoded tagRegex; update the implementation so tagRegex is constructed
from VALID_PREFIXES instead of repeating the literals: build a regex alternation
by joining VALID_PREFIXES (sort by length desc to ensure multi-char prefixes
like "SW" match before "S"), escape any regex-special chars if needed, prepend
the case-insensitive flag (?i) and then compile that into tagRegex; keep
tagExtractRegex unchanged but ensure any logic that relies on the prefix set
uses VALID_PREFIXES (or the dynamically-built regex) so the two stay in sync.

75-79: 💤 Low value

isValidTagMatch is now effectively dead-code-protected.

With the updated tagExtractRegex, group 2 is [\s\-_.]+ (one or more), so separator.isEmpty() can never be true when the regex matches. The guard therefore always returns true, and the three call sites (isTag, normalizeTagText, extractTag) perform a no-op check. Either tighten the validation to do something meaningful (e.g., reject single-character ambiguous separators that aren't - _ .), or drop the helper to reduce noise.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt`
around lines 75 - 79, The current isValidTagMatch guard is useless because group
2 is always non-empty; update isValidTagMatch to reject ambiguous
single-character separators (e.g., a single whitespace or other punctuation)
while allowing "-", "_" and ".". Concretely, in isValidTagMatch(match:
MatchResult) inspect val separator = match.groupValues[2] and val rawToken =
match.groupValues[3] and return false when separator.length == 1 && separator
!in listOf("-", "_", ".") && rawToken.firstOrNull()?.isLetter() == true;
otherwise return true. This keeps the three call sites (isTag, normalizeTagText,
extractTag) meaningful by filtering out tokens that start with a letter preceded
by an ambiguous one-char separator.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt`:
- Around line 27-33: The merge is allowing the same canvas tag to be assigned by
both parseMarginGroup(distribution.leftMargin, canvasTags) and
parseMarginGroup(distribution.rightMargin, canvasTags) because splitCanvasTags
was removed and each resolveImplicitBlocks maintains independent
unresolvedTagsByPrefix; fix by reintroducing shared consumption or re-splitting:
either (A) restore splitCanvasTags and pass side-specific tag lists into
parseMarginGroup so left/right only see their half, or (B) keep the unified
canvasTags but add a shared consumedTags set that both parseMarginGroup calls
consult/mark and have resolveImplicitBlocks check against (and against
existingAnnotations across both margins) before popping a tag; update
annotationMap merging logic only if you intentionally want multi-side
annotations.

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRules.kt`:
- Around line 17-23: Add unit tests that exercise the OCR sanitizer
implementations (e.g., DimensionSanitizer, ColorSanitizer and any other classes
extending DictionaryRegexSanitizer) to assert that their rawRules produce the
expected normalized strings for common OCR misreads; specifically, create tests
that instantiate each sanitizer (or call the public normalization method on
DictionaryRegexSanitizer if available), feed representative malformed inputs
like "ilayout wioidth", "layout hei5ht.", "mwa tch-parent" and assert outputs
equal "layout_width:", "layout_height:", "match_parent" (and analogous expected
results for ColorSanitizer), and include negative/edge cases to ensure rules
don't overreplace. Ensure tests reference the class names DimensionSanitizer and
ColorSanitizer (or other sanitizer class names) so failures point directly to
the right implementation.

---

Nitpick comments:
In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerFactory.kt`:
- Around line 5-15: Add a brief comment above createDefaultSanitizer explaining
the rationale for the sanitizer application order: why ColorSanitizer runs
before TextAttributeSanitizer, then DimensionSanitizer, MarginPaddingSanitizer,
and finally StructureSanitizer; reference CompositeOcrSanitizer and the list of
sanitizers (ColorSanitizer, TextAttributeSanitizer, DimensionSanitizer,
MarginPaddingSanitizer, StructureSanitizer) and describe any dependencies or
side-effects that justify the sequence (e.g., color normalization first so later
attribute merging/structure decisions operate on normalized values).

In
`@cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt`:
- Around line 8-10: The prefix list is duplicated between VALID_PREFIXES and the
hardcoded tagRegex; update the implementation so tagRegex is constructed from
VALID_PREFIXES instead of repeating the literals: build a regex alternation by
joining VALID_PREFIXES (sort by length desc to ensure multi-char prefixes like
"SW" match before "S"), escape any regex-special chars if needed, prepend the
case-insensitive flag (?i) and then compile that into tagRegex; keep
tagExtractRegex unchanged but ensure any logic that relies on the prefix set
uses VALID_PREFIXES (or the dynamically-built regex) so the two stay in sync.
- Around line 75-79: The current isValidTagMatch guard is useless because group
2 is always non-empty; update isValidTagMatch to reject ambiguous
single-character separators (e.g., a single whitespace or other punctuation)
while allowing "-", "_" and ".". Concretely, in isValidTagMatch(match:
MatchResult) inspect val separator = match.groupValues[2] and val rawToken =
match.groupValues[3] and return false when separator.length == 1 && separator
!in listOf("-", "_", ".") && rawToken.firstOrNull()?.isLetter() == true;
otherwise return true. This keeps the three call sites (isTag, normalizeTagText,
extractTag) meaningful by filtering out tokens that start with a letter preceded
by an ambiguous one-char separator.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c1d0639a-5974-4ad6-96df-f9c177f786ae

📥 Commits

Reviewing files that changed from the base of the PR and between 7736433 and 832593e.

📒 Files selected for processing (7)
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/FuzzyAttributeParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/CompositeOcrSanitizer.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizer.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerFactory.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRules.kt

@jatezzz jatezzz force-pushed the fix/ADFA-3911-widget-tag-parsing-and-ocr-rules-experimental branch from 62cb964 to 7fc9832 Compare May 15, 2026 14:10
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRulesTest.kt (1)

8-67: ⚡ Quick win

Consider adding edge case tests.

The current tests verify the main sanitization paths well. Consider adding tests for:

  • Empty/blank strings (verify graceful handling)
  • Already-correct input (verify sanitizers don't break valid data)
  • Idempotency (verify sanitizer.sanitize(sanitizer.sanitize(x)) == sanitizer.sanitize(x))

These would increase confidence that sanitizers handle unexpected input safely and won't corrupt valid data.

📋 Example edge case tests
`@Test`
fun `sanitizers handle empty input gracefully`() {
    val sanitizers = listOf(
        ColorSanitizer(),
        TextAttributeSanitizer(),
        DimensionSanitizer()
    )
    
    sanitizers.forEach { sanitizer ->
        assertEquals("", sanitizer.sanitize(""))
    }
}

`@Test`
fun `sanitizers preserve already-correct input`() {
    val sanitizer = OcrSanitizerFactory.createDefaultSanitizer()
    val validInput = "background red | text_style: bold | layout_width: match_parent"
    
    assertEquals(validInput, sanitizer.sanitize(validInput))
}

`@Test`
fun `sanitizers are idempotent`() {
    val sanitizer = OcrSanitizerFactory.createDefaultSanitizer()
    val input = "backgroundired | text stjle"
    
    val firstPass = sanitizer.sanitize(input)
    val secondPass = sanitizer.sanitize(firstPass)
    
    assertEquals(firstPass, secondPass)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRulesTest.kt`
around lines 8 - 67, Add edge-case unit tests to the existing
OcrSanitizerRulesTest: verify empty/blank input handling by calling sanitize("")
on ColorSanitizer, TextAttributeSanitizer, DimensionSanitizer (and others) and
expecting "" (or the defined safe output); add a test that passes an
already-correct string to
OcrSanitizerFactory.createDefaultSanitizer().sanitize(...) and asserts it
remains unchanged; add an idempotency test that runs sanitizer.sanitize(input)
twice and asserts the first and second results are equal using the default
sanitizer; use the existing classes/methods (ColorSanitizer,
TextAttributeSanitizer, DimensionSanitizer, MarginPaddingSanitizer,
StructureSanitizer, OcrSanitizerFactory.createDefaultSanitizer(),
sanitizer.sanitize) so tests live alongside the current cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@cv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRulesTest.kt`:
- Line 16: Fix inconsistent indentation in OcrSanitizerRulesTest.kt by replacing
tab characters with 4 space characters for the annotated test lines;
specifically update the `@Test` annotation and surrounding lines where tabs are
used (lines containing the `@Test` marker and the subsequent test method
declarations in OcrSanitizerRulesTest) so they use spaces only—ensure
occurrences around the `@Test` annotation and the related test method
signatures/body (the blocks referenced at lines 16, 24-25, 32-33, 41, and 59 in
the diff) are converted to 4-space indentation to match the rest of the file.

---

Nitpick comments:
In
`@cv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRulesTest.kt`:
- Around line 8-67: Add edge-case unit tests to the existing
OcrSanitizerRulesTest: verify empty/blank input handling by calling sanitize("")
on ColorSanitizer, TextAttributeSanitizer, DimensionSanitizer (and others) and
expecting "" (or the defined safe output); add a test that passes an
already-correct string to
OcrSanitizerFactory.createDefaultSanitizer().sanitize(...) and asserts it
remains unchanged; add an idempotency test that runs sanitizer.sanitize(input)
twice and asserts the first and second results are equal using the default
sanitizer; use the existing classes/methods (ColorSanitizer,
TextAttributeSanitizer, DimensionSanitizer, MarginPaddingSanitizer,
StructureSanitizer, OcrSanitizerFactory.createDefaultSanitizer(),
sanitizer.sanitize) so tests live alongside the current cases.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e50bdc0b-c4a0-4292-a4c9-2f81aeb70a98

📥 Commits

Reviewing files that changed from the base of the PR and between 832593e and 7ee68a2.

📒 Files selected for processing (8)
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/FuzzyAttributeParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/CompositeOcrSanitizer.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizer.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerFactory.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRules.kt
  • cv-image-to-xml/src/test/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRulesTest.kt
🚧 Files skipped from review as they are similar to previous changes (7)
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerFactory.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/CompositeOcrSanitizer.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizer.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/MarginAnnotationParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/WidgetTagParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/FuzzyAttributeParser.kt
  • cv-image-to-xml/src/main/java/org/appdevforall/codeonthego/computervision/domain/parser/sanitizer/OcrSanitizerRules.kt

@jatezzz jatezzz merged commit 6b65621 into stage May 15, 2026
2 checks passed
@jatezzz jatezzz deleted the fix/ADFA-3911-widget-tag-parsing-and-ocr-rules-experimental branch May 15, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants