Add grammar checks to developer guide CI by shai-almog · Pull Request #5002 · codenameone/CodenameOne

shai-almog · 2026-05-22T03:33:49Z

Summary

Catches the class of bug from #5000 (paragraph rendered as "many ways to animate..." with no leading subject) which the existing Vale config could not detect. Vale is a style/regex linter; its asciidoc tokenizer fragments paragraphs at inline markup (#kbd#, code spans, links), so anchor regexes match on every text run between inline elements rather than on real paragraph starts. Confirmed locally — running the broken sentence through Vale 3.14 with this repo's .vale.ini produces zero alerts.

Two new checks run inside the existing developer-guide-docs workflow.

Hard gate (build-failing)

scripts/developer-guide/check_paragraph_capitalization.rb walks every paragraph block via asciidoctor's parser and flags ones whose first prose word starts lowercase.
Skips paragraphs that begin with <code>/<kbd>/<a>/<img> elements, code-like identifiers (com.foo.Bar, iosScrollMotionBool), and single-word transitional connectors between code blocks (becomes, to, and).
docs/developer-guide/paragraph-capitalization-baseline.json locks in 107 pre-existing findings so this PR doesn't snowball into a 200-paragraph rewrite. Only new violations fail CI. Maintainers can fix baseline entries over time and regenerate with --update-baseline.

Soft gate (advisory)

scripts/developer-guide/run_languagetool.py strips the rendered HTML to plain text and runs LanguageTool via language-tool-python.
JSON report is uploaded as the developer-guide-languagetool artifact. Never fails the build — LanguageTool's false-positive rate on technical prose is too high to enforce, but its findings are useful spot-check signal in PR review.
On the current guide it flagged 514 UPPERCASE_SENTENCE_START matches, confirming it catches the same class of bug as the hard gate.

Test plan

Re-introduce the PR Update Animations.asciidoc #5000 bug locally → script exits 1, names the file/line/offending word.
Restore the fix → script exits 0 with "0 new finding(s)".
python3 -c "import yaml; yaml.safe_load(...)" confirms the workflow YAML parses.
CI passes on this PR (will be visible once the workflow runs).

🤖 Generated with Claude Code

Catches the class of bug from #5000 (paragraph rendered as "many ways to animate..." with no leading subject) which the existing Vale config could not detect — Vale is a style/regex linter and its asciidoc tokenizer fragments paragraphs at inline markup, so anchor regexes match on every text run between inline elements rather than on real paragraph starts. Two new checks run in the existing developer-guide-docs workflow: - Hard gate (build-failing): a Ruby script using the asciidoctor parser walks every paragraph block and flags ones whose first prose word starts lowercase. Skips paragraphs that begin with code/kbd/link/image elements, code-like identifiers (com.foo.Bar, iosScrollMotionBool), and single-word transitional connectors between code blocks. A baseline file locks in 107 pre-existing findings; only new violations fail CI. - Soft gate (advisory): a Python wrapper around LanguageTool (via language-tool-python) strips the rendered HTML to plain text, runs the grammar pass, and uploads the JSON report as a CI artifact. Never fails the build — LanguageTool's false-positive rate on technical prose is too high to enforce, but its findings are useful for review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-22T03:41:22Z

Developer Guide build artifacts are available for download from this workflow run:

Developer Guide quality checks:

AsciiDoc linter: No issues found (report)
Vale: No alerts found (report)
Paragraph capitalization: No paragraph capitalization issues (report)
LanguageTool (advisory): 7608 advisory match(es) — top: CONSECUTIVE_SPACES (2376), MORFOLOGIK_RULE_EN_US (2297), COMMA_PARENTHESIS_WHITESPACE (835) (report)
Image references: No unused images detected (report)

github-actions · 2026-05-22T03:52:18Z

Cloudflare Preview

URL: https://pr-5002-website-preview.codenameone.pages.dev
Branch: pr-5002-website-preview

Three follow-up fixes after the first CI run on this branch: - Vale's write-good.ThereIs rule was blocking "There are many ways to animate..." — the documented fix to PR #5000 that motivated this whole effort. The rule's premise (don't lead with existential "there are") is wrong for technical reference, where it's the natural way to introduce a count or set. Disabled in .vale.ini with justification, matching the precedent for other write-good rules already turned off. - The LanguageTool step crashed with "Connection reset by peer" because the local LT server was fed the full 3 MB of stripped guide text in a single request. The script now splits on paragraph boundaries into ~40 KB chunks and aggregates results, and always writes a report file even when LT fails to start, so the summarizer/quality-gate downstream read a valid file instead of falling back to 0. - The PR comment didn't surface the new paragraph-capitalization or LanguageTool checks. Added summarize_reports.py subcommands for both, wired matching steps and env vars, and extended the github-script block to render their summaries with artifact links. The paragraph- capitalization report now also serializes total/new/baseline counts so the summary can say "1 new (107 baseline ignored)" instead of just "exit code 1". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

User feedback was twofold: (1) the baseline mechanism was hiding 107 real prose bugs rather than enforcing the rule, so fix them and remove the baseline; (2) the LanguageTool advisory was reporting 0 matches in CI, which meant something was broken — locally LT reports 7614 matches. Changes: - Capitalized the first prose word of every flagged paragraph across 23 developer guide files. The shape of the fixes varied: most were a single-letter capitalization ("if you" → "If you", "that's" → "That's"), a few needed "There are" prepended to a missing-subject sentence ("two ways..." → "There are two ways..."), and a few opaque "that's: ..." pseudo-list intros became "In other words: ..." so the text reads as a sentence. - Removed the baseline mechanism from check_paragraph_capitalization.rb entirely. No --baseline flag, no --update-baseline flag, no baseline JSON file. The check is now a strict gate: any paragraph that starts with a lowercase prose word fails CI. - Tightened the script's "skip if leading element is code/kbd/link/img" heuristic to also accept formatting wrappers (strong/em/b/i/mark/u/ sub/sup) around the identifier. css.asciidoc:443's `**`repeating-linear-gradient` / `repeating-radial-gradient`**` glossary entry renders as `<strong><code>...</code> / <code>...</code></strong>` which the old regex missed. - Fixed the LanguageTool advisory check. The CI step was crashing with AttributeError: 'Match' object has no attribute 'rule_id'. The pinned language-tool-python==2.9.4 uses camelCase accessors (ruleId, errorLength) while newer releases use snake_case (rule_id, error_length). Added a small _attr() helper that tries both names and serialization is now wrapped in a try/except inside a try/finally so the JSON report is written even when LT raises — the original code failed silently because the JSON dump only ran on the happy path and `LT_COUNT="$(python3 ... || echo 0)"` papered over the missing file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three rewrites in the previous commit tripped existing Vale rules: - "It is defined in..." → write as a contraction ("It's defined in..."), matching Microsoft.Contractions. - "Let's fix the example above..." → "The example above can be extended..."; Microsoft.We bans first-person plural. - "So far you've relied on..." → "Up to this point you've relied on..."; the write-good.So rule bans sentence-initial "So ". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Brings in PR #5002's scripts/developer-guide/check_paragraph_capitalization.rb so GitHub's auto-CodeQL Ruby scan has something to process. Without this file the dynamic 'Analyze (ruby)' job errors with 'CodeQL could not process any code written in Ruby' even though my PR adds no Ruby content.

shai-almog and others added 3 commits May 22, 2026 07:06

shai-almog merged commit 8ad4cc0 into master May 22, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add grammar checks to developer guide CI#5002

Add grammar checks to developer guide CI#5002
shai-almog merged 4 commits into
masterfrom
docs-grammar-checks

shai-almog commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shai-almog commented May 22, 2026

Summary

Hard gate (build-failing)

Soft gate (advisory)

Test plan

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 22, 2026

Cloudflare Preview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 22, 2026 •

edited

Loading