Skip to content

line-log: range-scope stat, check, and -G under -L#2152

Open
mmontalbo wants to merge 7 commits into
gitgitgadget:masterfrom
mmontalbo:mm/line-log-stat-formats-followup
Open

line-log: range-scope stat, check, and -G under -L#2152
mmontalbo wants to merge 7 commits into
gitgitgadget:masterfrom
mmontalbo:mm/line-log-stat-formats-followup

Conversation

@mmontalbo

@mmontalbo mmontalbo commented Jun 15, 2026

Copy link
Copy Markdown

This series extends git log -L so that more of its diff output and
commit selection honor the tracked line ranges: the diff stat formats
and --check are supported with range-scoped results, and the -G pickaxe
is scoped to the tracked range.

It builds on top of the mm/line-log-cleanup topic [1], which integrated
-L with the standard log output pipeline and taught it the non-patch
formats --raw, --name-only, --name-status, and --summary.

With these patches the following also honor the tracked range:

  • --stat, --numstat, --shortstat: counts cover only the lines inside
    the tracked range, not the whole file.

  • --check: whitespace errors are reported only for added lines inside
    the tracked range, with the correct file line numbers.

  • -G: a commit is selected only when the pattern appears on an
    added/removed line inside the tracked range, rather than anywhere in
    the file.

The --dirstat format is deliberately rejected. Its default mode
reports each directory's share of the total churn as a percentage,
computed from whole-file byte damage (via diffcore_count_changes(),
outside the line-based pipeline that -L scopes), so bare --dirstat
cannot honor the tracked range. The --dirstat=lines mode could: it
aggregates the same per-file line counts as --numstat, which -L
already scopes. But supporting only that sub-mode while bare
--dirstat still errors is a confusing split, so the whole format is
left to a follow-up; --numstat already gives the exact range-scoped
per-file counts.

-S is left matching the whole file. Unlike -G, it counts needle
occurrences per blob rather than grepping the diff, so scoping it to a
range needs a different approach; that is left to a follow-up. Patch 7,
which scopes -G, also updates the -L documentation to note the -S/-G
distinction, so the whole-file behavior of -S is not mistaken for the
range-scoped behavior around it.

Patches 1-3 are independent of the new formats: they fix two bugs in
the existing -L patch output (a leaked deletion and an off-by-one hunk
header), bring its hunk headers in line with git diff's format, and
clarify the line-range filter mm/line-log-cleanup added, whose names
obscured its model (cryptic lno_ cursors conflating the pre/post-image
and 0/1-based axes, a flat hunk-state struct, and a one-letter state
pointer (s)). The two bugs may be a hint that the model could use
clarification, so patch 1 renames and groups the filter state and
patch 2 documents the model, before the fixes that read against it.
Patches 4-7 then build the new formats on top:

  • Patch 1: rename and group the filter for clarity. Spell the
    cryptic names out to the file's own forms: the line-number cursors
    to lno_in_preimage/lno_in_postimage (as in struct emit_callback)
    and the range index to idx_in_postimage, while the hunk geometry
    stays old/new (the xdiff_emit_hunk_fn convention) and moves into a
    sub-struct. Name the filter pointer (filter) and rename the struct
    to line_range_filter and the flush helper to flush_range_hunk. No
    behavior change.

  • Patch 2: simplify the filter by classifying removals as they arrive,
    dropping the pending_rm buffer and a latent flush_range_hunk() bug
    that leaked deletions just past the range. Make the buffered lines
    the hunk's single source of truth: flush_range_hunk() derives the
    counts from them rather than tracking them per line, dropping three
    more fields. Document the model with a block comment and worked
    example, and add begin_range_hunk() as the counterpart to
    flush_range_hunk(). (This simplification was submitted by itself
    previously [2] but did not advance, so it is re-included here.)

  • Patch 3: stop hand-rolling the synthetic hunk header and emit it
    through xdiff's own formatter via a new xdiff_emit_hunk_header()
    helper. The hand-rolled code put a count-0 side's begin one too
    high (the convention is the line before the change); routing
    through xdl_emit_hunk_hdr() fixes that by construction and, as a
    side effect, makes -L headers match git diff exactly, including its
    omission of a count of 1. Regenerate the two affected fixtures.

  • Patch 4: extract a line_range_filter_diff() helper that folds the
    filter's two preconditions into one place: inflate ctxlen to the
    largest range span so every change within a range lands in a single
    xdiff hunk, and clear XDL_EMIT_NO_HUNK_HDR so the hunk headers the
    filter reads are always emitted (its position tracking relies on
    both). It then runs an initialized filter through xdiff, flushes
    the final range hunk, and releases it; use it in builtin_diff().
    The stat, check, and -G patches that reuse it inherit both.

  • Patch 5: reuse the filter in builtin_diffstat() for the stat
    formats, extend the -L output-format allowlist, and reject --dirstat.

  • Patch 6: reuse the filter in builtin_checkdiff() and extend the
    allowlist for --check. The separate blank-at-eof pass scans the
    whole file, so scope its report to the tracked ranges too.

  • Patch 7: scope -G to the tracked range. Expose the filter as
    diff_emit_line_ranges() and grep only the tracked range's lines,
    threading the filepair's line_ranges through the pickaxe callback.
    -S is left whole-file, and the -L documentation is updated to note
    that -G is range-scoped while -S still matches the whole file.

[1] https://lore.kernel.org/git/pull.2094.v3.git.1780001267.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/pull.2099.git.1777230630020.gitgitgadget@gmail.com/

Cc: "D. Ben Knoble" ben.knoble@gmail.com

@mmontalbo mmontalbo force-pushed the mm/line-log-stat-formats-followup branch 8 times, most recently from 7977925 to 2ea00b6 Compare June 16, 2026 21:41
@mmontalbo mmontalbo changed the title line-log: support stat and check diff formats with -L line-log: range-scope stat, check, and -G under -L Jun 16, 2026
@mmontalbo mmontalbo force-pushed the mm/line-log-stat-formats-followup branch 11 times, most recently from b587ea9 to a1b3ff0 Compare June 18, 2026 00:46
The line-range filter that mm/line-log-cleanup added uses names that
obscure its model.  The cursors lno_post/lno_pre and the index lno_0
share an lno_ prefix but conflate the pre/post-image axis with the
0-based/1-based axis, the hunk state is a flat set of rhunk_* fields,
and the filter-state pointer is just s.

The filter bridges two layers of diff.c, and its fields already used
each layer's vocabulary, but in cryptic abbreviations.  Spell them out
to the form the rest of the file uses, so that the patches that follow
can simplify and fix it with those clearer names in place:

  - lno_post/lno_pre -> lno_in_postimage/lno_in_preimage, the
    line-number cursors, matching the counters in struct emit_callback
  - lno_0 -> idx_in_postimage, the 0-based range index
  - the hunk-header geometry stays old/new (old_begin, new_begin, and
    counts) to match the xdiff_emit_hunk_fn callback and the
    "@@ -<old> +<new> @@" header it feeds, but moves from flat rhunk_*
    fields into a "hunk" sub-struct, so accesses read
    filter->hunk.old_begin
  - flush_rhunk -> flush_range_hunk
  - the filter-state pointer in each callback: s -> filter

Also rename the struct line_range_callback to line_range_filter: it is
a filter over xdiff output, not merely a callback.

No behavior change.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
@mmontalbo mmontalbo force-pushed the mm/line-log-stat-formats-followup branch 5 times, most recently from d7b39a5 to 2bc09ca Compare June 18, 2026 06:55
The filter buffered '-' lines in a pending_rm strbuf, deferring their
classification until a '+' or ' ' line revealed the post-image
position.  That buffering is unnecessary: a removal occupies no
post-image line, so it does not advance lno_in_postimage, and xdiff
emits removals before additions within a change.  A '-' therefore
arrives while lno_in_postimage already holds the index the following
'+'/' ' will occupy, and can be classified against the ranges as it
arrives.

The buffering also hid a bug: flush_range_hunk() drained pending_rm into
the range hunk whenever the hunk was active, even after lno_in_postimage
had advanced past the tracked range, so a deletion just after the
tracked function leaked into the patch.  Classifying each line as it
arrives removes the pending_rm buffer, the discard_pending_rm() helper,
three struct fields, and makes that bug impossible by construction.

With every line classified on arrival, the buffered lines are the
hunk's single source of truth, so the old/new counts need not be kept
alongside them: flush_range_hunk() derives the counts (and whether the
hunk holds any change) from the buffer when it builds the header.  Drop
the per-line counting and the old_count, new_count, and has_changes
fields; there is no longer a second tally that could fall out of sync
with the buffer.

Add begin_range_hunk() to open the accumulator at the first in-range
line, seeding both begins from the live image cursors, as the
counterpart to flush_range_hunk().  With the counting gone too,
line_range_line_fn() now only appends an in-range line.

Document the coordinate model: a block comment on struct
line_range_filter states it (the pre/post-image cursors, the 0-based
idx_in_postimage, removals classified by the following line) with a
worked example.

Add tests for the leaked trailing deletion this fixes, the symmetric
leading-deletion case, and the filter's range boundaries (a change at
the first and last line of a range, and a pure in-range deletion).

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
The line-range filter builds its own "@@ -<old> +<new> @@" header for
each range hunk.  For a side with no lines (count 0, such as the old
side of a pure insertion), the begin should be the number of the line
before the change, per the convention git diff and xdl_emit_hunk_hdr()
follow.  The hand-rolled code's begin was one too high; in t4211 this
produced

	@@ -25,0 +18,9 @@

an old begin of 25 in a 24-line file, where git diff would give 24.

Stop hand-rolling the header.  flush_range_hunk() now formats it through
xdiff's own emitter: a new xdiff_emit_hunk_header() helper wraps
xdl_emit_hunk_hdr(), the function that produces every other diff's hunk
headers.  The count-0 begin is then correct by construction, and as a
side effect -L headers match git diff exactly, including its omission of
a count of 1 ("@@ -22 +22 @@" rather than "@@ -22,1 +22,1 @@").

xdiff's hunk callback already hands line_range_hunk_fn() a count-0 begin
decremented, so undo that when seeding the cursors and let the formatter
re-apply the convention once, at emit time.

The off-by-one predates this series, and the two regenerated fixtures
reach it from different origins: no-assertion-error has carried it since
its test was added in ab60c69 (line-log: fix assertion error,
2025-08-18), while vanishes-early acquired it when 86e986f (line-log:
route -L output through the standard diff pipeline) reshaped its tracked
line into a pure insertion.  vanishes-early also drops its count-1
counts.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
builtin_diff() open-codes the line-range filter setup and teardown
around its xdi_diff_outf() call: zero the struct, point it at the
output callback, inflate ctxlen to the largest range span so each range
yields a single xdiff hunk, run the diff, flush the trailing range
hunk, and release the buffer.  The upcoming -L stat and check formats
need the same sequence.

Extract line_range_filter_init() for the setup and a
line_range_filter_diff() helper that prepares the xdiff config the
filter needs, runs an initialized filter through xdi_diff_outf(),
flushes the final range hunk, and releases it, returning the latched
error.  The helper inflates ctxlen to the largest range span so each
range yields a single xdiff hunk, and clears XDL_EMIT_NO_HUNK_HDR so
the hunk headers the filter seeds its position from are always emitted.
Folding both into the helper keeps these invariants, which the filter's
position tracking relies on, in a single place for every consumer.
builtin_diff() now does init + line_range_filter_diff(); the next two
patches reuse them in builtin_diffstat() and builtin_checkdiff()
instead of repeating the boilerplate.

No behavior change: builtin_diff() leaves XDL_EMIT_NO_HUNK_HDR unset,
so clearing it is a no-op until the suppressing consumers arrive.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Reuse the line_range_filter in builtin_diffstat() to produce
range-scoped statistics.  When a filepair carries line_ranges, the
filter wraps diffstat_consume() as its output callback, forwarding only
in-range lines for counting.  flush_range_hunk() replays buffered
content through diffstat_consume(), which ignores synthetic @@ headers
since it only counts '+' and '-' lines.

Expand the output format allowlist in setup_revisions() to accept
--stat, --numstat, and --shortstat with -L.

Leave --dirstat out of the allowlist so it is rejected like any other
unsupported format.  Its default mode counts each file's whole-file
byte damage via diffcore_count_changes(), outside the line-based
pipeline that the -L filter scopes, so bare --dirstat cannot honor the
tracked range.  The --dirstat=lines mode could: it aggregates the same
per-file line counts as --numstat, which -L already scopes.  But
accepting only that sub-mode while bare --dirstat keeps erroring is a
confusing split, so the whole format is deferred to a follow-up;
--numstat already reports the exact range-scoped per-file counts.

Also drop "yet" from the generic -L rejection message ("does not
yet support the requested diff format").  Some rejected formats do
not fit a line range at all, so "yet" wrongly implied they are all
just awaiting support.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
builtin_checkdiff() runs its own xdiff pass to detect whitespace
errors in newly added lines.  When -L is active, the check should
be scoped to the tracked line ranges rather than the whole file.

Reuse the line_range_filter to wrap checkdiff_consume(), the same
pattern already used for patch output and diffstat.  The filter
forwards only in-range lines for whitespace checking.

checkdiff reports the file line number of each error, which it
normally learns from the hunk header via checkdiff_consume_hunk().
The filter synthesizes its own hunk headers, so give it an optional
hunk callback and route checkdiff_consume_hunk() through it; this
sets the post-image position before the in-range lines are replayed.
Without it the reported line numbers would count from the start of
the range hunk rather than the start of the file.

The trailing blank-at-eof check is a second pass that scans the whole
file via check_blank_at_eof(), so gate its report on the tracked
ranges as well; otherwise a blank line added at end of file is
reported even when it lies outside the range.

Add DIFF_FORMAT_CHECKDIFF to the -L output format allowlist in
setup_revisions() so that -L --check is accepted, and list --check
among the supported formats in the documentation.  Add tests covering
that whitespace errors are reported, scoped to the tracked range, and
labeled with the correct file line number, including when two errors
in one range are separated by a gap that would otherwise split into
multiple xdiff hunks.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
git log -L scopes its diff output to the tracked range, but pickaxe
(-S, -G) still runs in diffcore over the whole-file change, so -L -G
selects a commit whenever the pattern appears in any added or removed
line of the file, even outside the tracked range.

Teach -G to honor the range.  diff_grep() already runs an xdiff pass
and greps the +/- lines; route that pass through the line-range filter
so only the tracked range's lines are grepped.  Expose the filter as
diff_emit_line_ranges(), a line-range-scoped xdi_diff_outf(), thread
the filepair's line_ranges through the pickaxe callback, and pass it
from pickaxe_match().  Skip scoping under textconv, whose output is not
in the original file's line coordinates.

-G needs only a hit/no-hit answer, so the line-number concerns the
filter handles for patch and check output do not apply here.

-S is left matching the whole file: it counts needle occurrences per
blob rather than grepping the diff, so scoping it needs a different
approach, left to a follow-up.  has_changes() takes the range parameter
but ignores it for now.

Document the resulting -L pickaxe scoping: -G is range-scoped, while -S
still matches the whole file.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
@mmontalbo mmontalbo force-pushed the mm/line-log-stat-formats-followup branch from 2bc09ca to f69ccfb Compare June 18, 2026 16:29
@mmontalbo

Copy link
Copy Markdown
Author

/preview

@gitgitgadget

gitgitgadget Bot commented Jun 18, 2026

Copy link
Copy Markdown

Preview email sent as pull.2152.git.1781800932.gitgitgadget@gmail.com

@mmontalbo

Copy link
Copy Markdown
Author

/submit

@gitgitgadget

gitgitgadget Bot commented Jun 18, 2026

Copy link
Copy Markdown

Submitted as pull.2152.git.1781806593.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2152/mmontalbo/mm/line-log-stat-formats-followup-v1

To fetch this version to local tag pr-2152/mmontalbo/mm/line-log-stat-formats-followup-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2152/mmontalbo/mm/line-log-stat-formats-followup-v1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant