Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Documentation/config/diff.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,11 @@ endif::git-diff[]
Set this option to `true` to make the diff driver cache the text

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +struct diff_subprocess {
> +	struct subprocess_entry subprocess;
> +	unsigned int supported_capabilities;
> +};
> +
> +static int subprocess_map_initialized;
> +static struct hashmap subprocess_map;

Can we avoid introducing new global variables like these?  Would
"struct userdiff_driver" or "struct diff_options" be a good place to
hang this hashmap, perhaps?

> +static int send_file_content(int fd, const char *buf, long size)
> +{
> +	int ret;
> +
> +	if (size > 0)
> +		ret = write_packetized_from_buf_no_flush(buf, size, fd);
> +	else
> +		ret = 0;

Shouldn't "size == -24" be flagged as an invalid input?

> +	if (ret)
> +		return ret;
> +	return packet_flush_gently(fd);
> +}

> +static int parse_hunk_line(const char *line, struct xdl_hunk *hunk)
> +{
> +...
> +}

This gives a silent error diagnosis, which is good for a lower level
helper.

> +int diff_process_get_hunks(struct userdiff_driver *drv,
> +			   const char *path,
> +			   const char *old_buf, long old_size,
> +			   const char *new_buf, long new_size,
> +			   struct xdl_hunk **hunks_out,
> +			   size_t *nr_hunks_out)
> +{
> +	struct diff_subprocess *backend;
> +	struct child_process *process;
> +	int fd_in, fd_out;
> +	struct strbuf status = STRBUF_INIT;
> +	struct xdl_hunk *hunks = NULL;
> +	struct xdl_hunk hunk;
> +	size_t nr_hunks = 0, alloc_hunks = 0;
> +	int len;
> +	char *line;
> +
> +	if (!drv || !drv->process)
> +		return -1;

A driver that does not define process is not an error; it is
perfectly normal in the current world order where nobody has such an
external process and even fi this patch lands, external processes
are optional.  So here "return -1" does not mean an error, and
silent return is perfectly fine.

> +	backend = find_or_start_process(drv->process);
> +	if (!backend)
> +		return -1;

This is probably an error; the user specified drv->process, we
either tried to find or start the process and failed.  Isn't it an
event that deserves to be reported in an error message?

> +	if (!(backend->supported_capabilities & CAP_HUNKS))
> +		return -1;

Backend started, but the "hunks" feature is not supported.  Perhaps
in a year or two, this external process protocol may have become so
popular that it gained more capabilities, possibly making get_hunks
obsolete.  We may be looking at such an external process that uses
other capabilities but not this one.  This is not an error, so
silent return is perfectly fine.

> +	process = subprocess_get_child_process(&backend->subprocess);
> +	fd_in = process->in;
> +	fd_out = process->out;
> +
> +	/* Send request */
> +	if (packet_write_fmt_gently(fd_in, "command=hunks\n") ||
> +	    packet_write_fmt_gently(fd_in, "pathname=%s\n", path) ||
> +	    packet_flush_gently(fd_in))
> +		goto error;
> +
> +	/* Send old file content */
> +	if (send_file_content(fd_in, old_buf, old_size))
> +		goto error;
> +
> +	/* Send new file content */
> +	if (send_file_content(fd_in, new_buf, new_size))
> +		goto error;
> +
> +	/* Read hunks until flush packet */
> +	while ((len = packet_read_line_gently(fd_out, NULL, &line)) >= 0 &&
> +	       line) {
> +		if (parse_hunk_line(line, &hunk) < 0)
> +			goto error;
> +		ALLOC_GROW(hunks, nr_hunks + 1, alloc_hunks);
> +		hunks[nr_hunks++] = hunk;
> +	}
> +	if (len < 0)
> +		goto error;
> +
> +	/* Read status */
> +	if (subprocess_read_status(fd_out, &status))
> +		goto error;
> +
> +	if (strcmp(status.buf, "success")) {
> +		if (!strcmp(status.buf, "abort"))
> +			backend->supported_capabilities &= ~CAP_HUNKS;
> +		goto error;
> +	}
> +
> +	*hunks_out = hunks;
> +	*nr_hunks_out = nr_hunks;
> +	strbuf_release(&status);
> +	return 0;
> +
> +error:

All exceptions that lead here look like events that should be
reported to the end-user.

> +	free(hunks);
> +	strbuf_release(&status);
> +	return -1;
> +}

> +/*
> + * Query a diff process for hunks describing the changes
> + * between old_buf and new_buf.
> + *
> + * The backend is a long-running subprocess configured via
> + * diff.<driver>.process.  It receives file content via
> + * pkt-line and returns hunks with 1-based line numbers.
> + *
> + * On success, sets *hunks_out and *nr_hunks_out to a newly allocated
> + * array (caller must free) and returns 0.
> + *
> + * On failure, returns -1.  The caller should fall back to the
> + * builtin diff algorithm.
> + */

I do not agree with this.  If it is a failure, the user should fix
the external process (or disable).  It shouldn't be hidden behind a
fallback.  As I left comments, in this round of implementation,
there are conditions that returns -1 for soemthing that is not an
error (i.e., not configured, or process not supporting the
particular capability) *and* in those cases the caller should fall
back as if nothing happened.  But some error cases, the caller
should't hide them.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Montalbo wrote on the Git mailing list (how to reply to this email):

On Mon, May 25, 2026 at 6:56 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +struct diff_subprocess {
> > +     struct subprocess_entry subprocess;
> > +     unsigned int supported_capabilities;
> > +};
> > +
> > +static int subprocess_map_initialized;
> > +static struct hashmap subprocess_map;
>
> Can we avoid introducing new global variables like these?  Would
> "struct userdiff_driver" or "struct diff_options" be a good place to
> hang this hashmap, perhaps?
>

Will clean this up.

> > +static int send_file_content(int fd, const char *buf, long size)
> > +{
> > +     int ret;
> > +
> > +     if (size > 0)
> > +             ret = write_packetized_from_buf_no_flush(buf, size, fd);
> > +     else
> > +             ret = 0;
>
> Shouldn't "size == -24" be flagged as an invalid input?
>

Will fix and do a broader audit of input validation and bounds checking.

> > +     if (ret)
> > +             return ret;
> > +     return packet_flush_gently(fd);
> > +}
>
> > +static int parse_hunk_line(const char *line, struct xdl_hunk *hunk)
> > +{
> > +...
> > +}
>
> This gives a silent error diagnosis, which is good for a lower level
> helper.
>
> > +int diff_process_get_hunks(struct userdiff_driver *drv,
> > +                        const char *path,
> > +                        const char *old_buf, long old_size,
> > +                        const char *new_buf, long new_size,
> > +                        struct xdl_hunk **hunks_out,
> > +                        size_t *nr_hunks_out)
> > +{
> > +     struct diff_subprocess *backend;
> > +     struct child_process *process;
> > +     int fd_in, fd_out;
> > +     struct strbuf status = STRBUF_INIT;
> > +     struct xdl_hunk *hunks = NULL;
> > +     struct xdl_hunk hunk;
> > +     size_t nr_hunks = 0, alloc_hunks = 0;
> > +     int len;
> > +     char *line;
> > +
> > +     if (!drv || !drv->process)
> > +             return -1;
>
> A driver that does not define process is not an error; it is
> perfectly normal in the current world order where nobody has such an
> external process and even fi this patch lands, external processes
> are optional.  So here "return -1" does not mean an error, and
> silent return is perfectly fine.
>
> > +     backend = find_or_start_process(drv->process);
> > +     if (!backend)
> > +             return -1;
>
> This is probably an error; the user specified drv->process, we
> either tried to find or start the process and failed.  Isn't it an
> event that deserves to be reported in an error message?
>
> > +     if (!(backend->supported_capabilities & CAP_HUNKS))
> > +             return -1;
>
> Backend started, but the "hunks" feature is not supported.  Perhaps
> in a year or two, this external process protocol may have become so
> popular that it gained more capabilities, possibly making get_hunks
> obsolete.  We may be looking at such an external process that uses
> other capabilities but not this one.  This is not an error, so
> silent return is perfectly fine.
>
> > +     process = subprocess_get_child_process(&backend->subprocess);
> > +     fd_in = process->in;
> > +     fd_out = process->out;
> > +
> > +     /* Send request */
> > +     if (packet_write_fmt_gently(fd_in, "command=hunks\n") ||
> > +         packet_write_fmt_gently(fd_in, "pathname=%s\n", path) ||
> > +         packet_flush_gently(fd_in))
> > +             goto error;
> > +
> > +     /* Send old file content */
> > +     if (send_file_content(fd_in, old_buf, old_size))
> > +             goto error;
> > +
> > +     /* Send new file content */
> > +     if (send_file_content(fd_in, new_buf, new_size))
> > +             goto error;
> > +
> > +     /* Read hunks until flush packet */
> > +     while ((len = packet_read_line_gently(fd_out, NULL, &line)) >= 0 &&
> > +            line) {
> > +             if (parse_hunk_line(line, &hunk) < 0)
> > +                     goto error;
> > +             ALLOC_GROW(hunks, nr_hunks + 1, alloc_hunks);
> > +             hunks[nr_hunks++] = hunk;
> > +     }
> > +     if (len < 0)
> > +             goto error;
> > +
> > +     /* Read status */
> > +     if (subprocess_read_status(fd_out, &status))
> > +             goto error;
> > +
> > +     if (strcmp(status.buf, "success")) {
> > +             if (!strcmp(status.buf, "abort"))
> > +                     backend->supported_capabilities &= ~CAP_HUNKS;
> > +             goto error;
> > +     }
> > +
> > +     *hunks_out = hunks;
> > +     *nr_hunks_out = nr_hunks;
> > +     strbuf_release(&status);
> > +     return 0;
> > +
> > +error:
>
> All exceptions that lead here look like events that should be
> reported to the end-user.
>

Agreed on all points. I will restructure things so errors are flagged when
appropriate (i.e., user specified a process but one was not found / couldn't
start and exceptions) and non-errors are treated as they should be.

> > +     free(hunks);
> > +     strbuf_release(&status);
> > +     return -1;
> > +}
>
> > +/*
> > + * Query a diff process for hunks describing the changes
> > + * between old_buf and new_buf.
> > + *
> > + * The backend is a long-running subprocess configured via
> > + * diff.<driver>.process.  It receives file content via
> > + * pkt-line and returns hunks with 1-based line numbers.
> > + *
> > + * On success, sets *hunks_out and *nr_hunks_out to a newly allocated
> > + * array (caller must free) and returns 0.
> > + *
> > + * On failure, returns -1.  The caller should fall back to the
> > + * builtin diff algorithm.
> > + */
>
> I do not agree with this.  If it is a failure, the user should fix
> the external process (or disable).  It shouldn't be hidden behind a
> fallback.  As I left comments, in this round of implementation,
> there are conditions that returns -1 for soemthing that is not an
> error (i.e., not configured, or process not supporting the
> particular capability) *and* in those cases the caller should fall
> back as if nothing happened.  But some error cases, the caller
> should't hide them.

Will address in a follow-up.

Thank you for the feedback!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Zero hunks with status=success means the tool considers the
> files equivalent.  Git skips diff output for that file.

Is "zero hunk" a common word or some random string you invented?  If
the latter, which is I am assuming it to be, you should define what
it means at/before the first use.  Here in the proposed log message,
and ...

>
> Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
> ---
>  Documentation/config/diff.adoc   |   8 +
>  Documentation/gitattributes.adoc |  40 ++++
>  Makefile                         |   1 +
>  diff-process.c                   | 206 +++++++++++++++++++
>  diff-process.h                   |  28 +++
>  diff.c                           |  23 +++
>  t/.gitattributes                 |   1 +
>  t/t4080-diff-process.sh          | 338 +++++++++++++++++++++++++++++++
>  8 files changed, 645 insertions(+)
>  create mode 100644 diff-process.c
>  create mode 100644 diff-process.h
>  create mode 100755 t/t4080-diff-process.sh
>
> diff --git a/Documentation/config/diff.adoc b/Documentation/config/diff.adoc
> index 1135a62a0a..4ab5f60df6 100644
> --- a/Documentation/config/diff.adoc
> +++ b/Documentation/config/diff.adoc
> @@ -218,6 +218,14 @@ endif::git-diff[]
>  	Set this option to `true` to make the diff driver cache the text
>  	conversion outputs.  See linkgit:gitattributes[5] for details.
>  
> +`diff.<driver>.process`::
> +	The command to run as a long-running diff process.
> +	The tool communicates via the pkt-line protocol and returns
> +	hunks that are fed into Git's diff and blame pipelines.
> +	If the tool returns zero hunks, the file is treated as
> +	unchanged for both diff output and blame attribution.
> +	See linkgit:gitattributes[5] for details.

... also here.

I do not know if you mean "the tool returns no hunks" (there is no
"hunk <old_start> <old_count> <new_start> <new_count>" line passed
from the tool over the protocol) or "the tool returns zero-hunk"
(there is a special "zero-hunk" message to signal this particular
condition sent over the protocol), and this description does not
quite help disambiguating between the two.

If the former, then avoid "zero hunks" as it sounds like a noun with
special meaning.  Yes, we can say "tool returns one hunk", "tool
returns 31 hunks", etc., so "tool returns zero hunks" may logically
be correct, but "when the tool returns no hunks with status=success"
is much less confusing, I think.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Montalbo wrote on the Git mailing list (how to reply to this email):

On Mon, May 25, 2026 at 7:26 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Zero hunks with status=success means the tool considers the
> > files equivalent.  Git skips diff output for that file.
>
> Is "zero hunk" a common word or some random string you invented?  If
> the latter, which is I am assuming it to be, you should define what
> it means at/before the first use.  Here in the proposed log message,
> and ...
>
> >
> > Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
> > ---
> >  Documentation/config/diff.adoc   |   8 +
> >  Documentation/gitattributes.adoc |  40 ++++
> >  Makefile                         |   1 +
> >  diff-process.c                   | 206 +++++++++++++++++++
> >  diff-process.h                   |  28 +++
> >  diff.c                           |  23 +++
> >  t/.gitattributes                 |   1 +
> >  t/t4080-diff-process.sh          | 338 +++++++++++++++++++++++++++++++
> >  8 files changed, 645 insertions(+)
> >  create mode 100644 diff-process.c
> >  create mode 100644 diff-process.h
> >  create mode 100755 t/t4080-diff-process.sh
> >
> > diff --git a/Documentation/config/diff.adoc b/Documentation/config/diff.adoc
> > index 1135a62a0a..4ab5f60df6 100644
> > --- a/Documentation/config/diff.adoc
> > +++ b/Documentation/config/diff.adoc
> > @@ -218,6 +218,14 @@ endif::git-diff[]
> >       Set this option to `true` to make the diff driver cache the text
> >       conversion outputs.  See linkgit:gitattributes[5] for details.
> >
> > +`diff.<driver>.process`::
> > +     The command to run as a long-running diff process.
> > +     The tool communicates via the pkt-line protocol and returns
> > +     hunks that are fed into Git's diff and blame pipelines.
> > +     If the tool returns zero hunks, the file is treated as
> > +     unchanged for both diff output and blame attribution.
> > +     See linkgit:gitattributes[5] for details.
>
> ... also here.
>
> I do not know if you mean "the tool returns no hunks" (there is no
> "hunk <old_start> <old_count> <new_start> <new_count>" line passed
> from the tool over the protocol) or "the tool returns zero-hunk"
> (there is a special "zero-hunk" message to signal this particular
> condition sent over the protocol), and this description does not
> quite help disambiguating between the two.
>
> If the former, then avoid "zero hunks" as it sounds like a noun with
> special meaning.  Yes, we can say "tool returns one hunk", "tool
> returns 31 hunks", etc., so "tool returns zero hunks" may logically
> be correct, but "when the tool returns no hunks with status=success"
> is much less confusing, I think.

Yes, "zero hunks" was my own invention and I see why it's confusing. Will
update the messaging to use "no hunks" instead and do a broader sweep of
the documentation to clarify the protocol and expected tool behavior.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Johannes Schindelin wrote on the Git mailing list (how to reply to this email):

Hi Michael,

I stumbled about this patch when it broke CI in Git for Windows, where we
do _not_ use `NO_PYTHON`, even though Python is unavailable in the
build/test CI jobs. The existing tests handle this situation gracefully,
this here patch does not:

On Sun, 7 Jun 2026, Michael Montalbo via GitGitGadget wrote:

> diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> new file mode 100755
> index 0000000000..f159cd86d8
> --- /dev/null
> +++ b/t/t4080-diff-process.sh
> @@ -0,0 +1,538 @@
> +#!/bin/sh
> +
> +test_description='diff process via long-running process'
> +
> +. ./test-lib.sh
> +
> +if test_have_prereq PYTHON
> +then
> +	PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)

When neither `python3` nor `python` are available (which is the case in
the minimal Git for Windows SDK used in Git's CI runs), this fails under
`set -e`. Before even running the first test case. Resulting in an
unexpected TAP format error.

Now, we could "fix" this by imitating what `lib-p4` does (see
https://github.com/dscho/git/commit/bd0b5570c744f678911a67a62da63f30655f20d8
which demonstrates that it is indeed a work-around on Windows):

-- snip --
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
index fdf6da1c341e67..bd22c247ff3856 100755
--- a/t/t4080-diff-process.sh
+++ b/t/t4080-diff-process.sh
@@ -4,9 +4,10 @@ test_description='diff process via long-running process'
 
 . ./test-lib.sh
 
-if test_have_prereq PYTHON
+if ! test_have_prereq PYTHON || ! test -x "$PYTHON_PATH"
 then
-	PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
+	skip_all='python interpreter not available'
+	test_done
 fi
 
 #
-- snap --

Of course, this uncovers _another_ problem with the Python script: It uses
Python3-only `f"..."` format strings, which cannot be handled by the
Python2 to which the `PYTHON_PATH` variable in `linux-TEST-vars` points.
So this requires _another follow-up (see also
https://github.com/dscho/git/commit/c12a9f4c80e5ce8db0fe370fac46fb45be2b775f):

-- snip --
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
index bd22c247ff3856..ba14682a9086e4 100755
--- a/t/t4080-diff-process.sh
+++ b/t/t4080-diff-process.sh
@@ -39,7 +39,8 @@ setup_backend () {
 
 	def write_pkt(line):
 	    data = (line + "\n").encode()
-	    sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
+	    hdr = "{:04x}".format(len(data) + 4).encode()
+	    sys.stdout.buffer.write(hdr + data)
 	    sys.stdout.buffer.flush()
 
 	def write_flush():
@@ -98,7 +99,8 @@ setup_backend () {
 	    new = read_content()
 	    old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
 	    new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
-	    log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
+	    log("command={} pathname={} old={} new={}".format(
+	        cmd, pathname, old_first, new_first))
 
 	    if mode == "error":
 	        write_flush()
@@ -130,7 +132,7 @@ setup_backend () {
 	        else:
 	            ol = old.count(b"\n")
 	            nl = new.count(b"\n")
-	            write_pkt(f"hunk 1 {ol} 1 {nl}")
+	            write_pkt("hunk 1 {} 1 {}".format(ol, nl))
 	        write_flush()
 	        write_pkt("status=success")
 	        write_flush()
-- snap --

And this is still not enough to make it work with Python2, see
https://github.com/dscho/git/actions/runs/27091523842/job/79955895737:

-- snip --
[...]
+ git -c diff.cdiff.process=./diff-process-backend --mode=fixed-hunk diff boundary.c
  Traceback (most recent call last):
    File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 45, in <module>
      assert read_pkt() == "git-diff-client"
    File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 4, in read_pkt
      hdr = sys.stdin.buffer.read(4)
  AttributeError: 'file' object has no attribute 'buffer'
-- snap --

I have experienced similar patterns in my career, where a single decision
required multiple follow-up fixes _just_ to avoid having to revert that
decision. This kind of doubling down has never ended well.

Therefore I would like to take a step back, and ask: Is it _really_ a good
idea to use Python here? Are we certain that we want to _require_ Python
to run this test and skip it if Python isn't available (as is the case in
the Windows-related parts of Git's very own CI) even if Python has nothing
at all to do with the feature that is being tested?

I don't want to be doomed to repeat history, and we can very well learn
e.g. from prior art in this very project, where the tests for the
clean/smudge filters (which _also_ want to speak pkt-line over stdio)
needlessly incurred Perl as a requirement to run the tests. It was
Matheus's heroic work in 52917a998ef3a (t0021: implementation the
rot13-filter.pl script in C, 2022-08-14) and 4d1d843be7a15 (tests: use the
new C rot13-filter helper to avoid PERL prereq, 2022-08-14) that avoided
that unnecessary prerequisite.

Likewise, there is `test-tool pkt-line` intended for driving the pkt-line
protocol via simple shell scripts.

So the conscious project direction has been: fold pkt-line test backends
into `test-tool` and drop the scripting-language prereq. Reintroducing the
same shape in Python would walk this back.

Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl',
2025-04-29) has been another clear sign that the Git project is actively
_removing_ scripting-language dependencies from the build and test
infrastructure, not adding new ones.

The clear prior art in Git's own tests for what t4080 wants to do, as of
today, is `t/helper/test-rot13-filter.c`, which could be imitated here
instead of (re-)introducing a dependency on a scripting language other
than Unix shell in Git's test suite.

The `PYTHON` prereq exists in exactly five files today, all `git p4`
related (where Python is an intrinsic prerequisite given that `git-p4.py`
_is_ written in Python): `t/lib-git-p4.sh`, `t/t9802-git-p4-filetype.sh`,
`t/t9810-git-p4-rcs.sh`, `t/t9835-git-p4-metadata-encoding-python2.sh`,
and `t/t9836-git-p4-metadata-encoding-python3.sh`.

After 7cdbff14d482 (remove merge-recursive-old, 2006-11-20), this here
patch would be the first one, after almost 20 years, to re-introduce
Python as a dependency outside `git p4`.

And it would also be the first ever to embed a Python script as a heredoc:

> +fi
> +
> +#
> +# A single parametric diff process.
> +# Usage: diff-process-backend --mode=<mode> [--log=<path>]
> +#
> +# Modes:
> +#   whole-file  - report all lines as changed (default)
> +#   fixed-hunk  - always report hunk 5 2 5 2
> +#   bad-hunk    - report out-of-bounds hunk 999 1 999 1
> +#   bad-sync    - report hunk with mismatched unchanged totals
> +#   overlap     - report two overlapping hunks
> +#   no-hunks   - return no hunks (files considered equivalent)
> +#   error       - return status=error for every request
> +#   abort       - return status=abort for every request
> +#   crash       - read one request then exit without responding
> +#
> +setup_backend () {
> +	cat >"$TRASH_DIRECTORY/diff-process-backend.py" <<-\PYEOF
> +	import sys, os
> +
> +	def read_pkt():
> +	    hdr = sys.stdin.buffer.read(4)
> +	    if len(hdr) < 4: return None
> +	    length = int(hdr, 16)
> +	    if length == 0: return ""
> +	    data = sys.stdin.buffer.read(length - 4)
> +	    return data.decode().rstrip("\n")
> +
> +	def write_pkt(line):
> +	    data = (line + "\n").encode()
> +	    sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
> +	    sys.stdout.buffer.flush()
> +
> +	def write_flush():
> +	    sys.stdout.buffer.write(b"0000")
> +	    sys.stdout.buffer.flush()
> +
> +	def read_content():
> +	    chunks = []
> +	    while True:
> +	        hdr = sys.stdin.buffer.read(4)
> +	        if len(hdr) < 4: break
> +	        length = int(hdr, 16)
> +	        if length == 0: break
> +	        chunks.append(sys.stdin.buffer.read(length - 4))
> +	    return b"".join(chunks)
> +
> +	mode = "whole-file"
> +	logfile = None
> +	for arg in sys.argv[1:]:
> +	    if arg.startswith("--mode="):
> +	        mode = arg[7:]
> +	    elif arg.startswith("--log="):
> +	        logfile = open(arg[6:], "a")
> +
> +	def log(msg):
> +	    if logfile:
> +	        logfile.write(msg + "\n")
> +	        logfile.flush()
> +
> +	# Handshake
> +	assert read_pkt() == "git-diff-client"
> +	assert read_pkt() == "version=1"
> +	read_pkt()
> +	write_pkt("git-diff-server")
> +	write_pkt("version=1")
> +	write_flush()
> +	while True:
> +	    p = read_pkt()
> +	    if p == "": break
> +	write_pkt("capability=hunks")
> +	write_flush()
> +
> +	log("ready")
> +
> +	while True:
> +	    cmd = None
> +	    pathname = None
> +	    while True:
> +	        p = read_pkt()
> +	        if p is None: sys.exit(0)
> +	        if p == "": break
> +	        if p.startswith("command="): cmd = p.split("=",1)[1]
> +	        if p.startswith("pathname="): pathname = p.split("=",1)[1]
> +	    if cmd is None: sys.exit(0)
> +	    old = read_content()
> +	    new = read_content()
> +	    old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
> +	    new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
> +	    log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
> +
> +	    if mode == "error":
> +	        write_flush()
> +	        write_pkt("status=error")
> +	        write_flush()
> +	        continue
> +
> +	    if mode == "abort":
> +	        write_flush()
> +	        write_pkt("status=abort")
> +	        write_flush()
> +	        continue
> +
> +	    if mode == "crash":
> +	        sys.exit(1)
> +
> +	    if cmd == "hunks":
> +	        if mode == "fixed-hunk":
> +	            write_pkt("hunk 5 2 5 2")
> +	        elif mode == "bad-hunk":
> +	            write_pkt("hunk 999 1 999 1")
> +	        elif mode == "bad-sync":
> +	            write_pkt("hunk 1 2 1 1")
> +	        elif mode == "overlap":
> +	            write_pkt("hunk 1 5 1 5")
> +	            write_pkt("hunk 3 2 3 2")
> +	        elif mode == "no-hunks":
> +	            pass
> +	        else:
> +	            ol = old.count(b"\n")
> +	            nl = new.count(b"\n")
> +	            write_pkt(f"hunk 1 {ol} 1 {nl}")
> +	        write_flush()
> +	        write_pkt("status=success")
> +	        write_flush()
> +	    else:
> +	        write_flush()
> +	        write_pkt("status=error")
> +	        write_flush()
> +	PYEOF

The existing pattern is to provide larger scripts as fixtures in
associated `t/tNNNN/` directories, not as heredoc, see e.g.
`t/t1509/prepare-chroot.sh`. Writing scripts, especially lengthy ones, in
heredoc strings makes it virtually impossible to use static code analysis
or syntax highlighting to fend off banal errors.

Given the complexity of what t4080 tries to test (error, abort, crash,
bad-sync, no-hunks, multiple files in one session, capability
negotiation), it would unfortunately be infeasible to use `test-tool
pkt-line` from a shell script implementing that `diff.*.process` protocol.

So I've spiked a demo how the `test-tool diff-process-backend` could look
like (letting Opus do the menial typing, so that I can enjoy at least part
of a sunny Sunday outside), which also passes the CI build and test:
https://github.com/dscho/git/commit/b6e3c93381b00929476c3a00155f7cf7334a22e6

That commit is of course not intended to be used as-is; Feel free to pick
code parts of it and integrate them into your topic branch. Or write your
own test-tool helper from scratch if that's more your jam.

Ciao,
Johannes

> +	write_script diff-process-backend <<-SHEOF
> +	exec "$PYTHON_PATH" "$TRASH_DIRECTORY/diff-process-backend.py" "\$@"
> +	SHEOF
> +}
> +
> +BACKEND="./diff-process-backend"
> +
> +test_expect_success PYTHON 'setup' '
> +	setup_backend &&
> +	echo "*.c diff=cdiff" >.gitattributes &&
> +	git add .gitattributes &&
> +
> +	# boundary.c: 10 lines, changes at 5-6 and 9-10.
> +	# Used by: hunk boundaries, error fallback, crash, bad hunks, overlap.
> +	cat >boundary.c <<-\EOF &&
> +	line1
> +	line2
> +	line3
> +	line4
> +	OLD5
> +	OLD6
> +	line7
> +	line8
> +	OLD9
> +	OLD10
> +	EOF
> +	git add boundary.c &&
> +
> +	# worddiff.c: single-line function, value changes 1 -> 999.
> +	# Used by: word-diff, --diff-algorithm, --no-ext-diff, --stat.
> +	cat >worddiff.c <<-\EOF &&
> +	int value(void) { return 1; }
> +	EOF
> +	git add worddiff.c &&
> +
> +	# newfile.c: single-line function, value changes 42 -> 99.
> +	# Used by: new file, --exit-code, multiple drivers.
> +	cat >newfile.c <<-\EOF &&
> +	int new_func(void) { return 42; }
> +	EOF
> +	git add newfile.c &&
> +
> +	# logtest.c: single-line function for log/format-patch tests.
> +	# Needs two commits so log -1 has a diff.
> +	cat >logtest.c <<-\EOF &&
> +	int logfunc(void) { return 1; }
> +	EOF
> +	git add logtest.c &&
> +
> +	# two.c/one.c: two-file pair for error/abort/startup-failure tests.
> +	cat >one.c <<-\EOF &&
> +	int first(void) { return 1; }
> +	EOF
> +	cat >two.c <<-\EOF &&
> +	int second(void) { return 2; }
> +	EOF
> +	git add one.c two.c &&
> +
> +	git commit -m "initial" &&
> +
> +	# Second commit for logtest.c (so log -1 has something to show).
> +	cat >logtest.c <<-\EOF &&
> +	int logfunc(void) { return 2; }
> +	EOF
> +	git add logtest.c &&
> +	git commit -m "change logtest.c" &&
> +
> +	# Working tree modifications (not committed).
> +	cat >boundary.c <<-\EOF &&
> +	line1
> +	line2
> +	line3
> +	line4
> +	NEW5
> +	NEW6
> +	line7
> +	line8
> +	NEW9
> +	NEW10
> +	EOF
> +
> +	cat >worddiff.c <<-\EOF &&
> +	int value(void) { return 999; }
> +	EOF
> +
> +	cat >newfile.c <<-\EOF &&
> +	int new_func(void) { return 99; }
> +	EOF
> +
> +	cat >one.c <<-\EOF &&
> +	int first(void) { return 10; }
> +	EOF
> +
> +	cat >two.c <<-\EOF
> +	int second(void) { return 20; }
> +	EOF
> +'
> +
> +#
> +# Core behavior: the tool controls which lines are marked as changed.
> +#
> +
> +test_expect_success PYTHON 'diff process hunk boundaries affect output' '
> +	# The file has changes at lines 5-6 and 9-10, but fixed-hunk
> +	# only reports lines 5-6 as changed.  Lines 9-10 should not
> +	# appear as changed in the output.
> +	git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \
> +		diff boundary.c >actual &&
> +	test_grep "^-OLD5" actual &&
> +	test_grep "^-OLD6" actual &&
> +	test_grep "^+NEW5" actual &&
> +	test_grep "^+NEW6" actual &&
> +	test_grep ! "^-OLD9" actual &&
> +	test_grep ! "^-OLD10" actual &&
> +	test_grep ! "^+NEW9" actual &&
> +	test_grep ! "^+NEW10" actual
> +'
> +
> +test_expect_success PYTHON 'diff process works with new file' '
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff -- newfile.c >actual 2>stderr &&
> +	test_grep "return 99" actual &&
> +	test_grep "pathname=newfile.c" backend.log &&
> +	test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process works with added file (empty old side)' '
> +	cat >added.c <<-\EOF &&
> +	int added(void) { return 1; }
> +	EOF
> +	git add added.c &&
> +
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff --cached -- added.c >actual 2>stderr &&
> +	test_grep "added" actual &&
> +	test_grep "pathname=added.c" backend.log &&
> +	test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process skipped for binary files' '
> +	printf "\\0binary" >binary.c &&
> +	git add binary.c &&
> +	git commit -m "add binary" &&
> +	printf "\\0changed" >binary.c &&
> +
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff -- binary.c >actual &&
> +	test_grep "Binary files" actual &&
> +	test_path_is_missing backend.log
> +'
> +
> +test_expect_success PYTHON 'diff process not consulted for unmatched driver' '
> +	echo "not tracked by cdiff" >unmatched.txt &&
> +	git add unmatched.txt &&
> +	git commit -m "add unmatched.txt" &&
> +
> +	echo "modified" >unmatched.txt &&
> +
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff -- unmatched.txt >actual &&
> +	test_grep "modified" actual &&
> +	test_path_is_missing backend.log
> +'
> +
> +test_expect_success PYTHON 'multiple drivers use separate processes' '
> +	echo "*.h diff=hdiff" >>.gitattributes &&
> +	git add .gitattributes &&
> +
> +	cat >multi.h <<-\EOF &&
> +	int header(void) { return 1; }
> +	EOF
> +	git add multi.h &&
> +	git commit -m "add multi.h" &&
> +
> +	cat >multi.h <<-\EOF &&
> +	int header(void) { return 2; }
> +	EOF
> +
> +	rm -f backend-c.log backend-h.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \
> +	    -c diff.hdiff.process="$BACKEND --log=backend-h.log" \
> +		diff -- newfile.c multi.h >actual 2>stderr &&
> +	test_grep "pathname=newfile.c" backend-c.log &&
> +	test_grep "pathname=multi.h" backend-h.log &&
> +	test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process works alongside textconv' '
> +	write_script uppercase-filter <<-\EOF &&
> +	tr "a-z" "A-Z" <"$1"
> +	EOF
> +
> +	cat >textconv.c <<-\EOF &&
> +	hello world
> +	EOF
> +	git add textconv.c &&
> +	git commit -m "add textconv.c" &&
> +
> +	cat >textconv.c <<-\EOF &&
> +	goodbye world
> +	EOF
> +
> +	rm -f backend.log &&
> +	git -c diff.cdiff.textconv="./uppercase-filter" \
> +	    -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff -- textconv.c >actual 2>stderr &&
> +	# The diff process receives textconv-transformed (uppercase) content.
> +	test_grep "pathname=textconv.c" backend.log &&
> +	test_grep "old=HELLO WORLD" backend.log &&
> +	test_grep "new=GOODBYE WORLD" backend.log &&
> +	test_must_be_empty stderr
> +'
> +
> +#
> +# Downstream features: word diff, log, equivalent files, exit code.
> +#
> +
> +test_expect_success PYTHON 'diff process with --word-diff' '
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff --word-diff worddiff.c >actual 2>stderr &&
> +	test_grep "\[-1;-\]" actual &&
> +	test_grep "{+999;+}" actual &&
> +	test_grep "pathname=worddiff.c" backend.log &&
> +	test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process works with git log -p' '
> +	# With no-hunks mode, the tool says the files are equivalent,
> +	# so log -p should show the commit but no diff content.
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
> +		log -1 -p -- logtest.c >actual 2>stderr &&
> +	test_grep "change logtest.c" actual &&
> +	test_grep ! "return 2" actual &&
> +	test_grep "command=hunks pathname=logtest.c" backend.log &&
> +	test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process no hunks suppresses diff output' '
> +	cat >nohunks.c <<-\EOF &&
> +	int zero(void) { return 0; }
> +	EOF
> +	git add nohunks.c &&
> +	git commit -m "add nohunks.c" &&
> +
> +	cat >nohunks.c <<-\EOF &&
> +	int zero(void) { return 999; }
> +	EOF
> +
> +	git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> +		diff nohunks.c >actual &&
> +	test_must_be_empty actual
> +'
> +
> +test_expect_success PYTHON 'diff process no hunks with --exit-code returns success' '
> +	git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> +		diff --exit-code nohunks.c
> +'
> +
> +test_expect_success PYTHON 'diff process with --exit-code and hunks returns failure' '
> +	test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \
> +		diff --exit-code newfile.c
> +'
> +
> +#
> +# Bypass mechanisms: flags and commands that skip the diff process.
> +#
> +
> +test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' '
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff --diff-algorithm=patience worddiff.c >actual &&
> +	test_grep "return 999" actual &&
> +	test_path_is_missing backend.log
> +'
> +
> +test_expect_success PYTHON 'diff process not used by --stat' '
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff --stat worddiff.c >actual &&
> +	test_grep "worddiff.c" actual &&
> +	test_path_is_missing backend.log
> +'
> +
> +#
> +# Error handling and fallback.
> +#
> +
> +test_expect_success PYTHON 'diff process fallback on tool error status' '
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> +		diff boundary.c >actual 2>stderr &&
> +	# Fallback produces the full builtin diff (both change regions).
> +	test_grep "^-OLD5" actual &&
> +	test_grep "^+NEW5" actual &&
> +	test_grep "^-OLD9" actual &&
> +	test_grep "^+NEW9" actual &&
> +	# Tool was contacted (it replied with error, not crash).
> +	test_grep "command=hunks pathname=boundary.c" backend.log &&
> +	test_grep "diff process.*failed" stderr
> +'
> +
> +test_expect_success PYTHON 'diff process error keeps tool available for next file' '
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> +		diff -- one.c two.c >actual 2>stderr &&
> +	# Unlike abort, error keeps the tool available: both files
> +	# are sent to the tool (and both fall back).
> +	test_grep "pathname=one.c" backend.log &&
> +	test_grep "pathname=two.c" backend.log &&
> +	test_grep "return 10" actual &&
> +	test_grep "return 20" actual
> +'
> +
> +test_expect_success PYTHON 'diff process abort disables for session' '
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \
> +		diff -- one.c two.c >actual &&
> +	# Both files should still produce diff output via fallback.
> +	test_grep "return 10" actual &&
> +	test_grep "return 20" actual &&
> +	# The tool aborts on the first file and git clears its
> +	# capability.  The second file never contacts the tool.
> +	test_grep "pathname=one.c" backend.log &&
> +	test_grep ! "pathname=two.c" backend.log
> +'
> +
> +test_expect_success PYTHON 'diff process fallback on tool crash' '
> +	git -c diff.cdiff.process="$BACKEND --mode=crash" \
> +		diff boundary.c >actual 2>stderr &&
> +	test_grep "^-OLD5" actual &&
> +	test_grep "^+NEW5" actual &&
> +	test_grep "^-OLD9" actual &&
> +	test_grep "^+NEW9" actual &&
> +	# Crash is a communication failure, so a warning is emitted.
> +	test_grep "diff process.*failed" stderr
> +'
> +
> +test_expect_success PYTHON 'diff process startup failure only warns once' '
> +	git -c diff.cdiff.process="/nonexistent/tool" \
> +		diff -- one.c two.c >actual 2>stderr &&
> +	# Both files produce diff output via fallback.
> +	test_grep "return 10" actual &&
> +	test_grep "return 20" actual &&
> +	# Sentinel prevents repeated warnings: only one, not one per file.
> +	test_grep "diff process.*failed" stderr >warnings &&
> +	test_line_count = 1 warnings
> +'
> +
> +test_expect_success PYTHON 'diff process fallback on bad hunks' '
> +	git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \
> +		diff boundary.c >actual 2>stderr &&
> +	test_grep "^-OLD5" actual &&
> +	test_grep "^+NEW5" actual &&
> +	test_grep "^-OLD9" actual &&
> +	test_grep "^+NEW9" actual &&
> +	# Invalid hunks are caught by xdiff validation, not the
> +	# protocol layer, so no warning is emitted.
> +	test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process fallback on mismatched unchanged totals' '
> +	cat >synctest.c <<-\EOF &&
> +	line1
> +	line2
> +	line3
> +	EOF
> +	git add synctest.c &&
> +	git commit -m "add synctest.c" &&
> +
> +	cat >synctest.c <<-\EOF &&
> +	line1
> +	changed
> +	line3
> +	EOF
> +
> +	# bad-sync reports hunk 1 2 1 1: marks 2 old lines and 1 new
> +	# line as changed, leaving 1 unchanged old vs 2 unchanged new.
> +	# The synchronization invariant fails and git falls back.
> +	git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \
> +		diff synctest.c >actual 2>stderr &&
> +	test_grep "changed" actual
> +'
> +
> +test_expect_success PYTHON 'diff process fallback on overlapping hunks' '
> +	# boundary.c has 10 lines, so both hunks are in bounds
> +	# but they overlap at lines 3-5, triggering the ordering check.
> +	git -c diff.cdiff.process="$BACKEND --mode=overlap" \
> +		diff boundary.c >actual 2>stderr &&
> +	test_grep "NEW5" actual
> +'
> +
> +test_done
> diff --git a/userdiff.h b/userdiff.h
> index 51c26e0d41..a98eabe377 100644
> --- a/userdiff.h
> +++ b/userdiff.h
> @@ -3,6 +3,7 @@
>  
>  #include "notes-cache.h"
>  
> +struct diff_subprocess;
>  struct index_state;
>  struct repository;
>  
> @@ -33,6 +34,8 @@ struct userdiff_driver {
>  	int textconv_want_cache;
>  	const char *process;
>  	char *process_owned;
> +	struct diff_subprocess *diff_subprocess;
> +	unsigned diff_process_failed : 1;
>  };
>  enum userdiff_driver_type {
>  	USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
> -- 
> gitgitgadget
> 
> 
> 

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Montalbo wrote on the Git mailing list (how to reply to this email):

On Sun, Jun 7, 2026 at 7:36 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Michael,
>
> I stumbled about this patch when it broke CI in Git for Windows, where we
> do _not_ use `NO_PYTHON`, even though Python is unavailable in the
> build/test CI jobs. The existing tests handle this situation gracefully,
> this here patch does not:
>
> On Sun, 7 Jun 2026, Michael Montalbo via GitGitGadget wrote:
>
> > diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> > new file mode 100755
> > index 0000000000..f159cd86d8
> > --- /dev/null
> > +++ b/t/t4080-diff-process.sh
> > @@ -0,0 +1,538 @@
> > +#!/bin/sh
> > +
> > +test_description='diff process via long-running process'
> > +
> > +. ./test-lib.sh
> > +
> > +if test_have_prereq PYTHON
> > +then
> > +     PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
>
> When neither `python3` nor `python` are available (which is the case in
> the minimal Git for Windows SDK used in Git's CI runs), this fails under
> `set -e`. Before even running the first test case. Resulting in an
> unexpected TAP format error.
>
> Now, we could "fix" this by imitating what `lib-p4` does (see
> https://github.com/dscho/git/commit/bd0b5570c744f678911a67a62da63f30655f20d8
> which demonstrates that it is indeed a work-around on Windows):
>
> -- snip --
> diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> index fdf6da1c341e67..bd22c247ff3856 100755
> --- a/t/t4080-diff-process.sh
> +++ b/t/t4080-diff-process.sh
> @@ -4,9 +4,10 @@ test_description='diff process via long-running process'
>
>  . ./test-lib.sh
>
> -if test_have_prereq PYTHON
> +if ! test_have_prereq PYTHON || ! test -x "$PYTHON_PATH"
>  then
> -       PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
> +       skip_all='python interpreter not available'
> +       test_done
>  fi
>
>  #
> -- snap --
>
> Of course, this uncovers _another_ problem with the Python script: It uses
> Python3-only `f"..."` format strings, which cannot be handled by the
> Python2 to which the `PYTHON_PATH` variable in `linux-TEST-vars` points.
> So this requires _another follow-up (see also
> https://github.com/dscho/git/commit/c12a9f4c80e5ce8db0fe370fac46fb45be2b775f):
>
> -- snip --
> diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> index bd22c247ff3856..ba14682a9086e4 100755
> --- a/t/t4080-diff-process.sh
> +++ b/t/t4080-diff-process.sh
> @@ -39,7 +39,8 @@ setup_backend () {
>
>         def write_pkt(line):
>             data = (line + "\n").encode()
> -           sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
> +           hdr = "{:04x}".format(len(data) + 4).encode()
> +           sys.stdout.buffer.write(hdr + data)
>             sys.stdout.buffer.flush()
>
>         def write_flush():
> @@ -98,7 +99,8 @@ setup_backend () {
>             new = read_content()
>             old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
>             new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
> -           log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
> +           log("command={} pathname={} old={} new={}".format(
> +               cmd, pathname, old_first, new_first))
>
>             if mode == "error":
>                 write_flush()
> @@ -130,7 +132,7 @@ setup_backend () {
>                 else:
>                     ol = old.count(b"\n")
>                     nl = new.count(b"\n")
> -                   write_pkt(f"hunk 1 {ol} 1 {nl}")
> +                   write_pkt("hunk 1 {} 1 {}".format(ol, nl))
>                 write_flush()
>                 write_pkt("status=success")
>                 write_flush()
> -- snap --
>
> And this is still not enough to make it work with Python2, see
> https://github.com/dscho/git/actions/runs/27091523842/job/79955895737:
>
> -- snip --
> [...]
> + git -c diff.cdiff.process=./diff-process-backend --mode=fixed-hunk diff boundary.c
>   Traceback (most recent call last):
>     File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 45, in <module>
>       assert read_pkt() == "git-diff-client"
>     File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 4, in read_pkt
>       hdr = sys.stdin.buffer.read(4)
>   AttributeError: 'file' object has no attribute 'buffer'
> -- snap --
>
> I have experienced similar patterns in my career, where a single decision
> required multiple follow-up fixes _just_ to avoid having to revert that
> decision. This kind of doubling down has never ended well.
>
> Therefore I would like to take a step back, and ask: Is it _really_ a good
> idea to use Python here? Are we certain that we want to _require_ Python
> to run this test and skip it if Python isn't available (as is the case in
> the Windows-related parts of Git's very own CI) even if Python has nothing
> at all to do with the feature that is being tested?
>
> I don't want to be doomed to repeat history, and we can very well learn
> e.g. from prior art in this very project, where the tests for the
> clean/smudge filters (which _also_ want to speak pkt-line over stdio)
> needlessly incurred Perl as a requirement to run the tests. It was
> Matheus's heroic work in 52917a998ef3a (t0021: implementation the
> rot13-filter.pl script in C, 2022-08-14) and 4d1d843be7a15 (tests: use the
> new C rot13-filter helper to avoid PERL prereq, 2022-08-14) that avoided
> that unnecessary prerequisite.
>
> Likewise, there is `test-tool pkt-line` intended for driving the pkt-line
> protocol via simple shell scripts.
>
> So the conscious project direction has been: fold pkt-line test backends
> into `test-tool` and drop the scripting-language prereq. Reintroducing the
> same shape in Python would walk this back.
>
> Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl',
> 2025-04-29) has been another clear sign that the Git project is actively
> _removing_ scripting-language dependencies from the build and test
> infrastructure, not adding new ones.
>
> The clear prior art in Git's own tests for what t4080 wants to do, as of
> today, is `t/helper/test-rot13-filter.c`, which could be imitated here
> instead of (re-)introducing a dependency on a scripting language other
> than Unix shell in Git's test suite.
>
> The `PYTHON` prereq exists in exactly five files today, all `git p4`
> related (where Python is an intrinsic prerequisite given that `git-p4.py`
> _is_ written in Python): `t/lib-git-p4.sh`, `t/t9802-git-p4-filetype.sh`,
> `t/t9810-git-p4-rcs.sh`, `t/t9835-git-p4-metadata-encoding-python2.sh`,
> and `t/t9836-git-p4-metadata-encoding-python3.sh`.
>
> After 7cdbff14d482 (remove merge-recursive-old, 2006-11-20), this here
> patch would be the first one, after almost 20 years, to re-introduce
> Python as a dependency outside `git p4`.
>
> And it would also be the first ever to embed a Python script as a heredoc:
>
> > +fi
> > +
> > +#
> > +# A single parametric diff process.
> > +# Usage: diff-process-backend --mode=<mode> [--log=<path>]
> > +#
> > +# Modes:
> > +#   whole-file  - report all lines as changed (default)
> > +#   fixed-hunk  - always report hunk 5 2 5 2
> > +#   bad-hunk    - report out-of-bounds hunk 999 1 999 1
> > +#   bad-sync    - report hunk with mismatched unchanged totals
> > +#   overlap     - report two overlapping hunks
> > +#   no-hunks   - return no hunks (files considered equivalent)
> > +#   error       - return status=error for every request
> > +#   abort       - return status=abort for every request
> > +#   crash       - read one request then exit without responding
> > +#
> > +setup_backend () {
> > +     cat >"$TRASH_DIRECTORY/diff-process-backend.py" <<-\PYEOF
> > +     import sys, os
> > +
> > +     def read_pkt():
> > +         hdr = sys.stdin.buffer.read(4)
> > +         if len(hdr) < 4: return None
> > +         length = int(hdr, 16)
> > +         if length == 0: return ""
> > +         data = sys.stdin.buffer.read(length - 4)
> > +         return data.decode().rstrip("\n")
> > +
> > +     def write_pkt(line):
> > +         data = (line + "\n").encode()
> > +         sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
> > +         sys.stdout.buffer.flush()
> > +
> > +     def write_flush():
> > +         sys.stdout.buffer.write(b"0000")
> > +         sys.stdout.buffer.flush()
> > +
> > +     def read_content():
> > +         chunks = []
> > +         while True:
> > +             hdr = sys.stdin.buffer.read(4)
> > +             if len(hdr) < 4: break
> > +             length = int(hdr, 16)
> > +             if length == 0: break
> > +             chunks.append(sys.stdin.buffer.read(length - 4))
> > +         return b"".join(chunks)
> > +
> > +     mode = "whole-file"
> > +     logfile = None
> > +     for arg in sys.argv[1:]:
> > +         if arg.startswith("--mode="):
> > +             mode = arg[7:]
> > +         elif arg.startswith("--log="):
> > +             logfile = open(arg[6:], "a")
> > +
> > +     def log(msg):
> > +         if logfile:
> > +             logfile.write(msg + "\n")
> > +             logfile.flush()
> > +
> > +     # Handshake
> > +     assert read_pkt() == "git-diff-client"
> > +     assert read_pkt() == "version=1"
> > +     read_pkt()
> > +     write_pkt("git-diff-server")
> > +     write_pkt("version=1")
> > +     write_flush()
> > +     while True:
> > +         p = read_pkt()
> > +         if p == "": break
> > +     write_pkt("capability=hunks")
> > +     write_flush()
> > +
> > +     log("ready")
> > +
> > +     while True:
> > +         cmd = None
> > +         pathname = None
> > +         while True:
> > +             p = read_pkt()
> > +             if p is None: sys.exit(0)
> > +             if p == "": break
> > +             if p.startswith("command="): cmd = p.split("=",1)[1]
> > +             if p.startswith("pathname="): pathname = p.split("=",1)[1]
> > +         if cmd is None: sys.exit(0)
> > +         old = read_content()
> > +         new = read_content()
> > +         old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
> > +         new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
> > +         log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
> > +
> > +         if mode == "error":
> > +             write_flush()
> > +             write_pkt("status=error")
> > +             write_flush()
> > +             continue
> > +
> > +         if mode == "abort":
> > +             write_flush()
> > +             write_pkt("status=abort")
> > +             write_flush()
> > +             continue
> > +
> > +         if mode == "crash":
> > +             sys.exit(1)
> > +
> > +         if cmd == "hunks":
> > +             if mode == "fixed-hunk":
> > +                 write_pkt("hunk 5 2 5 2")
> > +             elif mode == "bad-hunk":
> > +                 write_pkt("hunk 999 1 999 1")
> > +             elif mode == "bad-sync":
> > +                 write_pkt("hunk 1 2 1 1")
> > +             elif mode == "overlap":
> > +                 write_pkt("hunk 1 5 1 5")
> > +                 write_pkt("hunk 3 2 3 2")
> > +             elif mode == "no-hunks":
> > +                 pass
> > +             else:
> > +                 ol = old.count(b"\n")
> > +                 nl = new.count(b"\n")
> > +                 write_pkt(f"hunk 1 {ol} 1 {nl}")
> > +             write_flush()
> > +             write_pkt("status=success")
> > +             write_flush()
> > +         else:
> > +             write_flush()
> > +             write_pkt("status=error")
> > +             write_flush()
> > +     PYEOF
>
> The existing pattern is to provide larger scripts as fixtures in
> associated `t/tNNNN/` directories, not as heredoc, see e.g.
> `t/t1509/prepare-chroot.sh`. Writing scripts, especially lengthy ones, in
> heredoc strings makes it virtually impossible to use static code analysis
> or syntax highlighting to fend off banal errors.
>
> Given the complexity of what t4080 tries to test (error, abort, crash,
> bad-sync, no-hunks, multiple files in one session, capability
> negotiation), it would unfortunately be infeasible to use `test-tool
> pkt-line` from a shell script implementing that `diff.*.process` protocol.
>
> So I've spiked a demo how the `test-tool diff-process-backend` could look
> like (letting Opus do the menial typing, so that I can enjoy at least part
> of a sunny Sunday outside), which also passes the CI build and test:
> https://github.com/dscho/git/commit/b6e3c93381b00929476c3a00155f7cf7334a22e6
>
> That commit is of course not intended to be used as-is; Feel free to pick
> code parts of it and integrate them into your topic branch. Or write your
> own test-tool helper from scratch if that's more your jam.
>

Johannes, thank you for the great feedback. The historical context is
really helpful and
the concerns you raise make a lot of sense. I will take a look at your
spike and also work
on removing Python from the test.

> Ciao,
> Johannes
>
> > +     write_script diff-process-backend <<-SHEOF
> > +     exec "$PYTHON_PATH" "$TRASH_DIRECTORY/diff-process-backend.py" "\$@"
> > +     SHEOF
> > +}
> > +
> > +BACKEND="./diff-process-backend"
> > +
> > +test_expect_success PYTHON 'setup' '
> > +     setup_backend &&
> > +     echo "*.c diff=cdiff" >.gitattributes &&
> > +     git add .gitattributes &&
> > +
> > +     # boundary.c: 10 lines, changes at 5-6 and 9-10.
> > +     # Used by: hunk boundaries, error fallback, crash, bad hunks, overlap.
> > +     cat >boundary.c <<-\EOF &&
> > +     line1
> > +     line2
> > +     line3
> > +     line4
> > +     OLD5
> > +     OLD6
> > +     line7
> > +     line8
> > +     OLD9
> > +     OLD10
> > +     EOF
> > +     git add boundary.c &&
> > +
> > +     # worddiff.c: single-line function, value changes 1 -> 999.
> > +     # Used by: word-diff, --diff-algorithm, --no-ext-diff, --stat.
> > +     cat >worddiff.c <<-\EOF &&
> > +     int value(void) { return 1; }
> > +     EOF
> > +     git add worddiff.c &&
> > +
> > +     # newfile.c: single-line function, value changes 42 -> 99.
> > +     # Used by: new file, --exit-code, multiple drivers.
> > +     cat >newfile.c <<-\EOF &&
> > +     int new_func(void) { return 42; }
> > +     EOF
> > +     git add newfile.c &&
> > +
> > +     # logtest.c: single-line function for log/format-patch tests.
> > +     # Needs two commits so log -1 has a diff.
> > +     cat >logtest.c <<-\EOF &&
> > +     int logfunc(void) { return 1; }
> > +     EOF
> > +     git add logtest.c &&
> > +
> > +     # two.c/one.c: two-file pair for error/abort/startup-failure tests.
> > +     cat >one.c <<-\EOF &&
> > +     int first(void) { return 1; }
> > +     EOF
> > +     cat >two.c <<-\EOF &&
> > +     int second(void) { return 2; }
> > +     EOF
> > +     git add one.c two.c &&
> > +
> > +     git commit -m "initial" &&
> > +
> > +     # Second commit for logtest.c (so log -1 has something to show).
> > +     cat >logtest.c <<-\EOF &&
> > +     int logfunc(void) { return 2; }
> > +     EOF
> > +     git add logtest.c &&
> > +     git commit -m "change logtest.c" &&
> > +
> > +     # Working tree modifications (not committed).
> > +     cat >boundary.c <<-\EOF &&
> > +     line1
> > +     line2
> > +     line3
> > +     line4
> > +     NEW5
> > +     NEW6
> > +     line7
> > +     line8
> > +     NEW9
> > +     NEW10
> > +     EOF
> > +
> > +     cat >worddiff.c <<-\EOF &&
> > +     int value(void) { return 999; }
> > +     EOF
> > +
> > +     cat >newfile.c <<-\EOF &&
> > +     int new_func(void) { return 99; }
> > +     EOF
> > +
> > +     cat >one.c <<-\EOF &&
> > +     int first(void) { return 10; }
> > +     EOF
> > +
> > +     cat >two.c <<-\EOF
> > +     int second(void) { return 20; }
> > +     EOF
> > +'
> > +
> > +#
> > +# Core behavior: the tool controls which lines are marked as changed.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process hunk boundaries affect output' '
> > +     # The file has changes at lines 5-6 and 9-10, but fixed-hunk
> > +     # only reports lines 5-6 as changed.  Lines 9-10 should not
> > +     # appear as changed in the output.
> > +     git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \
> > +             diff boundary.c >actual &&
> > +     test_grep "^-OLD5" actual &&
> > +     test_grep "^-OLD6" actual &&
> > +     test_grep "^+NEW5" actual &&
> > +     test_grep "^+NEW6" actual &&
> > +     test_grep ! "^-OLD9" actual &&
> > +     test_grep ! "^-OLD10" actual &&
> > +     test_grep ! "^+NEW9" actual &&
> > +     test_grep ! "^+NEW10" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with new file' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff -- newfile.c >actual 2>stderr &&
> > +     test_grep "return 99" actual &&
> > +     test_grep "pathname=newfile.c" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with added file (empty old side)' '
> > +     cat >added.c <<-\EOF &&
> > +     int added(void) { return 1; }
> > +     EOF
> > +     git add added.c &&
> > +
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff --cached -- added.c >actual 2>stderr &&
> > +     test_grep "added" actual &&
> > +     test_grep "pathname=added.c" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process skipped for binary files' '
> > +     printf "\\0binary" >binary.c &&
> > +     git add binary.c &&
> > +     git commit -m "add binary" &&
> > +     printf "\\0changed" >binary.c &&
> > +
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff -- binary.c >actual &&
> > +     test_grep "Binary files" actual &&
> > +     test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process not consulted for unmatched driver' '
> > +     echo "not tracked by cdiff" >unmatched.txt &&
> > +     git add unmatched.txt &&
> > +     git commit -m "add unmatched.txt" &&
> > +
> > +     echo "modified" >unmatched.txt &&
> > +
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff -- unmatched.txt >actual &&
> > +     test_grep "modified" actual &&
> > +     test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'multiple drivers use separate processes' '
> > +     echo "*.h diff=hdiff" >>.gitattributes &&
> > +     git add .gitattributes &&
> > +
> > +     cat >multi.h <<-\EOF &&
> > +     int header(void) { return 1; }
> > +     EOF
> > +     git add multi.h &&
> > +     git commit -m "add multi.h" &&
> > +
> > +     cat >multi.h <<-\EOF &&
> > +     int header(void) { return 2; }
> > +     EOF
> > +
> > +     rm -f backend-c.log backend-h.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \
> > +         -c diff.hdiff.process="$BACKEND --log=backend-h.log" \
> > +             diff -- newfile.c multi.h >actual 2>stderr &&
> > +     test_grep "pathname=newfile.c" backend-c.log &&
> > +     test_grep "pathname=multi.h" backend-h.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works alongside textconv' '
> > +     write_script uppercase-filter <<-\EOF &&
> > +     tr "a-z" "A-Z" <"$1"
> > +     EOF
> > +
> > +     cat >textconv.c <<-\EOF &&
> > +     hello world
> > +     EOF
> > +     git add textconv.c &&
> > +     git commit -m "add textconv.c" &&
> > +
> > +     cat >textconv.c <<-\EOF &&
> > +     goodbye world
> > +     EOF
> > +
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.textconv="./uppercase-filter" \
> > +         -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff -- textconv.c >actual 2>stderr &&
> > +     # The diff process receives textconv-transformed (uppercase) content.
> > +     test_grep "pathname=textconv.c" backend.log &&
> > +     test_grep "old=HELLO WORLD" backend.log &&
> > +     test_grep "new=GOODBYE WORLD" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +#
> > +# Downstream features: word diff, log, equivalent files, exit code.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process with --word-diff' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff --word-diff worddiff.c >actual 2>stderr &&
> > +     test_grep "\[-1;-\]" actual &&
> > +     test_grep "{+999;+}" actual &&
> > +     test_grep "pathname=worddiff.c" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with git log -p' '
> > +     # With no-hunks mode, the tool says the files are equivalent,
> > +     # so log -p should show the commit but no diff content.
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
> > +             log -1 -p -- logtest.c >actual 2>stderr &&
> > +     test_grep "change logtest.c" actual &&
> > +     test_grep ! "return 2" actual &&
> > +     test_grep "command=hunks pathname=logtest.c" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process no hunks suppresses diff output' '
> > +     cat >nohunks.c <<-\EOF &&
> > +     int zero(void) { return 0; }
> > +     EOF
> > +     git add nohunks.c &&
> > +     git commit -m "add nohunks.c" &&
> > +
> > +     cat >nohunks.c <<-\EOF &&
> > +     int zero(void) { return 999; }
> > +     EOF
> > +
> > +     git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> > +             diff nohunks.c >actual &&
> > +     test_must_be_empty actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process no hunks with --exit-code returns success' '
> > +     git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> > +             diff --exit-code nohunks.c
> > +'
> > +
> > +test_expect_success PYTHON 'diff process with --exit-code and hunks returns failure' '
> > +     test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \
> > +             diff --exit-code newfile.c
> > +'
> > +
> > +#
> > +# Bypass mechanisms: flags and commands that skip the diff process.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff --diff-algorithm=patience worddiff.c >actual &&
> > +     test_grep "return 999" actual &&
> > +     test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process not used by --stat' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff --stat worddiff.c >actual &&
> > +     test_grep "worddiff.c" actual &&
> > +     test_path_is_missing backend.log
> > +'
> > +
> > +#
> > +# Error handling and fallback.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process fallback on tool error status' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> > +             diff boundary.c >actual 2>stderr &&
> > +     # Fallback produces the full builtin diff (both change regions).
> > +     test_grep "^-OLD5" actual &&
> > +     test_grep "^+NEW5" actual &&
> > +     test_grep "^-OLD9" actual &&
> > +     test_grep "^+NEW9" actual &&
> > +     # Tool was contacted (it replied with error, not crash).
> > +     test_grep "command=hunks pathname=boundary.c" backend.log &&
> > +     test_grep "diff process.*failed" stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process error keeps tool available for next file' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> > +             diff -- one.c two.c >actual 2>stderr &&
> > +     # Unlike abort, error keeps the tool available: both files
> > +     # are sent to the tool (and both fall back).
> > +     test_grep "pathname=one.c" backend.log &&
> > +     test_grep "pathname=two.c" backend.log &&
> > +     test_grep "return 10" actual &&
> > +     test_grep "return 20" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process abort disables for session' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \
> > +             diff -- one.c two.c >actual &&
> > +     # Both files should still produce diff output via fallback.
> > +     test_grep "return 10" actual &&
> > +     test_grep "return 20" actual &&
> > +     # The tool aborts on the first file and git clears its
> > +     # capability.  The second file never contacts the tool.
> > +     test_grep "pathname=one.c" backend.log &&
> > +     test_grep ! "pathname=two.c" backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on tool crash' '
> > +     git -c diff.cdiff.process="$BACKEND --mode=crash" \
> > +             diff boundary.c >actual 2>stderr &&
> > +     test_grep "^-OLD5" actual &&
> > +     test_grep "^+NEW5" actual &&
> > +     test_grep "^-OLD9" actual &&
> > +     test_grep "^+NEW9" actual &&
> > +     # Crash is a communication failure, so a warning is emitted.
> > +     test_grep "diff process.*failed" stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process startup failure only warns once' '
> > +     git -c diff.cdiff.process="/nonexistent/tool" \
> > +             diff -- one.c two.c >actual 2>stderr &&
> > +     # Both files produce diff output via fallback.
> > +     test_grep "return 10" actual &&
> > +     test_grep "return 20" actual &&
> > +     # Sentinel prevents repeated warnings: only one, not one per file.
> > +     test_grep "diff process.*failed" stderr >warnings &&
> > +     test_line_count = 1 warnings
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on bad hunks' '
> > +     git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \
> > +             diff boundary.c >actual 2>stderr &&
> > +     test_grep "^-OLD5" actual &&
> > +     test_grep "^+NEW5" actual &&
> > +     test_grep "^-OLD9" actual &&
> > +     test_grep "^+NEW9" actual &&
> > +     # Invalid hunks are caught by xdiff validation, not the
> > +     # protocol layer, so no warning is emitted.
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on mismatched unchanged totals' '
> > +     cat >synctest.c <<-\EOF &&
> > +     line1
> > +     line2
> > +     line3
> > +     EOF
> > +     git add synctest.c &&
> > +     git commit -m "add synctest.c" &&
> > +
> > +     cat >synctest.c <<-\EOF &&
> > +     line1
> > +     changed
> > +     line3
> > +     EOF
> > +
> > +     # bad-sync reports hunk 1 2 1 1: marks 2 old lines and 1 new
> > +     # line as changed, leaving 1 unchanged old vs 2 unchanged new.
> > +     # The synchronization invariant fails and git falls back.
> > +     git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \
> > +             diff synctest.c >actual 2>stderr &&
> > +     test_grep "changed" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on overlapping hunks' '
> > +     # boundary.c has 10 lines, so both hunks are in bounds
> > +     # but they overlap at lines 3-5, triggering the ordering check.
> > +     git -c diff.cdiff.process="$BACKEND --mode=overlap" \
> > +             diff boundary.c >actual 2>stderr &&
> > +     test_grep "NEW5" actual
> > +'
> > +
> > +test_done
> > diff --git a/userdiff.h b/userdiff.h
> > index 51c26e0d41..a98eabe377 100644
> > --- a/userdiff.h
> > +++ b/userdiff.h
> > @@ -3,6 +3,7 @@
> >
> >  #include "notes-cache.h"
> >
> > +struct diff_subprocess;
> >  struct index_state;
> >  struct repository;
> >
> > @@ -33,6 +34,8 @@ struct userdiff_driver {
> >       int textconv_want_cache;
> >       const char *process;
> >       char *process_owned;
> > +     struct diff_subprocess *diff_subprocess;
> > +     unsigned diff_process_failed : 1;
> >  };
> >  enum userdiff_driver_type {
> >       USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
> > --
> > gitgitgadget
> >
> >
> >

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael Montalbo wrote on the Git mailing list (how to reply to this email):

On Sun, Jun 7, 2026 at 7:36 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Michael,
>
> I stumbled about this patch when it broke CI in Git for Windows, where we
> do _not_ use `NO_PYTHON`, even though Python is unavailable in the
> build/test CI jobs. The existing tests handle this situation gracefully,
> this here patch does not:
>
> On Sun, 7 Jun 2026, Michael Montalbo via GitGitGadget wrote:
>
> > diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> > new file mode 100755
> > index 0000000000..f159cd86d8
> > --- /dev/null
> > +++ b/t/t4080-diff-process.sh
> > @@ -0,0 +1,538 @@
> > +#!/bin/sh
> > +
> > +test_description='diff process via long-running process'
> > +
> > +. ./test-lib.sh
> > +
> > +if test_have_prereq PYTHON
> > +then
> > +     PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
>
> When neither `python3` nor `python` are available (which is the case in
> the minimal Git for Windows SDK used in Git's CI runs), this fails under
> `set -e`. Before even running the first test case. Resulting in an
> unexpected TAP format error.
>
> Now, we could "fix" this by imitating what `lib-p4` does (see
> https://github.com/dscho/git/commit/bd0b5570c744f678911a67a62da63f30655f20d8
> which demonstrates that it is indeed a work-around on Windows):
>
> -- snip --
> diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> index fdf6da1c341e67..bd22c247ff3856 100755
> --- a/t/t4080-diff-process.sh
> +++ b/t/t4080-diff-process.sh
> @@ -4,9 +4,10 @@ test_description='diff process via long-running process'
>
>  . ./test-lib.sh
>
> -if test_have_prereq PYTHON
> +if ! test_have_prereq PYTHON || ! test -x "$PYTHON_PATH"
>  then
> -       PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
> +       skip_all='python interpreter not available'
> +       test_done
>  fi
>
>  #
> -- snap --
>
> Of course, this uncovers _another_ problem with the Python script: It uses
> Python3-only `f"..."` format strings, which cannot be handled by the
> Python2 to which the `PYTHON_PATH` variable in `linux-TEST-vars` points.
> So this requires _another follow-up (see also
> https://github.com/dscho/git/commit/c12a9f4c80e5ce8db0fe370fac46fb45be2b775f):
>
> -- snip --
> diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> index bd22c247ff3856..ba14682a9086e4 100755
> --- a/t/t4080-diff-process.sh
> +++ b/t/t4080-diff-process.sh
> @@ -39,7 +39,8 @@ setup_backend () {
>
>         def write_pkt(line):
>             data = (line + "\n").encode()
> -           sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
> +           hdr = "{:04x}".format(len(data) + 4).encode()
> +           sys.stdout.buffer.write(hdr + data)
>             sys.stdout.buffer.flush()
>
>         def write_flush():
> @@ -98,7 +99,8 @@ setup_backend () {
>             new = read_content()
>             old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
>             new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
> -           log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
> +           log("command={} pathname={} old={} new={}".format(
> +               cmd, pathname, old_first, new_first))
>
>             if mode == "error":
>                 write_flush()
> @@ -130,7 +132,7 @@ setup_backend () {
>                 else:
>                     ol = old.count(b"\n")
>                     nl = new.count(b"\n")
> -                   write_pkt(f"hunk 1 {ol} 1 {nl}")
> +                   write_pkt("hunk 1 {} 1 {}".format(ol, nl))
>                 write_flush()
>                 write_pkt("status=success")
>                 write_flush()
> -- snap --
>
> And this is still not enough to make it work with Python2, see
> https://github.com/dscho/git/actions/runs/27091523842/job/79955895737:
>
> -- snip --
> [...]
> + git -c diff.cdiff.process=./diff-process-backend --mode=fixed-hunk diff boundary.c
>   Traceback (most recent call last):
>     File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 45, in <module>
>       assert read_pkt() == "git-diff-client"
>     File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 4, in read_pkt
>       hdr = sys.stdin.buffer.read(4)
>   AttributeError: 'file' object has no attribute 'buffer'
> -- snap --
>
> I have experienced similar patterns in my career, where a single decision
> required multiple follow-up fixes _just_ to avoid having to revert that
> decision. This kind of doubling down has never ended well.
>
> Therefore I would like to take a step back, and ask: Is it _really_ a good
> idea to use Python here? Are we certain that we want to _require_ Python
> to run this test and skip it if Python isn't available (as is the case in
> the Windows-related parts of Git's very own CI) even if Python has nothing
> at all to do with the feature that is being tested?
>
> I don't want to be doomed to repeat history, and we can very well learn
> e.g. from prior art in this very project, where the tests for the
> clean/smudge filters (which _also_ want to speak pkt-line over stdio)
> needlessly incurred Perl as a requirement to run the tests. It was
> Matheus's heroic work in 52917a998ef3a (t0021: implementation the
> rot13-filter.pl script in C, 2022-08-14) and 4d1d843be7a15 (tests: use the
> new C rot13-filter helper to avoid PERL prereq, 2022-08-14) that avoided
> that unnecessary prerequisite.
>
> Likewise, there is `test-tool pkt-line` intended for driving the pkt-line
> protocol via simple shell scripts.
>
> So the conscious project direction has been: fold pkt-line test backends
> into `test-tool` and drop the scripting-language prereq. Reintroducing the
> same shape in Python would walk this back.
>
> Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl',
> 2025-04-29) has been another clear sign that the Git project is actively
> _removing_ scripting-language dependencies from the build and test
> infrastructure, not adding new ones.
>

Now I wonder if the extension / addition of more Perl test infra with my other
series:

https://lore.kernel.org/git/pull.2135.git.1780559158.gitgitgadget@gmail.com/T/

also goes against the project direction w.r.t. removing scripting languages.
Maybe I should also re-evaluate the usage of Perl there. I am leveraging
existing shell parsing logic written in Perl, but maybe the preference for
Perl-based lint rules is a mistake and should be avoided.

> The clear prior art in Git's own tests for what t4080 wants to do, as of
> today, is `t/helper/test-rot13-filter.c`, which could be imitated here
> instead of (re-)introducing a dependency on a scripting language other
> than Unix shell in Git's test suite.
>
> The `PYTHON` prereq exists in exactly five files today, all `git p4`
> related (where Python is an intrinsic prerequisite given that `git-p4.py`
> _is_ written in Python): `t/lib-git-p4.sh`, `t/t9802-git-p4-filetype.sh`,
> `t/t9810-git-p4-rcs.sh`, `t/t9835-git-p4-metadata-encoding-python2.sh`,
> and `t/t9836-git-p4-metadata-encoding-python3.sh`.
>
> After 7cdbff14d482 (remove merge-recursive-old, 2006-11-20), this here
> patch would be the first one, after almost 20 years, to re-introduce
> Python as a dependency outside `git p4`.
>
> And it would also be the first ever to embed a Python script as a heredoc:
>
> > +fi
> > +
> > +#
> > +# A single parametric diff process.
> > +# Usage: diff-process-backend --mode=<mode> [--log=<path>]
> > +#
> > +# Modes:
> > +#   whole-file  - report all lines as changed (default)
> > +#   fixed-hunk  - always report hunk 5 2 5 2
> > +#   bad-hunk    - report out-of-bounds hunk 999 1 999 1
> > +#   bad-sync    - report hunk with mismatched unchanged totals
> > +#   overlap     - report two overlapping hunks
> > +#   no-hunks   - return no hunks (files considered equivalent)
> > +#   error       - return status=error for every request
> > +#   abort       - return status=abort for every request
> > +#   crash       - read one request then exit without responding
> > +#
> > +setup_backend () {
> > +     cat >"$TRASH_DIRECTORY/diff-process-backend.py" <<-\PYEOF
> > +     import sys, os
> > +
> > +     def read_pkt():
> > +         hdr = sys.stdin.buffer.read(4)
> > +         if len(hdr) < 4: return None
> > +         length = int(hdr, 16)
> > +         if length == 0: return ""
> > +         data = sys.stdin.buffer.read(length - 4)
> > +         return data.decode().rstrip("\n")
> > +
> > +     def write_pkt(line):
> > +         data = (line + "\n").encode()
> > +         sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
> > +         sys.stdout.buffer.flush()
> > +
> > +     def write_flush():
> > +         sys.stdout.buffer.write(b"0000")
> > +         sys.stdout.buffer.flush()
> > +
> > +     def read_content():
> > +         chunks = []
> > +         while True:
> > +             hdr = sys.stdin.buffer.read(4)
> > +             if len(hdr) < 4: break
> > +             length = int(hdr, 16)
> > +             if length == 0: break
> > +             chunks.append(sys.stdin.buffer.read(length - 4))
> > +         return b"".join(chunks)
> > +
> > +     mode = "whole-file"
> > +     logfile = None
> > +     for arg in sys.argv[1:]:
> > +         if arg.startswith("--mode="):
> > +             mode = arg[7:]
> > +         elif arg.startswith("--log="):
> > +             logfile = open(arg[6:], "a")
> > +
> > +     def log(msg):
> > +         if logfile:
> > +             logfile.write(msg + "\n")
> > +             logfile.flush()
> > +
> > +     # Handshake
> > +     assert read_pkt() == "git-diff-client"
> > +     assert read_pkt() == "version=1"
> > +     read_pkt()
> > +     write_pkt("git-diff-server")
> > +     write_pkt("version=1")
> > +     write_flush()
> > +     while True:
> > +         p = read_pkt()
> > +         if p == "": break
> > +     write_pkt("capability=hunks")
> > +     write_flush()
> > +
> > +     log("ready")
> > +
> > +     while True:
> > +         cmd = None
> > +         pathname = None
> > +         while True:
> > +             p = read_pkt()
> > +             if p is None: sys.exit(0)
> > +             if p == "": break
> > +             if p.startswith("command="): cmd = p.split("=",1)[1]
> > +             if p.startswith("pathname="): pathname = p.split("=",1)[1]
> > +         if cmd is None: sys.exit(0)
> > +         old = read_content()
> > +         new = read_content()
> > +         old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
> > +         new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
> > +         log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
> > +
> > +         if mode == "error":
> > +             write_flush()
> > +             write_pkt("status=error")
> > +             write_flush()
> > +             continue
> > +
> > +         if mode == "abort":
> > +             write_flush()
> > +             write_pkt("status=abort")
> > +             write_flush()
> > +             continue
> > +
> > +         if mode == "crash":
> > +             sys.exit(1)
> > +
> > +         if cmd == "hunks":
> > +             if mode == "fixed-hunk":
> > +                 write_pkt("hunk 5 2 5 2")
> > +             elif mode == "bad-hunk":
> > +                 write_pkt("hunk 999 1 999 1")
> > +             elif mode == "bad-sync":
> > +                 write_pkt("hunk 1 2 1 1")
> > +             elif mode == "overlap":
> > +                 write_pkt("hunk 1 5 1 5")
> > +                 write_pkt("hunk 3 2 3 2")
> > +             elif mode == "no-hunks":
> > +                 pass
> > +             else:
> > +                 ol = old.count(b"\n")
> > +                 nl = new.count(b"\n")
> > +                 write_pkt(f"hunk 1 {ol} 1 {nl}")
> > +             write_flush()
> > +             write_pkt("status=success")
> > +             write_flush()
> > +         else:
> > +             write_flush()
> > +             write_pkt("status=error")
> > +             write_flush()
> > +     PYEOF
>
> The existing pattern is to provide larger scripts as fixtures in
> associated `t/tNNNN/` directories, not as heredoc, see e.g.
> `t/t1509/prepare-chroot.sh`. Writing scripts, especially lengthy ones, in
> heredoc strings makes it virtually impossible to use static code analysis
> or syntax highlighting to fend off banal errors.
>
> Given the complexity of what t4080 tries to test (error, abort, crash,
> bad-sync, no-hunks, multiple files in one session, capability
> negotiation), it would unfortunately be infeasible to use `test-tool
> pkt-line` from a shell script implementing that `diff.*.process` protocol.
>
> So I've spiked a demo how the `test-tool diff-process-backend` could look
> like (letting Opus do the menial typing, so that I can enjoy at least part
> of a sunny Sunday outside), which also passes the CI build and test:
> https://github.com/dscho/git/commit/b6e3c93381b00929476c3a00155f7cf7334a22e6
>
> That commit is of course not intended to be used as-is; Feel free to pick
> code parts of it and integrate them into your topic branch. Or write your
> own test-tool helper from scratch if that's more your jam.
>
> Ciao,
> Johannes
>
> > +     write_script diff-process-backend <<-SHEOF
> > +     exec "$PYTHON_PATH" "$TRASH_DIRECTORY/diff-process-backend.py" "\$@"
> > +     SHEOF
> > +}
> > +
> > +BACKEND="./diff-process-backend"
> > +
> > +test_expect_success PYTHON 'setup' '
> > +     setup_backend &&
> > +     echo "*.c diff=cdiff" >.gitattributes &&
> > +     git add .gitattributes &&
> > +
> > +     # boundary.c: 10 lines, changes at 5-6 and 9-10.
> > +     # Used by: hunk boundaries, error fallback, crash, bad hunks, overlap.
> > +     cat >boundary.c <<-\EOF &&
> > +     line1
> > +     line2
> > +     line3
> > +     line4
> > +     OLD5
> > +     OLD6
> > +     line7
> > +     line8
> > +     OLD9
> > +     OLD10
> > +     EOF
> > +     git add boundary.c &&
> > +
> > +     # worddiff.c: single-line function, value changes 1 -> 999.
> > +     # Used by: word-diff, --diff-algorithm, --no-ext-diff, --stat.
> > +     cat >worddiff.c <<-\EOF &&
> > +     int value(void) { return 1; }
> > +     EOF
> > +     git add worddiff.c &&
> > +
> > +     # newfile.c: single-line function, value changes 42 -> 99.
> > +     # Used by: new file, --exit-code, multiple drivers.
> > +     cat >newfile.c <<-\EOF &&
> > +     int new_func(void) { return 42; }
> > +     EOF
> > +     git add newfile.c &&
> > +
> > +     # logtest.c: single-line function for log/format-patch tests.
> > +     # Needs two commits so log -1 has a diff.
> > +     cat >logtest.c <<-\EOF &&
> > +     int logfunc(void) { return 1; }
> > +     EOF
> > +     git add logtest.c &&
> > +
> > +     # two.c/one.c: two-file pair for error/abort/startup-failure tests.
> > +     cat >one.c <<-\EOF &&
> > +     int first(void) { return 1; }
> > +     EOF
> > +     cat >two.c <<-\EOF &&
> > +     int second(void) { return 2; }
> > +     EOF
> > +     git add one.c two.c &&
> > +
> > +     git commit -m "initial" &&
> > +
> > +     # Second commit for logtest.c (so log -1 has something to show).
> > +     cat >logtest.c <<-\EOF &&
> > +     int logfunc(void) { return 2; }
> > +     EOF
> > +     git add logtest.c &&
> > +     git commit -m "change logtest.c" &&
> > +
> > +     # Working tree modifications (not committed).
> > +     cat >boundary.c <<-\EOF &&
> > +     line1
> > +     line2
> > +     line3
> > +     line4
> > +     NEW5
> > +     NEW6
> > +     line7
> > +     line8
> > +     NEW9
> > +     NEW10
> > +     EOF
> > +
> > +     cat >worddiff.c <<-\EOF &&
> > +     int value(void) { return 999; }
> > +     EOF
> > +
> > +     cat >newfile.c <<-\EOF &&
> > +     int new_func(void) { return 99; }
> > +     EOF
> > +
> > +     cat >one.c <<-\EOF &&
> > +     int first(void) { return 10; }
> > +     EOF
> > +
> > +     cat >two.c <<-\EOF
> > +     int second(void) { return 20; }
> > +     EOF
> > +'
> > +
> > +#
> > +# Core behavior: the tool controls which lines are marked as changed.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process hunk boundaries affect output' '
> > +     # The file has changes at lines 5-6 and 9-10, but fixed-hunk
> > +     # only reports lines 5-6 as changed.  Lines 9-10 should not
> > +     # appear as changed in the output.
> > +     git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \
> > +             diff boundary.c >actual &&
> > +     test_grep "^-OLD5" actual &&
> > +     test_grep "^-OLD6" actual &&
> > +     test_grep "^+NEW5" actual &&
> > +     test_grep "^+NEW6" actual &&
> > +     test_grep ! "^-OLD9" actual &&
> > +     test_grep ! "^-OLD10" actual &&
> > +     test_grep ! "^+NEW9" actual &&
> > +     test_grep ! "^+NEW10" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with new file' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff -- newfile.c >actual 2>stderr &&
> > +     test_grep "return 99" actual &&
> > +     test_grep "pathname=newfile.c" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with added file (empty old side)' '
> > +     cat >added.c <<-\EOF &&
> > +     int added(void) { return 1; }
> > +     EOF
> > +     git add added.c &&
> > +
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff --cached -- added.c >actual 2>stderr &&
> > +     test_grep "added" actual &&
> > +     test_grep "pathname=added.c" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process skipped for binary files' '
> > +     printf "\\0binary" >binary.c &&
> > +     git add binary.c &&
> > +     git commit -m "add binary" &&
> > +     printf "\\0changed" >binary.c &&
> > +
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff -- binary.c >actual &&
> > +     test_grep "Binary files" actual &&
> > +     test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process not consulted for unmatched driver' '
> > +     echo "not tracked by cdiff" >unmatched.txt &&
> > +     git add unmatched.txt &&
> > +     git commit -m "add unmatched.txt" &&
> > +
> > +     echo "modified" >unmatched.txt &&
> > +
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff -- unmatched.txt >actual &&
> > +     test_grep "modified" actual &&
> > +     test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'multiple drivers use separate processes' '
> > +     echo "*.h diff=hdiff" >>.gitattributes &&
> > +     git add .gitattributes &&
> > +
> > +     cat >multi.h <<-\EOF &&
> > +     int header(void) { return 1; }
> > +     EOF
> > +     git add multi.h &&
> > +     git commit -m "add multi.h" &&
> > +
> > +     cat >multi.h <<-\EOF &&
> > +     int header(void) { return 2; }
> > +     EOF
> > +
> > +     rm -f backend-c.log backend-h.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \
> > +         -c diff.hdiff.process="$BACKEND --log=backend-h.log" \
> > +             diff -- newfile.c multi.h >actual 2>stderr &&
> > +     test_grep "pathname=newfile.c" backend-c.log &&
> > +     test_grep "pathname=multi.h" backend-h.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works alongside textconv' '
> > +     write_script uppercase-filter <<-\EOF &&
> > +     tr "a-z" "A-Z" <"$1"
> > +     EOF
> > +
> > +     cat >textconv.c <<-\EOF &&
> > +     hello world
> > +     EOF
> > +     git add textconv.c &&
> > +     git commit -m "add textconv.c" &&
> > +
> > +     cat >textconv.c <<-\EOF &&
> > +     goodbye world
> > +     EOF
> > +
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.textconv="./uppercase-filter" \
> > +         -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff -- textconv.c >actual 2>stderr &&
> > +     # The diff process receives textconv-transformed (uppercase) content.
> > +     test_grep "pathname=textconv.c" backend.log &&
> > +     test_grep "old=HELLO WORLD" backend.log &&
> > +     test_grep "new=GOODBYE WORLD" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +#
> > +# Downstream features: word diff, log, equivalent files, exit code.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process with --word-diff' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff --word-diff worddiff.c >actual 2>stderr &&
> > +     test_grep "\[-1;-\]" actual &&
> > +     test_grep "{+999;+}" actual &&
> > +     test_grep "pathname=worddiff.c" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with git log -p' '
> > +     # With no-hunks mode, the tool says the files are equivalent,
> > +     # so log -p should show the commit but no diff content.
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
> > +             log -1 -p -- logtest.c >actual 2>stderr &&
> > +     test_grep "change logtest.c" actual &&
> > +     test_grep ! "return 2" actual &&
> > +     test_grep "command=hunks pathname=logtest.c" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process no hunks suppresses diff output' '
> > +     cat >nohunks.c <<-\EOF &&
> > +     int zero(void) { return 0; }
> > +     EOF
> > +     git add nohunks.c &&
> > +     git commit -m "add nohunks.c" &&
> > +
> > +     cat >nohunks.c <<-\EOF &&
> > +     int zero(void) { return 999; }
> > +     EOF
> > +
> > +     git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> > +             diff nohunks.c >actual &&
> > +     test_must_be_empty actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process no hunks with --exit-code returns success' '
> > +     git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> > +             diff --exit-code nohunks.c
> > +'
> > +
> > +test_expect_success PYTHON 'diff process with --exit-code and hunks returns failure' '
> > +     test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \
> > +             diff --exit-code newfile.c
> > +'
> > +
> > +#
> > +# Bypass mechanisms: flags and commands that skip the diff process.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff --diff-algorithm=patience worddiff.c >actual &&
> > +     test_grep "return 999" actual &&
> > +     test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process not used by --stat' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff --stat worddiff.c >actual &&
> > +     test_grep "worddiff.c" actual &&
> > +     test_path_is_missing backend.log
> > +'
> > +
> > +#
> > +# Error handling and fallback.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process fallback on tool error status' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> > +             diff boundary.c >actual 2>stderr &&
> > +     # Fallback produces the full builtin diff (both change regions).
> > +     test_grep "^-OLD5" actual &&
> > +     test_grep "^+NEW5" actual &&
> > +     test_grep "^-OLD9" actual &&
> > +     test_grep "^+NEW9" actual &&
> > +     # Tool was contacted (it replied with error, not crash).
> > +     test_grep "command=hunks pathname=boundary.c" backend.log &&
> > +     test_grep "diff process.*failed" stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process error keeps tool available for next file' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> > +             diff -- one.c two.c >actual 2>stderr &&
> > +     # Unlike abort, error keeps the tool available: both files
> > +     # are sent to the tool (and both fall back).
> > +     test_grep "pathname=one.c" backend.log &&
> > +     test_grep "pathname=two.c" backend.log &&
> > +     test_grep "return 10" actual &&
> > +     test_grep "return 20" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process abort disables for session' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \
> > +             diff -- one.c two.c >actual &&
> > +     # Both files should still produce diff output via fallback.
> > +     test_grep "return 10" actual &&
> > +     test_grep "return 20" actual &&
> > +     # The tool aborts on the first file and git clears its
> > +     # capability.  The second file never contacts the tool.
> > +     test_grep "pathname=one.c" backend.log &&
> > +     test_grep ! "pathname=two.c" backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on tool crash' '
> > +     git -c diff.cdiff.process="$BACKEND --mode=crash" \
> > +             diff boundary.c >actual 2>stderr &&
> > +     test_grep "^-OLD5" actual &&
> > +     test_grep "^+NEW5" actual &&
> > +     test_grep "^-OLD9" actual &&
> > +     test_grep "^+NEW9" actual &&
> > +     # Crash is a communication failure, so a warning is emitted.
> > +     test_grep "diff process.*failed" stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process startup failure only warns once' '
> > +     git -c diff.cdiff.process="/nonexistent/tool" \
> > +             diff -- one.c two.c >actual 2>stderr &&
> > +     # Both files produce diff output via fallback.
> > +     test_grep "return 10" actual &&
> > +     test_grep "return 20" actual &&
> > +     # Sentinel prevents repeated warnings: only one, not one per file.
> > +     test_grep "diff process.*failed" stderr >warnings &&
> > +     test_line_count = 1 warnings
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on bad hunks' '
> > +     git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \
> > +             diff boundary.c >actual 2>stderr &&
> > +     test_grep "^-OLD5" actual &&
> > +     test_grep "^+NEW5" actual &&
> > +     test_grep "^-OLD9" actual &&
> > +     test_grep "^+NEW9" actual &&
> > +     # Invalid hunks are caught by xdiff validation, not the
> > +     # protocol layer, so no warning is emitted.
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on mismatched unchanged totals' '
> > +     cat >synctest.c <<-\EOF &&
> > +     line1
> > +     line2
> > +     line3
> > +     EOF
> > +     git add synctest.c &&
> > +     git commit -m "add synctest.c" &&
> > +
> > +     cat >synctest.c <<-\EOF &&
> > +     line1
> > +     changed
> > +     line3
> > +     EOF
> > +
> > +     # bad-sync reports hunk 1 2 1 1: marks 2 old lines and 1 new
> > +     # line as changed, leaving 1 unchanged old vs 2 unchanged new.
> > +     # The synchronization invariant fails and git falls back.
> > +     git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \
> > +             diff synctest.c >actual 2>stderr &&
> > +     test_grep "changed" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on overlapping hunks' '
> > +     # boundary.c has 10 lines, so both hunks are in bounds
> > +     # but they overlap at lines 3-5, triggering the ordering check.
> > +     git -c diff.cdiff.process="$BACKEND --mode=overlap" \
> > +             diff boundary.c >actual 2>stderr &&
> > +     test_grep "NEW5" actual
> > +'
> > +
> > +test_done
> > diff --git a/userdiff.h b/userdiff.h
> > index 51c26e0d41..a98eabe377 100644
> > --- a/userdiff.h
> > +++ b/userdiff.h
> > @@ -3,6 +3,7 @@
> >
> >  #include "notes-cache.h"
> >
> > +struct diff_subprocess;
> >  struct index_state;
> >  struct repository;
> >
> > @@ -33,6 +34,8 @@ struct userdiff_driver {
> >       int textconv_want_cache;
> >       const char *process;
> >       char *process_owned;
> > +     struct diff_subprocess *diff_subprocess;
> > +     unsigned diff_process_failed : 1;
> >  };
> >  enum userdiff_driver_type {
> >       USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
> > --
> > gitgitgadget
> >
> >
> >

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> So the conscious project direction has been: fold pkt-line test backends
> into `test-tool` and drop the scripting-language prereq. Reintroducing the
> same shape in Python would walk this back.
> ...
> The `PYTHON` prereq exists in exactly five files today, all `git p4`
> related (where Python is an intrinsic prerequisite given that `git-p4.py`
> _is_ written in Python): `t/lib-git-p4.sh`, `t/t9802-git-p4-filetype.sh`,
> `t/t9810-git-p4-rcs.sh`, `t/t9835-git-p4-metadata-encoding-python2.sh`,
> and `t/t9836-git-p4-metadata-encoding-python3.sh`.
> ...
> That commit is of course not intended to be used as-is; Feel free to pick
> code parts of it and integrate them into your topic branch. Or write your
> own test-tool helper from scratch if that's more your jam.

Showing better direction to new folks with such a clear thinking is
very much appreciated.  Even though it is natural and perfectly OK
for tests that interacts with parts of Git that are written in these
languages (e.g., we are OK for gitweb tests to require Perl), we
should consciously keep ourselves clean and not adding unnecessary
dependencies.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

Michael Montalbo <mmontalbo@gmail.com> writes:

> On Sun, Jun 7, 2026 at 7:36 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>>
>> Hi Michael,
>>
>> I stumbled about this patch when it broke CI in Git for Windows, where we
>> do _not_ use `NO_PYTHON`, even though Python is unavailable in the
>> build/test CI jobs. The existing tests handle this situation gracefully,
>> this here patch does not:
>> ...
>> Given the complexity of what t4080 tries to test (error, abort, crash,
>> bad-sync, no-hunks, multiple files in one session, capability
>> negotiation), it would unfortunately be infeasible to use `test-tool
>> pkt-line` from a shell script implementing that `diff.*.process` protocol.
>>
>> So I've spiked a demo how the `test-tool diff-process-backend` could look
>> like (letting Opus do the menial typing, so that I can enjoy at least part
>> of a sunny Sunday outside), which also passes the CI build and test:
>> https://github.com/dscho/git/commit/b6e3c93381b00929476c3a00155f7cf7334a22e6
>>
>> That commit is of course not intended to be used as-is; Feel free to pick
>> code parts of it and integrate them into your topic branch. Or write your
>> own test-tool helper from scratch if that's more your jam.
>>
>
> Johannes, thank you for the great feedback. The historical context is
> really helpful and
> the concerns you raise make a lot of sense. I will take a look at your
> spike and also work
> on removing Python from the test.

Another request.

Please do not force readers to scroll through a ~800 line message
just to read only 5 lines of response from you.  Keep relevant parts
of the message you are responding to in your message to help readers
understand the context in which your response was made, but trim
everything else that is not relevant from your quote.

Thanks.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

Michael Montalbo <mmontalbo@gmail.com> writes:

>> So the conscious project direction has been: fold pkt-line test backends
>> into `test-tool` and drop the scripting-language prereq. Reintroducing the
>> same shape in Python would walk this back.
>>
>> Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl',
>> 2025-04-29) has been another clear sign that the Git project is actively
>> _removing_ scripting-language dependencies from the build and test
>> infrastructure, not adding new ones.
>
> Now I wonder if the extension / addition of more Perl test infra with my other
> series:
>
> https://lore.kernel.org/git/pull.2135.git.1780559158.gitgitgadget@gmail.com/T/
>
> also goes against the project direction w.r.t. removing scripting languages.
> Maybe I should also re-evaluate the usage of Perl there. I am leveraging
> existing shell parsing logic written in Perl, but maybe the preference for
> Perl-based lint rules is a mistake and should be avoided.

That sounds prudent (even though it is outside the scope of _this_
topic, of course).

Thanks.

conversion outputs. See linkgit:gitattributes[5] for details.

`diff.<driver>.process`::
The command to run as a long-running diff process that
provides hunks to Git's diff pipeline.
See linkgit:gitattributes[5] for details.

`diff.indentHeuristic`::
Set this option to `false` to disable the default heuristics
that shift diff hunk boundaries to make patches easier to read.
Expand Down
3 changes: 3 additions & 0 deletions Documentation/diff-algorithm-option.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,6 @@
For instance, if you configured the `diff.algorithm` variable to a
non-default value and want to use the default one, then you
have to use `--diff-algorithm=default` option.
+
If you explicitly choose a diff algorithm, it also bypasses
`diff.<driver>.process` (see linkgit:gitattributes[5]).
4 changes: 3 additions & 1 deletion Documentation/diff-options.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -833,7 +833,9 @@ endif::git-format-patch[]
to use this option with linkgit:git-log[1] and friends.

`--no-ext-diff`::
Disallow external diff drivers.
Disallow external diff helpers, including
`diff.<driver>.command` and `diff.<driver>.process`
(see linkgit:gitattributes[5]).

`--textconv`::
`--no-textconv`::
Expand Down
253 changes: 253 additions & 0 deletions Documentation/gitattributes.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -784,6 +784,16 @@ with the above configuration, i.e. `j-c-diff`, with 7
parameters, just like `GIT_EXTERNAL_DIFF` program is called.
See linkgit:git[1] for details.

An external diff driver replaces the patch Git would otherwise
produce for the path: Git runs the command and shows its output in
place of its own. Output features that post-process Git's diff do
not apply to it; word diff, function context (`-W`), `--color-moved`,
and coloring all act on Git's builtin diff, not the driver's output.
The driver is consulted only when Git generates a textual patch. The
summary formats (`--stat`, `--numstat`, `--shortstat`, and
`--dirstat`), `git blame`, and `git log -L` do not run it and
continue to use Git's builtin diff.

If the program is able to ignore certain changes (similar to
`git diff --ignore-space-change`), then also set the option
`trustExitCode` to true. It is then expected to return exit code 1 if
Expand Down Expand Up @@ -821,6 +831,249 @@ NOTE: If `diff.<name>.command` is defined for path with the
(see above), and adding `diff.<name>.algorithm` has no effect, as the
algorithm is not passed to the external diff driver.

Using an external diff process
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If `diff.<name>.process` is defined, Git sends the old and new file
content to an external tool and receives back a list of changed
regions (pairs of line ranges in the old and new file). Git uses
these instead of its builtin diff algorithm, but still controls
all output formatting, so features like word diff, function context,
color, and blame work normally. This is achieved by using the
long-running process protocol (described in
Documentation/technical/long-running-process-protocol.adoc).
Unlike `diff.<name>.command`, which replaces Git's output entirely,
the diff process feeds results back into the standard pipeline. If
both are configured for a path, `diff.<name>.command` takes precedence
for the output it produces: since it replaces the diff, the process is
not consulted there.

First, in `.gitattributes`, assign the `diff` attribute for paths.

------------------------
*.c diff=cdiff
------------------------

Then, define a "diff.<name>.process" configuration to specify
the diff process command.

----------------------------------------------------------------
[diff "cdiff"]
process = /path/to/diff-process-tool
----------------------------------------------------------------

When Git encounters the first file that needs to be diffed, it starts
the process and performs the handshake. In the handshake, the welcome
message sent by Git is "git-diff-client", only version 1 is supported,
and the supported capability is "hunks" (the changed regions
described below).

For each file, Git sends a list of "key=value" pairs terminated with
a flush packet, followed by the old and new file content as packetized
data, each terminated with a flush packet. The pathname is relative
to the repository root. When `diff.<name>.textconv` is also set,
the tool receives the textconv-transformed content rather than the
raw blob, matching what the consuming feature itself diffs: patch
output is textconv'd, the summary formats (noted below) are not, and
`git blame` applies textconv only under `--textconv`. Git does not
send binary files to the diff process.

-----------------------
packet: git> command=hunks
packet: git> pathname=path/file.c
packet: git> 0000
packet: git> OLD_CONTENT
packet: git> 0000
packet: git> NEW_CONTENT
packet: git> 0000
-----------------------

The tool is expected to respond with zero or more hunk lines,
a flush packet, and a status packet terminated with a flush packet.
Each hunk line has the form:

`hunk <old_start> <old_count> <new_start> <new_count>`

where `<old_start>` and `<old_count>` identify a range of lines in
the old file, and `<new_start>` and `<new_count>` identify the
replacement range in the new file. Start values are 1-based and
counts are non-negative. Ranges must not extend beyond the end of
the file. For example, `hunk 3 2 3 4` means that 2 lines starting
at line 3 in the old file were replaced by 4 lines starting at
line 3 in the new file. An `<old_count>` of 0 means no lines were
removed (pure insertion); a `<new_count>` of 0 means no lines were
added (pure deletion). A start value of 0 is accepted when
the corresponding count is 0 (e.g., `hunk 0 0 1 5` for a newly
added file), matching what `git diff` itself emits for empty
file sides. Git ignores any extra whitespace-separated tokens after
`<new_count>`, so a future protocol version can append fields to a
hunk line (for example a "moved" marker) without older clients
rejecting it.

Lines are delimited by newlines. A file `"foo\nbar\n"` and a
file `"foo\nbar"` both have 2 lines.

Hunks must be listed in order and must not overlap. Any line not
covered by a hunk is treated as unchanged and is paired, in order,
with the unchanged lines on the other side. Each run of unchanged
lines between two hunks (and the run before the first hunk and
after the last) must therefore be the same length on both sides,
not merely equal in total. For the hunks `1 3 1 5` and `10 2 12 2`
below, lines 4-9 of the old file and lines 6-11 of the new file are
both the six unchanged lines between the two hunks. A response that
balances only the total unchanged count but misaligns one of these
runs is rejected, and Git falls back to the builtin diff.

Git does not check that the lines a hunk leaves unchanged are
byte-for-byte identical between the two sides; it pairs them by
position and shows the new side as context. A tool may therefore
report lines that differ textually (a pure reformatting, say) as
unchanged, and the diff reflects that judgment. This is
the point of a semantic backend, but it means a misbehaving tool can
produce a diff whose context does not match the old blob; as with
`git diff -w`, such a patch may not apply against the old content.

-----------------------
packet: git< hunk 1 3 1 5
packet: git< hunk 10 2 12 2
packet: git< 0000
packet: git< status=success
packet: git< 0000
-----------------------

If the tool responds with hunks and "success", Git marks those lines
as changed and feeds them into the standard diff pipeline. Git may
still slide or regroup those changes against matching context for
display, exactly as it compacts its own diffs, so the tool controls
which lines are reported as changed, not the precise hunk boundaries.
Patch output features (word diff, function context, color) work
normally, as do summary formats like `--stat`. Not every feature
consults the process, though; see "Which features consult the diff
process" below for the full picture and the reasoning behind it.

If no hunk lines precede the flush, followed by "success", Git
treats the files as having no changes: `git diff` produces no output,
`git diff --exit-code` and `--quiet` report success even though the
stored blobs differ, and `git blame` skips the commit, attributing
lines to earlier commits.
The one exception is a change that only adds or removes the file's
trailing newline: it cannot be expressed as line hunks, so when the
line content otherwise matches Git keeps the builtin diff for that
file (preserving the `\ No newline at end of file` marker) instead of
treating the two sides as equal.

-----------------------
packet: git< 0000
packet: git< status=success
packet: git< 0000
-----------------------

If the tool returns invalid hunks (out of bounds, overlapping, or
mismatched unchanged line counts), Git warns and falls back to the
builtin diff algorithm.

In case the tool cannot or does not want to process the content,
it is expected to respond with an "error" status. Git warns and
falls back to the builtin diff algorithm for this file. The tool
remains available for subsequent files.

-----------------------
packet: git< 0000
packet: git< status=error
packet: git< 0000
-----------------------

In case the tool cannot or does not want to process the content as
well as any future content for the lifetime of the Git process, it
is expected to respond with an "abort" status. Git silently falls
back to the builtin diff algorithm for this file and does not send
further requests to the tool.

-----------------------
packet: git< 0000
packet: git< status=abort
packet: git< 0000
-----------------------

If the tool dies during the communication or does not adhere to the
protocol then Git will stop the process and fall back to the builtin
diff algorithm. Git warns once and does not restart the process for
subsequent files.

Tools should ignore unknown keys in the per-file request to remain
forward-compatible. Future versions of Git may send additional
`command=` values; tools that receive an unrecognized command should
respond with `status=error` rather than terminating.

Which features consult the diff process
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The diff process answers a single question: given two blobs, which
line ranges differ? Whether a particular feature consults it follows
from whether that is the question the feature is really asking.

Features that ask "which lines changed" use the tool's hunks in place
of the builtin algorithm:

- `git diff` patch output, together with everything layered on it:
word diff, function context (`-W`), `--color-moved`, and the `@@`
hunk headers. These operate on the lines the patch step already
emitted, so they reflect the tool's hunks without any further
negotiation.
- `git blame`: a commit whose change the tool reports as equivalent is
skipped, and its lines are attributed to an earlier commit.
- `git log -L`: both the line-range display and the underlying range
tracking consult the tool, so a commit it reports as equivalent is
dropped from the log (its tracked range maps across unchanged)
rather than selected and then shown with an empty diff.
- `--stat`, `--numstat`, and `--shortstat`: the inserted and deleted
counts come from the tool's hunks, so a file the tool calls
equivalent contributes no stat line, matching the empty patch that
`git diff` produces for it. These summary formats do not apply
textconv (just as the builtin summary path does not), so the tool
is consulted on the raw blob content even when a `textconv` is also
configured for patch output; this mirrors how builtin `--stat`
already counts raw lines rather than the textconv'd view. The
line-counting `--dirstat=lines` uses these same counts; the default
`--dirstat`, which weighs byte changes, is computed on its own and
does not consult the tool.

Features that ask a different question do not consult the process, by
design:

- The pickaxe `-G<regex>` searches the textual diff for a pattern; it
asks "does this string appear in the diff," not "did these lines
change." (`-S` runs at an earlier stage and is likewise unaffected.)
- `git patch-id` must produce a stable hash for `git rebase` and
cherry-pick detection; deriving it from a configured tool would make
equal patches hash differently from machine to machine.
- The merge machinery (`git merge-tree`, `rerere`) computes merge
content and conflict signatures rather than display output, so the
tool's hunks must not alter its results.
- `git range-diff` diffs patch text, not source blobs, so source-file
hunks do not apply to it.
- `--check` reports whitespace errors in added lines using the builtin
diff's notion of which lines are added, not the tool's. It can
therefore flag (and exit non-zero on) a line the tool treats as
unchanged and that `git diff` shows as context. Whitespace breakage
is a property of the literal bytes, so `--check` keeps the builtin
partition deliberately; a future change could wire it to the tool if
matching `git diff` exactly became desirable.
- `--raw`, `--name-only`, and `--name-status` compare object ids at
the tree level and never run a line-level diff at all.

Combined diffs (`--cc` and merge diffs) ask "which lines changed" but
still use the builtin algorithm, and may consult the process in a
later change; their protocol would have to be extended from a single
old/new pair to one comparison per merge parent.

`--no-ext-diff` and `--diff-algorithm` bypass the process entirely,
for every feature listed above. The whitespace-ignoring options
(`-w`, `--ignore-space-change`, `--ignore-blank-lines`, and the like),
`-I<regex>`, and `--anchored` also bypass it for the affected files:
the tool is never told about these options, so it could not honor
them, and Git falls back to the builtin diff, which does.

Defining a custom hunk-header
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -811,6 +811,7 @@ TEST_BUILTINS_OBJS += test-csprng.o
TEST_BUILTINS_OBJS += test-date.o
TEST_BUILTINS_OBJS += test-delete-gpgsig.o
TEST_BUILTINS_OBJS += test-delta.o
TEST_BUILTINS_OBJS += test-diff-process-backend.o
TEST_BUILTINS_OBJS += test-dir-iterator.o
TEST_BUILTINS_OBJS += test-drop-caches.o
TEST_BUILTINS_OBJS += test-dump-cache-tree.o
Expand Down Expand Up @@ -1140,6 +1141,7 @@ LIB_OBJS += diff-delta.o
LIB_OBJS += diff-merges.o
LIB_OBJS += diff-lib.o
LIB_OBJS += diff-no-index.o
LIB_OBJS += diff-process.o
LIB_OBJS += diff.o
LIB_OBJS += diffcore-break.o
LIB_OBJS += diffcore-delta.o
Expand Down
46 changes: 37 additions & 9 deletions blame.c
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
#include "tag.h"
#include "trace2.h"
#include "blame.h"
#include "diff-process.h"
#include "xdiff-interface.h"
#include "alloc.h"
#include "commit-slab.h"
#include "bloom.h"
Expand Down Expand Up @@ -314,17 +316,25 @@ static struct commit *fake_working_tree_commit(struct repository *r,



static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts)
static int diff_hunks_xpp(mmfile_t *file_a, mmfile_t *file_b,
xdl_emit_hunk_consume_func_t hunk_func,
void *cb_data, xpparam_t *xpp)
{
xpparam_t xpp = {0};
xdemitconf_t xecfg = {0};
xdemitcb_t ecb = {NULL};

xpp.flags = xdl_opts;
xecfg.hunk_func = hunk_func;
ecb.priv = cb_data;
return xdi_diff(file_a, file_b, &xpp, &xecfg, &ecb);
return xdi_diff(file_a, file_b, xpp, &xecfg, &ecb);
}

static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts)
{
xpparam_t xpp = {0};

xpp.flags = xdl_opts;
return diff_hunks_xpp(file_a, file_b, hunk_func, cb_data, &xpp);
}

static const char *get_next_line(const char *start, const char *end)
Expand Down Expand Up @@ -1946,6 +1956,7 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb,
struct blame_origin *parent, int ignore_diffs)
{
mmfile_t file_p, file_o;
xpparam_t xpp = {0};
struct blame_chunk_cb_data d;
struct blame_entry *newdest = NULL;

Expand All @@ -1964,10 +1975,27 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb,
&sb->num_read_blob, ignore_diffs);
sb->num_get_patch++;

if (diff_hunks(&file_p, &file_o, blame_chunk_cb, &d, sb->xdl_opts))
die("unable to generate diff (%s -> %s)",
oid_to_hex(&parent->commit->object.oid),
oid_to_hex(&target->commit->object.oid));
xpp.flags = sb->xdl_opts;
/*
* Whitespace-ignoring options are not communicated to the diff
* process, so it could not honor them; consult it only when none
* are set, and otherwise use the builtin diff (which does honor
* them via xpp.flags), matching the bypass in
* diff_process_fill_hunks() for "git diff". If the process
* considers the files equivalent, skip the diff so blame looks
* past this commit.
*/
if ((xpp.flags & (XDF_WHITESPACE_FLAGS | XDF_IGNORE_BLANK_LINES)) ||
diff_process_fill_hunks(&sb->revs->diffopt, target->path,
&file_p, &file_o, &xpp)
!= DIFF_PROCESS_EQUIVALENT) {
if (diff_hunks_xpp(&file_p, &file_o, blame_chunk_cb,
&d, &xpp))
die("unable to generate diff (%s -> %s)",
oid_to_hex(&parent->commit->object.oid),
oid_to_hex(&target->commit->object.oid));
}
free(xpp.external_hunks);
/* The rest are the same as the parent */
blame_chunk(&d.dstq, &d.srcq, INT_MAX, d.offset, INT_MAX, 0,
parent, target, 0);
Expand Down
Loading
Loading