CI: Check html links for doc#4178
Conversation
|
LGTM. htmltest in a container works well and keeps doc builds free of new dev dependencies. One note for later: we already have an in-tree link checker (docs/src/checkref) built on w3c-linkchecker, but it has been a silent no-op for a while since recent w3c-linkchecker refuses file:// URIs, so it validates nothing. This CI check is a real improvement over that. If we ever want something more actively maintained than htmltest, lychee (Rust, async) is excellent. I'd keep it CI-only via Docker though: there is no Debian apt package for it, so it is not a good fit for the local make checkref target where we rely on apt-installable deps. |
|
Hmm, it doesn't really make sense to have two ways of checking links. However, after some investigation:
Should be easy to integrate it CI, let's see... |
|
Please no, its soooooo slow 😿 |
:-D Ok, but thanks for the tip with lychee, it has even a github action. Let's see what the CI has to say to my change. |
|
Will have to see what @BsAtHome thinks, I'd rather have it quick, but this is an out of tree dep, probably will get contested... |
|
It seems to be an integral part of CI, so there should be less of a problem. The real issue would be how you can do this locally. It is one thing to have CI check, but when working, then you want this to be functional from your dev machine. Lychee is also a photo management system. Confusion is preprogrammed, it seems. FWIW, Fedora doesn't even have the w3c checker. I had to create a local empty script to get through the doc build. Adding more dependencies is, well, not nice. Especially not nice when they are not in any repo. And very "unnice" when requiring internet connections. So, lets just say, I'm not convinced. |
|
Ah now I see the issue, checklink is indeed active in the CI but only for english. Of course, all other languages have broken links. So this could also be corrected by checking all other languages as well instead of running a recursive link checker in CI afterwards. At the moment, checklink is run with each html file as an argument. Doing it recursively with |
I disabled external link checks in CI, this takes ages. But you should run it from time to time locally, there are hundreds of broken external links... Of course this also needs a way to run locally, it's not nice to have to use CI to check your issues.
For w3c-linkchecker: Just For debian, there is a package, but it is broken and throws a million
Me neither, still WIP. I was not aware that there is already a check in which is broken. The question is: Downside: Both new ones have no debian package. You would need to use docker, download a binary or build from source. Not so nice. However, the two I tried have quite a nice output. htmltest: |
|
Just including the w3c link checker is not enough. You then need to add all the Perl dependencies, at least in the dev-build dependencies. Why do lychee and htmltest not report the same problems? That is a bit fishy. Doing docker runs is also a problem IMO. They fall into the same category as curl runs. At least lychee has a CI provided target, which is a better shield than a download-and-run scenario. |
They all report different kinds of errors. It has to be investigated which tool reports the most useful errors and some can also be turned on/off.
Docker this way shroud be fine as i understand it. If the tool is compromised, it still can not break out of docker and compromise the CI. The github actions are from a marketplace: https://github.com/marketplace/actions/lychee-broken-link-checker I don't know if / how well there are checked, as bad as curl from a github link to a random repo i assume. By using the git hash instead of the version, you can avoid compromised versions coming in, but if the actual version is already bad, bad luck. I would have to switch to the hash before merging. Only the ones starting with actions/ are directly from github and are save as long as github itself is not compromised. So and the variant with w3c in CI: So the options so fare are:
|
Chasing false positive and missing true problems will just take a lot of time. There needs to be real resilience in the checker for it to be worth the effort.
So far, I'm not liking any way or option of getting code from "other" places. The "marketplace" is another work for "jungle law rules". Not a system I trust or would want to have bindings to. As for the time it takes to check, yes, the old w3c checker is slow. Others are faster, but I have some strong reservations to allow integration of any. Maybe we shouldn't integrate it at all and leave it to a side project? OTOH, making sure internal links are fine is an issue. Would it be possible to do internal consistency checking already at the adoc level? |
|
Following up on my own comment, with a correction. I built a source-level checker for the adoc-level idea and compared it against the lychee run in this PR's CI. The comparison is worth reporting honestly. What does work at the adoc level: the Where I was wrong: I first reported 9 broken manpage links in What the source check cannot do: lychee found 29 real issues that a source-level English check misses entirely. They are translated pages (de, es, fr, nb, ru, uk, zh_CN) linking to So I am walking back my earlier framing. A source-level Updated scripts, with the false-positive fix: https://gist.github.com/grandixximo/2840d3ceca5d344e89f8b374c0e91189 |
But aren't those links generated in the processing? And, why are the |
Same with cppckeck and shellcheck. It needs fixing the real issues first and then identifying false positives and eliminate them by modifying the config. There are ways to do that but it takes time.
Agreed, a good reason to stick to the w3d checker which is available for debian based systems. For others, you would have to install it possibly manually if you want to do it locally. This should not be to complicated due to it is a single perl script, you just need the dependency's which probably are also available in other distros.
Yes, sounds like a bug, you might miss it with an adoc check, so a final check on html level would make sense anyway. Based on the discussion so far, I would go the following way for the HTML check: Suggestion: The way it is checked now seams to be a bit over complicated. Every file is checked separately. By using I would do it similar than cppckeck.sh: scripts/htmlcheck.sh that is run in a separate CI step and works also locally assuming w3c is installed. If you are OK with this, I can pivot this PR to that direction. Alternative: Line 703 in 3917a72 This looks like a bug, I see the target for other languages but it is not run for an unknown reason: Line 714 in 3917a72 Allow to build doc's when w3c is not available and only show a warning at the end instead of not allowing to build: Line 1219 in 3917a72 |
|
@BsAtHome @hdiethelm a few points, with data from a tree-wide check I ran (the source adoc plus the po4a-translated adoc trees, cross-checked against the rendered HTML). On detectability: the dead So I agree with @hdiethelm: keep an HTML check as the backstop for the post-render and build-logic cases an adoc check structurally misses, and a separate The finding that matters for the design: after #4192 the English docs are internally clean, but the translations hold 81 real broken internal references, and almost all are dead For completeness, a dependency-free source check over the same adoc trees catches ~80 of the 81 at ~99% precision (validated against the rendered HTML). I see it as the fast local complement that points at the source line, not a replacement for the HTML backstop. |
|
@hdiethelm on the "not run for an unknown reason" part: the per-language rules are fine, the trigger is the gap. The One-line fix: make |
|
So, next commit should do what my suggestion is. It is way less complex than checking every file. I removed also checklinks.py. This was an other link check it seams but it it was removed from the Submakefile some time ago. I did not find the commit removing it but it was still in here: fb9c354 |
f4862dc to
b3ae161
Compare
Thanks. With my last commit, this is now just gone, no need to fix it. Let's see for the feedback. It is probably more reliable this way due to it checks all links. I could also integrate external link checks. These are not checked due to I could add an argument for htmlcheck.sh to enable it. |
|
I don't think the linuxcnc server will run any of the checks, the docs are built on GitHub CI and the server just serves them, I think the CI should be a good place to do external link checking since it is already our connectivity bottleneck... |
|
I added -e, so everybody can check external links. This external link check: for example blocks quite a while. |
|
Yeah, the forum is down a lot, at least for that we shouldn't check too many times if at all, might have to be a manual exception 😔 |
Sometimes, with are just two of many external links taking forever. That's why I think external link checking is something you should do from time to time manually but not in CI. Still running after 20min, let's see how long it takes... Hmm, all translations are based on English right? So to not check every external link in all languages, we could just run it with external on en/index.html, right? Probably this still takes some time but not as bad as for all languages. |
|
Automatic external link checks should never be done in CI. There are no guarantees it will finish within a reasonable time and there are no guarantees that it will report real problems. It would also fail (a lot) on transient errors and that is inherently bad in CI. |
|
Agreed. Runtime is long and it might give transient errors if a page is down for some time. Both is bad. I was wrong. Searching trough the huge content of the verbose output, I just found one: afterwards: So in recursive mode, each link is checked only once. However, it is reported for each file where it is wrong, this confused me a bit. Total runtime of three tests is averaged: ~20min |
|
So finally, I have a concept that looks good:
Are you OK with it? If yes, I squash it into a single commit and call it ready. |
|
I went already forward, rebased to master and squashed. Ready to merge from my side. Of course only if you have no reservations. The old commits are still available in my repo. Tested: Mostly no change in docs/build before / after this PR. There are of course all the dates changed. And there is a new issue. It looks like there is a race condition somewhere. I can not explain otherwise how the man page of I think this issue is not due changes in this PR. Master build 1 vs PR build diff -r build_master/man/de/man3/hm2_uart_read.3 build_pr/man/de/man3/hm2_uart_read.3
2c2
< .\" Title: hm2_uart_read
---
> .\" Title: hm2_pktuart_read
38c38,39
< int hm2_uart_read(char *name, unsigned char *data);
---
> int hm2_pktuart_read(char* name, unsigned char data[], rtapi_u8* num_frames,
> rtapi_u16* max_frame_length, rtapi_u16 frame_sizes[]);
44c45
< \fBhm2_uart_read\fP liest Daten vom UART "name". "name" ist eine eindeutige Zeichenkette, die jedem UART während des hostmot2\-Setups zugewiesen wird. Die Namen der verfügbaren Kanäle werden während des Ladevorgangs des Treibers auf die Standardausgabe ausgegeben und haben die Form: hm2\fI_<Kartenname>\fP.\fI<Kartenindex>\fP.uart.\fI<index>\fP Zum Beispiel \f(CRhm2_5i23.0.uart.0\fP.
---
> \fBhm2_pktuart_read\fP liest Daten vom PktUART "name".
46c47
< Diese Funktion liest eine variable Anzahl von Bytes aus dem angegebenen Kanal. Sie sollte innerhalb einer Echtzeit\-HAL\-Komponente verwendet werden, die mit dem Haupt\-hostmot2\-Treiber unter Verwendung der Funktion \fBhm2_uart_set_read_function\fP im Setup\-Code registriert ist.
---
> VERALTET, außer im Setup\-Code.
48,56c49
< Beachten Sie, dass der UART\-Empfangs\-FIFO nur 16 Byte tief ist (der Sende\-FIFO ist 64 Byte tief) und dass die "Daten" mindestens so groß sein müssen, da sonst ein undefiniertes Chaos entsteht.
< .SH "RETURN VALUE"
< .sp
< Gibt im Erfolgsfall die Anzahl der gelesenen Bytes zurück, im Fehlerfall \-1.
< .SH "SIEHE AUCH"
< .sp
< hm2_uart_setup(3), hm2_uart_send(3)
< .sp
< Siehe src/hal/drivers mesa_uart.comp für ein Anwendungsbeispiel.
\ Kein Zeilenumbruch am Dateiende.
---
> Bitte beachten Sie das kombinierte Dokument hm2_pktuart.3
\ Kein Zeilenumbruch am Dateiende.Master build 1 vs build 2 diff -r build_master/man/de/man3/hm2_bspi_set_write_function.3 build/man/de/man3/hm2_bspi_set_write_function.3
2c2
< .\" Title: hm2_bspi_set_write_function
---
> .\" Title: hm2_bspi_set_read_function
38c38
< int hm2_bspi_set_write_function(char *name, void *func, void *subdata)
---
> int hm2_bspi_set_read_function(char *name, void *func, void *subdata)
44c44
< \fBhm2_bspi_set_write_function\fP registriert eine Funktion in einem externen Treiber, die jedes Mal aufgerufen wird, wenn der Hostmot2\-Haupttreiber die generischen "prepare_tram_write"\-Funktionen aufruft. Die Namen der verfügbaren Kanäle werden mit rtapi_print_msg während des Ladevorgangs des Treibers ausgegeben und haben die Form:
---
> \fBhm2_bspi_set_read_function\fP registriert eine Funktion in einem externen Treiber, die jedes Mal aufgerufen wird, wenn der Hostmot2\-Haupttreiber die generische Funktion "process_tram_read" aufruft. Die Namen der verfügbaren Kanäle werden mit rtapi_print_msg während des Ladevorgangs des Treibers ausgegeben und haben die Form:
46c46,52
< \fBhm2_\fP\fI<board name>\fP\fB.\fP\fI<board index>\fP\fB.bspi\fP.\fI<index>\fP
---
> .if n .RS 4
> .nf
> .fam C
> hm2_<board name>.<board index>.bspi.<index>
> .fam
> .fi
> .if n .RE
50,52c56
< "func" sollte ein Zeiger auf eine Funktion im Subtreiber sein, die aufgerufen werden soll, um die Pins vor der regulären TRAM\-Schreibphase in BSPI\-Schreibregistern zu verarbeiten. Die Funktion muss ein einziges Argument haben, einen Zeiger auf eine individuelle Instanz des internen Treibers. Wenn die Funktion in comp definiert ist, darf sie \fBnicht\fP das Komfortmakro FUNCTION() verwenden, und das Argument für die Funktion in der Definition muss \fBimmer\fP (struct state *inst) sein.
< .sp
< "subdata" ist ein Zeiger auf die internen Daten der Treiberinstanz. Im Falle eines in comp geschriebenen Treibers wird dies im Funktionsaufruf immer "inst" sein.
---
> "func" sollte ein Zeiger auf eine Funktion im Subtreiber sein, die aufgerufen werden soll, um die Ergebnisse der BSPI\-TRAM\-Lesephase zu verarbeiten. Die Funktion muss ein einziges Argument haben, einen Zeiger auf eine einzelne Instanz des internen Treibers. Wenn sie in comp definiert ist, darf die Funktion das Komfortmakro FUNCTION() \fBnicht\fP verwenden, und das Argument für die Funktion in der Definition muss \fBimmer\fP (struct state *inst) sein.
54c58
< Wenn Sie comp verwenden, sollte der Aufruf dieser Funktion irgendwo im EXTRA_SETUP\-Code erfolgen.
---
> "subdata" ist ein Zeiger auf die internen Daten der Treiberinstanz. Im Falle eines in comp geschriebenen Treibers wird dies immer "inst" im Funktionsaufruf sein, und der Aufruf sollte irgendwo im EXTRA_SETUP\-Code erfolgen.
63c67
< hm2_allocate_bspi_tram(3), hm2_bspi_set_read_function(3), hm2_bspi_setup_chan(3), hm2_bspi_write_chan(3), hm2_tram_add_bspi_frame(3), src/hal/drivers mesa_7i65.comp in der LinuxCNC Quellcode distribution.
\ Kein Zeilenumbruch am Dateiende.
---
> hm2_allocate_bspi_tram(3), hm2_bspi_setup_chan(3), hm2_bspi_set_write_function(3), hm2_bspi_write_chan(3), hm2_tram_add_bspi_frame(3), src/hal/drivers mesa_7i65.comp in der LinuxCNC Quellcode Distribution.
\ Kein Zeilenumbruch am Dateiende.
diff -r build_master/man/de/man3/rtapi_stdint.3 build/man/de/man3/rtapi_stdint.3
2c2
< .\" Title: rtapi_string
---
> .\" Title: rtapi_stdint
37,40c37,51
< #include <rtapi_string.h>
< char** rtapi_argv_split(rtapi_gfp_t g, const char* argstr, int* argc);
< void rtapi_argv_free(char** argv);
< char* rtapi_kstrdup(const char* s, rtapi_gfp_t t);
---
> #include <rtapi_stdint.h>
>
> typedef ... rtapi_s8;
> typedef ... rtapi_s16;
> typedef ... rtapi_s32;
> typedef ... rtapi_s64;
> typedef ... rtapi_intptr_t;
> typedef ... rtapi_u8;
> typedef ... rtapi_u16;
> typedef ... rtapi_u32;
> typedef ... rtapi_u64;
> typedef ... rtapi_uintptr_t;
> #define RTAPI_INT__xx___MIN ...
> #define RTAPI_INT__xx___MAX ...
> #define RTAPI_UINT__xx___MAX ...
46c57
< Im Kernelbereich wird jeder rtapi_xxx oder RTAPI\fI_XXX\fP Bezeichner der zugrunde liegenden Kernelfunktionalität zugeordnet, sofern verfügbar.
---
> Im Kernelbereich wird jeder rtapi\fI_xxx\fP oder RTAPI\fI_XXX\fP Bezeichner der zugrunde liegenden Kernelfunktionalität zugeordnet, sofern verfügbar.
51c62
< Aufruf nur aus dem Init/Cleanup\-Code, nicht aus Echtzeit\-Tasks. Diese Funktion ist in Nicht\-Echtzeit\-Code nicht verfügbar.
---
> Keinen. |
|
@hdiethelm you are right that this is not from your PR; it reproduces on master. Root cause: asciidoctor's manpage backend names its output file from the page's So Fix is up as #4193, verified on a full translated build. It is independent of #4175, which addresses the same defect on the |
|
@grandixximo Thanks for investigating and correcting. But this translation is also wrong. The function name should stay across all languages. |
|
Agreed the translation has to be fixed, but if a tool is renamed, or a translator messes up a tool name, the build should warn and not break IMO |
|
@grandixximo |
320f74d to
fade80c
Compare
|
Went shortly back to draft to add a CI toy. There is now a -w flag that generates a CI annotation (CI only) but returns 0. |
scripts/htmlcheck.sh does the link checking. In CI, this is now a separate step. Remove link check from makefile and all delete now unused scripts. docs/src/checklinks.py was not used since some time, removed.


Check HTML links in docs in CI and remove halve-broken link checking from makefile.
There are a few broken links, so continue-on-error is set to true for now.
The concept is the following:
scripts/htmlcheck.shchecksdocs/build/html/index.htmlrecursively with w3c-linkcheckerscripts/htmlcheck.sh -eAfter a few discussions, we settled to use w3c-linkchecker. Variants tested: