Optimize CI for wolfProvider#400
Open
aidangarske wants to merge 38 commits into
Open
Conversation
aidangarske
added a commit
to aidangarske/wolfProvider
that referenced
this pull request
May 25, 2026
…ew fix) Was: every workflow pulled ghcr.io/wolfssl/wolfprovider-test-deps:bookworm, which doesn't exist until upstream master runs the publish workflow. Bootstrap chicken-and-egg. Now: publish-test-deps-image.yml fires on any branch push (and PRs) and pushes to ghcr.io/<repo-owner>/wolfprovider-test-deps:bookworm. Consumer workflows read from the PR head's owner when on a PR, else the running repo's owner. Result: a fork PR publishes to the fork's ghcr namespace and pulls from it; master pushes publish to the org's ghcr namespace and pulls from it. Also fixes copilot review feedback from wolfSSL#400 (review) - Phase B log filename renames broke check-workflow-result.sh's hardcoded log paths (curl-test.log, openvpn-test.log, sssd-test.log, net-snmp-test.log, nginx-test.log, openssh-test.log, tcpdump-test.log, liboauth2-test.log, stunnel-test.log) plus in-step greps in cjose, libcryptsetup, libfido2, libhashkit2, libtss2, opensc, python3-ntp, qt5network5, tnftp, tpm2-tools. Reverted log names back to <app>-test.log; second mode overwrites first. - libtss2.yml: fix `if $(grep -q ...)` (invalid shell -- command substitution of grep used as the if condition expanded to an empty command). Use `if grep -q ...; then`. - opensc.yml: fix `TEST_RESULT=$(((grep ...) && echo 0 || echo 1))` (arithmetic expansion `(( ))` can't contain shell commands). Hoist to a check_opensc_log() function called from both modes. - stunnel.yml: `grep -c "failed: 0"` returns 1 on success, but check-workflow-result.sh expects TEST_RESULT==0 for pass. Use `if grep -q ...; then TEST_RESULT=0; else TEST_RESULT=1; fi`. Also mirror tests/logs/results.log to stunnel-test.log so the force-fail check finds the expected file. - hostap.yml: drop continue-on-error from the normal-mode test step. Without it the step's exit code was swallowed and normal-mode test failures didn't fail the job. One-time setup: after this lands, the owner of each fork that opens a PR has to make their ghcr.io/<owner>/wolfprovider-test-deps package public (GitHub UI: Packages -> Package settings -> Change visibility). GitHub's Actions runners can only pull public packages from another namespace.
aidangarske
added a commit
to aidangarske/wolfProvider
that referenced
this pull request
May 25, 2026
…vate)
Earlier commits tried to make fork CI work by:
- having publish-test-deps-image.yml push to a per-owner ghcr namespace
(ghcr.io/<owner>/wolfprovider-test-deps)
- having consumer workflows pull from the PR head's owner
- auto-PATCHing the test-deps package to visibility=public
- dropping the `github.repository == 'wolfSSL/wolfProvider'` guard on
the wolfprov-debs ORAS pull in build-wolfprovider.yml
That path only works if the packages can be public, which they can't
(some of the .debs contain commercially-licensed bits). Revert to the
canonical-only behavior:
publish-test-deps-image.yml
- fires only on push to master/main (was '**')
- guards the publish on github.repository == 'wolfSSL/wolfProvider'
- drops the per-owner namespace; always pushes to
ghcr.io/wolfssl/wolfprovider-test-deps
- removes the Mark-package-public step
build-wolfprovider.yml
- restores the github.repository == 'wolfSSL/wolfProvider' guard on
the Login, Download .debs, and Download WIC steps
39 consumer workflows
- container.image reverted from the per-owner expression back to the
literal ghcr.io/wolfssl/wolfprovider-test-deps:bookworm
Practical effect: PR CI and nightly only run on the canonical repo
(or once PR wolfSSL#400 merges, on wolfSSL/wolfProvider's runners). Fork
pushes will skip the wolfprov-deb pull and any container-using job
will fail loud at the image pull -- which is the right signal: those
runs need to happen on the canonical repo.
aidangarske
added a commit
to aidangarske/wolfProvider
that referenced
this pull request
May 25, 2026
…idation) Add pull_request trigger to nightly-osp.yml so PR wolfSSL#400's reviewers can see the dispatcher actually fan all 41 reusable workflows out and the notify job hit Slack. Marked temporary in the file header -- revert this trigger before merging if you don't want the full nightly job set firing on every PR. (For everyday CI, scheduled + workflow_dispatch is the intended shape.) Note: PR runs from forks will still hit the private-package issue for the wolfprov-debs pull (the wolfSSL/wolfProvider repo guard short-circuits the ORAS step on non-canonical repos). The plumbing itself -- dispatch, discover-versions, notify, Slack -- runs regardless and is what this PR-trigger lets you verify end-to-end.
aidangarske
added a commit
to aidangarske/wolfProvider
that referenced
this pull request
May 25, 2026
Adds aidangarske/wolfProvider to the publish workflow's repository allowlist so PR wolfSSL#400's working branch can bootstrap a test-deps image on the fork's ghcr namespace. Pushed image lands at ghcr.io/aidangarske/wolfprovider-test-deps:bookworm. Also adds 'ci-draft-pause' to the branches list (alongside master/ main) so a push to that branch triggers the workflow without needing a separate workflow_dispatch. Consumer workflows continue to pull from ghcr.io/wolfssl/... so this fork-side push is purely for the fork owner to verify the build/push pipeline works end to end before PR merges. After merge, the canonical wolfSSL/wolfProvider master push will publish the authoritative image and consumers will find it. Note: the 'ci-draft-pause' branch entry is TEMPORARY for PR wolfSSL#400. Drop it (and remove aidangarske from the allowlist if desired) once the PR merges.
dgarske
pushed a commit
that referenced
this pull request
May 26, 2026
) Bootstrap PR: introduces the test-deps container image that PR #400's nightly OSP workflows consume. This is a minimal subset of PR #400 intended to merge first, so the publish workflow fires once on master and the test-deps image lands at ghcr.io/wolfssl/wolfprovider-test-deps :bookworm before the rest of PR #400 merges. Without this, PR #400's OSP container jobs all fail with "manifest unknown" because the image they pull doesn't exist anywhere yet. Two files only: docker/wolfprovider-test-deps/Dockerfile Single Debian-bookworm image with every apt dep that the OSP integration tests used to install at job time. One apt-get update at build time, zero at job time -- eliminates Debian mirror flake. .github/workflows/publish-test-deps-image.yml Builds the Dockerfile and pushes to ghcr.io/wolfssl/wolfprovider-test-deps:bookworm on push to master/main (path-filtered to docker/wolfprovider-test-deps/**) or workflow_dispatch. Guarded with github.repository == 'wolfSSL/wolfProvider' so forks don't try to push to wolfSSL's namespace. The OSP workflows themselves, the discover-versions resolver, the ASan/UBSan workflow, and all the matrix/force-fail consolidation land via PR #400 once this is in place.
dgarske
added a commit
that referenced
this pull request
May 26, 2026
ci: bootstrap test-deps Docker image (prep for PR #400)
aidangarske
added a commit
to aidangarske/wolfProvider
that referenced
this pull request
May 26, 2026
PR wolfSSL#402 published ghcr.io/wolfssl/wolfprovider-test-deps:bookworm. This empty commit bumps the head SHA so PR wolfSSL#400's checks rerun against the now-existing image.
5ce6df6 to
91f2549
Compare
82d537b to
e5226fb
Compare
Adds optional 'wolfssl_refs_json' input (JSON array string) to every PR-time and nightly OSP workflow. When set, it overrides the matrix's wolfssl_ref dimension; when empty (default), behavior is unchanged (matrix uses discover_versions output). Rewrites nightly-osp.yml as two sequential waves: Wave 1: every OSP project pinned to v5.9.1-stable wave1-done: fan-in job with 'if: always()' Wave 2: every OSP project pinned to v5.8.4-stable, needs wave1-done Wave 2 only starts after every Wave 1 job has finished. 'if: always()' on Wave 2 jobs means a single Wave 1 flake doesn't skip the 5.8.4 coverage. The Slack notify job lists both wave 1 and wave 2 job results. static-analysis and multi-compiler stay outside the waves (static-analysis isn't wolfssl-version sensitive; multi-compiler iterates its own wolfssl matrix and now includes representative v5.8.4-stable rows for gcc-12 and clang-14).
Wave 1 no longer hardcodes wolfssl_refs_json. Each child workflow's default matrix already picks up latest discovered wolfssl stable from _discover-versions.yml, so Wave 1 auto-tracks 5.9.1 today and 5.9.2 (or whatever) tomorrow with no edits here. Wave 2 stays pinned to v5.8.4-stable because that's the explicit back-compat line.
Brings the v5.8.4 backwards-compat plan execution into PR wolfSSL#400: - scripts/resolve-osp-patch.sh for wolfssl-version-aware patch lookup - 26 OSP workflows routed through the helper - wolfssl_refs_json input on all 42 nightly OSP workflows - nightly-osp.yml split into Wave 1 (dynamic latest stable) and Wave 2 (v5.8.4-stable pinned) with wave1-done fan-in - nightly-multi-compiler.yml gains representative v5.8.4-stable rows Depends on wolfssl/osp PR wolfSSL#340 + follow-up commit that adds -wolfssl-5.8.4- snapshot patches for libssh2, krb5, stunnel.
OSP PR wolfSSL#340 review removed the duplicate stunnel-WPFF-5.67-wolfprov-fips.patch (it was identical to the non-FIPS patch). stunnel.yml no longer passes --fips so the resolver picks the single stunnel-WPFF-5.67-wolfprov.patch for both FIPS and non-FIPS rows.
…xists Every workflow (stunnel included) passes --fips uniformly; the resolver decides. If a project ships no FIPS-specific patch it uses the common non-FIPS one, and adding a FIPS patch in OSP later is picked up automatically with no workflow change.
OSP patch names are now one convention (<project>-<projver>-wolfprov
[-fips].patch), so the resolver drops the -FIPS- infix handling and the
opensc -wolfprovider special case. FIPS resolution is bidirectional:
--fips prefers -wolfprov-fips.patch then falls back to -wolfprov.patch,
and non-FIPS prefers -wolfprov.patch then -wolfprov-fips.patch, so a
project that ships only one variant works for both modes.
Every OSP workflow that has a fips_ref matrix dimension now passes the
same '${{ matrix.fips_ref == 'FIPS' && '--fips' || '' }}' to the
resolver - no more mix of hardcoded --fips (grpc, python3-ntp, tcpdump)
and silently-omitted --fips (liboauth2 had a FIPS patch it never used,
plus curl, libnice, opensc, openvpn, openldap, libssh2, libcryptsetup,
qt5network5, socat, x11vnc, openssh). libtss2 and sssd are unchanged -
they have no FIPS dimension.
Previously always pulled the rolling :fips/:nonfips tag regardless of the wolfssl_ref matrix value, so every job tested the same deb. Now a vX.Y.Z-stable ref pulls debs:fips-<ref>/nonfips-<ref> (the version-pinned tag debian-export publishes), so nightly Wave 1 (latest) and Wave 2 (v5.8.4-stable) actually exercise different wolfSSL versions. Non-stable refs (master) fall back to rolling.
…P patches Temporary test scaffolding so the nightly OSP CI run can exercise the renamed/snapshot patches from osp PR wolfSSL#340 before they merge to osp master. REVERT before merging PR wolfSSL#400 - the OSP checkout must go back to wolfssl/osp (master) once osp wolfSSL#340 lands. All 26 OSP workflow checkouts repointed from wolfssl/osp to aidangarske/osp ref 5.9.1-wolfprov-patches.
Run the full Wave 1 (5.9.1) + Wave 2 (5.8.4) OSP suite in CI on push to the test branch instead of waiting for the 6 AM schedule. Only meaningful on canonical wolfSSL/wolfProvider where the private deb pull works. REVERT before merge - keep only schedule + workflow_dispatch.
GitHub Actions expressions require single quotes for string literals; the matrix override used double quotes (!= ""), which fails workflow validation - nightly-osp startup_failed with 0 jobs. Replace != "" with != '' across all 40 OSP workflow matrix lines.
Retry-outcome classification: rerun every non-passing job once; cleared = flake, failed-twice = real. Claude validates the survivors and writes the short root-cause notes; the script renders one clean Slack health report. TEMP push trigger reports against finished run 26594154670 for testing.
…ing suite TEMP paths-ignore so report-file-only pushes don't kick the nightly suite.
Drop header emoji and the 'AI:' note prefix. Exclude infra-setup flakes from the AI input so the headline can't conflate them with real failures. Color by pass-rate (mostly-green run with a few real failures is a warning, not red). Tighten the prompt for specific per-job notes grounded only in that job's log.
Move triage script to .github/scripts/ (one workflow yml). Reproducing test failures are real with a P0-P3 severity, never demoted to flake (flake = infra only). Reconcile the counts, add per-suite tiers, failing-job log links, and AI symptom/cause/next lines.
Named severity (Critical/High/Medium/Low) with a meter tally. Restore the colored breakdown line, reconciled by jobs. Add a pass-rate sparkline vs prior nightlies. Recovered-on-retry jobs count as passed, never as flakes.
Its matrix reads needs.discover_versions.outputs.* but the job never declared the dependency, so the matrix expanded empty — no build jobs, test_xmlsec skipped, workflow failed on both wolfSSL versions. Matches every other OSP workflow.
…l it As a reusable workflow its concurrency group keyed off the caller (github.workflow = 'Nightly OSP Suite'), so a new nightly run cancelled the prior run's static analysis even though nightly-osp uses cancel-in-progress: false. Add github.run_id to the group.
Remove all temporary testing scaffolding so everything points at the canonical repos and runs on the nightly schedule: - OSP checkout back to wolfssl/osp (drop the fork-branch override) - nightly-osp: schedule + workflow_dispatch only (drop the ci-draft-pause push trigger and paths-ignore) - nightly-osp: drop the old flat Slack notify; the new osp-report (on: workflow_run) handles notification + auto-retry - osp-report: workflow_run + workflow_dispatch only (drop push trigger and the hardcoded fixture run id)
aidangarske
added a commit
to aidangarske/wolfProvider
that referenced
this pull request
May 29, 2026
…or testing pull_request_target so the ghcr push has canonical scope; OWNER/MEMBER gate + PR-head checkout, mirroring publish-test-deps-image. Lets PR wolfSSL#400 populate the dep-cache so the --build-ci consumers can be validated end-to-end.
c10acca to
0576058
Compare
Empty commit to re-fire PR wolfSSL#400 CI so we can observe: - PRB launches immediately on this push, in parallel with smoke test on GitHub Actions (preflight no longer polls smoke). - On the Jenkins agents that pick up the matrix, the test-wp-cs step runs via prb-cached-test.sh: first time builds + caches, later times pull from $HOME/.cache/wolfprov-prb-deps/.
Previous PRB run failed because the Jenkinsfile referenced
${WORKSPACE}/jenkins-scripts/stable/PRB/prb-cached-test.sh, but the
setup stage's cleanWs() wipes the testing-repo checkout before
stashing only wolfProvider/, so the helper wasn't on the parallel
agents. testing PR #958 b89d0b49 inlines the cache logic directly
into the test-wp-cs sh block.
- _discover-versions.yml: extends wolfssl_latest_ref_array, so Simple, Cmdline, Sanitizers and SEED-SRC pick it up automatically. - smoke-test.yml: third smoke build (5.8.4 + openssl-latest). - multi-compiler.yml: gcc-12 + v5.8.4-stable + master openssl, alongside the existing v5.8.0-stable entry.
Member
Author
|
Jenkins retest this please |
HIGH-1 sssd.yml: nightly-osp now calls sssd-591 and sssd-584 with wolfssl_refs_json, but sssd.yml had a hardcoded [master, v5.8.0-stable] matrix and silently ignored the input. Wire matrix.wolfssl_ref to the input when set, falling back to the SSSD-compatible defaults. openssl_ref stays pinned to openssl-3.5.0 on purpose (SSSD 2.9.1 only tracks that line). MEDIUM-3 check-workflow-result.sh curl branch: a non-zero exit alone was treated as the expected force-fail outcome, so a build/network/infra failure that never ran curl would silently pass. Require curl-test.log to exist and contain a TESTFAIL line as evidence curl actually ran. Still no exact-test-list pinning (those drift across curl versions). MEDIUM-8 sanitizers.yml: workflow inherited org-default GITHUB_TOKEN permissions while executing PR-controlled build code. Pin to contents: read + packages: read. MEDIUM-9 _discover-versions.yml: ORAS tarball was downloaded and extracted without verification. Add the same SHA-256 check build-wolfprovider.yml already uses for the same release. Deferred: - MEDIUM-2 artifact-name race in build-wolfprovider.yml: real but needs a restructure (top-level shared build vs per-caller artifact names); separate PR. - MEDIUM-4/5/6/7: test-coverage suggestions for resolver, wait-for-smoke, curl branch, osp-triage; tracked as follow-up.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
related PR's need to go in first in this order then this one
slack notifications system using claude and simple expected regex to retry flacky jobs
