feat(huggingFace): refactor operator into per-task codegen + text-generation by PG1204 · Pull Request #5278 · apache/texera

PG1204 · 2026-05-28T18:58:54Z

What changes were proposed in this PR?

Refactors the monolithic 1,278-line HuggingFaceInferenceOpDesc from the team's feature branch into a dispatcher + per-task codegen architecture and ships the first task family (text-generation):

codegen/TaskCodegen.scala introduces the trait + CodegenContext that model per-task variation.
codegen/PythonCodegenBase.scala emits the shared provider-fallback / process_table / _parse_response infrastructure with two holes for the per-task payload and parse snippets.
codegen/TextGenCodegen.scala supplies text-generation's chat-completions payload and the body["choices"][0 ["message"]["content"] parse branch.
HuggingFaceInferenceOpDesc.scala becomes a thin (~180-line) dispatcher holding the @JsonProperty fields and the registeredCodegens map.

User-input string fields are typed EncodableString and emitted via the pyb"..." macro so values reach Python as self.decode_python_template('<base64>') rather than raw literals. Class constants are assigned in open(self) so self is in scope for the decode call. The generated process_table runs a defensive _HF_MODEL_ID_PATTERN check at runtime before any HF URL is composed.

The TaskCodegen trait also exposes a tasks: Set[String] default so a single codegen can register under multiple task strings, this becomes relevant in PR 3 (image family).

Any related issues, documentation, or discussions?

Tracked in #5277 & #5041(umbrella issue for the HuggingFace operator end-to-end implementation).

Closes #5277

Stacked on #5124 (PR 1 - REST resource).

This is PR 2 of a multi-PR series landing the HuggingFace operator end-to-end. The full plan and umbrella issue live separately; this PR's scope is exactly the dispatcher pattern + text-generation codegen.

How was this PR tested?

sbt "WorkflowOperator/compile; WorkflowOperator/Test/compile" clean.
sbt scalafmtCheck clean.
sbt "WorkflowOperator/testOnly org.apache.texera.amber.operator.huggingFace.HuggingFaceInferenceOpDescSpec" - 10/10 pass (operator info, validation, codegen wiring, MODEL_ID runtime check, leak-prevention, clamping, schema).
sbt "WorkflowOperator/testOnly org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec" - 117/117 descriptors py_compile cleanly, no raw-text leaks. The new operator is included in this scan.
Generated Python verified via python3 -m py_compile on a sample output.

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.7

…d media proxy Introduces a new Jersey REST resource exposing endpoints used by the upcoming HuggingFace operator UI: - GET /api/huggingface/models — browse / search models per task - GET /api/huggingface/tasks — list HF pipeline tags with hosted inference - POST /api/huggingface/upload-audio — upload audio for HF audio tasks - GET /api/huggingface/audio-preview — stream uploaded audio (path-validated) - GET /api/huggingface/media-proxy — proxy remote media URLs to bypass CORS This is the first PR in a stacked series landing the HF operator end-to-end. No operator code yet; this resource is independently useful and lets the frontend integrate with HF before the operator class lands.

Addresses xuang7's review on PR apache#5124 — both endpoints previously buffered the full payload into a heap-resident byte[] with no upper bound, leaving the JVM open to OOM on a hostile or buggy upstream response (/media-proxy) or out-of-band write into the audio temp dir (/audio-preview). - /media-proxy: switch from Unirest.asBytes() to asObject(Function<RawResponse, T>), streaming the upstream body in 8 KiB chunks with a running byte counter. Aborts with 413 if the declared Content-Length exceeds the cap (pre-check) or if the body crosses the cap mid-read (defends against missing/lying Content-Length). New MAX_MEDIA_PROXY_BYTES = 50 MiB, sized for HF inference media (text-to-image ~5 MiB, text-to-video ~30 MiB) with headroom. - /audio-preview: add Files.size() defense-in-depth check before readAllBytes. /upload-audio already enforces MAX_AUDIO_BYTES on ingest; this catches the case where a bug or out-of-band write puts an oversized file in the temp dir. Adds a spec covering the audio-preview cap using a sparse-file fixture so the test stays fast (87/87 spec passes). The media-proxy cap path is exercised via the existing input-validation suite plus the new streamMediaWithCap helper - a follow-up can add a fake-RawResponse unit test if reviewers want explicit coverage of the chunked-read cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov-commenter · 2026-05-28T19:01:36Z

Codecov Report

❌ Patch coverage is 97.36842% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.00%. Comparing base (891d2ad) to head (7597f84).

Files with missing lines	Patch %	Lines
...rator/huggingFace/HuggingFaceInferenceOpDesc.scala	97.56%	0 Missing and 1 partial ⚠️
...ber/operator/huggingFace/codegen/TaskCodegen.scala	88.88%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #5278      +/-   ##
============================================
+ Coverage     52.95%   53.00%   +0.04%     
- Complexity     2627     2651      +24     
============================================
  Files          1090     1094       +4     
  Lines         42210    42284      +74     
  Branches       4534     4541       +7     
============================================
+ Hits          22353    22413      +60     
- Misses        18546    18558      +12     
- Partials       1311     1313       +2

Flag	Coverage Δ		*Carryforward flag
access-control-service	`70.91% <ø> (ø)`
agent-service	`34.36% <ø> (ø)`		Carriedforward from 94170ae
amber	`53.32% <97.36%> (+0.20%)`	⬆️
computing-unit-managing-service	`1.65% <ø> (ø)`
config-service	`56.71% <ø> (ø)`
file-service	`57.06% <ø> (ø)`
frontend	`47.86% <ø> (-0.07%)`	⬇️	Carriedforward from 94170ae
pyamber	`90.73% <ø> (+0.95%)`	⬆️	Carriedforward from 94170ae
python	`90.73% <ø> (ø)`		Carriedforward from 94170ae
workflow-compiling-service	`58.69% <ø> (ø)`

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

PG1204 · 2026-05-28T20:05:43Z

/request-review @Ma77Ball

@RolesAllowed

Per review on apache#5124 (xuang7, Ma77Ball): mark the resource with @RolesAllowed(Array("REGULAR", "ADMIN")) to document that all five endpoints require an authenticated user. The annotation isn't enforced yet — that's coming with the auth-enforcement PR @Yicong-Huang and @Ma77Ball are working on — but adding it now means no follow-up change is needed when enforcement lands, and it matches the convention used by UserConfigResource / AdminSettingsResource. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@JsonProperty

…eration Splits the monolithic 1,278-line HuggingFaceInferenceOpDesc from the team's feature branch into a dispatcher + per-task codegen architecture and ships the first task family (text-generation) end-to-end. - TaskCodegen trait + CodegenContext model the per-task variation - PythonCodegenBase emits the shared provider-fallback / process_table / _parse_response infrastructure with two holes for the per-task payload and parse snippets - TextGenCodegen supplies text-generation's chat-completions payload and the body["choices"][0]["message"]["content"] parse branch - HuggingFaceInferenceOpDesc becomes a thin dispatcher (~180 lines) holding @JsonProperty fields and the registeredCodegens map User-input string fields are typed as EncodableString and emitted via the pyb"..." macro so values reach Python as self.decode_python_template('<base64>') rather than raw literals; class constants are assigned in open(self) so self is in scope for the decode call. Generated process_table runs a defensive _HF_MODEL_ID_PATTERN check at runtime before any HF URL is composed. PR 2 of a stacked 9-PR series. PR 1 (apache#5124) ships the supporting REST resource; PRs 3-5 will add image, audio + media-gen, and QA/ranking task families by registering new *Codegen objects in the dispatcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@JsonProperty

…degen specs Addresses Codecov's 66.85% patch coverage warning by exercising the defensive null-handling branches in HuggingFaceInferenceOpDesc.scala and the TextGenCodegen contract that previously had no spec hits. - null-tolerance: feed null into every @JsonProperty (token, model, prompt col, system prompt, result col, task, maxNewTokens, temperature) and assert generatePythonCode still emits a parseable ProcessTableOperator with sane defaults (TASK falls back to text-generation, MAX_NEW_TOKENS clamps to 256, TEMPERATURE to 0.7). Covers the `if (x == null) ... else x` branches that previously had no test that took the null side. - TextGenCodegen.task: trivial canonical-value check. - TextGenCodegen ctx-independence: pass an "irrelevant"-filled ctx and assert payloadPython / parsePython still reference self.MODEL_ID and body["choices"]…. Catches a future refactor that accidentally splices ctx fields into the static snippets. 13/13 in HuggingFaceInferenceOpDescSpec, 2/2 in PythonCodeRawInvalidTextSpec (117/117 descriptors still py_compile cleanly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Ma77Ball

Please look over the suggestions below.

…NAI_COMPATIBLE_PROVIDERS to class constants

Yicong-Huang · 2026-06-07T17:27:13Z

Hi @PG1204 what is the status of this PR? not sure if it is ready for review, given the note about stacked PR, is the current diff accurate?

PG1204 · 2026-06-07T20:39:31Z

Hi @PG1204 what is the status of this PR? not sure if it is ready for review, given the note about stacked PR, is the current diff accurate?

@Yicong-Huang The comments given by @Ma77Ball have been resolved, awaiting further review.
Yep the current diff is accurate.

PG1204 · 2026-06-07T23:07:44Z

Hi @PG1204 what is the status of this PR? not sure if it is ready for review, given the note about stacked PR, is the current diff accurate?

@Yicong-Huang This is PR-2 in the stacked PRs for the HuggingFace operator. PR-1 was merged a while back.

Ma77Ball

Overall LGTM! I think the below can be implemented or left as is.

…duplication

github-actions · 2026-06-12T18:25:39Z

⚠️ Benchmark changes need a look

🟢 4 better · 🔴 3 worse · ⚪ 8 noise (<±5%) · 0 without baseline

Compared against main 891d2ad benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

	config	throughput	MB/s	latency	max Δ latest / 7d
🟢	bs=10 sw=10 sl=64	449	0.274	21,682/28,635/28,635 us	🟢 -14.0% / 🟢 -18.1%
🔴	bs=100 sw=10 sl=64	942	0.575	104,749/159,763/159,763 us	🔴 +16.0% / 🔴 +14.3%
🟢	bs=1000 sw=10 sl=64	1,118	0.683	893,104/943,018/943,018 us	🟢 -6.4% / 🟢 -8.2%

Baseline details

Latest main 891d2ad from same runner

config	metric	PR	latest main	7d avg	Δ latest	Δ 7d
bs=10 sw=10 sl=64	throughput	449 tuples/sec	465 tuples/sec	410.82 tuples/sec	-3.4%	+9.3%
bs=10 sw=10 sl=64	MB/s	0.274 MB/s	0.284 MB/s	0.251 MB/s	-3.5%	+9.3%
bs=10 sw=10 sl=64	p50	21,682 us	21,250 us	23,785 us	+2.0%	-8.8%
bs=10 sw=10 sl=64	p95	28,635 us	33,299 us	34,980 us	-14.0%	-18.1%
bs=10 sw=10 sl=64	p99	28,635 us	33,299 us	34,980 us	-14.0%	-18.1%
bs=100 sw=10 sl=64	throughput	942 tuples/sec	982 tuples/sec	891.94 tuples/sec	-4.1%	+5.6%
bs=100 sw=10 sl=64	MB/s	0.575 MB/s	0.599 MB/s	0.544 MB/s	-4.0%	+5.6%
bs=100 sw=10 sl=64	p50	104,749 us	97,654 us	112,277 us	+7.3%	-6.7%
bs=100 sw=10 sl=64	p95	159,763 us	137,693 us	139,802 us	+16.0%	+14.3%
bs=100 sw=10 sl=64	p99	159,763 us	137,693 us	139,802 us	+16.0%	+14.3%
bs=1000 sw=10 sl=64	throughput	1,118 tuples/sec	1,112 tuples/sec	1,041 tuples/sec	+0.5%	+7.4%
bs=1000 sw=10 sl=64	MB/s	0.683 MB/s	0.679 MB/s	0.635 MB/s	+0.6%	+7.5%
bs=1000 sw=10 sl=64	p50	893,104 us	894,598 us	972,714 us	-0.2%	-8.2%
bs=1000 sw=10 sl=64	p95	943,018 us	1,007,417 us	1,023,057 us	-6.4%	-7.8%
bs=1000 sw=10 sl=64	p99	943,018 us	1,007,417 us	1,023,057 us	-6.4%	-7.8%

Raw CSV

config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,445.39,200,128000,449,0.274,21682.12,28634.95,28634.95
1,100,10,64,20,2123.63,2000,1280000,942,0.575,104749.07,159762.78,159762.78
2,1000,10,64,20,17884.78,20000,12800000,1118,0.683,893103.94,943017.96,943017.96

xuang7

LGTM! I think this may be better categorized as a feature rather than a refactor.

Yicong-Huang · 2026-06-16T06:20:25Z

Hi @PG1204 before merging this Pr, I think it's better to remove the stack note at the beginning of the PR description, as it contains an obsolete information as the PRs that it based are merged

PG1204 · 2026-06-16T06:25:40Z

Hi @PG1204 before merging this Pr, I think it's better to remove the stack note at the beginning of the PR description, as it contains an obsolete information as the PRs that it based are merged

Hi @Yicong-Huang, the stack note has been removed. Thanks for the suggestion.

…#5320) ### What changes were proposed in this PR? Adds the image task family — 9 HF pipeline tasks — as the second `TaskCodegen` plugged into the dispatcher established by apache#5278: image-only: image-classification, object-detection, image-segmentation, image-to-text image + prompt: visual-question-answering, document-question-answering, zero-shot-image-classification, image-text-to-text, image-to-image - `codegen/ImageTaskCodegen.scala` supplies the per-task payload + parse Python branches for all 9 tasks. - `TaskCodegen` trait gains a `tasks: Set[String]` default method (defaults to `Set(task)`) so a single codegen can register under multiple task strings; `ImageTaskCodegen` is the first multi-task codegen to use it. - `CodegenContext` extended with `imageInput` + `inputImageColumn` (`EncodableString`). - `HuggingFaceInferenceOpDesc.scala` gains 2 new `@JsonProperty` fields and registers `ImageTaskCodegen` via the new `tasks` flat-map. `PythonCodegenBase.scala` grows to host the shared image infrastructure: - Task-family tuples (`image_only_tasks`, `image_prompt_tasks`, `image_tasks`) + `image_headers` in `process_table`. - Per-row image-bytes resolution from upload or column with `_read_image_input` / `_read_binary_value` / `_compress_image_bytes`. - `_post_with_fallback` extended with `raw_binary_headers` + `use_raw_binary_body`; adds image-text-to-text chat-completions and model-author vision branches. - `_call_provider` gains zai-org, Replicate predictions + polling, Fal-ai, Wavespeed submit+poll branches, and image embedding for OpenAI-compatible / unknown-provider fallbacks. - Image content-type response handling returns `data:image/...;base64,...` URLs. - Image helpers added: `_read_image_input`, `_compress_image_bytes`, `_image_input_as_base64`, `_read_binary_value`, `_looks_like_html`, `_html_to_image_bytes`, `_extract_json_arg`, `_url_to_data_url`. Frontend integration (HF lines only — no agent / dataset noise): `HuggingFaceImageUploadComponent` declared in `app.module.ts`, `huggingface-image-upload` formly type registered, image upload component .ts/.html/.scss + `HuggingFace.png` + `sample-image.png` assets. User-input strings continue to flow through `pyb"..."` + `EncodableString` so they reach Python as `self.decode_python_template('<base64>')` rather than raw literals. `PythonCodeRawInvalidTextSpec` still passes (117/117 descriptors `py_compile` cleanly). ### Any related issues, documentation, or discussions? - Tracking issue: apache#5319 - Closes: apache#5319 - Stacked on: apache#5278 (operator + text-generation — issue apache#5277) - Parent issue: apache#5041 - Closed sibling issue: apache#5134 (REST resource — landed via apache#5124) ### How was this PR tested? - `sbt "WorkflowOperator/compile; WorkflowOperator/Test/compile"` clean. - `sbt scalafmtCheck` clean. - `sbt "WorkflowOperator/testOnly org.apache.texera.amber.operator.huggingFace.HuggingFaceInferenceOpDescSpec"` — 18/18 pass (PR 2's 13 spec tests + 5 new image-task tests: image-only routing, VQA / document-QA payload, image-text-to-text chat-completions, image-to-image data-URL parse, all-9-tasks dispatcher coverage). - `sbt "WorkflowOperator/testOnly org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"` — 117/117 descriptors `py_compile` cleanly with the new operator code paths, no marker leaks. - Generated Python verified via `python3 -m py_compile` on sample image-task outputs. ### Was this PR authored or co-authored using generative AI tooling? Yes, co-authored with Claude Opus 4.7. --------- Signed-off-by: Prateek Ganigi <91584519+PG1204@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…eration (apache#5278) > ⚠️ This PR is stacked on apache#5124. Until that lands, the diff below includes apache#5124's `HuggingFaceModelResource.scala` and the 1-line registration in `TexeraWebApplication.scala`. The new code in this PR is everything under `common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/` and the new test under `common/workflow-operator/src/test/.../huggingFace/HuggingFaceInferenceOpDescSpec.scala`. Once apache#5124 merges, this diff will auto-clean to ~839 lines. ### What changes were proposed in this PR? Refactors the monolithic 1,278-line `HuggingFaceInferenceOpDesc` from the team's feature branch into a dispatcher + per-task codegen architecture and ships the first task family (text-generation): - `codegen/TaskCodegen.scala` introduces the trait + `CodegenContext` that model per-task variation. - `codegen/PythonCodegenBase.scala` emits the shared provider-fallback / `process_table` / `_parse_response` infrastructure with two holes for the per-task payload and parse snippets. - `codegen/TextGenCodegen.scala` supplies text-generation's chat-completions payload and the `body["choices"][0 ["message"]["content"]` parse branch. - `HuggingFaceInferenceOpDesc.scala` becomes a thin (~180-line) dispatcher holding the `@JsonProperty` fields and the `registeredCodegens` map. User-input string fields are typed `EncodableString` and emitted via the `pyb"..."` macro so values reach Python as `self.decode_python_template('<base64>')` rather than raw literals. Class constants are assigned in `open(self)` so `self` is in scope for the decode call. The generated `process_table` runs a defensive `_HF_MODEL_ID_PATTERN` check at runtime before any HF URL is composed. The `TaskCodegen` trait also exposes a `tasks: Set[String]` default so a single codegen can register under multiple task strings, this becomes relevant in PR 3 (image family). ### Any related issues, documentation, or discussions? Tracked in apache#5277 & apache#5041(umbrella issue for the HuggingFace operator end-to-end implementation). Closes apache#5277 Stacked on apache#5124 (PR 1 - REST resource). This is PR 2 of a multi-PR series landing the HuggingFace operator end-to-end. The full plan and umbrella issue live separately; this PR's scope is exactly the dispatcher pattern + text-generation codegen. ### How was this PR tested? - `sbt "WorkflowOperator/compile; WorkflowOperator/Test/compile"` clean. - `sbt scalafmtCheck` clean. - `sbt "WorkflowOperator/testOnly org.apache.texera.amber.operator.huggingFace.HuggingFaceInferenceOpDescSpec"` - 10/10 pass (operator info, validation, codegen wiring, MODEL_ID runtime check, leak-prevention, clamping, schema). - `sbt "WorkflowOperator/testOnly org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"` - 117/117 descriptors `py_compile` cleanly, no raw-text leaks. The new operator is included in this scan. - Generated Python verified via `python3 -m py_compile` on a sample output. ### Was this PR authored or co-authored using generative AI tooling? Co-authored with Claude Opus 4.7 --------- Co-authored-by: Elliot Lin <36275109+ELin2025@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Xuan Gu <162244362+xuang7@users.noreply.github.com>

…#5320) ### What changes were proposed in this PR? Adds the image task family — 9 HF pipeline tasks — as the second `TaskCodegen` plugged into the dispatcher established by apache#5278: image-only: image-classification, object-detection, image-segmentation, image-to-text image + prompt: visual-question-answering, document-question-answering, zero-shot-image-classification, image-text-to-text, image-to-image - `codegen/ImageTaskCodegen.scala` supplies the per-task payload + parse Python branches for all 9 tasks. - `TaskCodegen` trait gains a `tasks: Set[String]` default method (defaults to `Set(task)`) so a single codegen can register under multiple task strings; `ImageTaskCodegen` is the first multi-task codegen to use it. - `CodegenContext` extended with `imageInput` + `inputImageColumn` (`EncodableString`). - `HuggingFaceInferenceOpDesc.scala` gains 2 new `@JsonProperty` fields and registers `ImageTaskCodegen` via the new `tasks` flat-map. `PythonCodegenBase.scala` grows to host the shared image infrastructure: - Task-family tuples (`image_only_tasks`, `image_prompt_tasks`, `image_tasks`) + `image_headers` in `process_table`. - Per-row image-bytes resolution from upload or column with `_read_image_input` / `_read_binary_value` / `_compress_image_bytes`. - `_post_with_fallback` extended with `raw_binary_headers` + `use_raw_binary_body`; adds image-text-to-text chat-completions and model-author vision branches. - `_call_provider` gains zai-org, Replicate predictions + polling, Fal-ai, Wavespeed submit+poll branches, and image embedding for OpenAI-compatible / unknown-provider fallbacks. - Image content-type response handling returns `data:image/...;base64,...` URLs. - Image helpers added: `_read_image_input`, `_compress_image_bytes`, `_image_input_as_base64`, `_read_binary_value`, `_looks_like_html`, `_html_to_image_bytes`, `_extract_json_arg`, `_url_to_data_url`. Frontend integration (HF lines only — no agent / dataset noise): `HuggingFaceImageUploadComponent` declared in `app.module.ts`, `huggingface-image-upload` formly type registered, image upload component .ts/.html/.scss + `HuggingFace.png` + `sample-image.png` assets. User-input strings continue to flow through `pyb"..."` + `EncodableString` so they reach Python as `self.decode_python_template('<base64>')` rather than raw literals. `PythonCodeRawInvalidTextSpec` still passes (117/117 descriptors `py_compile` cleanly). ### Any related issues, documentation, or discussions? - Tracking issue: apache#5319 - Closes: apache#5319 - Stacked on: apache#5278 (operator + text-generation — issue apache#5277) - Parent issue: apache#5041 - Closed sibling issue: apache#5134 (REST resource — landed via apache#5124) ### How was this PR tested? - `sbt "WorkflowOperator/compile; WorkflowOperator/Test/compile"` clean. - `sbt scalafmtCheck` clean. - `sbt "WorkflowOperator/testOnly org.apache.texera.amber.operator.huggingFace.HuggingFaceInferenceOpDescSpec"` — 18/18 pass (PR 2's 13 spec tests + 5 new image-task tests: image-only routing, VQA / document-QA payload, image-text-to-text chat-completions, image-to-image data-URL parse, all-9-tasks dispatcher coverage). - `sbt "WorkflowOperator/testOnly org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"` — 117/117 descriptors `py_compile` cleanly with the new operator code paths, no marker leaks. - Generated Python verified via `python3 -m py_compile` on sample image-task outputs. ### Was this PR authored or co-authored using generative AI tooling? Yes, co-authored with Claude Opus 4.7. --------- Signed-off-by: Prateek Ganigi <91584519+PG1204@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

PG1204 and others added 7 commits May 17, 2026 13:02

fix: address review feedback on HuggingFaceModelResource

935ccc1

Merge branch 'apache:main' into hf/01-backend-skeleton

089c3c4

Merge branch 'apache:main' into hf/01-backend-skeleton

2aa865c

Merge branch 'apache:main' into hf/01-backend-skeleton

0c30beb

chore: retrigger CI

6857e34

github-actions Bot assigned PG1204 May 28, 2026

github-actions Bot added engine common labels May 28, 2026

PG1204 and others added 7 commits May 28, 2026 13:06

Merge branch 'apache:main' into hf/01-backend-skeleton

6f0f5fb

Merge branch 'main' into hf/01-backend-skeleton

fec6dfb

Merge branch 'apache:main' into hf/01-backend-skeleton

5e95bcd

fix: scala lint fixes

8350eb9

PG1204 force-pushed the hf/02-operator-textgen branch from 61e6c41 to 8350eb9 Compare May 29, 2026 20:38

Merge branch 'apache:main' into hf/02-operator-textgen

2efa337

github-actions Bot removed the engine label Jun 2, 2026

This was referenced Jun 3, 2026

Add image task family (ImageTaskCodegen) to HuggingFace operator #5319

Closed

feat(huggingFace): add image task family via ImageTaskCodegen #5320

Merged

Merge branch 'apache:main' into hf/02-operator-textgen

c44d7d0

Ma77Ball suggested changes Jun 4, 2026

View reviewed changes

Comment thread .../src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/PythonCodegenBase.scala Outdated

Comment thread .../src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/PythonCodegenBase.scala Outdated

PG1204 and others added 3 commits June 5, 2026 12:39

refactor(huggingFace): cap HTTP error detail + lift CHAT_ROUTES / OPE…

28fcab0

…NAI_COMPATIBLE_PROVIDERS to class constants

Merge branch 'apache:main' into hf/02-operator-textgen

eb9d6d1

Merge branch 'apache:main' into hf/02-operator-textgen

89e0819

Merge branch 'apache:main' into hf/02-operator-textgen

0f38215

Ma77Ball approved these changes Jun 10, 2026

View reviewed changes

Comment thread ...src/main/scala/org/apache/texera/amber/operator/huggingFace/HuggingFaceInferenceOpDesc.scala Outdated

PG1204 and others added 5 commits June 9, 2026 17:52

Merge branch 'apache:main' into hf/02-operator-textgen

48890a6

refactor(huggingFace): extract resolvedResultColumn helper to remove …

0304702

…duplication

Merge branch 'apache:main' into hf/02-operator-textgen

78ff986

Merge branch 'apache:main' into hf/02-operator-textgen

c9c566b

Merge branch 'apache:main' into hf/02-operator-textgen

336709a

Merge branch 'apache:main' into hf/02-operator-textgen

fc4b2c8

Yicong-Huang changed the title ~~feat(huggingFace): refactor operator into per-task codegen + text-generation~~ refactor(huggingFace): refactor operator into per-task codegen + text-generation Jun 12, 2026

PG1204 added 2 commits June 12, 2026 20:30

Merge branch 'apache:main' into hf/02-operator-textgen

821f872

Merge branch 'apache:main' into hf/02-operator-textgen

7597f84

xuang7 self-requested a review June 15, 2026 22:13

xuang7 approved these changes Jun 15, 2026

View reviewed changes

PG1204 changed the title ~~refactor(huggingFace): refactor operator into per-task codegen + text-generation~~ feat(huggingFace): refactor operator into per-task codegen + text-generation Jun 15, 2026

xuang7 added this pull request to the merge queue Jun 15, 2026

Merged via the queue into apache:main with commit 2b9add9 Jun 15, 2026
23 checks passed

Uh oh!

Conversation

PG1204 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this PR?

Any related issues, documentation, or discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

codecov-commenter commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PG1204 commented May 28, 2026

Uh oh!

Ma77Ball left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Yicong-Huang commented Jun 7, 2026

Uh oh!

PG1204 commented Jun 7, 2026

Uh oh!

PG1204 commented Jun 7, 2026

Uh oh!

Ma77Ball left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Benchmark changes need a look

Uh oh!

xuang7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Yicong-Huang commented Jun 16, 2026

Uh oh!

PG1204 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

PG1204 commented May 28, 2026 •

edited

Loading

codecov-commenter commented May 28, 2026 •

edited

Loading

github-actions Bot commented Jun 12, 2026 •

edited

Loading