refactor: route shim-registered expressions through CometExpressionSerde#4139
Merged
Conversation
Register CometStringDecode (shared under spark-3.x/) in the versionSpecificStringExpressions map for Spark 3.4 and 3.5, and remove the now-redundant case branch from versionSpecificExprToProtoInternal.
…serde # Conflicts: # spark/src/main/spark-4.0/org/apache/comet/shims/CometExprShim.scala # spark/src/main/spark-4.1/org/apache/comet/shims/CometExprShim.scala # spark/src/main/spark-4.2/org/apache/comet/shims/CometExprShim.scala
…iteral block YAML literal blocks (`|`) do not strip `#` comments, so the previous inline comment lines were passed to the runner as bogus suite class names. Move the reference into a proper YAML comment outside the block so dev/ci/check-suites.py still finds the substring without polluting the suite list.
comphead
reviewed
Apr 29, 2026
| protected def binaryOutputStyle: BinaryOutputStyle = BinaryOutputStyle.HEX_DISCRETE | ||
| def binaryOutputStyle: BinaryOutputStyle = BinaryOutputStyle.HEX_DISCRETE | ||
|
|
||
| def versionSpecificStringExpressions: Map[Class[_ <: Expression], CometExpressionSerde[_]] = |
Contributor
There was a problem hiding this comment.
Suggested change
| def versionSpecificStringExpressions: Map[Class[_ <: Expression], CometExpressionSerde[_]] = | |
| def sparkVersionSpecificStringExpressions: Map[Class[_ <: Expression], CometExpressionSerde[_]] = |
?
Otherwise it is not clear what version referring to, if its spark, comet, etc?
comphead
reviewed
Apr 29, 2026
| import org.apache.spark.sql.CometTestBase | ||
| import org.apache.spark.sql.execution.{ProjectExec, SparkPlan} | ||
|
|
||
| class CometWidthBucketFrameworkSuite extends CometTestBase { |
Contributor
There was a problem hiding this comment.
I'm not sure what Framework means for width_bucket test
comphead
reviewed
Apr 29, 2026
|
|
||
| import org.apache.comet.CometSparkSessionExtensions.isSpark40Plus | ||
|
|
||
| class CometStringDecodeFrameworkSuite extends CometTestBase { |
Contributor
There was a problem hiding this comment.
the word Framework looks redundant
Rename versionSpecific* shim methods to sparkVersionSpecific* so the "Spark version" intent is explicit, and drop the redundant "Framework" suffix from the new test suites.
Member
Author
|
Thanks @comphead. I addressed feedback. Could you take another look? |
…serde # Conflicts: # spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
Closes apache#510 (or contributes to it). The compatibility guide is now generated separately for each supported Spark profile (3.4, 3.5, 4.0, 4.1). Source category templates moved under _category_template/, and docs/build.sh runs GenerateDocs once per profile, populating per-version subdirectories under compatibility/expressions/. Hand -written pages and configs.md remain shared. GenerateDocs accepts an optional second argument naming the per-version subdirectory; without it, the legacy flat layout is preserved (used by older released doc trees that don't carry the new structure). Combined with the shim-to-framework migration in this PR, the per-version output now reflects real differences (e.g. StringDecode appears in the Spark 3.4 unsupported list but not on Spark 4.x where it's handled via a StaticInvoke rewrite).
# Conflicts: # .github/workflows/pr_build_linux.yml # .github/workflows/pr_build_macos.yml # docs/source/user-guide/latest/compatibility/expressions/index.md # spark/src/main/scala/org/apache/comet/GenerateDocs.scala # spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala # spark/src/main/spark-4.0/org/apache/comet/shims/CometExprShim.scala # spark/src/main/spark-4.1/org/apache/comet/shims/CometExprShim.scala # spark/src/main/spark-4.2/org/apache/comet/shims/CometExprShim.scala
Member
Author
|
Merged. Thanks @mbutrovich! |
marvelshan
pushed a commit
to marvelshan/datafusion-comet
that referenced
this pull request
Jul 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #4077.
Rationale for this change
StringDecode,ToPrettyString, and (4.x)MapSortwere registered throughCometExprShim.sparkVersionSpecificExprToProtoInternaland bypassed theCometExpressionSerdeframework. They had no per-expressionenabledconfig, no support-level gate, and no entry in the auto-generated compatibility guide.A second consequence: because the shim's match was version-private, the docs generator only saw the version-agnostic serde maps, so there was no way to publish per-Spark-version compatibility pages.
What changes are included in this PR?
Shim-side category maps
CometExprShim:sparkVersionSpecificStringExpressions,sparkVersionSpecificMathExpressions,sparkVersionSpecificMiscExpressions,sparkVersionSpecificMapExpressions. Each per-Spark-version trait fills these with the classes only available (or only handled) on that Spark version.QueryPlanSerdemerges these into the global category maps, so the auto-generated compatibility docs and the per-expressionenabledconfig now reach version-specific expressions.binaryOutputStylepublic on the trait so serde objects can read it viaQueryPlanSerde.binaryOutputStyle.Migrate shim-resident expressions to
CometExpressionSerdeStringDecode(3.4 / 3.5) →CometStringDecodeToPrettyString(3.5 / 4.x) →CometToPrettyStringMapSort(4.x) →CometMapSortWidthBucketwas originally migrated by this PR, but during the mergemain's #4538 had already routed it throughCometCodegenDispatch[WidthBucket]; this PR keepsmain's serde and only contributes thesparkVersionSpecificMathExpressionsmap entry andCometWidthBucketSuite.Share the 4.x trait body
New
Spark4xCometExprShim(underspark-4.x/) holds the parts identical across 4.0 / 4.1 / 4.2:KnownNotContainsNull,Invoke-on-StructsToJsonEvaluator/ParseUrlEvaluator, theStaticInvoke-on-StringDecoderewrite, andlengthOfJsonArray/DayName/MonthName/ structured-text dispatch (the latter via the existingCometExprShim4xit now extends). The per-minor-versionCometExprShimtraits override onlybinaryOutputStyleand addmake_time/to_time/try_to_time(4.1+).Per-Spark-version compatibility pages (
GenerateDocs)GenerateDocs.mainaccepts an optional second arg: a per-Spark-version subdirectory name (e.g.spark-3.5). When supplied, pages are written tocompatibility/expressions/<version>/; when omitted, the legacy flatcompatibility/expressions/layout is preserved for released-version doc trees.sparkVersionSpecific*Expressionsmaps observable in the docs: each Spark version's docs reflect its own version-specific expressions.How are these changes tested?
CometStringDecodeSuite,CometToPrettyStringSuite,CometMapSortSuite,CometWidthBucketSuite) togglespark.comet.expression.<X>.enabledand assert fallback to Spark when disabled. These fail before the refactor and pass after.CometStringExpressionSuite,CometMapExpressionSuite, etc. continue to pass — no behavior change for accepted expressions.