Skip to content

minor: Improve error message for invalid column expression in SELECT statement#22486

Merged
2010YOUY01 merged 7 commits into
apache:mainfrom
2010YOUY01:col-did-you-mean
May 31, 2026
Merged

minor: Improve error message for invalid column expression in SELECT statement#22486
2010YOUY01 merged 7 commits into
apache:mainfrom
2010YOUY01:col-did-you-mean

Conversation

@2010YOUY01

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

User friendly 'did you mean' error message is missing for some cases, this PR fixes it.

Demo in datafusion-cli

-- Before
> create external table hits
stored as parquet
location '/Users/yongting/Code/datafusion/benchmarks/data/hits_partitioned';
0 row(s) fetched.
Elapsed 0.056 seconds.

> select url+1 from hits;
Schema error: No field named url. Valid fields are hits."WatchID", hits."JavaEnable", hits."Title", hits."GoodEvent", hits."EventTime", hits."EventDate", hits."CounterID", hits."ClientIP", hits."RegionID", hits."UserID", hits."CounterClass",
...
-- PR
> select url from hits;
Schema error: No field named url. Did you mean 'hits."URL"'?
Column names are case sensitive. You can use double quotes to refer to the hits."URL" column or set the datafusion.sql_parser.enable_ident_normalization configuration.
Valid fields are hits."WatchID", hits."JavaEnable", hits."Title", hits."GoodEvent", hits."EventTime", hits."EventDate", hits."CounterID", hits."ClientIP", hits."RegionID", hits."UserID", hits."CounterClass", hits."OS", hits."UserAgent", hits."URL", hits."Referer", hits."IsRefresh", hits."RefererCategoryID", hits."RefererRegionID", hits."URLCategoryID", hits."URLRegionID",
...

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions Bot added sql SQL Planner logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate labels May 24, 2026
// lookup with unqualified name "t1.c0"
let err = schema.index_of_column(&col).unwrap_err();
let expected = "Schema error: No field named \"t1.c0\". \
Column names are case sensitive. \

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this case sensitive hint only shows up, if the input expr has some case-insensitive match.

@nuno-faria nuno-faria left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @2010YOUY01, makes sense to me. I leave some suggestions below.

Comment on lines +287 to +289
"\nColumn names are case sensitive. You can use double quotes to refer to the {} column \
or set the datafusion.sql_parser.enable_ident_normalization configuration.",
case_sensitive_match.quoted_flat_name()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not new, but should it say "... or disable the datafusion.sql_parser.enable_ident_normalization configuration."? I think "set" makes it sound like we need to enable the config.

Comment on lines +205 to +220
let field_flat_name = field.flat_name();
let field_name_lower = field.name().to_lowercase();
let field_flat_name_lower = field_flat_name.to_lowercase();

valid_fields.iter().find(|valid_field| {
let valid_field_flat_name = valid_field.flat_name();
let valid_field_name_lower = valid_field.name().to_lowercase();
let valid_field_flat_name_lower = valid_field_flat_name.to_lowercase();

let name_differs_only_by_case = field_name_lower == valid_field_name_lower
&& field.name() != valid_field.name();
let flat_name_differs_only_by_case = field_flat_name_lower
== valid_field_flat_name_lower
&& field_flat_name != valid_field_flat_name;

name_differs_only_by_case || flat_name_differs_only_by_case

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: For consistency, the "_flat_name" variables could be removed, or we could add variables for the .name() values.

Comment on lines +242 to +246
let distance = levenshtein(target, valid_name);
let max_len = target.chars().count().max(valid_name.chars().count());
if max_len == 0 || distance * 2 > max_len {
continue;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add a small comment here to explain the reasoning behind the formula?

@2010YOUY01

Copy link
Copy Markdown
Contributor Author

Thank you for the review @nuno-faria ! All of them make sense, have updated in 934ffa9

@2010YOUY01 2010YOUY01 added this pull request to the merge queue May 31, 2026
Merged via the queue into apache:main with commit fa724c1 May 31, 2026
38 checks passed
@2010YOUY01 2010YOUY01 deleted the col-did-you-mean branch May 31, 2026 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants