Skip to content

Add #[recursive]#1522

Merged
alamb merged 14 commits into
apache:mainfrom
blaginin:add-recursive
Dec 19, 2024
Merged

Add #[recursive]#1522
alamb merged 14 commits into
apache:mainfrom
blaginin:add-recursive

Conversation

@blaginin

@blaginin blaginin commented Nov 14, 2024

Copy link
Copy Markdown
Member

Closes #984, related to apache/datafusion#9375 (comment)

Todo:

  • Test the performance overhead on large queries. If it's too high, move it to a separate feature to allow opting out

@blaginin

blaginin commented Nov 14, 2024

Copy link
Copy Markdown
Member Author

I was wondering if we should remove RecursionCounter with this PR. In my opinion, we shouldn't, because the ability to limit max recursion might be useful for some users still

@blaginin

blaginin commented Nov 15, 2024

Copy link
Copy Markdown
Member Author

It seems this PR doesn't have any significant performance impact.

critcmp main recursive                        
group                                                    main                                   recursive
-----                                                    ----                                   ---------
sqlparser-rs parsing benchmark/format_large_statement    1.00   309.1±12.72µs        ? ?/sec    1.01    312.4±7.45µs        ? ?/sec
sqlparser-rs parsing benchmark/parse_large_statement     1.02      6.3±0.35ms        ? ?/sec    1.00      6.2±0.40ms        ? ?/sec
sqlparser-rs parsing benchmark/parse_sql_large_query     1.00      6.1±0.23ms        ? ?/sec  
sqlparser-rs parsing benchmark/sqlparser::select         1.02  1893.8±82.19ns        ? ?/sec    1.00  1855.1±25.08ns        ? ?/sec
sqlparser-rs parsing benchmark/sqlparser::with_select    1.03     11.7±1.20µs        ? ?/sec    1.00     11.4±0.28µs        ? ?/sec

@blaginin blaginin marked this pull request as ready for review November 15, 2024 19:22
@blaginin

Copy link
Copy Markdown
Member Author

FYI, I marked this ready for review. @peter-toth @Eason0729, if you guys want to take a look 🌻

@iffyio

iffyio commented Nov 16, 2024

Copy link
Copy Markdown
Contributor

@blaginin thanks for looking to fix this!

Currently the preference is to avoid a third-party dependency for this issue, ideally fixing up the parser behavior instead to properly handle deeply nested input. See comment here for a bit more context on rationale
From a quick look at recursive it seems to be a procmacro on top of the stacker library so that the same considerations should apply I imagine.

@Eason0729 Eason0729 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @blaginin. LGTM except for some number in unit test.

I left some comments, and any number above 40000 seems to overflow stack without #[recursive].

group.bench_function("sqlparser::with_select", |b| {
b.iter(|| Parser::parse_sql(&dialect, with_query));
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For large_statement, making separated test would make differentiating potential(future) regression easier.
It's a suggestion(fine to leave it as it is).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understood your comment, sorry. I added tests for parsing large statements in tests/sqlparser_common.rs. Do you think we should test something else?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinks it would be better to add as separated test.

Comment thread tests/sqlparser_common.rs
Comment thread src/ast/visitor.rs
@peter-toth

peter-toth commented Nov 16, 2024

Copy link
Copy Markdown

Currently the preference is to avoid a third-party dependency for this issue, ideally fixing up the parser behavior instead to properly handle deeply nested input. See comment here for a bit more context on rationale From a quick look at recursive it seems to be a procmacro on top of the stacker library so that the same considerations should apply I imagine.

@iffyio, we had done some research on stacker / recursive in apache/datafusion#13310 to verify concerns before we added it to datafusion: apache/datafusion#13310 (review) / apache/datafusion#13177 (comment)

@Eason0729

Copy link
Copy Markdown
Contributor

Currently the preference is to avoid a third-party dependency for this issue, ideally fixing up the parser behavior instead to properly handle deeply nested input. See comment here for a bit more context on rationale From a quick look at recursive it seems to be a procmacro on top of the stacker library so that the same considerations should apply I imagine.

@iffyio, we had done some research on stacker / recursive in apache/datafusion#13310 to verify concerns before we added it to datafusion: apache/datafusion#13310 (review) / apache/datafusion#13177 (comment)

Sorry for missing that context when reviewing. So we would like to avoid using recursive for stablility, maybe try...

  1. vender the code from recursive
  2. use stacker directly, like fix: add stacker and maybe_grow on recursion guard #1468

@blaginin

Copy link
Copy Markdown
Member Author

Thanks for the review!! 🙂

Sorry for missing that context when reviewing. So we would like to avoid using recursive for stablility, maybe try...

We ended up using #[recursive] in Datafusion, as Peter highlighted. Feels like it’s good to stay consistent across projects?

@iffyio

iffyio commented Nov 17, 2024

Copy link
Copy Markdown
Contributor

Ah I see, thanks for the context @peter-toth!

cc @alamb for overall thoughts on adding this dependency to sqlparser?

@alamb

alamb commented Nov 24, 2024

Copy link
Copy Markdown
Contributor

Ah I see, thanks for the context @peter-toth!

cc @alamb for overall thoughts on adding this dependency to sqlparser?

While adding new dependencies in general ls 🤮 I don't think there is any viable alternative in this case

We have tried to avoid doing something like stacker for several years with sqlparser and datafusion, but I feel like we have only been able to band aid over the problem, not fix the problem once and for all.

I am hopeful that if we adopt this particular crate we won't have to worry about it again 🤞

@alamb

alamb commented Nov 24, 2024

Copy link
Copy Markdown
Contributor

I think this PR needs a bit more documentation and we shoudl figure out how to rationalize with the existing recursion_limit argument.

https://docs.rs/sqlparser/latest/sqlparser/parser/struct.Parser.html#method.with_recursion_limit

@blaginin

Copy link
Copy Markdown
Member Author

@alamb, I feel like recursion_limit should stay because:

  • recursive doesn't solve the issue when std isn’t enabled.
  • Removing it would be a breaking change.
  • It can be a useful feature even with recursive protection, for example, if lib users have additional constraints on the parsed query.

I added notes in the methods I touched; should be better now ✋

@Eason0729 Eason0729 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Eason0729

Copy link
Copy Markdown
Contributor

It seems like we reached the decision to add recursive instead of using underlying dependency(stacker).

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good -- any other thoughts @iffyio

Thank you @blaginin and @Eason0729 for pushing this along

@blaginin

blaginin commented Dec 2, 2024

Copy link
Copy Markdown
Member Author

Just for transparency, there's apache/datafusion#13513 raised in Datafusion but I believe it shouldn't be the reason not to merge this one (happy to be challenged)

@alamb

alamb commented Dec 2, 2024

Copy link
Copy Markdown
Contributor

Just for transparency, there's apache/datafusion#13513 raised in Datafusion but I believe it shouldn't be the reason not to merge this one (happy to be challenged)

Maybe we could make it an optional dependency 🤔

@blaginin

blaginin commented Dec 2, 2024

Copy link
Copy Markdown
Member Author

nice idea actually! will do

@iffyio iffyio left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread README.md Outdated
blaginin and others added 3 commits December 3, 2024 21:45
Co-authored-by: Ifeanyi Ubah <ify1992@yahoo.com>
# Conflicts:
#	tests/sqlparser_common.rs
@blaginin

blaginin commented Dec 6, 2024

Copy link
Copy Markdown
Member Author

resolved conflicts, should be good to merge now 🤗

@alamb

alamb commented Dec 11, 2024

Copy link
Copy Markdown
Contributor

Given the potential for unintended side effects with this change, I think we should merge it in asap after we have released

@Eason0729

Copy link
Copy Markdown
Contributor

Should we merge this? or something is missing in this PR.
I have no write access, so cc @alamb .

# Conflicts:
#	tests/sqlparser_common.rs
@blaginin

Copy link
Copy Markdown
Member Author

hey, based on the previous comment I think we want to make a release first 🙂

@Eason0729

Copy link
Copy Markdown
Contributor

hey, based on the previous comment I think we want to make a release first 🙂

Thanks.
I am sorry for missing that! 👀

@alamb

alamb commented Dec 19, 2024

Copy link
Copy Markdown
Contributor

Sorry -- the reason I haven't previously merged this is exactly to meger it after release to give it enough "bake time" . I am glad we did wait, actually, as using this macro has caused trouble downstream in datafusion

I think we are good to go

Thank you again @blaginin and @Eason0729 for your contributions and patience

@lovasoa

lovasoa commented Feb 18, 2025

Copy link
Copy Markdown
Contributor

Hi @alamb , @iffyio ! Just wanted to report that I just received a crash report that seems to come from here: sqlpage/SQLPage#814

@alamb

alamb commented Feb 18, 2025

Copy link
Copy Markdown
Contributor

Hi @alamb , @iffyio ! Just wanted to report that I just received a crash report that seems to come from here: sqlpage/SQLPage#814

Thaks @lovasoa

Looks like you have also reported a bug to stacker:

I am glad there is a way to disable the stacker dependency in sqlparser

Is there anything else you think we should do here?

Thanks again

@lovasoa

lovasoa commented Feb 18, 2025

Copy link
Copy Markdown
Contributor

Yes, I discovered error handling is completely absent from stacker, and it just crashes the entire program with a cryptic error message when the underlying pthread library returns any error. I submitted a PR, but in the meantime, I wouldn't recommend including it by default in sqlparser, especially if the goal was to avoid crashes in the first place.

@lovasoa

lovasoa commented Feb 18, 2025

Copy link
Copy Markdown
Contributor

SQLPage does not have a huge install base, and it took just a few days before the first crash report.

@alamb

alamb commented Feb 19, 2025

Copy link
Copy Markdown
Contributor

SQLPage does not have a huge install base, and it took just a few days before the first crash report.

DataFusion does have a pretty large user base and we haven't gotten crash reports yet that I know of. I was somewhat worried about using stacker in datafusion too until @peter-toth pointed out that it was used by rustc itself which allayed my concerns.

it does seem like the usecase

glibc targets where procfs isn't mounted at /proc is

Is somewhat uncommon.

I don't really have a strong opinion one way or the other.

@lovasoa

lovasoa commented Feb 19, 2025

Copy link
Copy Markdown
Contributor

Yes, I initially thought that the problem was specific to their very restricted environment, but looking at the code in stacker, they crash the entire program on ANY error returned by any of the pthread functions used. And pthread functions can return an error code in a number of cases.

@alamb

alamb commented Feb 19, 2025

Copy link
Copy Markdown
Contributor

Interesting -- I haven't looked at the code or what functions are used (and thus under what circumstances such errors happen or how likely they are to occur)

@peter-toth

peter-toth commented Feb 19, 2025

Copy link
Copy Markdown

I didn't notice the missing error handling either. Your rust-lang/stacker#116 seems like a nice improvement. But if it doesn't get accepted for some reason, then probably we could handle errors in DF. Maybe adjust the recursive macro to run some tests before using stacker...

@alamb

alamb commented Feb 23, 2025

Copy link
Copy Markdown
Contributor

I didn't notice the missing error handling either. Your rust-lang/stacker#116 seems like a nice improvement. But if it doesn't get accepted for some reason, then probably we could handle errors in DF. Maybe adjust the recursive macro to run some tests before using stacker...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stack overflow in Statement::to_string for deeply nested expresions

6 participants