MDEV-21248: Prevent optimizing out buf argument in check_stack_overrun.#1424
Conversation
|
Hi @marxin, unfortunately what you suggest is a scalability bottleneck. Have you managed to find out why this parameter was needed and why it helps in your case? |
Well, the argument is passed to the function You can also take at a similar issue I saw recently with GCC 10: |
|
Please correct me if I'm wrong, if a stack variable is optimised away, it means less stack is used and |
Yes, I suspect that in the problematic test-case |
|
Feels like |
|
#pragma GCC push_options If you would not want to optimize this function, then you can tell that directly, rather than introducing magic volatiles. |
Note that the compiler is free to remove such a stack variable if it's unused. One example: static ha_rows get_quick_record_count(THD *thd, SQL_SELECT *select,
TABLE *table,
const key_map *keys,ha_rows limit)
{
int error;
DBUG_ENTER("get_quick_record_count");
uchar buff[STACK_BUFF_ALLOC];
if (unlikely(check_stack_overrun(thd, STACK_MIN_SIZE, buff)))
DBUG_RETURN(0); // Fatal error flag is set
if (select)
{
select->head=table;
table->reginfo.impossible_range=0;
if (likely((error=
select->test_quick_select(thd, *(key_map *)keys,
(table_map) 0,
limit, 0, FALSE,
TRUE, /* remove_where_parts*/
FALSE)) ==
1))
DBUG_RETURN(select->quick->records);
if (unlikely(error == -1))
{
table->reginfo.impossible_range=1;
DBUG_RETURN(0);
}
DBUG_PRINT("warning",("Couldn't use record count on const keypart"));
}
DBUG_RETURN(HA_POS_ERROR); /* This shouldn't happend */
}Here
Yes, this will likely work here. |
|
You can play with the following example: char *global;
static int foo(char *ptr)
{
// this does not work
__asm__ __volatile__ ("":::"memory");
static volatile char *x;
x = ptr;
// this works:
// global = ptr;
return 0;
}
int main()
{
char v[100];
return foo(v);
}So one can either assign the address to a global variable, or use |
|
How a stack variable is unused in your example if it is used directly after you declare it. uchar buff[STACK_BUFF_ALLOC]; How a compiler would remove that? You mean the compiler , for which the obscure attribute(unused) was written? |
|
@vvaintroub in this particular case it may be considered unused indeed. I just wonder why did we need that variable at all? |
The attribute only drives warning emission, it will not prevent optimizations to remove the variables: |
It's just a placeholder that helps you to detect that you are close stack overflow. Otherwise, you'll make a call of another function (or call a |
|
do you actually know what LTO has done to this function? Do you have a dissassemly example of that? |
Sure, I can provide an assembly. Do you which function should catch the stack overflow in the mentioned test-case? |
|
before we jump into that, can you share what happens if you deoptimize the function in a documented way like #1424 (comment) Thank you very much. |
Sorry for a long wait for the reply. Yes, adding
|
When using LTO, one can see optimization of stack variables that are passed to check_stack_overrun as argument buf. That prevents proper stack overrun detection.
eeca2b7 to
c1584b7
Compare
|
May I please ping this? |
|
That's unfortunate. Could you share your build parameters either here or in jira task? |
Sure. I basically add |
|
@marxin, attached log shows successful build with gcc-9. It didn't fail on my side either. How do I properly install gcc10 with tumbleweed? |
The TW does not have gcc10 as system compiler. You can however use open build service and build the following package: It uses GCC 10. |
|
We do believe that |
|
You are right, the |
|
@svoj it is much better to deoptimize GCC by introducing compiler specific pragma , than to make everyone wonder why the heck the unused variable is doing there. There are too many ways to check for buffer overrun already. my_thread_var->stack_ends_here is used in mysys/lf_alloc-pin.c maybe it helps here too. @marxin, If LTO is producing a buggy code, why do you use it? Or rather, why don't you bug gcc maintainers, but us? |
Strong words here ;) Well, to be honest, an optimizing compiler is free to remove any unused variable. In this particular case, the We use it because it brings a lot of optimization opportunities (mostly cross-module inlining) and you end up with faster and smaller binaries. We believe it's what users want. If you believe it's a GCC issue, feel free to open an issue with a reduced test-case and we can discuss that. |
|
The question is why do you believe it is our issue? I mean, where, exactly, is the bug in the code? Users want something that works, and only then cross-module inlining and the goodies link time optimization brings. I think -O2 would suffice most of the time. Add -g and -fno-omit-frame-pointer for debuggability in production, and for the good stack traces, and that would be fine, with me. It is nothing bad to do some LTO/PGO research, but in the end, what's the point in whooping 5% improvement on a sysbench benchmark, if generated code can't pass the test suite? |
Well, you basically rely on the following:
Well, users want ideally both.
We're on the cutting edge. People want to have correct programs that run maximally fast. LTO is a technique that exposes much more defects in a user code base and we help people to explain the violations. And yes, sometimes LTO causes a miscompilation and we're eager to fix all these issues. |
|
Ok, the "buff" and gcc unused are both obscure. Can you please expand then on #1424 (comment) . The buff parameter is redundant, and there should not be any buffs of size 352 (STACK_BUFF_ALLOC) to check the stack size. Instead , the size can be added to STACK_MIN_SIZE, right, this is what @svoj did in 66cb908 . You said it did not help, and why it did not? |
|
With some luck (and proper tuning of the Let's imagine So that explains my current patch in this pull request: it adds an optimization barrier to |
|
FWIW I managed to deploy opensuse with gcc10. My patches didn't work indeed. Even worse they break this fix. OTOH this fix alone works. I do believe that MariaDB is guilty for this bug: it is relying on non-standard behaviour, specifically stack layout (or is it defined anywhere?) If I fail to find out what's going on during the next week, I will vote to apply this PR. My current guess is: either stack got optimised and became smaller or |
|
It came to be that the buffer is needed to reserve stack space for recursive We thought about alternatives like adding new Another alternatives were: or But I guess very smart compiler may optimise these as well. In other words we couldn't come up with anything better and your patch is approved. |
When using LTO, one can see optimization of stack variables that are passed to check_stack_overrun as argument buf. That prevents proper stack overrun detection. Closes #1424
|
@marxin thanks for contribution! |
When using LTO, one can see optimization of stack variables that
are passed to check_stack_overrun as argument buf. That prevents
proper stack overrun detection.