Обсуждение: broken JIT support on Fedora 40

Поиск
Список
Период
Сортировка

broken JIT support on Fedora 40

От
Pavel Stehule
Дата:
Hi

after today update, the build with --with-llvm produces broken code, and make check fails with crash

    Upgrade    hwdata-0.380-1.fc40.noarch                           @updates-testing
    Upgraded   hwdata-0.379-1.fc40.noarch                           @@System
    Upgrade    llvm-18.1.0~rc4-1.fc40.x86_64                        @updates-testing
    Upgraded   llvm-17.0.6-6.fc40.x86_64                            @@System
    Upgrade    llvm-devel-18.1.0~rc4-1.fc40.x86_64                  @updates-testing
    Upgraded   llvm-devel-17.0.6-6.fc40.x86_64                      @@System
    Upgrade    llvm-googletest-18.1.0~rc4-1.fc40.x86_64             @updates-testing
    Upgraded   llvm-googletest-17.0.6-6.fc40.x86_64                 @@System
    Upgrade    llvm-libs-18.1.0~rc4-1.fc40.i686                     @updates-testing
    Upgraded   llvm-libs-17.0.6-6.fc40.i686                         @@System
    Upgrade    llvm-libs-18.1.0~rc4-1.fc40.x86_64                   @updates-testing
    Upgraded   llvm-libs-17.0.6-6.fc40.x86_64                       @@System
    Upgrade    llvm-static-18.1.0~rc4-1.fc40.x86_64                 @updates-testing
    Upgraded   llvm-static-17.0.6-6.fc40.x86_64                     @@System
    Upgrade    llvm-test-18.1.0~rc4-1.fc40.x86_64                   @updates-testing
    Instalovat llvm17-libs-17.0.6-7.fc40.i686                       @updates-testing
    Instalovat llvm17-libs-17.0.6-7.fc40.x86_64                     @updates-testing

Regards

Pavel

Re: broken JIT support on Fedora 40

От
Robert Haas
Дата:
On Wed, Mar 6, 2024 at 1:54 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> after today update, the build with --with-llvm produces broken code, and make check fails with crash
>
>     Upgrade    hwdata-0.380-1.fc40.noarch                           @updates-testing
>     Upgraded   hwdata-0.379-1.fc40.noarch                           @@System
>     Upgrade    llvm-18.1.0~rc4-1.fc40.x86_64                        @updates-testing
>     Upgraded   llvm-17.0.6-6.fc40.x86_64                            @@System
>     Upgrade    llvm-devel-18.1.0~rc4-1.fc40.x86_64                  @updates-testing
>     Upgraded   llvm-devel-17.0.6-6.fc40.x86_64                      @@System
>     Upgrade    llvm-googletest-18.1.0~rc4-1.fc40.x86_64             @updates-testing
>     Upgraded   llvm-googletest-17.0.6-6.fc40.x86_64                 @@System
>     Upgrade    llvm-libs-18.1.0~rc4-1.fc40.i686                     @updates-testing
>     Upgraded   llvm-libs-17.0.6-6.fc40.i686                         @@System
>     Upgrade    llvm-libs-18.1.0~rc4-1.fc40.x86_64                   @updates-testing
>     Upgraded   llvm-libs-17.0.6-6.fc40.x86_64                       @@System
>     Upgrade    llvm-static-18.1.0~rc4-1.fc40.x86_64                 @updates-testing
>     Upgraded   llvm-static-17.0.6-6.fc40.x86_64                     @@System
>     Upgrade    llvm-test-18.1.0~rc4-1.fc40.x86_64                   @updates-testing
>     Instalovat llvm17-libs-17.0.6-7.fc40.i686                       @updates-testing
>     Instalovat llvm17-libs-17.0.6-7.fc40.x86_64                     @updates-testing

I don't know anything about LLVM, but somehow I'm guessing that people
who do will need more detail than this in order to be able to fix the
problem. I see that support for LLVM 18 was added in commit
d282e88e50521a457fa1b36e55f43bac02a3167f on January 18th; perhaps you
were building from an older commit?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: broken JIT support on Fedora 40

От
Pavel Stehule
Дата:
Hi

čt 14. 3. 2024 v 19:20 odesílatel Robert Haas <robertmhaas@gmail.com> napsal:
On Wed, Mar 6, 2024 at 1:54 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> after today update, the build with --with-llvm produces broken code, and make check fails with crash
>
>     Upgrade    hwdata-0.380-1.fc40.noarch                           @updates-testing
>     Upgraded   hwdata-0.379-1.fc40.noarch                           @@System
>     Upgrade    llvm-18.1.0~rc4-1.fc40.x86_64                        @updates-testing
>     Upgraded   llvm-17.0.6-6.fc40.x86_64                            @@System
>     Upgrade    llvm-devel-18.1.0~rc4-1.fc40.x86_64                  @updates-testing
>     Upgraded   llvm-devel-17.0.6-6.fc40.x86_64                      @@System
>     Upgrade    llvm-googletest-18.1.0~rc4-1.fc40.x86_64             @updates-testing
>     Upgraded   llvm-googletest-17.0.6-6.fc40.x86_64                 @@System
>     Upgrade    llvm-libs-18.1.0~rc4-1.fc40.i686                     @updates-testing
>     Upgraded   llvm-libs-17.0.6-6.fc40.i686                         @@System
>     Upgrade    llvm-libs-18.1.0~rc4-1.fc40.x86_64                   @updates-testing
>     Upgraded   llvm-libs-17.0.6-6.fc40.x86_64                       @@System
>     Upgrade    llvm-static-18.1.0~rc4-1.fc40.x86_64                 @updates-testing
>     Upgraded   llvm-static-17.0.6-6.fc40.x86_64                     @@System
>     Upgrade    llvm-test-18.1.0~rc4-1.fc40.x86_64                   @updates-testing
>     Instalovat llvm17-libs-17.0.6-7.fc40.i686                       @updates-testing
>     Instalovat llvm17-libs-17.0.6-7.fc40.x86_64                     @updates-testing

I don't know anything about LLVM, but somehow I'm guessing that people
who do will need more detail than this in order to be able to fix the
problem. I see that support for LLVM 18 was added in commit
d282e88e50521a457fa1b36e55f43bac02a3167f on January 18th; perhaps you
were building from an older commit?

I repeated build and check today, fresh master, today fedora update with same result

build is ok, but regress tests fails with crash (it works without -with-llvm)

Regards

Pavel




 

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: broken JIT support on Fedora 40

От
Daniel Gustafsson
Дата:
> On 14 Mar 2024, at 20:15, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> build is ok, but regress tests fails with crash (it works without -with-llvm)

Can you post some details around this crash?  It doesn't seem to be a
combination we have covered in the buildfarm.

--
Daniel Gustafsson




Re: broken JIT support on Fedora 40

От
Thomas Munro
Дата:
On Fri, Mar 15, 2024 at 12:27 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> > On 14 Mar 2024, at 20:15, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> > build is ok, but regress tests fails with crash (it works without -with-llvm)
>
> Can you post some details around this crash?  It doesn't seem to be a
> combination we have covered in the buildfarm.

Yeah, 18.1 (note they switched to 1-based minor numbers, there was no
18.0) just came out a week or so ago.  Despite testing their 18 branch
just before their "RC1" tag, as recently as

commit d282e88e50521a457fa1b36e55f43bac02a3167f
Author: Thomas Munro <tmunro@postgresql.org>
Date:   Thu Jan 25 10:37:35 2024 +1300

    Track LLVM 18 changes.

at which point everything worked, it seems that something changed
before they released.  I haven't looked into why yet but it's crashing
on my FreeBSD box too.



Re: broken JIT support on Fedora 40

От
Thomas Munro
Дата:
For me it seems that the LLVMRunPasses() call, new in

commit 76200e5ee469e4a9db5f9514b9d0c6a31b496bff
Author: Thomas Munro <tmunro@postgresql.org>
Date:   Wed Oct 18 22:15:54 2023 +1300

    jit: Changes for LLVM 17.

is reaching code that segfaults inside libLLVM, specifically in
llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool,
llvm::AAResults*, bool, llvm::Function*).  First obvious question
would be: is that NULL argument still acceptable?  Perhaps it wants
our LLVMTargetMachineRef there:

       err = LLVMRunPasses(module, passes, NULL, options);

But then when we see what is does with that argument, it arrives at a
place that apparently accepts nullptr.


https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/lib/Passes/PassBuilderBindings.cpp#L56

https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/include/llvm/Passes/PassBuilder.h#L124

Hrmph.  Might need an assertion build to learn more.  I'll try to look
again next week or so.



Re: broken JIT support on Fedora 40

От
Dmitry Dolgov
Дата:
> On Fri, Mar 15, 2024 at 01:54:38PM +1300, Thomas Munro wrote:
> For me it seems that the LLVMRunPasses() call, new in
>
> commit 76200e5ee469e4a9db5f9514b9d0c6a31b496bff
> Author: Thomas Munro <tmunro@postgresql.org>
> Date:   Wed Oct 18 22:15:54 2023 +1300
>
>     jit: Changes for LLVM 17.
>
> is reaching code that segfaults inside libLLVM, specifically in
> llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool,
> llvm::AAResults*, bool, llvm::Function*).  First obvious question
> would be: is that NULL argument still acceptable?  Perhaps it wants
> our LLVMTargetMachineRef there:
>
>        err = LLVMRunPasses(module, passes, NULL, options);
>
> But then when we see what is does with that argument, it arrives at a
> place that apparently accepts nullptr.
>
>
https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/lib/Passes/PassBuilderBindings.cpp#L56
>
https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/include/llvm/Passes/PassBuilder.h#L124
>
> Hrmph.  Might need an assertion build to learn more.  I'll try to look
> again next week or so.

Looks like I can reproduce this as well, libLLVM crashes when reaching
AddReturnAttributes inside InlineFunction, when trying to access
operands of the return instruction. I think, for whatever reason, the
latest LLVM doesn't like (i.e. do not expect this when performing
inlining pass) return instructions that do not have a return value, and
this is what happens in the outblock of deform function we generate
(slot_compile_deform).

For verification, I've modified the deform.outblock to call LLVMBuildRet
instead of LLVMBuildRetVoid and this seems to help -- inline and deform
stages are still performed as before, but nothing crashes. But of course
it doesn't sound right that inlining pass cannot process such code.
Unfortunately I don't see any obvious change in the recent LLVM history
that would justify this outcome, might be a genuine bug, will
investigate further.



Re: broken JIT support on Fedora 40

От
Dmitry Dolgov
Дата:
> On Sun, Mar 17, 2024 at 09:02:08PM +0100, Dmitry Dolgov wrote:
> > On Fri, Mar 15, 2024 at 01:54:38PM +1300, Thomas Munro wrote:
> > For me it seems that the LLVMRunPasses() call, new in
> >
> > commit 76200e5ee469e4a9db5f9514b9d0c6a31b496bff
> > Author: Thomas Munro <tmunro@postgresql.org>
> > Date:   Wed Oct 18 22:15:54 2023 +1300
> >
> >     jit: Changes for LLVM 17.
> >
> > is reaching code that segfaults inside libLLVM, specifically in
> > llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool,
> > llvm::AAResults*, bool, llvm::Function*).  First obvious question
> > would be: is that NULL argument still acceptable?  Perhaps it wants
> > our LLVMTargetMachineRef there:
> >
> >        err = LLVMRunPasses(module, passes, NULL, options);
> >
> > But then when we see what is does with that argument, it arrives at a
> > place that apparently accepts nullptr.
> >
> >
https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/lib/Passes/PassBuilderBindings.cpp#L56
> >
https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/include/llvm/Passes/PassBuilder.h#L124
> >
> > Hrmph.  Might need an assertion build to learn more.  I'll try to look
> > again next week or so.
>
> Looks like I can reproduce this as well, libLLVM crashes when reaching
> AddReturnAttributes inside InlineFunction, when trying to access
> operands of the return instruction. I think, for whatever reason, the
> latest LLVM doesn't like (i.e. do not expect this when performing
> inlining pass) return instructions that do not have a return value, and
> this is what happens in the outblock of deform function we generate
> (slot_compile_deform).
>
> For verification, I've modified the deform.outblock to call LLVMBuildRet
> instead of LLVMBuildRetVoid and this seems to help -- inline and deform
> stages are still performed as before, but nothing crashes. But of course
> it doesn't sound right that inlining pass cannot process such code.
> Unfortunately I don't see any obvious change in the recent LLVM history
> that would justify this outcome, might be a genuine bug, will
> investigate further.

I think I found the change that got it all started [1], the commit has a
set of tags like 18.1.0-rc1 and is relatively recent. The message
doesn't say anything related to the crash that we see, so I assume it's
indeed a bug. I've opened an issue to confirm this understanding [2]
(wow, issues were indeed moved to github since the last time I was
touching LLVM), let's see what would be the response.

[1]: https://github.com/llvm/llvm-project/commit/2da4960f20f7e5d88a68ce25636a895284dc66d8
[2]: https://github.com/llvm/llvm-project/issues/86162



Re: broken JIT support on Fedora 40

От
Thomas Munro
Дата:
On Fri, Mar 22, 2024 at 7:15 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > For verification, I've modified the deform.outblock to call LLVMBuildRet
> > instead of LLVMBuildRetVoid and this seems to help -- inline and deform
> > stages are still performed as before, but nothing crashes. But of course
> > it doesn't sound right that inlining pass cannot process such code.

Thanks for investigating and filing the issue.  It doesn't seem to be
moving yet.  Do you want to share the LLVMBuildRet() workaround?
Maybe we need to consider shipping something like that in the
meantime?



Re: broken JIT support on Fedora 40

От
Dmitry Dolgov
Дата:
> On Sat, Mar 30, 2024 at 04:38:11PM +1300, Thomas Munro wrote:
> On Fri, Mar 22, 2024 at 7:15 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > > For verification, I've modified the deform.outblock to call LLVMBuildRet
> > > instead of LLVMBuildRetVoid and this seems to help -- inline and deform
> > > stages are still performed as before, but nothing crashes. But of course
> > > it doesn't sound right that inlining pass cannot process such code.
>
> Thanks for investigating and filing the issue.  It doesn't seem to be
> moving yet.  Do you want to share the LLVMBuildRet() workaround?
> Maybe we need to consider shipping something like that in the
> meantime?

Yeah, sorry, I'm a bit baffled about this situation myself. Yesterday
I've opened a one-line PR fix that should address the issue, maybe this
would help. In the meantime I've attached what did work for me as a
workaround -- it essentially just makes the deform function to return
some value. It's ugly, but since call site will ignore that, and it's
only one occasion where LLVMBuildRetVoid is used, maybe it's acceptable.
Give me a moment, I'm going to test this change more (waiting on
rebuilding LLVM, it takes quire a while on my machine :( ), then can
confirm that it works as expected on the latest version.

Вложения

Re: broken JIT support on Fedora 40

От
Thomas Munro
Дата:
On Sun, Mar 31, 2024 at 5:59 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> Yeah, sorry, I'm a bit baffled about this situation myself. Yesterday
> I've opened a one-line PR fix that should address the issue, maybe this
> would help. In the meantime I've attached what did work for me as a
> workaround -- it essentially just makes the deform function to return
> some value. It's ugly, but since call site will ignore that, and it's
> only one occasion where LLVMBuildRetVoid is used, maybe it's acceptable.
> Give me a moment, I'm going to test this change more (waiting on
> rebuilding LLVM, it takes quire a while on my machine :( ), then can
> confirm that it works as expected on the latest version.

Great news.  I see your PR:

https://github.com/llvm/llvm-project/pull/87093

Checking their release schedule, they have:

Mar  5th: 18.1.0 was released
Mar  8th: 18.1.1 was released
Mar 19th: 18.1.2 was released
Apr 16th: 18.1.3
Apr 30th: 18.1.4
May 14th: 18.1.5
May 28th: 18.1.6 (if necessary)

Our next release is May 9th.  So assuming your PR goes in in the next
couple of weeks and makes it into their 18.1.3 or even .4, there is no
point in pushing a work-around on our side.



Re: broken JIT support on Fedora 40

От
Thomas Munro
Дата:
On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> https://github.com/llvm/llvm-project/pull/87093

Oh, with those clues, I think I might see...  It is a bit strange that
we copy attributes from AttributeTemplate(), a function that returns
Datum, to our void deform function.  It works (I mean doesn't crash)
if you just comment this line out:

    llvm_copy_attributes(AttributeTemplate, v_deform_fn);

... but I guess that disables inlining of the deform function?  So
perhaps we just need to teach that thing not to try to copy the return
value's attributes, which also seems to work here:

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index ec0fdd49324..92b4993a98a 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -552,8 +552,11 @@ llvm_copy_attributes(LLVMValueRef v_from,
LLVMValueRef v_to)
        /* copy function attributes */
        llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeFunctionIndex);

-       /* and the return value attributes */
-       llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeReturnIndex);
+       if (LLVMGetTypeKind(LLVMGetFunctionReturnType(v_to)) !=
LLVMVoidTypeKind)
+       {
+               /* and the return value attributes */
+               llvm_copy_attributes_at_index(v_from, v_to,
LLVMAttributeReturnIndex);
+       }



Re: broken JIT support on Fedora 40

От
Dmitry Dolgov
Дата:
> On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote:
> On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > https://github.com/llvm/llvm-project/pull/87093
>
> Oh, with those clues, I think I might see...  It is a bit strange that
> we copy attributes from AttributeTemplate(), a function that returns
> Datum, to our void deform function.  It works (I mean doesn't crash)
> if you just comment this line out:
>
>     llvm_copy_attributes(AttributeTemplate, v_deform_fn);
>
> ... but I guess that disables inlining of the deform function?  So
> perhaps we just need to teach that thing not to try to copy the return
> value's attributes, which also seems to work here:

Yep, I think this is it. I've spent some hours trying to understand why
suddenly deform function has noundef ret attribute, when it shouldn't --
this explains it and the proposed change fixes the crash. One thing that
is still not clear to me though is why this copied attribute doesn't
show up in the bitcode dumped right before running inline pass (I've
added this to troubleshoot the issue).



Re: broken JIT support on Fedora 40

От
Dmitry Dolgov
Дата:
> On Fri, Apr 05, 2024 at 03:21:06PM +0200, Dmitry Dolgov wrote:
> > On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote:
> > On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > https://github.com/llvm/llvm-project/pull/87093
> >
> > Oh, with those clues, I think I might see...  It is a bit strange that
> > we copy attributes from AttributeTemplate(), a function that returns
> > Datum, to our void deform function.  It works (I mean doesn't crash)
> > if you just comment this line out:
> >
> >     llvm_copy_attributes(AttributeTemplate, v_deform_fn);
> >
> > ... but I guess that disables inlining of the deform function?  So
> > perhaps we just need to teach that thing not to try to copy the return
> > value's attributes, which also seems to work here:
>
> Yep, I think this is it. I've spent some hours trying to understand why
> suddenly deform function has noundef ret attribute, when it shouldn't --
> this explains it and the proposed change fixes the crash. One thing that
> is still not clear to me though is why this copied attribute doesn't
> show up in the bitcode dumped right before running inline pass (I've
> added this to troubleshoot the issue).

One thing to consider in this context is indeed adding "verify" pass as
suggested in the PR, at least for the debugging configuration. Without the fix
it immediately returns:

    Running analysis: VerifierAnalysis on deform_0_1
    Attribute 'noundef' applied to incompatible type!

    llvm error: Broken function found, compilation aborted!



Re: broken JIT support on Fedora 40

От
Dmitry Dolgov
Дата:
> On Fri, Apr 05, 2024 at 03:50:50PM +0200, Dmitry Dolgov wrote:
> > On Fri, Apr 05, 2024 at 03:21:06PM +0200, Dmitry Dolgov wrote:
> > > On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote:
> > > On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > > https://github.com/llvm/llvm-project/pull/87093
> > >
> > > Oh, with those clues, I think I might see...  It is a bit strange that
> > > we copy attributes from AttributeTemplate(), a function that returns
> > > Datum, to our void deform function.  It works (I mean doesn't crash)
> > > if you just comment this line out:
> > >
> > >     llvm_copy_attributes(AttributeTemplate, v_deform_fn);
> > >
> > > ... but I guess that disables inlining of the deform function?  So
> > > perhaps we just need to teach that thing not to try to copy the return
> > > value's attributes, which also seems to work here:
> >
> > Yep, I think this is it. I've spent some hours trying to understand why
> > suddenly deform function has noundef ret attribute, when it shouldn't --
> > this explains it and the proposed change fixes the crash. One thing that
> > is still not clear to me though is why this copied attribute doesn't
> > show up in the bitcode dumped right before running inline pass (I've
> > added this to troubleshoot the issue).
>
> One thing to consider in this context is indeed adding "verify" pass as
> suggested in the PR, at least for the debugging configuration. Without the fix
> it immediately returns:
>
>     Running analysis: VerifierAnalysis on deform_0_1
>     Attribute 'noundef' applied to incompatible type!
>
>     llvm error: Broken function found, compilation aborted!

Here is what I have in mind. Interestingly enough, it also shows few
more errors besides "noundef":

    Intrinsic name not mangled correctly for type arguments! Should be: llvm.lifetime.end.p0
    ptr @llvm.lifetime.end.p0i8

It refers to the function from create_LifetimeEnd, not sure how
important is this.

Вложения

Re: broken JIT support on Fedora 40

От
Thomas Munro
Дата:
On Sat, Apr 6, 2024 at 5:01 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > > Yep, I think this is it. I've spent some hours trying to understand why
> > > suddenly deform function has noundef ret attribute, when it shouldn't --
> > > this explains it and the proposed change fixes the crash. One thing that
> > > is still not clear to me though is why this copied attribute doesn't
> > > show up in the bitcode dumped right before running inline pass (I've
> > > added this to troubleshoot the issue).
> >
> > One thing to consider in this context is indeed adding "verify" pass as
> > suggested in the PR, at least for the debugging configuration. Without the fix
> > it immediately returns:
> >
> >       Running analysis: VerifierAnalysis on deform_0_1
> >       Attribute 'noundef' applied to incompatible type!
> >
> >       llvm error: Broken function found, compilation aborted!
>
> Here is what I have in mind. Interestingly enough, it also shows few
> more errors besides "noundef":
>
>     Intrinsic name not mangled correctly for type arguments! Should be: llvm.lifetime.end.p0
>     ptr @llvm.lifetime.end.p0i8
>
> It refers to the function from create_LifetimeEnd, not sure how
> important is this.

Would it be too slow to run the verify pass always, in assertion
builds?  Here's a patch for the original issue, and a patch to try
that idea + a fix for that other complaint it spits out.  The latter
would only run for LLVM 17+, but that seems OK.

Вложения

Re: broken JIT support on Fedora 40

От
Dmitry Dolgov
Дата:
> On Tue, Apr 09, 2024 at 07:07:58PM +1200, Thomas Munro wrote:
> On Sat, Apr 6, 2024 at 5:01 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > > > Yep, I think this is it. I've spent some hours trying to understand why
> > > > suddenly deform function has noundef ret attribute, when it shouldn't --
> > > > this explains it and the proposed change fixes the crash. One thing that
> > > > is still not clear to me though is why this copied attribute doesn't
> > > > show up in the bitcode dumped right before running inline pass (I've
> > > > added this to troubleshoot the issue).
> > >
> > > One thing to consider in this context is indeed adding "verify" pass as
> > > suggested in the PR, at least for the debugging configuration. Without the fix
> > > it immediately returns:
> > >
> > >       Running analysis: VerifierAnalysis on deform_0_1
> > >       Attribute 'noundef' applied to incompatible type!
> > >
> > >       llvm error: Broken function found, compilation aborted!
> >
> > Here is what I have in mind. Interestingly enough, it also shows few
> > more errors besides "noundef":
> >
> >     Intrinsic name not mangled correctly for type arguments! Should be: llvm.lifetime.end.p0
> >     ptr @llvm.lifetime.end.p0i8
> >
> > It refers to the function from create_LifetimeEnd, not sure how
> > important is this.
>
> Would it be too slow to run the verify pass always, in assertion
> builds?  Here's a patch for the original issue, and a patch to try
> that idea + a fix for that other complaint it spits out.  The latter
> would only run for LLVM 17+, but that seems OK.

Sounds like a good idea. About the overhead, I've done a quick test on the
reproducer at hands, doing explain analyze in a tight loop and fetching
"optimization" timinigs. It gives quite visible difference 96ms p99 with verify
vs 46ms p99 without verify (and a rather low stddev, ~1.5ms). But I can
imagine it's acceptable for a build with assertions.

Btw, I've found there is a C-api for this exposed, which produces the same
warnings for me. Maybe it would be even better this way:

    /**
     * Toggle adding the VerifierPass for the PassBuilder, ensuring all functions
     * inside the module is valid.
     */
    void LLVMPassBuilderOptionsSetVerifyEach(LLVMPassBuilderOptionsRef Options,
                                             LLVMBool VerifyEach);


    +       /* In assertion builds, run the LLVM verify pass. */
    +#ifdef USE_ASSERT_CHECKING
    +       LLVMPassBuilderOptionsSetVerifyEach(options, true);
    +#endif




Re: broken JIT support on Fedora 40

От
Thomas Munro
Дата:
On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>         +       /* In assertion builds, run the LLVM verify pass. */
>         +#ifdef USE_ASSERT_CHECKING
>         +       LLVMPassBuilderOptionsSetVerifyEach(options, true);
>         +#endif

Thanks, that seems nicer.  I think the question is whether it will
slow down build farm/CI/local meson test runs to a degree that exceeds
its value.  Another option would be to have some other opt-in macro,
like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain
JIT-related stuff to turn on.

Supposing we go with USE_ASSERT_CHECKING, I have another question:

-       const char *nm = "llvm.lifetime.end.p0i8";
+       const char *nm = "llvm.lifetime.end.p0";

Was that a mistake, or did the mangling rules change in some version?
I don't currently feel inclined to go and test this on the ancient
versions we claim to support in back-branches.  Perhaps we should just
do this in master, and then it'd be limited to worrying about LLVM
versions 10-18 (see 820b5af7), which have the distinct advantage of
being available in package repositories for testing.  Or I suppose we
could back-patch, but only do it if LLVM_VERSION_MAJOR >= 10.  Or we
could do it unconditionally, and wait for ancient-LLVM build farm
animals to break if they're going to.

I pushed the illegal attribute fix though.  Thanks for the detective work!

(It crossed my mind that perhaps deform functions should have their
own template function, but if someone figures out that that's a good
idea, I think we'll *still* need that change just pushed.)



Re: broken JIT support on Fedora 40

От
Pavel Stehule
Дата:


st 10. 4. 2024 v 2:44 odesílatel Thomas Munro <thomas.munro@gmail.com> napsal:
On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>         +       /* In assertion builds, run the LLVM verify pass. */
>         +#ifdef USE_ASSERT_CHECKING
>         +       LLVMPassBuilderOptionsSetVerifyEach(options, true);
>         +#endif

Thanks, that seems nicer.  I think the question is whether it will
slow down build farm/CI/local meson test runs to a degree that exceeds
its value.  Another option would be to have some other opt-in macro,
like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain
JIT-related stuff to turn on.

Supposing we go with USE_ASSERT_CHECKING, I have another question:

-       const char *nm = "llvm.lifetime.end.p0i8";
+       const char *nm = "llvm.lifetime.end.p0";

Was that a mistake, or did the mangling rules change in some version?
I don't currently feel inclined to go and test this on the ancient
versions we claim to support in back-branches.  Perhaps we should just
do this in master, and then it'd be limited to worrying about LLVM
versions 10-18 (see 820b5af7), which have the distinct advantage of
being available in package repositories for testing.  Or I suppose we
could back-patch, but only do it if LLVM_VERSION_MAJOR >= 10.  Or we
could do it unconditionally, and wait for ancient-LLVM build farm
animals to break if they're going to.

I pushed the illegal attribute fix though.  Thanks for the detective work!

(It crossed my mind that perhaps deform functions should have their
own template function, but if someone figures out that that's a good
idea, I think we'll *still* need that change just pushed.)

all tests passed on fc 40 without problems

Thank you

Pavel

Re: broken JIT support on Fedora 40

От
Dmitry Dolgov
Дата:
> On Wed, Apr 10, 2024 at 12:43:23PM +1200, Thomas Munro wrote:
> On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> >         +       /* In assertion builds, run the LLVM verify pass. */
> >         +#ifdef USE_ASSERT_CHECKING
> >         +       LLVMPassBuilderOptionsSetVerifyEach(options, true);
> >         +#endif
>
> Thanks, that seems nicer.  I think the question is whether it will
> slow down build farm/CI/local meson test runs to a degree that exceeds
> its value.  Another option would be to have some other opt-in macro,
> like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain
> JIT-related stuff to turn on.

Oh, I see. I'm afraid I don't have enough knowledge of the CI pipeline,
but at least locally for me installcheck became only few percent slower
with the verify pass.

> Supposing we go with USE_ASSERT_CHECKING, I have another question:
>
> -       const char *nm = "llvm.lifetime.end.p0i8";
> +       const char *nm = "llvm.lifetime.end.p0";
>
> Was that a mistake, or did the mangling rules change in some version?

I'm not sure, but inclined to think it's the same as with noundef -- a
mistake, which was revealed in some recent version of LLVM. From what I
understand the suffix i8 indicates an overloaded argument of that type,
which is probably not needed. At the same time I can't get this error
from the verify pass with LLVM-12 or LLVM-15 (I have those at hand by
accident). To make it even more confusing I've found a few similar
examples in other projects, where this was really triggered by an issue
in LLVM [1].

[1]: https://github.com/rust-lang/rust/issues/102738



Re: broken JIT support on Fedora 40

От
Andres Freund
Дата:
Hi,

On 2024-04-10 22:15:27 +0200, Dmitry Dolgov wrote:
> > On Wed, Apr 10, 2024 at 12:43:23PM +1200, Thomas Munro wrote:
> > On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > >         +       /* In assertion builds, run the LLVM verify pass. */
> > >         +#ifdef USE_ASSERT_CHECKING
> > >         +       LLVMPassBuilderOptionsSetVerifyEach(options, true);
> > >         +#endif
> >
> > Thanks, that seems nicer.  I think the question is whether it will
> > slow down build farm/CI/local meson test runs to a degree that exceeds
> > its value.  Another option would be to have some other opt-in macro,
> > like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain
> > JIT-related stuff to turn on.
> 
> Oh, I see. I'm afraid I don't have enough knowledge of the CI pipeline,
> but at least locally for me installcheck became only few percent slower
> with the verify pass.

I think it's worthwhile to add. It makes some problems so much easier to
find. And if you're building with debug enabled llvm, performance is so
atrocious anyway, that this isn't going to change the big picture...


> > Supposing we go with USE_ASSERT_CHECKING, I have another question:
> >
> > -       const char *nm = "llvm.lifetime.end.p0i8";
> > +       const char *nm = "llvm.lifetime.end.p0";
> >
> > Was that a mistake, or did the mangling rules change in some version?
> 
> I'm not sure, but inclined to think it's the same as with noundef -- a
> mistake, which was revealed in some recent version of LLVM. From what I
> understand the suffix i8 indicates an overloaded argument of that type,
> which is probably not needed. At the same time I can't get this error
> from the verify pass with LLVM-12 or LLVM-15 (I have those at hand by
> accident). To make it even more confusing I've found a few similar
> examples in other projects, where this was really triggered by an issue
> in LLVM [1].

I'm afraid that it actually has changed over time, I'm fairly sure that it was
required at some point.

Greetings,

Andres Freund



Re: broken JIT support on Fedora 40

От
Robert Haas
Дата:
On Tue, Apr 9, 2024 at 8:44 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> I pushed the illegal attribute fix though.  Thanks for the detective work!

This was commit 53c8d6c9f157f2bc8211b8de02417e55fefddbc7 and as I
understand it that fixed the issue originally reported on this thread.

Therefore, I have marked https://commitfest.postgresql.org/48/4917/ as
Committed.

If that's not correct, please feel free to fix. If there are other
issues that need to be patched separately, please consider opening a
new CF entry for those issues once a patch is available.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: broken JIT support on Fedora 40

От
Dmitry Dolgov
Дата:
> On Tue, May 14, 2024 at 11:09:39AM -0400, Robert Haas wrote:
> On Tue, Apr 9, 2024 at 8:44 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > I pushed the illegal attribute fix though.  Thanks for the detective work!
>
> This was commit 53c8d6c9f157f2bc8211b8de02417e55fefddbc7 and as I
> understand it that fixed the issue originally reported on this thread.
>
> Therefore, I have marked https://commitfest.postgresql.org/48/4917/ as
> Committed.
>
> If that's not correct, please feel free to fix. If there are other
> issues that need to be patched separately, please consider opening a
> new CF entry for those issues once a patch is available.

Thanks, that's correct. I think the only thing left is to add a verifier
pass, which everybody seems to be agreed is nice to have. The plan is to
add it only to master without back-patching. I assume Thomas did not
have time for that yet, so I've added the latest suggestions into his
patch, and going to open a CF item to not forget about it.

Вложения