Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19508

Issue with: #define SI_KERNEL in include/my_pthread.h

Details

    Description

      Compilation error:

      mariadb-10.4.3/sql/mysqld.cc:3272:36: error: 'SI_KERNEL' was not declared in this scope
             if (!abort_loop && origin != SI_KERNEL)
      

      Attachments

        Issue Links

          Activity

            anel Anel Husakovic created issue -
            anel Anel Husakovic made changes -
            Field Original Value New Value
            anel Anel Husakovic made changes -
            Description Compilation error:
            mariadb-10.4.3/sql/mysqld.cc:3272:36: error: 'SI_KERNEL' was not declared in this scope
                   if (!abort_loop && origin != SI_KERNEL)

            Compilation error:
            {noformat}
            mariadb-10.4.3/sql/mysqld.cc:3272:36: error: 'SI_KERNEL' was not declared in this scope
                   if (!abort_loop && origin != SI_KERNEL)
            {noformat}
            anel Anel Husakovic made changes -
            EGuesnet Etienne Guesnet added a comment - - edited

            I confirm this bug on AIX 7.2 Power 8, with MariaDB 10.4.6.

            SI_KERNEL is defined in include/my_pthread.h, line 866,

            #ifdef HAVE_SIGWAITINFO
            [...]
            #else
            #define SI_KERNEL 128
            

            I find in CMakeCache.txt that we have HAVE_SIGWAITINFO to 1.

            This bug can probably be found in all plateform with HAVE_SIGWAITINFO.

            MariaDB may define SI_KERNEL in all case, or ensure the code using it (sql/mysqld.cc, line 3292) is not compiled if HAVE_SIGWAITINFO is 1.

            EGuesnet Etienne Guesnet added a comment - - edited I confirm this bug on AIX 7.2 Power 8, with MariaDB 10.4.6. SI_KERNEL is defined in include/my_pthread.h, line 866, #ifdef HAVE_SIGWAITINFO [...] #else #define SI_KERNEL 128 I find in CMakeCache.txt that we have HAVE_SIGWAITINFO to 1. This bug can probably be found in all plateform with HAVE_SIGWAITINFO. MariaDB may define SI_KERNEL in all case, or ensure the code using it (sql/mysqld.cc, line 3292) is not compiled if HAVE_SIGWAITINFO is 1.

            Hi EGuesnet,
            can you please create patch and raise an Pull request, so your patch could be visible to other developers ?
            Thanks.

            anel Anel Husakovic added a comment - Hi EGuesnet , can you please create patch and raise an Pull request, so your patch could be visible to other developers ? Thanks.

            Done [here](https://github.com/MariaDB/server/pull/1357). Sorry for the delay. I hope it is correct, I do not use Github frequently.

            EGuesnet Etienne Guesnet added a comment - Done [here] ( https://github.com/MariaDB/server/pull/1357 ). Sorry for the delay. I hope it is correct, I do not use Github frequently.
            svoj Sergey Vojtovich made changes -
            Assignee Sergei Golubchik [ serg ]

            I wonder why original fix didn't do something like setsid() instead?

            svoj Sergey Vojtovich added a comment - I wonder why original fix didn't do something like setsid() instead?
            EGuesnet Etienne Guesnet made changes -
            serg Sergei Golubchik made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            serg Sergei Golubchik made changes -
            Status In Progress [ 3 ] In Review [ 10002 ]

            EGuesnet, I see that AIX has sigwaitinfo() and siginfo_t. What are possible values of siginfo_t.si_code ?

            serg Sergei Golubchik added a comment - EGuesnet , I see that AIX has sigwaitinfo() and siginfo_t . What are possible values of siginfo_t.si_code ?

            It is an integer from 0 to 74, or -3. What exactly do you want?

            EGuesnet Etienne Guesnet added a comment - It is an integer from 0 to 74, or -3. What exactly do you want?

            In Linux it's something like

            enum
            {
              SI_ASYNCNL = -60,		/* Sent by asynch name lookup completion.  */
              SI_DETHREAD = -7,		/* Sent by execve killing subsidiary threads.  */
              SI_TKILL,			/* Sent by tkill.  */
              SI_SIGIO,			/* Sent by queued SIGIO. */
              SI_ASYNCIO,			/* Sent by AIO completion.  */
              SI_MESGQ,			/* Sent by real time mesq state change.  */
              SI_TIMER,			/* Sent by timer expiration.  */
              SI_QUEUE,			/* Sent by sigqueue.  */
              SI_USER,			/* Sent by kill, sigsend.  */
              SI_KERNEL = 0x80		/* Send by kernel.  */
            };
            

            and the code distinguishes SIGHUP sent by the user (kill -HUP) from the SIGHUP sent by the kernel (killing the conrolling terminal). Does the same logic work on AIX? May be with a different si_code value?

            serg Sergei Golubchik added a comment - In Linux it's something like enum { SI_ASYNCNL = -60, /* Sent by asynch name lookup completion. */ SI_DETHREAD = -7, /* Sent by execve killing subsidiary threads. */ SI_TKILL, /* Sent by tkill. */ SI_SIGIO, /* Sent by queued SIGIO. */ SI_ASYNCIO, /* Sent by AIO completion. */ SI_MESGQ, /* Sent by real time mesq state change. */ SI_TIMER, /* Sent by timer expiration. */ SI_QUEUE, /* Sent by sigqueue. */ SI_USER, /* Sent by kill, sigsend. */ SI_KERNEL = 0x80 /* Send by kernel. */ }; and the code distinguishes SIGHUP sent by the user (kill -HUP) from the SIGHUP sent by the kernel (killing the conrolling terminal). Does the same logic work on AIX? May be with a different si_code value?

            The code are

            /*
             * the following are the code macros that are stored in the si_code
             * element of the siginfo_t struct, on a signal-specific basis.
             */
             
            #define SI_USER         0       /* signal sent by another process with kill */
            #define SI_UNDEFINED    8       /* siginfo_t contains partial information   */
            #define SI_EMPTY        9       /* siginfo_t contains no useful information */
            #define SI_QUEUE        71
            #define SI_TIMER        72
            #define SI_ASYNCIO      73
            #define SI_MESGQ        74
            

            As I know, there are no differences user and kernel SIGHUP.

            EGuesnet Etienne Guesnet added a comment - The code are /* * the following are the code macros that are stored in the si_code * element of the siginfo_t struct, on a signal-specific basis. */   #define SI_USER 0 /* signal sent by another process with kill */ #define SI_UNDEFINED 8 /* siginfo_t contains partial information */ #define SI_EMPTY 9 /* siginfo_t contains no useful information */ #define SI_QUEUE 71 #define SI_TIMER 72 #define SI_ASYNCIO 73 #define SI_MESGQ 74 As I know, there are no differences user and kernel SIGHUP.
            serg Sergei Golubchik added a comment - - edited

            Okay, then, indeed, perhaps we'd better use setsid() instead.

            anel, to clarify the above, I mean:

            • revert my commit with SI_KERNEL
            • push into buildbot to verify that installation on Debian started failing again
            • implement setsid() fix
            • verify that installation on Debian no longer fails
            serg Sergei Golubchik added a comment - - edited Okay, then, indeed, perhaps we'd better use setsid() instead. anel , to clarify the above, I mean: revert my commit with SI_KERNEL push into buildbot to verify that installation on Debian started failing again implement setsid() fix verify that installation on Debian no longer fails
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Anel Husakovic [ anel ]
            Status In Review [ 10002 ] Stalled [ 10000 ]

            Note it may happen so that calling setsid() has to be conditional. It may have to go inline with setting up SIGHUP signal handler. E.g. "debug_gdb" and "bootstrap" modes may have to stay attached to controlling terminal, probably there're more modes to consider.

            svoj Sergey Vojtovich added a comment - Note it may happen so that calling setsid() has to be conditional. It may have to go inline with setting up SIGHUP signal handler. E.g. "debug_gdb" and "bootstrap" modes may have to stay attached to controlling terminal, probably there're more modes to consider.
            anel Anel Husakovic made changes -
            Assignee Anel Husakovic [ anel ] Robert Bindar [ robertbindar ]
            danblack Daniel Black added a comment -

            The solution was here all along:

            `<= SI_USER` is equivalent to `!= SI_KERNEL`. Works for Linux, AIX, Solaris and FreeBSD.

            For completeness - FreeBSD-12.1

            #if __POSIX_VISIBLE || __XSI_VISIBLE
            #define SI_NOINFO       0               /* No signal info besides si_signo. */
            #define SI_USER         0x10001         /* Signal sent by kill(). */
            #define SI_QUEUE        0x10002         /* Signal sent by the sigqueue(). */
            #define SI_TIMER        0x10003         /* Signal generated by expiration of */
                                                    /* a timer set by timer_settime(). */
            #define SI_ASYNCIO      0x10004         /* Signal generated by completion of */
                                                    /* an asynchronous I/O request.*/
            #define SI_MESGQ        0x10005         /* Signal generated by arrival of a */
                                                    /* message on an empty message queue. */
            #define SI_KERNEL       0x10006
            #define SI_LWP          0x10007         /* Signal sent by thr_kill */
            #endif
            
            

            PR ready to go: https://github.com/MariaDB/server/pull/1392

            EGuesnet has tested per https://github.com/MariaDB/server/pull/1357#issuecomment-610341999

            danblack Daniel Black added a comment - The solution was here all along: `<= SI_USER` is equivalent to `!= SI_KERNEL`. Works for Linux, AIX, Solaris and FreeBSD. For completeness - FreeBSD-12.1 #if __POSIX_VISIBLE || __XSI_VISIBLE #define SI_NOINFO 0 /* No signal info besides si_signo. */ #define SI_USER 0x10001 /* Signal sent by kill(). */ #define SI_QUEUE 0x10002 /* Signal sent by the sigqueue(). */ #define SI_TIMER 0x10003 /* Signal generated by expiration of */ /* a timer set by timer_settime(). */ #define SI_ASYNCIO 0x10004 /* Signal generated by completion of */ /* an asynchronous I/O request.*/ #define SI_MESGQ 0x10005 /* Signal generated by arrival of a */ /* message on an empty message queue. */ #define SI_KERNEL 0x10006 #define SI_LWP 0x10007 /* Signal sent by thr_kill */ #endif PR ready to go: https://github.com/MariaDB/server/pull/1392 EGuesnet has tested per https://github.com/MariaDB/server/pull/1357#issuecomment-610341999
            danblack Daniel Black added a comment -

            Or if absolute existing compatibility is required (as some FreeBSD make the substitution less obvious):

            #ifdef SI_KERNEL
                  if (!abort_loop && origin != SI_KERNEL)
            #else
                  if (!abort_loop && origin <= SI_USER)
            #endif
            

            danblack Daniel Black added a comment - Or if absolute existing compatibility is required (as some FreeBSD make the substitution less obvious): #ifdef SI_KERNEL if (!abort_loop && origin != SI_KERNEL) #else if (!abort_loop && origin <= SI_USER) #endif
            svoj Sergey Vojtovich added a comment - Competing fix: https://github.com/MariaDB/server/commit/6b032039cc2c50167558e805b52a13decd9fb52b
            danblack Daniel Black added a comment -

            svoj,

            Nice reference https://www.freedesktop.org/software/systemd/man/daemon.html!

            Note: https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingLaunchdJobs.html "Do not call setsid to create a new session.", maybe not applicable - I don't know how MariaDB works on OSX.

            danblack Daniel Black added a comment - svoj , Nice reference https://www.freedesktop.org/software/systemd/man/daemon.html ! Note: https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingLaunchdJobs.html "Do not call setsid to create a new session.", maybe not applicable - I don't know how MariaDB works on OSX.

            One doesn't need to call setsid() under systemd either, because it is already called. But it must be harmless: it should be no-op in this case.

            svoj Sergey Vojtovich added a comment - One doesn't need to call setsid() under systemd either, because it is already called. But it must be harmless: it should be no-op in this case.
            danblack Daniel Black added a comment -

            Great. Can it be merged?

            danblack Daniel Black added a comment - Great. Can it be merged?
            robertbindar Robert Bindar added a comment -

            I reviewed svoj's patch a while ago and IIRC the only concern I had at that time was whether or not `setsid()` would affect the distribution of signals, and potentially break parts of the server or multi-process setups such as Galera or Debian scripts or even external unknown-to-us scripts that rely on signals happening in a particular way.

            Svoj argued that windows and embedded are not affected and that systemd/launchd should work.

            The question is how do we test the patch to make sure at least scripts we can control don't break?
            Elena suggested that testing this is "as whitebox as it gets" and that, in terms of external scripts we don't control, we should also think if we want to guarantee any specific behavior upon sending any specific signals.

            Can danblack or serg help with any suggestions on this?

            robertbindar Robert Bindar added a comment - I reviewed svoj's patch a while ago and IIRC the only concern I had at that time was whether or not `setsid()` would affect the distribution of signals, and potentially break parts of the server or multi-process setups such as Galera or Debian scripts or even external unknown-to-us scripts that rely on signals happening in a particular way. Svoj argued that windows and embedded are not affected and that systemd/launchd should work. The question is how do we test the patch to make sure at least scripts we can control don't break? Elena suggested that testing this is "as whitebox as it gets" and that, in terms of external scripts we don't control, we should also think if we want to guarantee any specific behavior upon sending any specific signals. Can danblack or serg help with any suggestions on this?
            danblack Daniel Black added a comment -

            serg's test plan from comment 2019-08-29

            Test 2 from 1st paragraph of original commit 07e9b1389857, run mysqld in terminal in foreground. gdb attach from different terminal. Kill terminal. Trace generated SIGHUP and if it results in a reload. Add setsid patch. Observe behaviour.

            danblack Daniel Black added a comment - serg 's test plan from comment 2019-08-29 Test 2 from 1st paragraph of original commit 07e9b1389857, run mysqld in terminal in foreground. gdb attach from different terminal. Kill terminal. Trace generated SIGHUP and if it results in a reload. Add setsid patch. Observe behaviour.
            robertbindar Robert Bindar added a comment -

            Hi danblack, sorry for the delay. Sure, but manual test will only conclude that SIGHUP is properly handled and that Sergei's bug doesn't reproduce anymore. But I don't see how it addresses the concerns Svoj, Elena and Robert pointed out, setsid might have side effects in some weird corner cases. It is also true that I don't have a solution on how to test this myself properly unless some more experienced dev helps me out. So if serg is ok with taking the risk to manually test this like he suggested, push the patch and address (if any) bugs that are discovered at some later point, feel free to push.

            robertbindar Robert Bindar added a comment - Hi danblack , sorry for the delay. Sure, but manual test will only conclude that SIGHUP is properly handled and that Sergei's bug doesn't reproduce anymore. But I don't see how it addresses the concerns Svoj, Elena and Robert pointed out, setsid might have side effects in some weird corner cases. It is also true that I don't have a solution on how to test this myself properly unless some more experienced dev helps me out. So if serg is ok with taking the risk to manually test this like he suggested, push the patch and address (if any) bugs that are discovered at some later point, feel free to push.
            danblack Daniel Black added a comment -

            By Elena's quote above, suggests that it's up to us to define if there is expected signal behaviour. By your concerns you seem to think we should conform to something regarding signal handling. If these two are compatible correct statements its up to you to examine what behaviour existing scripts, just a minimum of galera and debian packaging expect. Given these are bash scripts the signal handling is usually fairly explicit. Come up with an expected signal behaviour.

            "debug_gdb" mode, I assume this is a `mtr --gdb`, have you tested this? --bootstrap AFAIK doesn't depend on any signalling or session information, just that stdin contains the bootstrap information.

            danblack Daniel Black added a comment - By Elena's quote above, suggests that it's up to us to define if there is expected signal behaviour. By your concerns you seem to think we should conform to something regarding signal handling. If these two are compatible correct statements its up to you to examine what behaviour existing scripts, just a minimum of galera and debian packaging expect. Given these are bash scripts the signal handling is usually fairly explicit. Come up with an expected signal behaviour. "debug_gdb" mode, I assume this is a `mtr --gdb`, have you tested this? --bootstrap AFAIK doesn't depend on any signalling or session information, just that stdin contains the bootstrap information.
            danblack Daniel Black made changes -
            Priority Major [ 3 ] Blocker [ 1 ]
            danblack Daniel Black added a comment - A work around commit was finally added in 10.4 - https://github.com/MariaDB/server/commit/f69c1c9dcb815d7597ec2035470a81ac3b6c9380
            danblack Daniel Black made changes -
            Priority Blocker [ 1 ] Major [ 3 ]
            cvicentiu Vicențiu Ciorbaru made changes -
            Fix Version/s 10.6.0 [ 24431 ]
            Fix Version/s 10.5.10 [ 25204 ]
            Fix Version/s 10.4.19 [ 25205 ]
            Fix Version/s 10.4 [ 22408 ]
            Assignee Robert Bindar [ robertbindar ] Daniel Black [ danblack ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 96799 ] MariaDB v4 [ 156214 ]

            People

              danblack Daniel Black
              anel Anel Husakovic
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.