Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29568

libelf (specificly libdw) based stack resolver

Details

    Description

      Part of libelf, though packages as libdw (at least in debian). This is the newer version on bfd for stack resolution.

      From source; https://sourceware.org/elfutils/, "The libraries and backends are dual GPLv2+/LGPLv3+. The utilities are GPLv3+", so staying with the libraries will mean there isn't non-distributable option.

      init

         static char *debuginfo_path;
        static const Dwfl_Callbacks proc_callbacks =
            {
            .find_debuginfo = dwfl_standard_find_debuginfo,
              .debuginfo_path = &debuginfo_path,
        
            .find_elf = dwfl_linux_proc_find_elf,
            };
        Dwfl *dwfl = dwfl_begin (&proc_callbacks);
        if (dwfl == NULL)
          error (2, 0, "dwfl_begin: %s", dwfl_errmsg (-1));
      

        int result = dwfl_linux_proc_report (dwfl, pid); // prints maps
        result = dwfl_linux_proc_attach (dwfl, pid, true); // other option?, there where thread based attaches
      

      Resolution; example addr2line.

      Dwfl_Module *mod = dwfl_addrmodule (dwfl, addr);
       Dwfl_Line *line = dwfl_module_getsrc (mod, addr);
      src = dwfl_lineinfo (line, &addr, &lineno, &linecol,   NULL, NULL)
      

      Using libelf, this integrates with debuginfod (when we run a service), or just rely on distro.

      Attachments

        Issue Links

          Activity

            why?

            serg Sergei Golubchik added a comment - why?
            danblack Daniel Black added a comment -

            addr2line InnoDB stack tracks in bug reports (MDEV-29536, MDEV-28312,MDEV-28440 and others) are still massively broken across multiple distros. MDEV-16194 hints the offset isn't needed, though need to test that, so fixing that would solve the bigger part of the problem.

            The use of bfd isn't there because HAVE_BFD_H is under NOT_FOR_DISTRIBUTION so its not really useful (except for bb which tends to generate good traces).

            Libdw being GPL2+ is actually distributable and has some hope of providing decent resolution. Being integrated with debuginfod it could pull down a distro (or ours, if we provided a service) debuginfo symbols in the process of creating the stacktrace.

            danblack Daniel Black added a comment - addr2line InnoDB stack tracks in bug reports ( MDEV-29536 , MDEV-28312 , MDEV-28440 and others) are still massively broken across multiple distros. MDEV-16194 hints the offset isn't needed, though need to test that, so fixing that would solve the bigger part of the problem. The use of bfd isn't there because HAVE_BFD_H is under NOT_FOR_DISTRIBUTION so its not really useful (except for bb which tends to generate good traces). Libdw being GPL2+ is actually distributable and has some hope of providing decent resolution. Being integrated with debuginfod it could pull down a distro (or ours, if we provided a service) debuginfo symbols in the process of creating the stacktrace.

            do you think that broken stack traces are caused by addr2line working (or being used) incorrectly? And that a different resolver would fix it?

            Do you have any proofs of that? For example, does mariadbd linked with libbfd (yes, not distributable) resolve stacks better? Can you show any case where mariadb+libbfd resolves stacks while mariadb+addr2line fails to?

            serg Sergei Golubchik added a comment - do you think that broken stack traces are caused by addr2line working (or being used) incorrectly? And that a different resolver would fix it? Do you have any proofs of that? For example, does mariadbd linked with libbfd (yes, not distributable) resolve stacks better? Can you show any case where mariadb+libbfd resolves stacks while mariadb+addr2line fails to?

            man 2 open on Linux mentions the following:

            O_DIRECT I/Os should never be run concurrently with the fork(2) system call, if the memory buffer is a private mapping (i.e., any mapping created with the mmap(2) MAP_PRIVATE flag; this includes memory allocated on the heap and statically allocated buffers). Any such I/Os, whether submitted via an asynchronous I/O interface or from another thread in the process, should be completed before {{fork(2}) is called. Failure to do so can result in data corruption and undefined behavior in parent and child processes.

            Starting with MDEV-24854 we enable O_DIRECT I/O by default. Also the built-in stack trace reporter, which invokes fork(2), is enabled by default. Thus, we seem to enable potential data corruption by default.

            marko Marko Mäkelä added a comment - man 2 open on Linux mentions the following: O_DIRECT I/Os should never be run concurrently with the fork(2) system call, if the memory buffer is a private mapping (i.e., any mapping created with the mmap(2) MAP_PRIVATE flag; this includes memory allocated on the heap and statically allocated buffers). Any such I/Os, whether submitted via an asynchronous I/O interface or from another thread in the process, should be completed before {{fork(2}) is called. Failure to do so can result in data corruption and undefined behavior in parent and child processes. Starting with MDEV-24854 we enable O_DIRECT I/O by default. Also the built-in stack trace reporter, which invokes fork(2) , is enabled by default. Thus, we seem to enable potential data corruption by default.
            serg Sergei Golubchik added a comment - - edited

            one reason could be not requiring users to install binutils to get resolved stack traces.

            also there're rumors (not proved) that fork-exec of addr2line causes a notable slowdown after a crash

            serg Sergei Golubchik added a comment - - edited one reason could be not requiring users to install binutils to get resolved stack traces. also there're rumors (not proved) that fork-exec of addr2line causes a notable slowdown after a crash

            I hope that implementing this would finally fix MDEV-21010.

            marko Marko Mäkelä added a comment - I hope that implementing this would finally fix MDEV-21010 .

            People

              danblack Daniel Black
              danblack Daniel Black
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.