[MDEV-29568] libelf (specificly libdw) based stack resolver Created: 2022-09-19  Updated: 2024-02-07

Status: Open
Project: MariaDB Server
Component/s: Server
Fix Version/s: None

Type: Task Priority: Major
Reporter: Daniel Black Assignee: Daniel Black
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-30861 addr2line can't find debug info files... Open
relates to MDBF-261 Provide debuginfod service Open
relates to MDEV-6479 stack traces in 10.1 Closed
relates to MDEV-20738 my_addr_resolve passes invalid offset... Open

 Description   

Part of libelf, though packages as libdw (at least in debian). This is the newer version on bfd for stack resolution.

From source; https://sourceware.org/elfutils/, "The libraries and backends are dual GPLv2+/LGPLv3+. The utilities are GPLv3+", so staying with the libraries will mean there isn't non-distributable option.

init

   static char *debuginfo_path;
  static const Dwfl_Callbacks proc_callbacks =
      {
      .find_debuginfo = dwfl_standard_find_debuginfo,
        .debuginfo_path = &debuginfo_path,
  
      .find_elf = dwfl_linux_proc_find_elf,
      };
  Dwfl *dwfl = dwfl_begin (&proc_callbacks);
  if (dwfl == NULL)
    error (2, 0, "dwfl_begin: %s", dwfl_errmsg (-1));

  int result = dwfl_linux_proc_report (dwfl, pid); // prints maps
  result = dwfl_linux_proc_attach (dwfl, pid, true); // other option?, there where thread based attaches

Resolution; example addr2line.

Dwfl_Module *mod = dwfl_addrmodule (dwfl, addr);
 Dwfl_Line *line = dwfl_module_getsrc (mod, addr);
src = dwfl_lineinfo (line, &addr, &lineno, &linecol,   NULL, NULL)

Using libelf, this integrates with debuginfod (when we run a service), or just rely on distro.



 Comments   
Comment by Sergei Golubchik [ 2022-09-22 ]

why?

Comment by Daniel Black [ 2022-09-23 ]

addr2line InnoDB stack tracks in bug reports (MDEV-29536, MDEV-28312,MDEV-28440 and others) are still massively broken across multiple distros. MDEV-16194 hints the offset isn't needed, though need to test that, so fixing that would solve the bigger part of the problem.

The use of bfd isn't there because HAVE_BFD_H is under NOT_FOR_DISTRIBUTION so its not really useful (except for bb which tends to generate good traces).

Libdw being GPL2+ is actually distributable and has some hope of providing decent resolution. Being integrated with debuginfod it could pull down a distro (or ours, if we provided a service) debuginfo symbols in the process of creating the stacktrace.

Comment by Sergei Golubchik [ 2022-09-26 ]

do you think that broken stack traces are caused by addr2line working (or being used) incorrectly? And that a different resolver would fix it?

Do you have any proofs of that? For example, does mariadbd linked with libbfd (yes, not distributable) resolve stacks better? Can you show any case where mariadb+libbfd resolves stacks while mariadb+addr2line fails to?

Comment by Marko Mäkelä [ 2024-02-07 ]

man 2 open on Linux mentions the following:

O_DIRECT I/Os should never be run concurrently with the fork(2) system call, if the memory buffer is a private mapping (i.e., any mapping created with the mmap(2) MAP_PRIVATE flag; this includes memory allocated on the heap and statically allocated buffers). Any such I/Os, whether submitted via an asynchronous I/O interface or from another thread in the process, should be completed before {{fork(2}) is called. Failure to do so can result in data corruption and undefined behavior in parent and child processes.

Starting with MDEV-24854 we enable O_DIRECT I/O by default. Also the built-in stack trace reporter, which invokes fork(2), is enabled by default. Thus, we seem to enable potential data corruption by default.

Generated at Thu Feb 08 10:09:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.