Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-163

Improvements to PMP

    XMLWordPrintable

Details

    • Task
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • N/A
    • None

    Description

      PMP, aka poor-mans-profiler, is a very good tool for understanding bottlenecks
      in the server. It is especially good at detecting mutex contention, as it
      profiles all threads. Most other profiling tools (oprofile, perf) only
      profiles running threads, so are mostly useful to detect bottlenecks in CPU
      usage, which is getting less and less relevant as number of cores grows while
      software scalability struggles to keep up.

      However, standard PMP based on attaching gdb and obtaining stack traces that
      way suffers from performance problem. Gdb is not optimised for this scenario
      and holds the target process suspended for longer than necessary for PMP,
      causing server stalls (can be several seconds or more), which limits its use
      in a production environment.

      This task is about implementing a PMP tool that provides stack traces in a
      more efficient way, to make it less intrusive on the server being profiled and
      thereby allowing it to be used in more cases. It is based on a prototype by
      knielsen: https://github.com/knielsen/knielsen-pmp. The goal is to be able to
      obtain stack traces at a rate of around 1 millisecond per thread in the target
      process.

      The idea is to write a stand-alone C++ program that uses ptrace() to attach to
      the threads of the running server, then uses libunwind to obtain stack
      traces. Some effort will be spent to minimise the time that the target process
      is held suspended under ptrace():

      • During ptrace() we will only obtain the raw stacktraces (list of
        instruction pointers). Resolving of symbols can take place after releasing
        the target process.
      • Libunwind allows to provide our own accessor methods for reading data from
        the target process. We will use this to read more efficiently. Rather than
        using ptrace(), which costs one system call for every word read, we will
        use pread() from /proc/pid/mem in pages of 4k size; this allows to read
        multiple words in a single system call. Additionally, we will cache reads,
        so that reading same words in multiple stack traces requires only one
        physical read. Read-only maps in the target process (ie. executable or
        library images) can even be cached between different profiling
        measurements.
      • If necessary (stage 2), we can look into improving libunwind to be even
        faster. Initial studies indicate that it does a number of repeated mmap()s
        of /proc/pid/maps which could be greatly improved by caching the data.

      A second goal is to provide an easy user interface, in the form of a single
      executable program. Eg. this could be statically linked to allow usage
      directly on a production server without requiring installation of Gdb or
      Perl or other dependencies:

      • The default operation could be a bit like top: a continuous updated display
        listing obtained stack traces along with the percentage of total. This
        allows to simply start the tool and see directly the top contenders.
      • With option (and perhaps if stdout is not a tty), we can instead output the
        raw stacktrace data for further analysis by external scripts (just like
        traditional PMP using GDB).

      [See also email discussion from early December 2011]

      Attachments

        Activity

          People

            knielsen Kristian Nielsen
            ratzpo Rasmus Johansson (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.