Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33661

LeakSanitizer: detected memory leaks on spider/odbc/pg suite

Details

    Description

      28049e0d5b26528175051c8deed96b380518b9a6

      > ./mysql-test/mtr spider/odbc/pg.basic_sql
      ....
      MariaDB Version 10.4.26-17-MariaDB-debug
       - SSL connections supported
       - binaries are debug compiled
       - binaries built with wsrep patch
      Collecting tests...
      Installing system database...
       
      ==============================================================================
       
      TEST                                      RESULT   TIME (ms) or COMMENT
      --------------------------------------------------------------------------
       
      worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
      spider/odbc/pg.basic_sql                 [ pass ]   3185
      ***Warnings generated in error logs during shutdown after running tests: spider/odbc/pg.basic_sql
       
      ==148932==ERROR: LeakSanitizer: detected memory leaks
      SUMMARY: AddressSanitizer: 7646314 byte(s) leaked in 3204 allocation(s).
       
      --------------------------------------------------------------------------
      The servers were restarted 0 times
      Spent 3.185 of 8 seconds executing testcases
       
      Warnings in log: All 1 tests were successful.
       
      Errors/warnings were found in logfiles during server shutdown after running the
      following sequence(s) of tests:
          spider/odbc/pg.basic_sql
      mysql-test-run: *** ERROR: There where errors/warnings in server logs after running test cases.
      

      Any other MTR test case in the spider/odbc/pg suite results in memory leaks.

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment - - edited

            I was able to eliminate more leaks, but to get rid of all of them is rather tricky without a proper stack. See the commit

            upstream/10.5-enterprise-ment-1591-unleak-some 45f4885b3e4a5d72bb7fb7e822eca60e908fe7ac
            MENT-1591 [demo] reduce some leaking.
             
            Take spider/odbc/mariadb.basic_sql for example:
             
            Before this change:
             
            SUMMARY: AddressSanitizer: 9496362 byte(s) leaked in 4532 allocation(s).
             
            After adding the first hunk (calling SQLFreeHandle()):
             
            SUMMARY: AddressSanitizer: 687962 byte(s) leaked in 3678 allocation(s).
             
            After removing the dbug_off code:
             
            SUMMARY: AddressSanitizer: 604162 byte(s) leaked in 708 allocation(s).
             
            Unfortunately without a better stack reported from leaksanitizer it is
            hard to figure out where the remaining leaks come from.
            

            I found a slack thread about similar leaks in CONNECT, and applying the UNIQUE part of the patch of MDEV-29579 (thanks to Johnston for pointing me to the commit), the leak all disappeared, even without applying the changes in the above commit. So it seems that libodbc does its own cleanup of allocated SQLHandle objects somehow.

            upstream/10.5-enterprise-ment-1591 1cd637149dfdde385006d8b5801859cd3aa68d78
            MENT-1591 Keep spider in memory until exit in ASAN builds
             
            Same as MDEV-29579. For some reason, libodbc does not clean up
            properly if unloaded too early with the dlclose() of spider. So we add
            UNIQUE symbols to spider so the spider does not reload in dlclose().
            

            However, with this change we get failures in the three ha_part tests in asan builds, see for example

            https://buildbot.mariadb.org/#/builders/588/builds/3355

            And the failure do not occur when running these tests separately / sequentially, but they do reproduce when run locally in parallel with other spider tests.

            I also added some trivial leak in spider, to see whether it gets masked by the above change, and the answer is no, the leak is still reported:

            upstream/10.5-enterprise-ment-1591-trivial-leak 337c253c5a30acfb138edfc45059192fd7a2c3b6
            MENT-1591 [demo] Leaks still detected after adding the UNIQUE symbols
             
            With the change that adds the UNIQUE symbols (see the parent commit
            fd3ccb216baba01adfd3fe36fc10713c94027452), leaks are still detected.
            This suggests libodbc does its own cleanup of all allocated SQLHandles
            etc.
            

            So this fix looks good so far, except the CI failures mentioned above.

            ycp Yuchen Pei added a comment - - edited I was able to eliminate more leaks, but to get rid of all of them is rather tricky without a proper stack. See the commit upstream/10.5-enterprise-ment-1591-unleak-some 45f4885b3e4a5d72bb7fb7e822eca60e908fe7ac MENT-1591 [demo] reduce some leaking.   Take spider/odbc/mariadb.basic_sql for example:   Before this change:   SUMMARY: AddressSanitizer: 9496362 byte(s) leaked in 4532 allocation(s).   After adding the first hunk (calling SQLFreeHandle()):   SUMMARY: AddressSanitizer: 687962 byte(s) leaked in 3678 allocation(s).   After removing the dbug_off code:   SUMMARY: AddressSanitizer: 604162 byte(s) leaked in 708 allocation(s).   Unfortunately without a better stack reported from leaksanitizer it is hard to figure out where the remaining leaks come from. I found a slack thread about similar leaks in CONNECT, and applying the UNIQUE part of the patch of MDEV-29579 (thanks to Johnston for pointing me to the commit), the leak all disappeared, even without applying the changes in the above commit. So it seems that libodbc does its own cleanup of allocated SQLHandle objects somehow. upstream/10.5-enterprise-ment-1591 1cd637149dfdde385006d8b5801859cd3aa68d78 MENT-1591 Keep spider in memory until exit in ASAN builds   Same as MDEV-29579. For some reason, libodbc does not clean up properly if unloaded too early with the dlclose() of spider. So we add UNIQUE symbols to spider so the spider does not reload in dlclose(). However, with this change we get failures in the three ha_part tests in asan builds, see for example https://buildbot.mariadb.org/#/builders/588/builds/3355 And the failure do not occur when running these tests separately / sequentially, but they do reproduce when run locally in parallel with other spider tests. I also added some trivial leak in spider, to see whether it gets masked by the above change, and the answer is no, the leak is still reported: upstream/10.5-enterprise-ment-1591-trivial-leak 337c253c5a30acfb138edfc45059192fd7a2c3b6 MENT-1591 [demo] Leaks still detected after adding the UNIQUE symbols   With the change that adds the UNIQUE symbols (see the parent commit fd3ccb216baba01adfd3fe36fc10713c94027452), leaks are still detected. This suggests libodbc does its own cleanup of all allocated SQLHandles etc. So this fix looks good so far, except the CI failures mentioned above.
            ycp Yuchen Pei added a comment -

            I have a patch ready for review. But I'd like to add some code documentation to the spider monitoring system, since it is poorly documented and I learned some aspects of it when debugging the regression in the ha_part tests.

            upstream/10.5-enterprise-ment-1591 94208f90d4efde9329dc6f06dda994ed12d82571
            MENT-1591 Keep spider in memory until exit in ASAN builds
             
            Same as MDEV-29579. For some reason, libodbc does not clean up
            properly if unloaded too early with the dlclose() of spider. So we add
            UNIQUE symbols to spider so the spider does not reload in dlclose().
             
            This change, however, uncovers some hidden problems in the spider
            codebase, for which we move the initialisation of some spider global
            variables into the initialisation of spider itself.
             
            Spider has some global variables. Their initialisation should be done
            in the initialisation of spider itself, otherwise, if spider were
            re-initialised without these symbol being unloaded, the values could
            be inconsistent and causing issues.
             
            One such issue is caused by the variables
            spider_mon_table_cache_version and spider_mon_table_cache_version_req.
            They are used for resetting the spider monitoring table cache and have
            initial values of 0 and 1 respectively. We have that always
            spider_mon_table_cache_version_req >= spider_mon_table_cache_version,
            and when the relation is strict, the cache is reset,
            spider_mon_table_cache_version is brought to be equal to
            spider_mon_table_cache_version_req, and the cache is searched for
            matching table_name, db_name and link_idx. If the relation is equal,
            no reset would happen and the cache would be searched directly.
             
            When spider is re-inited without resetting the values of
            spider_mon_table_cache_version and spider_mon_table_cache_version_req
            that were set to be equal in the previous cache reset action, the
            cache was emptied in the previous spider deinit, which would result in
            HA_ERR_KEY_NOT_FOUND unexpectedly.
             
            An alternative way to fix this issue would be to call the spider udf
            spider_flush_mon_cache_table(), which increments
            spider_mon_table_cache_version_req thus making sure the inequality is
            strict. However, there's no reason for spider to initialise these
            global variables on dlopen(), rather than on spider init, which is
            cleaner and "purer".
             
            To reproduce this issue, simply revert the changes involving the two
            variables and then run:
             
            mtr --no-reorder spider.ha{,_part}
             
            Finally, we also fix spider/bugfix.mdev_28856 because of MDEV-29718.
            

            ycp Yuchen Pei added a comment - I have a patch ready for review. But I'd like to add some code documentation to the spider monitoring system, since it is poorly documented and I learned some aspects of it when debugging the regression in the ha_part tests. upstream/10.5-enterprise-ment-1591 94208f90d4efde9329dc6f06dda994ed12d82571 MENT-1591 Keep spider in memory until exit in ASAN builds   Same as MDEV-29579. For some reason, libodbc does not clean up properly if unloaded too early with the dlclose() of spider. So we add UNIQUE symbols to spider so the spider does not reload in dlclose().   This change, however, uncovers some hidden problems in the spider codebase, for which we move the initialisation of some spider global variables into the initialisation of spider itself.   Spider has some global variables. Their initialisation should be done in the initialisation of spider itself, otherwise, if spider were re-initialised without these symbol being unloaded, the values could be inconsistent and causing issues.   One such issue is caused by the variables spider_mon_table_cache_version and spider_mon_table_cache_version_req. They are used for resetting the spider monitoring table cache and have initial values of 0 and 1 respectively. We have that always spider_mon_table_cache_version_req >= spider_mon_table_cache_version, and when the relation is strict, the cache is reset, spider_mon_table_cache_version is brought to be equal to spider_mon_table_cache_version_req, and the cache is searched for matching table_name, db_name and link_idx. If the relation is equal, no reset would happen and the cache would be searched directly.   When spider is re-inited without resetting the values of spider_mon_table_cache_version and spider_mon_table_cache_version_req that were set to be equal in the previous cache reset action, the cache was emptied in the previous spider deinit, which would result in HA_ERR_KEY_NOT_FOUND unexpectedly.   An alternative way to fix this issue would be to call the spider udf spider_flush_mon_cache_table(), which increments spider_mon_table_cache_version_req thus making sure the inequality is strict. However, there's no reason for spider to initialise these global variables on dlopen(), rather than on spider init, which is cleaner and "purer".   To reproduce this issue, simply revert the changes involving the two variables and then run:   mtr --no-reorder spider.ha{,_part}   Finally, we also fix spider/bugfix.mdev_28856 because of MDEV-29718.
            ycp Yuchen Pei added a comment -

            Hi holyfoot, ptal thanks

            3247c401730 upstream/bb-10.5-ycp MDEV-33661 MENT-1591 Keep spider in memory until exit in ASAN builds
            b61f9d934e3 MDEV-33661 MENT-1591 Fix spider/bugfix.mdev_28856 because of MDEV-29718.
            159ba37b6f2 MDEV-33661 MENT-1591 Documenting spider_mon_table_cache and friends.
            

            This is based on 10.5. The top commit is the actual fix. It is not important enough to push to 10.4 which is locked.

            The bug was moved from MENT-1591, and only appears in ES with ASAN. Will cherry pick to 10.4 ES.

            Here are the 10.4-es version of the commits

            1bffd0524fb upstream/10.4-enterprise-ment-1591 MENT-1591 Keep spider in memory until exit in ASAN builds
            35810fc5e86 MENT-1591 Fix spider/bugfix.mdev_28856 because of MDEV-29718.
            180c03775d5 MENT-1591 Documenting spider_mon_table_cache and friends.
            

            ycp Yuchen Pei added a comment - Hi holyfoot , ptal thanks 3247c401730 upstream/bb-10.5-ycp MDEV-33661 MENT-1591 Keep spider in memory until exit in ASAN builds b61f9d934e3 MDEV-33661 MENT-1591 Fix spider/bugfix.mdev_28856 because of MDEV-29718. 159ba37b6f2 MDEV-33661 MENT-1591 Documenting spider_mon_table_cache and friends. This is based on 10.5. The top commit is the actual fix. It is not important enough to push to 10.4 which is locked. The bug was moved from MENT-1591 , and only appears in ES with ASAN. Will cherry pick to 10.4 ES. Here are the 10.4-es version of the commits 1bffd0524fb upstream/10.4-enterprise-ment-1591 MENT-1591 Keep spider in memory until exit in ASAN builds 35810fc5e86 MENT-1591 Fix spider/bugfix.mdev_28856 because of MDEV-29718. 180c03775d5 MENT-1591 Documenting spider_mon_table_cache and friends.

            ok to push.
            Didn't know the attribute ((used)) works like that in the library.

            holyfoot Alexey Botchkov added a comment - ok to push. Didn't know the attribute ((used)) works like that in the library.
            ycp Yuchen Pei added a comment - - edited

            Jenkins is back - pushed the following to 10.4/5/6-ES:

            # 10.4
            de20e6fa0ee upstream/10.4-enterprise-mdev-33661 upstream/10.4-enterprise MDEV-33661 MENT-1591 Keep spider in memory until exit in ASAN builds
            6a7b50f39ab MDEV-33661 MENT-1591 Fix spider/bugfix.mdev_28856 because of MDEV-29718.
            760286d5e61 MDEV-33661 MENT-1591 Documenting spider_mon_table_cache and friends.
            # 10.5
            6cc345d9b21 upstream/10.5-enterprise-mdev-33661 upstream/10.5-enterprise MDEV-33661 MENT-1591 Keep spider in memory until exit in ASAN builds
            e13f394796d MDEV-33661 MENT-1591 Fix spider/bugfix.mdev_28856 because of MDEV-29718.
            f0ee2f0543e MDEV-33661 MENT-1591 Documenting spider_mon_table_cache and friends.
            # 10.6
            9ce1fd5f1f7 upstream/10.6-enterprise-mdev-33661 upstream/10.6-enterprise MDEV-33661 MENT-1591 Keep spider in memory until exit in ASAN builds
            a5d2fc2c1a9 MDEV-33661 MENT-1591 Fix spider/bugfix.mdev_28856 because of MDEV-29718.
            27ef9229a23 MDEV-33661 MENT-1591 Documenting spider_mon_table_cache and friends.
            

            Also pushed the top commit 662bb176b412993a085fe329af559ddc3dc83ec3 to CS 10.4, now that the ES pushes are ok in jenkins:

            ycp Yuchen Pei added a comment - - edited Jenkins is back - pushed the following to 10.4/5/6-ES: # 10.4 de20e6fa0ee upstream/10.4-enterprise-mdev-33661 upstream/10.4-enterprise MDEV-33661 MENT-1591 Keep spider in memory until exit in ASAN builds 6a7b50f39ab MDEV-33661 MENT-1591 Fix spider/bugfix.mdev_28856 because of MDEV-29718. 760286d5e61 MDEV-33661 MENT-1591 Documenting spider_mon_table_cache and friends. # 10.5 6cc345d9b21 upstream/10.5-enterprise-mdev-33661 upstream/10.5-enterprise MDEV-33661 MENT-1591 Keep spider in memory until exit in ASAN builds e13f394796d MDEV-33661 MENT-1591 Fix spider/bugfix.mdev_28856 because of MDEV-29718. f0ee2f0543e MDEV-33661 MENT-1591 Documenting spider_mon_table_cache and friends. # 10.6 9ce1fd5f1f7 upstream/10.6-enterprise-mdev-33661 upstream/10.6-enterprise MDEV-33661 MENT-1591 Keep spider in memory until exit in ASAN builds a5d2fc2c1a9 MDEV-33661 MENT-1591 Fix spider/bugfix.mdev_28856 because of MDEV-29718. 27ef9229a23 MDEV-33661 MENT-1591 Documenting spider_mon_table_cache and friends. Also pushed the top commit 662bb176b412993a085fe329af559ddc3dc83ec3 to CS 10.4, now that the ES pushes are ok in jenkins:

            People

              ycp Yuchen Pei
              nayuta-yanagisawa Nayuta Yanagisawa (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.