[MDEV-6615] Incorrect barrier logic in mutex_exit path causes data corruption on AArch64 - Jira

Steve Capper created issue - 2014-08-20 15:53

Steve Capper made changes - 2014-08-20 15:57

Field	Original Value	New Value
Description	Hi, We've run into an issue with MariaDB when running Sysbench "oltp.lua" test with 8 threads. The server daemon crashed mostly with an assertion failure at storage/xtradb/fil/fil0fil.c:5288: ============================================== fil_node_complete_io( /=================/ fil_node_t* node, /!< in: file node / fil_system_t* system, /!< in: tablespace memory cache / ulint type) /!< in: OS_FILE_WRITE or OS_FILE_READ; marks the node as modified if type == OS_FILE_WRITE / { ut_ad(node); ut_ad(system); ut_ad(mutex_own(&(system->mutex))); ut_a(node->n_pending > 0); <-- failure point node->n_pending--; ============================================== An attached debugger gave the following backtrace: ============================================== (gdb) bt full #0 0x0000007fb1d44d18 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 _sys_result = 0 pd = 0x7fa2fff1a0 pid = <optimised out> selftid = 5661 #1 0x0000007fb1d4818c in __GI_abort () at abort.c:89 save_stage = 2 act = {__sigaction_handler = {sa_handler = 0x7f00000000, sa_sigaction = 0x7f00000000}, sa_mask = {__val = {548445445976, 404, 1, 404, 0, 366924161824, 548195526816, 366921716804, 366927454208, 3, 0, 548434850424, 366927869208, 366936283408, 548195527344, 548434850424}}, sa_flags = 5288, sa_restorer = 0xa2fff1a0} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x000000556e3d1448 in fil_node_complete_io (node=<optimised out>, system=<optimised out>, type=<optimised out>) at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5288 No locals. #3 0x000000556e3db800 in fil_aio_wait (segment=segment@entry=3) at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5705 ret = <optimised out> fil_node = 0x7fb14a0e78 message = 0x7fa54d4350 type = 10 space_id = 0 #4 0x000000556e3592a4 in io_handler_thread (arg=<optimised out>) at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/srv/srv0start.c:486 segment = 3 #5 0x0000007fb220ae2c in start_thread (arg=0x7fa2fff1a0) at pthread_create.c:314 pd = 0x7fa2fff1a0 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {548195529120, 548981639624, 548449456128, 0, 548449452032, 548195529312, 548195527344, 548443965168, 8388608, 548449472512, 548195527056, 13770210553321828185, 0, 13770210553602140361, 0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = { 0x0, 0x0, 0x7fb220ad7c <start_thread>, 0x7fa2fff1a0}, data = { prev = 0x0, cleanup = 0x0, canceltype = -1306481284}}} not_first_call = 0 pagesize_m1 = <optimised out> sp = <optimised out> freesize = <optimised out> __PRETTY_FUNCTION__ = "start_thread" #6 0x0000007fb1dd9c40 in clone () at ../ports/sysdeps/unix/sysv/linux/aarch64/nptl/../clone.S:96 No locals. ============================================== Once the daemon crashed we've sometimes been unable to start it again without wiping out the database and re-installing it. Having done some digging it is apparent that there is a problem in the mutex_exit code path; in particular at: http://bazaar.launchpad.net/~maria-captains/maria/5.5/view/head:/storage/xtradb/include/sync0sync.ic#L106 A load-acquire is used to exit the mutex rather than a store-release. This leads to unpredictable results for architectures with a weak memory model. We have the following in program order: mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed protected work mutex_exit -> load-acquire lock, set it to 0 regardless. However, the following sequence of events can be observed by another core: mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed some of the protected work mutex_exit -> load-acquire lock, set it to 0 regardless. some more of the protected work (not protected). The above can (and has for our test system) lead to severe data corruption; that prevents the daemon from even re-starting. I've attached an emergency patch that re-introduces __sync_lock_release to release the mutex. This fixes the crash and data corruption issues for me, but I understand from comments in the code that there were issues with this function in the past? Could the gcc intrinsics be moved over to the __atomic_* functions? Ideally: To acquire the lock: __atomic_exchange_n(ptr, (byte) new_val, __ATOMIC_ACQUIRE) To release the lock: __atomic_store_n(ptr, (byte) new_val, __ATOMIC_RELEASE) (which also worked on my test system). I believe this issue may affect other versions of MariaDB, but I've only tested 5.5.36. Cheers, -- Steve Capper	Hi, We've run into an issue with MariaDB when running Sysbench "oltp.lua" test with 8 threads. The server daemon crashed mostly with an assertion failure at storage/xtradb/fil/fil0fil.c:5288: ============================================== {noformat} fil_node_complete_io( /=================/ fil_node_t* node, /!< in: file node / fil_system_t* system, /!< in: tablespace memory cache / ulint type) /!< in: OS_FILE_WRITE or OS_FILE_READ; marks the node as modified if type == OS_FILE_WRITE / { ut_ad(node); ut_ad(system); ut_ad(mutex_own(&(system->mutex))); ut_a(node->n_pending > 0); <-- failure point node->n_pending--; {noformat} ============================================== An attached debugger gave the following backtrace: ============================================== {noformat} (gdb) bt full #0 0x0000007fb1d44d18 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 _sys_result = 0 pd = 0x7fa2fff1a0 pid = <optimised out> selftid = 5661 #1 0x0000007fb1d4818c in __GI_abort () at abort.c:89 save_stage = 2 act = {__sigaction_handler = {sa_handler = 0x7f00000000, sa_sigaction = 0x7f00000000}, sa_mask = {__val = {548445445976, 404, 1, 404, 0, 366924161824, 548195526816, 366921716804, 366927454208, 3, 0, 548434850424, 366927869208, 366936283408, 548195527344, 548434850424}}, sa_flags = 5288, sa_restorer = 0xa2fff1a0} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x000000556e3d1448 in fil_node_complete_io (node=<optimised out>, system=<optimised out>, type=<optimised out>) at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5288 No locals. #3 0x000000556e3db800 in fil_aio_wait (segment=segment@entry=3) at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5705 ret = <optimised out> fil_node = 0x7fb14a0e78 message = 0x7fa54d4350 type = 10 space_id = 0 #4 0x000000556e3592a4 in io_handler_thread (arg=<optimised out>) at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/srv/srv0start.c:486 segment = 3 #5 0x0000007fb220ae2c in start_thread (arg=0x7fa2fff1a0) at pthread_create.c:314 pd = 0x7fa2fff1a0 unwind_buf = {cancel_jmp_buf = {{jmp_buf = {548195529120, 548981639624, 548449456128, 0, 548449452032, 548195529312, 548195527344, 548443965168, 8388608, 548449472512, 548195527056, 13770210553321828185, 0, 13770210553602140361, 0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = { 0x0, 0x0, 0x7fb220ad7c <start_thread>, 0x7fa2fff1a0}, data = { prev = 0x0, cleanup = 0x0, canceltype = -1306481284}}} not_first_call = 0 pagesize_m1 = <optimised out> sp = <optimised out> freesize = <optimised out> __PRETTY_FUNCTION__ = "start_thread" #6 0x0000007fb1dd9c40 in clone () at ../ports/sysdeps/unix/sysv/linux/aarch64/nptl/../clone.S:96 No locals. {noformat} ============================================== Once the daemon crashed we've sometimes been unable to start it again without wiping out the database and re-installing it. Having done some digging it is apparent that there is a problem in the mutex_exit code path; in particular at: http://bazaar.launchpad.net/~maria-captains/maria/5.5/view/head:/storage/xtradb/include/sync0sync.ic#L106 A load-acquire is used to exit the mutex rather than a store-release. This leads to unpredictable results for architectures with a weak memory model. We have the following in program order: mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed protected work mutex_exit -> load-acquire lock, set it to 0 regardless. However, the following sequence of events can be observed by another core: mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed some of the protected work mutex_exit -> load-acquire lock, set it to 0 regardless. some more of the protected work (not protected). The above can (and has for our test system) lead to severe data corruption; that prevents the daemon from even re-starting. I've attached an emergency patch that re-introduces __sync_lock_release to release the mutex. This fixes the crash and data corruption issues for me, but I understand from comments in the code that there were issues with this function in the past? Could the gcc intrinsics be moved over to the __atomic_* functions? Ideally: To acquire the lock: __atomic_exchange_n(ptr, (byte) new_val, __ATOMIC_ACQUIRE) To release the lock: __atomic_store_n(ptr, (byte) new_val, __ATOMIC_RELEASE) (which also worked on my test system). I believe this issue may affect other versions of MariaDB, but I've only tested 5.5.36. Cheers, -- Steve Capper

Steve Capper made changes - 2014-08-20 15:58

Description

Hi,
We've run into an issue with MariaDB when running Sysbench "oltp.lua" test with 8 threads. The server daemon crashed mostly with an assertion failure at storage/xtradb/fil/fil0fil.c:5288:

==============================================

{noformat}
fil_node_complete_io(
/*=================*/
        fil_node_t* node, /*!< in: file node */
        fil_system_t* system, /*!< in: tablespace memory cache */
        ulint type) /*!< in: OS_FILE_WRITE or OS_FILE_READ; marks
                                the node as modified if
                                type == OS_FILE_WRITE */
{
        ut_ad(node);
        ut_ad(system);
        ut_ad(mutex_own(&(system->mutex)));

        ut_a(node->n_pending > 0); <-- failure point

        node->n_pending--;
{noformat}
==============================================

An attached debugger gave the following backtrace:

==============================================
{noformat}
(gdb) bt full
#0 0x0000007fb1d44d18 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        _sys_result = 0
        pd = 0x7fa2fff1a0
        pid = <optimised out>
        selftid = 5661
#1 0x0000007fb1d4818c in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7f00000000,
            sa_sigaction = 0x7f00000000}, sa_mask = {__val = {548445445976,
              404, 1, 404, 0, 366924161824, 548195526816, 366921716804,
              366927454208, 3, 0, 548434850424, 366927869208, 366936283408,
              548195527344, 548434850424}}, sa_flags = 5288,
          sa_restorer = 0xa2fff1a0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x000000556e3d1448 in fil_node_complete_io (node=<optimised out>,
    system=<optimised out>, type=<optimised out>)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5288
No locals.
#3 0x000000556e3db800 in fil_aio_wait (segment=segment@entry=3)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5705
        ret = <optimised out>
        fil_node = 0x7fb14a0e78
        message = 0x7fa54d4350
        type = 10
        space_id = 0
#4 0x000000556e3592a4 in io_handler_thread (arg=<optimised out>)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/srv/srv0start.c:486
        segment = 3
#5 0x0000007fb220ae2c in start_thread (arg=0x7fa2fff1a0)
    at pthread_create.c:314
        pd = 0x7fa2fff1a0
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {548195529120,
                548981639624, 548449456128, 0, 548449452032, 548195529312,
                548195527344, 548443965168, 8388608, 548449472512,
                548195527056, 13770210553321828185, 0, 13770210553602140361,
                0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x7fb220ad7c <start_thread>, 0x7fa2fff1a0}, data = {
              prev = 0x0, cleanup = 0x0, canceltype = -1306481284}}}
        not_first_call = 0
        pagesize_m1 = <optimised out>
        sp = <optimised out>
        freesize = <optimised out>
        __PRETTY_FUNCTION__ = "start_thread"
#6 0x0000007fb1dd9c40 in clone ()
    at ../ports/sysdeps/unix/sysv/linux/aarch64/nptl/../clone.S:96
No locals.
{noformat}
==============================================

Once the daemon crashed we've sometimes been unable to start it again without wiping out the database and re-installing it.

Having done some digging it is apparent that there is a problem in the mutex_exit code path; in particular at:
http://bazaar.launchpad.net/~maria-captains/maria/5.5/view/head:/storage/xtradb/include/sync0sync.ic#L106

A load-acquire is used to exit the mutex rather than a store-release. This leads to unpredictable results for architectures with a weak memory model.

We have the following in program order:

mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed
protected work
mutex_exit -> load-acquire lock, set it to 0 regardless.

However, the following sequence of events can be observed by another core:
mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed
some of the protected work
mutex_exit -> load-acquire lock, set it to 0 regardless.
some more of the protected work (not protected).

The above can (and has for our test system) lead to severe data corruption; that prevents the daemon from even re-starting.

I've attached an emergency patch that re-introduces __sync_lock_release to release the mutex. This fixes the crash and data corruption issues for me, but I understand from comments in the code that there were issues with this function in the past? Could the gcc intrinsics be moved over to the __atomic_* functions? Ideally:

To acquire the lock:
__atomic_exchange_n(ptr, (byte) new_val, __ATOMIC_ACQUIRE)

To release the lock:
__atomic_store_n(ptr, (byte) new_val, __ATOMIC_RELEASE)

(which also worked on my test system).

I believe this issue may affect other versions of MariaDB, but I've only tested 5.5.36.

Cheers,
--
Steve Capper

Hi,
We've run into an issue with MariaDB when running Sysbench "oltp.lua" test with 8 threads. The server daemon crashed mostly with an assertion failure at {{storage/xtradb/fil/fil0fil.c:5288}}:

{noformat}
fil_node_complete_io(
/*=================*/
        fil_node_t* node, /*!< in: file node */
        fil_system_t* system, /*!< in: tablespace memory cache */
        ulint type) /*!< in: OS_FILE_WRITE or OS_FILE_READ; marks
                                the node as modified if
                                type == OS_FILE_WRITE */
{
        ut_ad(node);
        ut_ad(system);
        ut_ad(mutex_own(&(system->mutex)));

        ut_a(node->n_pending > 0); <-- failure point

        node->n_pending--;
{noformat}

An attached debugger gave the following backtrace:

{noformat}
(gdb) bt full
#0 0x0000007fb1d44d18 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        _sys_result = 0
        pd = 0x7fa2fff1a0
        pid = <optimised out>
        selftid = 5661
#1 0x0000007fb1d4818c in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7f00000000,
            sa_sigaction = 0x7f00000000}, sa_mask = {__val = {548445445976,
              404, 1, 404, 0, 366924161824, 548195526816, 366921716804,
              366927454208, 3, 0, 548434850424, 366927869208, 366936283408,
              548195527344, 548434850424}}, sa_flags = 5288,
          sa_restorer = 0xa2fff1a0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x000000556e3d1448 in fil_node_complete_io (node=<optimised out>,
    system=<optimised out>, type=<optimised out>)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5288
No locals.
#3 0x000000556e3db800 in fil_aio_wait (segment=segment@entry=3)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5705
        ret = <optimised out>
        fil_node = 0x7fb14a0e78
        message = 0x7fa54d4350
        type = 10
        space_id = 0
#4 0x000000556e3592a4 in io_handler_thread (arg=<optimised out>)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/srv/srv0start.c:486
        segment = 3
#5 0x0000007fb220ae2c in start_thread (arg=0x7fa2fff1a0)
    at pthread_create.c:314
        pd = 0x7fa2fff1a0
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {548195529120,
                548981639624, 548449456128, 0, 548449452032, 548195529312,
                548195527344, 548443965168, 8388608, 548449472512,
                548195527056, 13770210553321828185, 0, 13770210553602140361,
                0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x7fb220ad7c <start_thread>, 0x7fa2fff1a0}, data = {
              prev = 0x0, cleanup = 0x0, canceltype = -1306481284}}}
        not_first_call = 0
        pagesize_m1 = <optimised out>
        sp = <optimised out>
        freesize = <optimised out>
        __PRETTY_FUNCTION__ = "start_thread"
#6 0x0000007fb1dd9c40 in clone ()
    at ../ports/sysdeps/unix/sysv/linux/aarch64/nptl/../clone.S:96
No locals.
{noformat}

Once the daemon crashed we've sometimes been unable to start it again without wiping out the database and re-installing it.

Having done some digging it is apparent that there is a problem in the mutex_exit code path; in particular at:
http://bazaar.launchpad.net/~maria-captains/maria/5.5/view/head:/storage/xtradb/include/sync0sync.ic#L106

A load-acquire is used to exit the mutex rather than a store-release. This leads to unpredictable results for architectures with a weak memory model.

We have the following in program order:

mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed
protected work
mutex_exit -> load-acquire lock, set it to 0 regardless.

However, the following sequence of events can be observed by another core:
mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed
some of the protected work
mutex_exit -> load-acquire lock, set it to 0 regardless.
some more of the protected work (not protected).

The above can (and has for our test system) lead to severe data corruption; that prevents the daemon from even re-starting.

I've attached an emergency patch that re-introduces {{__sync_lock_release}} to release the mutex. This fixes the crash and data corruption issues for me, but I understand from comments in the code that there were issues with this function in the past? Could the gcc intrinsics be moved over to the {{__atomic_*}} functions? Ideally:

To acquire the lock:
{{__atomic_exchange_n(ptr, (byte) new_val, __ATOMIC_ACQUIRE)}}

To release the lock:
{{__atomic_store_n(ptr, (byte) new_val, __ATOMIC_RELEASE)}}

(which also worked on my test system).

I believe this issue may affect other versions of MariaDB, but I've only tested 5.5.36.

Cheers,
--
Steve Capper

Steve Capper made changes - 2014-08-20 16:03

Description

Hi,
We've run into an issue with MariaDB when running Sysbench "oltp.lua" test with 8 threads. The server daemon crashed mostly with an assertion failure at {{storage/xtradb/fil/fil0fil.c:5288}}:

{noformat}
fil_node_complete_io(
/*=================*/
        fil_node_t* node, /*!< in: file node */
        fil_system_t* system, /*!< in: tablespace memory cache */
        ulint type) /*!< in: OS_FILE_WRITE or OS_FILE_READ; marks
                                the node as modified if
                                type == OS_FILE_WRITE */
{
        ut_ad(node);
        ut_ad(system);
        ut_ad(mutex_own(&(system->mutex)));

        ut_a(node->n_pending > 0); <-- failure point

        node->n_pending--;
{noformat}

An attached debugger gave the following backtrace:

{noformat}
(gdb) bt full
#0 0x0000007fb1d44d18 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        _sys_result = 0
        pd = 0x7fa2fff1a0
        pid = <optimised out>
        selftid = 5661
#1 0x0000007fb1d4818c in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7f00000000,
            sa_sigaction = 0x7f00000000}, sa_mask = {__val = {548445445976,
              404, 1, 404, 0, 366924161824, 548195526816, 366921716804,
              366927454208, 3, 0, 548434850424, 366927869208, 366936283408,
              548195527344, 548434850424}}, sa_flags = 5288,
          sa_restorer = 0xa2fff1a0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x000000556e3d1448 in fil_node_complete_io (node=<optimised out>,
    system=<optimised out>, type=<optimised out>)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5288
No locals.
#3 0x000000556e3db800 in fil_aio_wait (segment=segment@entry=3)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5705
        ret = <optimised out>
        fil_node = 0x7fb14a0e78
        message = 0x7fa54d4350
        type = 10
        space_id = 0
#4 0x000000556e3592a4 in io_handler_thread (arg=<optimised out>)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/srv/srv0start.c:486
        segment = 3
#5 0x0000007fb220ae2c in start_thread (arg=0x7fa2fff1a0)
    at pthread_create.c:314
        pd = 0x7fa2fff1a0
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {548195529120,
                548981639624, 548449456128, 0, 548449452032, 548195529312,
                548195527344, 548443965168, 8388608, 548449472512,
                548195527056, 13770210553321828185, 0, 13770210553602140361,
                0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x7fb220ad7c <start_thread>, 0x7fa2fff1a0}, data = {
              prev = 0x0, cleanup = 0x0, canceltype = -1306481284}}}
        not_first_call = 0
        pagesize_m1 = <optimised out>
        sp = <optimised out>
        freesize = <optimised out>
        __PRETTY_FUNCTION__ = "start_thread"
#6 0x0000007fb1dd9c40 in clone ()
    at ../ports/sysdeps/unix/sysv/linux/aarch64/nptl/../clone.S:96
No locals.
{noformat}

Once the daemon crashed we've sometimes been unable to start it again without wiping out the database and re-installing it.

Having done some digging it is apparent that there is a problem in the mutex_exit code path; in particular at:
http://bazaar.launchpad.net/~maria-captains/maria/5.5/view/head:/storage/xtradb/include/sync0sync.ic#L106

A load-acquire is used to exit the mutex rather than a store-release. This leads to unpredictable results for architectures with a weak memory model.

We have the following in program order:

mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed
protected work
mutex_exit -> load-acquire lock, set it to 0 regardless.

However, the following sequence of events can be observed by another core:
mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed
some of the protected work
mutex_exit -> load-acquire lock, set it to 0 regardless.
some more of the protected work (not protected).

The above can (and has for our test system) lead to severe data corruption; that prevents the daemon from even re-starting.

I've attached an emergency patch that re-introduces {{__sync_lock_release}} to release the mutex. This fixes the crash and data corruption issues for me, but I understand from comments in the code that there were issues with this function in the past? Could the gcc intrinsics be moved over to the {{__atomic_*}} functions? Ideally:

To acquire the lock:
{{__atomic_exchange_n(ptr, (byte) new_val, __ATOMIC_ACQUIRE)}}

To release the lock:
{{__atomic_store_n(ptr, (byte) new_val, __ATOMIC_RELEASE)}}

(which also worked on my test system).

I believe this issue may affect other versions of MariaDB, but I've only tested 5.5.36.

Cheers,
--
Steve Capper

Hi,
We've run into an issue with MariaDB when running Sysbench "oltp.lua" test with 8 threads. The server daemon crashed mostly with an assertion failure at {{storage/xtradb/fil/fil0fil.c:5288}}:

{noformat}
fil_node_complete_io(
/*=================*/
        fil_node_t* node, /*!< in: file node */
        fil_system_t* system, /*!< in: tablespace memory cache */
        ulint type) /*!< in: OS_FILE_WRITE or OS_FILE_READ; marks
                                the node as modified if
                                type == OS_FILE_WRITE */
{
        ut_ad(node);
        ut_ad(system);
        ut_ad(mutex_own(&(system->mutex)));

        ut_a(node->n_pending > 0); <-- failure point

        node->n_pending--;
{noformat}

An attached debugger gave the following backtrace:

{noformat}
(gdb) bt full
#0 0x0000007fb1d44d18 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        _sys_result = 0
        pd = 0x7fa2fff1a0
        pid = <optimised out>
        selftid = 5661
#1 0x0000007fb1d4818c in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7f00000000,
            sa_sigaction = 0x7f00000000}, sa_mask = {__val = {548445445976,
              404, 1, 404, 0, 366924161824, 548195526816, 366921716804,
              366927454208, 3, 0, 548434850424, 366927869208, 366936283408,
              548195527344, 548434850424}}, sa_flags = 5288,
          sa_restorer = 0xa2fff1a0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x000000556e3d1448 in fil_node_complete_io (node=<optimised out>,
    system=<optimised out>, type=<optimised out>)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5288
No locals.
#3 0x000000556e3db800 in fil_aio_wait (segment=segment@entry=3)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5705
        ret = <optimised out>
        fil_node = 0x7fb14a0e78
        message = 0x7fa54d4350
        type = 10
        space_id = 0
#4 0x000000556e3592a4 in io_handler_thread (arg=<optimised out>)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/srv/srv0start.c:486
        segment = 3
#5 0x0000007fb220ae2c in start_thread (arg=0x7fa2fff1a0)
    at pthread_create.c:314
        pd = 0x7fa2fff1a0
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {548195529120,
                548981639624, 548449456128, 0, 548449452032, 548195529312,
                548195527344, 548443965168, 8388608, 548449472512,
                548195527056, 13770210553321828185, 0, 13770210553602140361,
                0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x7fb220ad7c <start_thread>, 0x7fa2fff1a0}, data = {
              prev = 0x0, cleanup = 0x0, canceltype = -1306481284}}}
        not_first_call = 0
        pagesize_m1 = <optimised out>
        sp = <optimised out>
        freesize = <optimised out>
        __PRETTY_FUNCTION__ = "start_thread"
#6 0x0000007fb1dd9c40 in clone ()
    at ../ports/sysdeps/unix/sysv/linux/aarch64/nptl/../clone.S:96
No locals.
{noformat}

Once the daemon crashed we've sometimes been unable to start it again without wiping out the database and re-installing it.

Having done some digging it is apparent that there is a problem in the mutex_exit code path; in particular at:
http://bazaar.launchpad.net/~maria-captains/maria/5.5/view/head:/storage/xtradb/include/sync0sync.ic#L106

A load-acquire is used to exit the mutex rather than a store-release. This leads to unpredictable results for architectures with a weak memory model.

We have the following in program order:

mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed
protected work
mutex_exit -> load-acquire lock, set it to 0 regardless.

However, the following sequence of events can be observed by another core:
mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed
some of the protected work
mutex_exit -> load-acquire lock, set it to 0 regardless.
some more of the protected work (not protected).

The above can (and has for our test system) lead to severe data corruption; that prevents the daemon from even re-starting.

I've attached an emergency patch that re-introduces {{__ sync_lock_release}} to release the mutex. This fixes the crash and data corruption issues for me, but I understand from comments in the code that there were issues with this function in the past? Could the gcc intrinsics be moved over to the {{__ atomic_*}} functions? Ideally:

To acquire the lock:
{noformat}
__atomic_exchange_n(ptr, (byte) new_val, __ATOMIC_ACQUIRE)
{noformat}

To release the lock:
{noformat}
__atomic_store_n(ptr, (byte) new_val, __ATOMIC_RELEASE)
{noformat}

(which also worked on my test system).

I believe this issue may affect other versions of MariaDB, but I've only tested 5.5.36.

Cheers,
--
Steve Capper

Steve Capper made changes - 2014-08-20 16:04

Description

Hi,
We've run into an issue with MariaDB when running Sysbench "oltp.lua" test with 8 threads. The server daemon crashed mostly with an assertion failure at {{storage/xtradb/fil/fil0fil.c:5288}}:

{noformat}
fil_node_complete_io(
/*=================*/
        fil_node_t* node, /*!< in: file node */
        fil_system_t* system, /*!< in: tablespace memory cache */
        ulint type) /*!< in: OS_FILE_WRITE or OS_FILE_READ; marks
                                the node as modified if
                                type == OS_FILE_WRITE */
{
        ut_ad(node);
        ut_ad(system);
        ut_ad(mutex_own(&(system->mutex)));

        ut_a(node->n_pending > 0); <-- failure point

        node->n_pending--;
{noformat}

An attached debugger gave the following backtrace:

{noformat}
(gdb) bt full
#0 0x0000007fb1d44d18 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        _sys_result = 0
        pd = 0x7fa2fff1a0
        pid = <optimised out>
        selftid = 5661
#1 0x0000007fb1d4818c in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7f00000000,
            sa_sigaction = 0x7f00000000}, sa_mask = {__val = {548445445976,
              404, 1, 404, 0, 366924161824, 548195526816, 366921716804,
              366927454208, 3, 0, 548434850424, 366927869208, 366936283408,
              548195527344, 548434850424}}, sa_flags = 5288,
          sa_restorer = 0xa2fff1a0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x000000556e3d1448 in fil_node_complete_io (node=<optimised out>,
    system=<optimised out>, type=<optimised out>)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5288
No locals.
#3 0x000000556e3db800 in fil_aio_wait (segment=segment@entry=3)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5705
        ret = <optimised out>
        fil_node = 0x7fb14a0e78
        message = 0x7fa54d4350
        type = 10
        space_id = 0
#4 0x000000556e3592a4 in io_handler_thread (arg=<optimised out>)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/srv/srv0start.c:486
        segment = 3
#5 0x0000007fb220ae2c in start_thread (arg=0x7fa2fff1a0)
    at pthread_create.c:314
        pd = 0x7fa2fff1a0
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {548195529120,
                548981639624, 548449456128, 0, 548449452032, 548195529312,
                548195527344, 548443965168, 8388608, 548449472512,
                548195527056, 13770210553321828185, 0, 13770210553602140361,
                0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x7fb220ad7c <start_thread>, 0x7fa2fff1a0}, data = {
              prev = 0x0, cleanup = 0x0, canceltype = -1306481284}}}
        not_first_call = 0
        pagesize_m1 = <optimised out>
        sp = <optimised out>
        freesize = <optimised out>
        __PRETTY_FUNCTION__ = "start_thread"
#6 0x0000007fb1dd9c40 in clone ()
    at ../ports/sysdeps/unix/sysv/linux/aarch64/nptl/../clone.S:96
No locals.
{noformat}

Once the daemon crashed we've sometimes been unable to start it again without wiping out the database and re-installing it.

Having done some digging it is apparent that there is a problem in the mutex_exit code path; in particular at:
http://bazaar.launchpad.net/~maria-captains/maria/5.5/view/head:/storage/xtradb/include/sync0sync.ic#L106

A load-acquire is used to exit the mutex rather than a store-release. This leads to unpredictable results for architectures with a weak memory model.

We have the following in program order:

mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed
protected work
mutex_exit -> load-acquire lock, set it to 0 regardless.

However, the following sequence of events can be observed by another core:
mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed
some of the protected work
mutex_exit -> load-acquire lock, set it to 0 regardless.
some more of the protected work (not protected).

The above can (and has for our test system) lead to severe data corruption; that prevents the daemon from even re-starting.

I've attached an emergency patch that re-introduces {{__ sync_lock_release}} to release the mutex. This fixes the crash and data corruption issues for me, but I understand from comments in the code that there were issues with this function in the past? Could the gcc intrinsics be moved over to the {{__ atomic_*}} functions? Ideally:

To acquire the lock:
{noformat}
__atomic_exchange_n(ptr, (byte) new_val, __ATOMIC_ACQUIRE)
{noformat}

To release the lock:
{noformat}
__atomic_store_n(ptr, (byte) new_val, __ATOMIC_RELEASE)
{noformat}

(which also worked on my test system).

I believe this issue may affect other versions of MariaDB, but I've only tested 5.5.36.

Cheers,
--
Steve Capper

Hi,
We've run into an issue with MariaDB when running Sysbench "oltp.lua" test with 8 threads. The server daemon crashed mostly with an assertion failure at {{storage/xtradb/fil/fil0fil.c:5288}}:

{noformat}
fil_node_complete_io(
/*=================*/
        fil_node_t* node, /*!< in: file node */
        fil_system_t* system, /*!< in: tablespace memory cache */
        ulint type) /*!< in: OS_FILE_WRITE or OS_FILE_READ; marks
                                the node as modified if
                                type == OS_FILE_WRITE */
{
        ut_ad(node);
        ut_ad(system);
        ut_ad(mutex_own(&(system->mutex)));

        ut_a(node->n_pending > 0); <-- failure point

        node->n_pending--;
{noformat}

An attached debugger gave the following backtrace:

{noformat}
(gdb) bt full
#0 0x0000007fb1d44d18 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
        _sys_result = 0
        pd = 0x7fa2fff1a0
        pid = <optimised out>
        selftid = 5661
#1 0x0000007fb1d4818c in __GI_abort () at abort.c:89
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7f00000000,
            sa_sigaction = 0x7f00000000}, sa_mask = {__val = {548445445976,
              404, 1, 404, 0, 366924161824, 548195526816, 366921716804,
              366927454208, 3, 0, 548434850424, 366927869208, 366936283408,
              548195527344, 548434850424}}, sa_flags = 5288,
          sa_restorer = 0xa2fff1a0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x000000556e3d1448 in fil_node_complete_io (node=<optimised out>,
    system=<optimised out>, type=<optimised out>)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5288
No locals.
#3 0x000000556e3db800 in fil_aio_wait (segment=segment@entry=3)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/fil/fil0fil.c:5705
        ret = <optimised out>
        fil_node = 0x7fb14a0e78
        message = 0x7fa54d4350
        type = 10
        space_id = 0
#4 0x000000556e3592a4 in io_handler_thread (arg=<optimised out>)
    at /build/buildd/mariadb-5.5-5.5.36/storage/xtradb/srv/srv0start.c:486
        segment = 3
#5 0x0000007fb220ae2c in start_thread (arg=0x7fa2fff1a0)
    at pthread_create.c:314
        pd = 0x7fa2fff1a0
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {548195529120,
                548981639624, 548449456128, 0, 548449452032, 548195529312,
                548195527344, 548443965168, 8388608, 548449472512,
                548195527056, 13770210553321828185, 0, 13770210553602140361,
                0, 0, 0, 0, 0, 0, 0, 0}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x7fb220ad7c <start_thread>, 0x7fa2fff1a0}, data = {
              prev = 0x0, cleanup = 0x0, canceltype = -1306481284}}}
        not_first_call = 0
        pagesize_m1 = <optimised out>
        sp = <optimised out>
        freesize = <optimised out>
        __PRETTY_FUNCTION__ = "start_thread"
#6 0x0000007fb1dd9c40 in clone ()
    at ../ports/sysdeps/unix/sysv/linux/aarch64/nptl/../clone.S:96
No locals.
{noformat}

Once the daemon crashed we've sometimes been unable to start it again without wiping out the database and re-installing it.

Having done some digging it is apparent that there is a problem in the mutex_exit code path; in particular at:
http://bazaar.launchpad.net/~maria-captains/maria/5.5/view/head:/storage/xtradb/include/sync0sync.ic#L106

A load-acquire is used to exit the mutex rather than a store-release. This leads to unpredictable results for architectures with a weak memory model.

We have the following in program order:

# mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed
# protected work
# mutex_exit -> load-acquire lock, set it to 0 regardless.

However, the following sequence of events can be observed by another core:
# mutex_enter -> load-acquire lock, loop until it is 0, then set to 1 relaxed
# some of the protected work
# mutex_exit -> load-acquire lock, set it to 0 regardless.
# some more of the protected work (not protected).

The above can (and has for our test system) lead to severe data corruption; that prevents the daemon from even re-starting.

I've attached an emergency patch that re-introduces {{__ sync_lock_release}} to release the mutex. This fixes the crash and data corruption issues for me, but I understand from comments in the code that there were issues with this function in the past? Could the gcc intrinsics be moved over to the {{__ atomic_*}} functions? Ideally:

To acquire the lock:
{noformat}
__atomic_exchange_n(ptr, (byte) new_val, __ATOMIC_ACQUIRE)
{noformat}

To release the lock:
{noformat}
__atomic_store_n(ptr, (byte) new_val, __ATOMIC_RELEASE)
{noformat}

(which also worked on my test system).

I believe this issue may affect other versions of MariaDB, but I've only tested 5.5.36.

Cheers,
--
Steve Capper

Sergey Vojtovich added a comment - 2014-08-20 17:37

This is a duplicate of ~~MDEV-6450~~, which was fixed in 10.0. We'll consider this task as a back port to 5.5 request.

Sergey Vojtovich added a comment - 2014-08-20 17:37 This is a duplicate of MDEV-6450 , which was fixed in 10.0. We'll consider this task as a back port to 5.5 request.

Sergey Vojtovich made changes - 2014-08-20 17:38

Link

This issue duplicates ~~MDEV-6450~~ [ ~~MDEV-6450~~ ]

Elena Stepanova made changes - 2014-08-21 20:53

Fix Version/s

5.5 [ 15800 ]

Sergei Golubchik made changes - 2014-08-22 18:35

Affects Version/s		5.5.36 [ 14600 ]
Affects Version/s	5.5.36-galera [ 14801 ]

Sergei Golubchik made changes - 2014-08-22 18:35

Assignee

Sergey Vojtovich [ svoj ]

Sergey Vojtovich made changes - 2014-08-29 17:39

Link

This issue is part of ~~MDEV-6478~~ [ ~~MDEV-6478~~ ]

Sergey Vojtovich added a comment - 2014-08-29 17:40

Thanks for this bug report. Fixed in 5.5.40:

revno: 4274

revision-id: svoj@mariadb.org-20140829121411-1vcbqxr98r9tbgtp

parent: svoj@mariadb.org-20140829120246-a3piah2b0u8fu68v

committer: Sergey Vojtovich <svoj@mariadb.org>

branch nick: 5.5

timestamp: Fri 2014-08-29 16:14:11 +0400

message:

  Backport from 10.0:

  MDEV-6483 - Deadlock around rw_lock_debug_mutex on PPC64

  This problem affects only debug builds on PPC64.

  There are at least two race conditions around

  rw_lock_debug_mutex_enter and rw_lock_debug_mutex_exit:

  - rw_lock_debug_waiters was loaded/stored without setting

    appropriate locks/memory barriers.

  - there is a gap between calls to os_event_reset() and

    os_event_wait() and in such case we're supposed to pass

    return value of the former to the latter.

  Fixed by replacing self-cooked spinlocks with system mutexes.

  These days system mutexes offer much better performance. OTOH

  performance is not that critical for debug builds.

------------------------------------------------------------

revno: 4273

revision-id: svoj@mariadb.org-20140829120246-a3piah2b0u8fu68v

parent: sergii@pisem.net-20140825145819-i643q99zm0a6hjg2

committer: Sergey Vojtovich <svoj@mariadb.org>

branch nick: 5.5

timestamp: Fri 2014-08-29 16:02:46 +0400

message:

  Backport from 10.0:

  MDEV-6450 - MariaDB crash on Power8 when built with advance tool

              chain

  InnoDB mutex_exit() function calls __sync_test_and_set() to release

  the lock. According to manual this function is supposed to create

  "acquire" memory barrier whereas in fact we need "release" memory

  barrier at mutex_exit().

  The problem isn't repeatable with gcc because it creates

  "acquire-release" memory barrier for __sync_test_and_set().

  ATC creates just "acquire" barrier.

  Fixed by creating proper barrier at mutex_exit() by using

  __sync_lock_release() instead of __sync_test_and_set().

Sergey Vojtovich added a comment - 2014-08-29 17:40 Thanks for this bug report. Fixed in 5.5.40: revno: 4274 revision-id: svoj@mariadb.org-20140829121411-1vcbqxr98r9tbgtp parent: svoj@mariadb.org-20140829120246-a3piah2b0u8fu68v committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 5.5 timestamp: Fri 2014-08-29 16:14:11 +0400 message: Backport from 10.0: MDEV-6483 - Deadlock around rw_lock_debug_mutex on PPC64 This problem affects only debug builds on PPC64. There are at least two race conditions around rw_lock_debug_mutex_enter and rw_lock_debug_mutex_exit: - rw_lock_debug_waiters was loaded/stored without setting appropriate locks/memory barriers. - there is a gap between calls to os_event_reset() and os_event_wait() and in such case we're supposed to pass return value of the former to the latter. Fixed by replacing self-cooked spinlocks with system mutexes. These days system mutexes offer much better performance. OTOH performance is not that critical for debug builds. ------------------------------------------------------------ revno: 4273 revision-id: svoj@mariadb.org-20140829120246-a3piah2b0u8fu68v parent: sergii@pisem.net-20140825145819-i643q99zm0a6hjg2 committer: Sergey Vojtovich <svoj@mariadb.org> branch nick: 5.5 timestamp: Fri 2014-08-29 16:02:46 +0400 message: Backport from 10.0: MDEV-6450 - MariaDB crash on Power8 when built with advance tool chain InnoDB mutex_exit() function calls __sync_test_and_set() to release the lock. According to manual this function is supposed to create "acquire" memory barrier whereas in fact we need "release" memory barrier at mutex_exit(). The problem isn't repeatable with gcc because it creates "acquire-release" memory barrier for __sync_test_and_set(). ATC creates just "acquire" barrier. Fixed by creating proper barrier at mutex_exit() by using __sync_lock_release() instead of __sync_test_and_set().

Sergey Vojtovich made changes - 2014-08-29 17:40

Fix Version/s		5.5.40 [ 17100 ]
Fix Version/s	5.5 [ 15800 ]
Resolution		Fixed [ 1 ]
Status	Open [ 1 ]	Closed [ 6 ]

dann frazier added a comment - 2015-03-04 17:07

Note that this has reappeared for AArch64 in 5.5.41:
https://mariadb.atlassian.net/browse/MDEV-7658

dann frazier added a comment - 2015-03-04 17:07 Note that this has reappeared for AArch64 in 5.5.41: https://mariadb.atlassian.net/browse/MDEV-7658

Rasmus Johansson (Inactive) made changes - 2015-05-18 17:51

Workflow

MariaDB v2 [ 52922 ]

MariaDB v3 [ 64618 ]

Sergey Vojtovich made changes - 2019-03-31 17:30

Link

This issue is duplicated by ~~MDEV-6436~~ [ ~~MDEV-6436~~ ]

Sergei Golubchik made changes - 2021-12-06 21:40

Workflow

MariaDB v3 [ 64618 ]

MariaDB v4 [ 148144 ]

MariaDB Server

Incorrect barrier logic in mutex_exit path causes data corruption on AArch64

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration