Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24014

Assertion failure in btr0pcur.cc during SELECT operation

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.3.21, 10.4.14
    • None
    • None
    • Fedora 32 Server Edition, kernel version 5.6.19-300.fc32.x86_64,

    Description

      On running the following query on a Nextcloud database, I receive a "MySQL Server has gone away" error. Manual running results in a loss of connection to the database shortly, until Systemd restarts the crashed service.

      Query:

      SELECT COUNT(*) AS `num_entries` FROM `oc_filecache`
      

      Error in mariadb.log:

      2020-10-22 23:25:58 0x7f9f8e451700  InnoDB: Assertion failure in file /builddir/build/BUILD/mariadb-10.4.14/storage/innobase/btr/btr0pcur.cc line 494
      InnoDB: Failing assertion: btr_page_get_prev(next_page) == btr_pcur_get_block(cursor)->page.id.page_no()
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mysqld startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
      InnoDB: about forcing recovery.
      201022 23:25:58 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.4.14-MariaDB
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=2
      max_threads=153
      thread_count=8
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467757 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7f9f398079c8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f9f8e44f588 thread_stack 0x49000
      /usr/libexec/mysqld(my_print_stacktrace+0x41)[0x564cb44971d1]
      /usr/libexec/mysqld(handle_fatal_signal+0x4e5)[0x564cb3fc5e95]
      sigaction.c:0(__restore_rt)[0x7f9f8f010a90]
      :0(__GI_raise)[0x7f9f8e6f49e5]
      :0(__GI_abort)[0x7f9f8e6dd895]
      /usr/libexec/mysqld(+0x5affde)[0x564cb3ca2fde]
      /usr/libexec/mysqld(+0x5b467f)[0x564cb3ca767f]
      /usr/libexec/mysqld(+0xb656cd)[0x564cb42586cd]
      /usr/libexec/mysqld(+0xaa9e50)[0x564cb419ce50]
      /usr/libexec/mysqld(_ZN7handler13ha_index_nextEPh+0x59)[0x564cb3fcbf59]
      /usr/libexec/mysqld(+0x704190)[0x564cb3df7190]
      /usr/libexec/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x1d3)[0x564cb3dea773]
      /usr/libexec/mysqld(_ZN4JOIN10exec_innerEv+0xe39)[0x564cb3e14579]
      /usr/libexec/mysqld(_ZN4JOIN4execEv+0x37)[0x564cb3e14867]
      /usr/libexec/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x136)[0x564cb3e12826]
      /usr/libexec/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x14b)[0x564cb3e1328b]
      /usr/libexec/mysqld(+0x6b1af1)[0x564cb3da4af1]
      /usr/libexec/mysqld(_Z21mysql_execute_commandP3THD+0x4cd9)[0x564cb3db1039]
      /usr/libexec/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x1df)[0x564cb3db36df]
      /usr/libexec/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1014)[0x564cb3db5824]
      /usr/libexec/mysqld(_Z10do_commandP3THD+0x110)[0x564cb3db7560]
      /usr/libexec/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1f2)[0x564cb3ea3a22]
      /usr/libexec/mysqld(handle_one_connection+0x3f)[0x564cb3ea3b7f]
      pthread_create.c:0(start_thread)[0x7f9f8f005432]
      :0(__GI___clone)[0x7f9f8e7b9913]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7f9f39855660): SELECT COUNT(*) AS `num_entries` FROM `oc_filecache`
      Connection ID (thread ID): 8
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on
       
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /var/lib/mysql
      Resource Limits:
      Limit                     Soft Limit           Hard Limit           Units
      Max cpu time              unlimited            unlimited            seconds
      Max file size             unlimited            unlimited            bytes
      Max data size             unlimited            unlimited            bytes
      Max stack size            8388608              unlimited            bytes
      Max core file size        unlimited            unlimited            bytes
      Max resident set          unlimited            unlimited            bytes
      Max processes             190227               190227               processes
      Max open files            32190                32190                files
      Max locked memory         65536                65536                bytes
      Max address space         unlimited            unlimited            bytes
      Max file locks            unlimited            unlimited            locks
      Max pending signals       190227               190227               signals
      Max msgqueue size         819200               819200               bytes
      Max nice priority         0                    0
      Max realtime priority     0                    0
      Max realtime timeout      unlimited            unlimited            us
      Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
       
      2020-10-22 23:26:04 0 [Note] RocksDB: 2 column families found
      2020-10-22 23:26:04 0 [Note] RocksDB: Column Families at start:
      2020-10-22 23:26:04 0 [Note]   cf=default
      2020-10-22 23:26:04 0 [Note]     write_buffer_size=67108864
      2020-10-22 23:26:04 0 [Note]     target_file_size_base=67108864
      2020-10-22 23:26:04 0 [Note]   cf=__system__
      2020-10-22 23:26:04 0 [Note]     write_buffer_size=67108864
      2020-10-22 23:26:04 0 [Note]     target_file_size_base=67108864
      2020-10-22 23:26:04 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables
      2020-10-22 23:26:04 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer
      2020-10-22 23:26:04 0 [Note] MyRocks storage engine plugin has been successfully initialized.
      Thu Oct 22 23:26:04 2020 PerconaFT recovery starting in env /var/lib/mysql/
      Thu Oct 22 23:26:04 2020 PerconaFT recovery scanning backward from 4362730
      Thu Oct 22 23:26:04 2020 PerconaFT recovery bw_end_checkpoint at 4362730 timestamp 1603423531957429 xid 4362726 (bw_newer)
      Thu Oct 22 23:26:04 2020 PerconaFT recovery bw_begin_checkpoint at 4362726 timestamp 1603423531957407 (bw_between)
      Thu Oct 22 23:26:04 2020 PerconaFT recovery turning around at begin checkpoint 4362726 time 22
      Thu Oct 22 23:26:04 2020 PerconaFT recovery starts scanning forward to 4362730 from 4362726 left 4 (fw_between)
      Thu Oct 22 23:26:04 2020 PerconaFT recovery closing 2 dictionaries
      Thu Oct 22 23:26:04 2020 PerconaFT recovery making a checkpoint
      Thu Oct 22 23:26:04 2020 PerconaFT recovery done
      2020-10-22 23:26:04 0 [Note] InnoDB: Using Linux native AIO
      2020-10-22 23:26:04 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2020-10-22 23:26:04 0 [Note] InnoDB: Uses event mutexes
      2020-10-22 23:26:04 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2020-10-22 23:26:04 0 [Note] InnoDB: Number of pools: 1
      2020-10-22 23:26:04 0 [Note] InnoDB: Using SSE2 crc32 instructions
      2020-10-22 23:26:04 0 [Note] mysqld: O_TMPFILE is not supported on /var/tmp (disabling future attempts)
      2020-10-22 23:26:04 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
      2020-10-22 23:26:04 0 [Note] InnoDB: Completed initialization of buffer pool
      2020-10-22 23:26:04 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
      2020-10-22 23:26:04 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=10419653722
      2020-10-22 23:26:05 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
      2020-10-22 23:26:05 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
      2020-10-22 23:26:05 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      2020-10-22 23:26:05 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      2020-10-22 23:26:05 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
      2020-10-22 23:26:05 0 [Note] InnoDB: Waiting for purge to start
      2020-10-22 23:26:05 0 [Note] InnoDB: 10.4.14 started; log sequence number 10419653731; transaction id 33373168
      2020-10-22 23:26:05 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
      2020-10-22 23:26:05 0 [Note] Plugin 'FEEDBACK' is disabled.
      2020-10-22 23:26:05 0 [Note] Recovering after a crash using tc.log
      2020-10-22 23:26:05 0 [Note] Starting crash recovery...
      2020-10-22 23:26:05 0 [Note] Crash recovery finished.
      2020-10-22 23:26:05 0 [Note] InnoDB: Buffer pool(s) load completed at 201022 23:26:05
      2020-10-22 23:26:05 0 [Note] Server socket created on IP: '::'.
      2020-10-22 23:26:05 0 [Note] Reading of all Master_info entries succeeded
      2020-10-22 23:26:05 0 [Note] Added new Master_info '' to hash table
      2020-10-22 23:26:05 0 [Note] /usr/libexec/mysqld: ready for connections.
      Version: '10.4.14-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
      

      Attachments

        Issue Links

          Activity

            The assertion failure says that the sibling page links for the B-tree are corrupted.

            I found a similar ticket MDEV-14170 where someone had apparently done something questionable with the redo log, causing B-tree corruption.

            You should try to find out the source of the corruption. Could it be MDEV-19916?

            marko Marko Mäkelä added a comment - The assertion failure says that the sibling page links for the B-tree are corrupted. I found a similar ticket MDEV-14170 where someone had apparently done something questionable with the redo log, causing B-tree corruption. You should try to find out the source of the corruption. Could it be MDEV-19916 ?
            Mik Mik added a comment -

            Unfortunately this issue started roughly 7-8 months ago, but because it was only a single specific SELECT statement that was crashing, I had only filed a bug with Nextcloud. I assumed they would have more intricate knowledge of the various queries Nextcloud issues that could cause corruption such as this. I no longer have log files from around when the problem would have occurred.

            I would assume that the issue arose during a Nextcloud version upgrade, so I'll need to do some digging regarding the queries run during an upgrade operation around that time. The only issue is the problem could have been created even earlier than that, as the issue is only triggered when viewing a specific administrative panel.

            Is it possible to manually repair this issue in the database? I've repaired smaller issues such as duplicate PK's manually in the past, but that seems much smaller an issue compared to this. Additionally, I am not nearly as familiar as I wish I was to investigate completely on my own.

            Mik Mik added a comment - Unfortunately this issue started roughly 7-8 months ago, but because it was only a single specific SELECT statement that was crashing, I had only filed a bug with Nextcloud. I assumed they would have more intricate knowledge of the various queries Nextcloud issues that could cause corruption such as this. I no longer have log files from around when the problem would have occurred. I would assume that the issue arose during a Nextcloud version upgrade, so I'll need to do some digging regarding the queries run during an upgrade operation around that time. The only issue is the problem could have been created even earlier than that, as the issue is only triggered when viewing a specific administrative panel. Is it possible to manually repair this issue in the database? I've repaired smaller issues such as duplicate PK's manually in the past, but that seems much smaller an issue compared to this. Additionally, I am not nearly as familiar as I wish I was to investigate completely on my own.
            marian.cerny Marián Černý added a comment - - edited

            Same problem here (it looks the same - very similar).

            2021-11-15 10:14:27 0x81ddf3400  InnoDB: Assertion failure in file /tmp/portbuild/usr/ports/databases/mariadb105-server/work/mariadb-10.5.12/storage/innobase/btr/btr0pcur.cc line 520
            InnoDB: Failing assertion: page_is_comp(next_page) == page_is_comp(page)
            [...]
            211115 10:14:27 [ERROR] mysqld got signal 6 ;
            [...]
            Server version: 10.5.12-MariaDB
            [...]
            Query (0x8218bf830): SELECT * FROM `emailsTemplates` ORDER BY `country`, `name`
            }}
            

            SELECT COUNT(*) FROM emailsTemplates

            also crashes the server with the same error.

            I din't check if the line refers to the same line (but I guess it does):

            mariadb-10.4.14/storage/innobase/btr/btr0pcur.cc line 494
            mariadb-10.5.12/storage/innobase/btr/btr0pcur.cc line 520

            Upgrading to a newer version didn't help:

            mariadb-10.5.13/storage/innobase/btr/btr0pcur.cc line 520

            There have been no power outages on the server (nor crashes) previously.
            The table has been corrupted some days ago. (There probably wasn't any attempt to read the table for few days.) A filesystem snapshot 6 days produced the same problem. The previous snapshot 14 days ago worked fine.

            The table with the problem is a small table with just only 22 rows. However there is a TEXT column. The size is 0.17 MB.

            There was no bit rot on the SSD. The underlying filesystem is ZFS with two disks in a mirror. (ZFS checksums all the blocks).

            The server does not have ECC RAM. However I have seen similar issue twice on different setups with master-master replication where one server was crashing on an update from the other. Those servers had ECC RAM.

            So I suspect the data have been corrupted by an internal error in MariaDB - some time before the crash.

            marian.cerny Marián Černý added a comment - - edited Same problem here (it looks the same - very similar). 2021-11-15 10:14:27 0x81ddf3400 InnoDB: Assertion failure in file /tmp/portbuild/usr/ports/databases/mariadb105-server/work/mariadb-10.5.12/storage/innobase/btr/btr0pcur.cc line 520 InnoDB: Failing assertion: page_is_comp(next_page) == page_is_comp(page) [...] 211115 10:14:27 [ERROR] mysqld got signal 6 ; [...] Server version: 10.5.12-MariaDB [...] Query (0x8218bf830): SELECT * FROM `emailsTemplates` ORDER BY `country`, `name` }} SELECT COUNT(*) FROM emailsTemplates also crashes the server with the same error. I din't check if the line refers to the same line (but I guess it does): mariadb-10.4.14/storage/innobase/btr/btr0pcur.cc line 494 mariadb-10.5.12/storage/innobase/btr/btr0pcur.cc line 520 Upgrading to a newer version didn't help: mariadb-10.5.13/storage/innobase/btr/btr0pcur.cc line 520 There have been no power outages on the server (nor crashes) previously. The table has been corrupted some days ago. (There probably wasn't any attempt to read the table for few days.) A filesystem snapshot 6 days produced the same problem. The previous snapshot 14 days ago worked fine. The table with the problem is a small table with just only 22 rows. However there is a TEXT column. The size is 0.17 MB. There was no bit rot on the SSD. The underlying filesystem is ZFS with two disks in a mirror. (ZFS checksums all the blocks). The server does not have ECC RAM. However I have seen similar issue twice on different setups with master-master replication where one server was crashing on an update from the other. Those servers had ECC RAM. So I suspect the data have been corrupted by an internal error in MariaDB - some time before the crash.
            marian.cerny Marián Černý added a comment - - edited

            I have checked the older logs from a similar issue I have mentioned ("with master-master replication where one server was crashing on an update from the other") and found out that it was a different issue.

            The other issues was:

            2021-08-20 10:14:56 0x820970900  InnoDB: Assertion failure in file /tmp/portbuild/usr/ports/databases/mariadb105-server/work/mariadb-10.5.12/storage/innobase/rem/rem0rec.cc line 877
            

            Then there were a lot of signal 10 crashes without an assertion failure (e.g. a reference to a source code line) when I was trying to recover.

            And then later I also got a crash in btr0pcru.cc, but on different line:

            2021-08-20 11:10:57 0x81e96da00  InnoDB: Assertion failure in file /tmp/portbuild/usr/ports/databases/mariadb105-server/work/mariadb-10.5.12/storage/innobase/btr/btr0pcur.cc line 52
            0
            InnoDB: Failing assertion: page_is_comp(next_page) == page_is_comp(page)
            Query (0x821b1ad20): CHECKSUM TABLE `prod`.`hiding_rule`
            

            So the issue from the original bug report happened to me just in one instance (but the SELECT always crashed until the data from the broken table was replaced from a backup).

            marian.cerny Marián Černý added a comment - - edited I have checked the older logs from a similar issue I have mentioned ("with master-master replication where one server was crashing on an update from the other") and found out that it was a different issue. The other issues was: 2021-08-20 10:14:56 0x820970900 InnoDB: Assertion failure in file /tmp/portbuild/usr/ports/databases/mariadb105-server/work/mariadb-10.5.12/storage/innobase/rem/rem0rec.cc line 877 Then there were a lot of signal 10 crashes without an assertion failure (e.g. a reference to a source code line) when I was trying to recover. And then later I also got a crash in btr0pcru.cc, but on different line: 2021-08-20 11:10:57 0x81e96da00 InnoDB: Assertion failure in file /tmp/portbuild/usr/ports/databases/mariadb105-server/work/mariadb-10.5.12/storage/innobase/btr/btr0pcur.cc line 52 0 InnoDB: Failing assertion: page_is_comp(next_page) == page_is_comp(page) Query (0x821b1ad20): CHECKSUM TABLE `prod`.`hiding_rule` So the issue from the original bug report happened to me just in one instance (but the SELECT always crashed until the data from the broken table was replaced from a backup).

            marian.cerny, the assertions that fail for you note that the flag that is clear for the original InnoDB format (the name ROW_FORMAT=REDUNDANT was introduced for it in MySQL 5.0.3) differs between two pages of the same index tree. I remember examining a case like that years ago. One of the pages was filled with NUL bytes.

            I think that in this case, the corrupted page may be a leaf page of the clustered index. CHECKSUM TABLE should not access the internal pages of the B-tree, except for the initial path from the root page to the first leaf page. In other words, some data could be lost even if the corrupted page was manually fixed in the data file.

            To my understanding, when the server is started from a file system snapshot copy of a running server’s data directory, crash recovery will typically be invoked. We do cover crash recovery in our internal testing, and DML operations are expected to be crash-safe. (DDL is only crash-safe starting with 10.6.)

            It is theoretically possible that the data was corrupted due to a bug in the crash recovery logic. It might also be a bug in the file system snapshot. I don’t think that we perform any internal stress testing using file system snapshots.

            Without having something that repeats the problem (corrupts an initially sound database) there is not much we can do.

            One last note: If you use innodb_force_recovery=6 or remove ib_logfile0 to ‘fix’ recovery issues, or you see any messages like log sequence number … is in the future, then you can expect everything to be thoroughly corrupted.

            marko Marko Mäkelä added a comment - marian.cerny , the assertions that fail for you note that the flag that is clear for the original InnoDB format (the name ROW_FORMAT=REDUNDANT was introduced for it in MySQL 5.0.3) differs between two pages of the same index tree. I remember examining a case like that years ago. One of the pages was filled with NUL bytes. I think that in this case, the corrupted page may be a leaf page of the clustered index. CHECKSUM TABLE should not access the internal pages of the B-tree, except for the initial path from the root page to the first leaf page. In other words, some data could be lost even if the corrupted page was manually fixed in the data file. To my understanding, when the server is started from a file system snapshot copy of a running server’s data directory, crash recovery will typically be invoked. We do cover crash recovery in our internal testing, and DML operations are expected to be crash-safe. (DDL is only crash-safe starting with 10.6.) It is theoretically possible that the data was corrupted due to a bug in the crash recovery logic. It might also be a bug in the file system snapshot. I don’t think that we perform any internal stress testing using file system snapshots. Without having something that repeats the problem (corrupts an initially sound database) there is not much we can do. One last note: If you use innodb_force_recovery=6 or remove ib_logfile0 to ‘fix’ recovery issues, or you see any messages like log sequence number … is in the future , then you can expect everything to be thoroughly corrupted.

            @Marko, thanks for looking into my comments.

            To sum it up, as I understand it now, is that the error Assertion failure btr0pcur.cc line 52 (page_is_comp(next_page) == page_is_comp(page)) from my second comment might a problem that resulted from a crash recovery. There was no recovery from a filesystem snapshot in this case.
            However there was a crash (primary problem) - Assertion failure in rem0rec.cc line 877.
            There was never a crash or a crash recovery on this server (neither recovery from a filesystem snapshot) before.

            In my first comment I was sending similar error than the initial bug report in Description by @Mik - Assertion failure in btr0pcur.cc line 520 (page_is_comp(next_page) == page_is_comp(page)). This also happened initially on a server where there was no crash or a crash recovery. Only once the problem occurred, I was trying to get the data from a “backup” - filesystem snapshot. Despite there was recovery from a snapshot when I was trying to get older data, I doubt the assertion was caused by the crash recovery. The assertion was there for few hourly snapshots I have tested and all the 6 daily snapshots. The snapshot that worked was the monthly snapshot (from 14 days ago).

            I understand that this assertion is a sign of a problem that occurred before and that from this state it’s not easy or possible to find out what happened. I do not have a way how to reproduce any of the two problems.

            I guess both problems might be any of those three problems:

            • bit rot (data going bad on disk)
            • memory issue
            • internal error in MariaDB

            I doubt it is bit rot, because ZFS checksums were ok.
            In one instance ECC RAM has been used, so at least the problem from my second comment (rem0rec.cc:877) should not be caused by a memory issue.
            For the other issue (btr0pcur.cc:520), the server does not have ECC RAM. I tried to check the RAM with memtester and did not find any problems. But the test runs on a running system and is not able to test all the RAM (so there might be some memory issue).

            marian.cerny Marián Černý added a comment - @Marko, thanks for looking into my comments. To sum it up, as I understand it now, is that the error Assertion failure btr0pcur.cc line 52 (page_is_comp(next_page) == page_is_comp(page)) from my second comment might a problem that resulted from a crash recovery. There was no recovery from a filesystem snapshot in this case. However there was a crash (primary problem) - Assertion failure in rem0rec.cc line 877. There was never a crash or a crash recovery on this server (neither recovery from a filesystem snapshot) before. In my first comment I was sending similar error than the initial bug report in Description by @Mik - Assertion failure in btr0pcur.cc line 520 (page_is_comp(next_page) == page_is_comp(page)). This also happened initially on a server where there was no crash or a crash recovery. Only once the problem occurred, I was trying to get the data from a “backup” - filesystem snapshot. Despite there was recovery from a snapshot when I was trying to get older data, I doubt the assertion was caused by the crash recovery. The assertion was there for few hourly snapshots I have tested and all the 6 daily snapshots. The snapshot that worked was the monthly snapshot (from 14 days ago). I understand that this assertion is a sign of a problem that occurred before and that from this state it’s not easy or possible to find out what happened. I do not have a way how to reproduce any of the two problems. I guess both problems might be any of those three problems: bit rot (data going bad on disk) memory issue internal error in MariaDB I doubt it is bit rot, because ZFS checksums were ok. In one instance ECC RAM has been used, so at least the problem from my second comment (rem0rec.cc:877) should not be caused by a memory issue. For the other issue (btr0pcur.cc:520), the server does not have ECC RAM. I tried to check the RAM with memtester and did not find any problems. But the test runs on a running system and is not able to test all the RAM (so there might be some memory issue).

            People

              Unassigned Unassigned
              Mik Mik
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.