[MDEV-21982] despite the use of new Streaming Replication , cluster crash on large data operations Created: 2020-03-19  Updated: 2023-10-19

Status: Open
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.4.12
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Levieux stéphane Assignee: Ramesh Sivaraman
Resolution: Unresolved Votes: 0
Labels: crash, need_verification
Environment:

Galera cluster, 3 nodes on debian 9, provider version 26.4.3(r4535)


Attachments: File my.cnf     PNG File screenshot-1.png    

 Description   

Hello,
For several months we have a problem with large (max 250 000) data operation ( delete + load data ) randomly . the entire cluster crash ( no more queries works, the cluster is like "locked" ) --> bootstrap a node etc...
This occurs during the journey, others transactions still arrive from normal traffic on our websites, no problem with large date operations in the night .

We do these large operations for several yearq without problems, don't konw if the problem can be associated with wsrep_load_data_splitting is off by default since several maria versions ?
So, for 1 month we use the new "streaming replication " feature with galera 4
SET SESSION wsrep_trx_fragment_unit='rows'
SET SESSION wsrep_trx_fragment_size=10000 ;

We active it in our program that do the large date operation

The replication streaming seems to be used correctly because the table wsrep_streaming_log growing each days (the space , not in rows because rows are deleted automaticaly )

Despite of the use of streaming replication nothing has changed, today a table of 245 000 records is sucessfully passed and 3 min after the entire cluster has crashed.

How to solve this problem that occurs randomly ( sometimes 3 times on a week , or sometimes 1 time by month .

Thank you



 Comments   
Comment by Levieux stéphane [ 2020-03-31 ]

After several months, it seems we found the solution
what put us on the track is that a load data local in file crash the cluster with about 100 000 records on some table while others bigger transactions works well on others tables.
The table (were not the only one) were in compact format while others table are in dynamic format . I read in doc that the dynamic format is now the default format (since few versions ) .
These table are in compact format for ten years without any problems for load data in file transaction, something has surely changed, several month ago .
Since 1 week, we migrate all our table in dynamic format, no crash of galera cluster anymore ... and the flow control seems in better state while these transactions are running.
( To remember : we have tried to use the new streaming replication without success ( it works but not solve the problem )
To be continued.....

Comment by Levieux stéphane [ 2020-04-14 ]

Despite our hopes, the cluster has crashed again last week, always on the same data transfert

our log
08/04/2020 09:27:56,227 INFO [main] : – statsgrossistes_e : 81703 enregistrements insérés dans stats_grossistes_e
08/04/2020 09:27:56,905 INFO [main] : 0 enregistrements supprimés de la table stats_grossistes_d
08/04/2020 09:28:12,037 INFO [main] : – statsgrossistes_d : 148752 enregistrements insérés dans stats_grossistes_d
08/04/2020 09:28:12,048 INFO [main] : Transfert terminé

80 000 records for the header and 150 000 records for the détail , 1 or 2 minute after the transfert the cluster freeze, kast commit on the 3 nodes don't move anymore even after waiting several minutes.

in mariadb logs there is absolutly nothing ... (on the 3 nodes )
i added the my.cnf in attachement my.cnf
No more ideas to test about this problem . I thought the new streaming replication with galera 4 would have solved the problem ( this data transfert exist for several years without problems ... but crash the cluster for several months )

Comment by Jan Lindström (Inactive) [ 2020-04-14 ]

stepan.patryshev Can you test with 3 nodes and one table where you insert a lot of rows using streaming replication. See attached my.cnf for other parameters. One possible way to test this is to use sysbench and just use enough rows in prepare phase.

Comment by Levieux stéphane [ 2020-04-22 ]

we had a new crash of the galera cluster on the same transfert the 13/04
Today , i added a trace of records in wsrep_streaming_log just before we commit the transaction

" Streaming replication : 141 records for 81918 records transfered
Streaming replication : 136 records for 81451 records transfered "

our parameters of streaming content are
wsrep_trx_fragment_unit = rows
wsrep_trx_fragment_size = 100
so 140 recors approximatively * 100 = 14 000 records and not 80 000

the "LOAD DATA LOCAL INFILE we use is with "REPLACE INTO TABLE " parameter
I had noticed this behaviour previously , it seems the replace don't replace anything if all values are the same ?
So we can conclude there is 14 000 records effectivly inserted/updated ? The replace parameters and tests values should surely have a cost for the process ?

Comment by Levieux stéphane [ 2020-07-27 ]

Since April, always the same problem... we tried so many things to solve this problem during these months...
it occurs always on same load data local infile ... the server crash suddenly, we have to recreate the cluster (galera_new_cluster)
what it strange it's we have a bigger data import and it never crash the cluster (the java program for import, behind is the same)

However, i notice one of the two tables concerned by the crash is bigger (3M records) , it's the only difference...
this morning it crashes with "only" 86954 records imported
we have only the primary index on this table
Do you have any suggestions ?
I would like to test to transfer the same data in parallel on a duplicate empty table to see if it's the existings data (3M) that cause the problem

( we are on 10.4.13 now)

Comment by Jan Lindström (Inactive) [ 2020-07-27 ]

Hi, if you can share your infile and table definition you use on load data and cluster config files, it could help us to find the reason.

Comment by Levieux stéphane [ 2020-07-27 ]

Of course

Here the table definition :

here my config file :

  1. http://dev.mysql.com/doc/mysql/en/server-system-variables.html
  1. This will be passed to all mysql clients
  2. It has been reported that passwords should be enclosed with ticks/quotes
  3. escpecially if they contain "#" chars...
  4. Remember to edit /etc/mysql/debian.cnf when changing the socket location.
    [client]
    port = 3306
    socket = /var/run/mysqld/mysqld.sock
  1. Here is entries for some specific programs
  2. The following values assume you have at least 32M ram
  1. This was formally known as [safe_mysqld]. Both versions are currently parsed.
    [mysqld_safe]
    socket = /var/run/mysqld/mysqld.sock
    nice = 0

[mysqld]
#

  1. * Basic Settings
    #
    sql_mode = 'ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'

user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc_messages_dir = /usr/share/mysql
lc_messages = en_US
skip-external-locking
#

  1. Instead of skip-networking the default is now to listen only on
  2. localhost which is more compatible and is not less secure.
  3. bind-address = 127.0.0.1
    #
  4. * Fine Tuning
    #
    thread_concurrency = 8
    thread_cache_size = 128

performance_schema=on
skip-name-resolve
max_connections = 450
connect_timeout = 5
wait_timeout = 180
max_allowed_packet = 256M
thread_cache_size = 128
sort_buffer_size = 4M
bulk_insert_buffer_size = 16M
tmp_table_size = 32M
max_heap_table_size = 32M
key_buffer_size = 64K
#

  1. * MyISAM
    #
  2. This replaces the startup script and checks MyISAM tables if needed
  3. the first time they are touched. On error, make copy and try a repair.
    myisam_recover_options = BACKUP
  4. open-files-limit = 2000
    table_open_cache = 4096
    concurrent_insert = 2
    #
  5. * Query Cache Configuration
    #
  6. Cache only tiny result sets, so we can fit more in the query cache.
    query_cache_limit = 128K
    query_cache_size = 0
    query_cache_type = 0
  7. for more write intensive setups, set to DEMAND or OFF
    #
  8. * Logging and Replication
    #
  9. Both location gets rotated by the cronjob.
  10. Be aware that this log type is a performance killer.
  11. As of 5.1 you can enable the log at runtime!
  12. general_log_file = /var/log/mysql/mysql.log
  13. general_log = 1
    #
  14. Error logging goes to syslog due to /etc/mysql/conf.d/mysqld_safe_syslog.cnf.
    #
  15. we do want to know about network errors and such
    log_warnings = 2
    #
  16. Enable the slow query log to see queries with especially long duration
    #slow_query_log[= {0|1}

    ]
    slow_query_log_file = /var/log/mysql/mariadb-slow.log
    long_query_time = 2

  17. log_slow_rate_limit = 1000
    log_slow_verbosity = query_plan
    log_slow_admin_statements=ON
    log_queries_not_using_indexes=OFF
    log_throttle_queries_not_using_indexes=1
    default_storage_engine = InnoDB
  18. you can't just change log file size, requires special procedure
  19. innodb_log_file_size = 50M
    innodb_buffer_pool_size = 10G
    innodb_file_per_table = 1
    #innodb_io_capacity = 400
    innodb_flush_method = O_DIRECT
  1. InnoDB
    default-storage-engine = 'InnoDB'
    innodb-stats-on-metadata = 0
    innodb-stats-sample-pages = 32

table-definition-cache = 2000
table-open-cache = 2048

  1. transaction-isolation = READ-COMMITTED
  2. # To be set to 0 if not multi transactional storage engine
    innodb-buffer-pool-instances = 8
    innodb_data_file_path = ibdata1:2000M;ibdata2:10M:autoextend
    #
  3. # – Set the following to maximum 60/70% of physical RAM.
  4. # innodb_max_dirty_pages_pct should be compute with redo log size in mind: It’s recommended to set the dirty pages smaller than redo log space. The more you have dirty page, the less io will be produced on disk.
    innodb-max-dirty-pages-pct = 50
    innodb-file-per-table = 1
  5. #Save and restore buffer pool to be transparent for user
  6. # innodb_flush_method = O_DIRECT
    innodb_log_file_size = 1G
    innodb_flush_log_at_trx_commit = 2
    innodb_lock_wait_timeout = 50
    innodb-log-buffer-size = 128M
  7. innodb-log-file-size = 1024M
  8. innodb_purge_threads = 1
  9. – Raid1: 200. Raid10: 200 x write arrays. SSD: 5000. FusionIO: 20000.
  10. innodb_io_capacity = 200
  11. – SSD & FusionIO can flush pages in random disk order, keep default for spinning disks
  12. innodb_flush_neighbors = 0
  13. – Increase these numbers to increase global throughput on SAN (16,32,64)
    innodb-read-io-threads = 8
    innodb-write-io-threads = 8
  14. – Set this to 2-4 times # of Cpus, maximum recommended is 64.
  15. – iostat report io service time: io_capacity / service_time is a good starting point for innodb_write_io_threads
  16. – innodb_read_io_threads starting point monitoring status # read ahead per second
    innodb-thread-concurrency = 8
    #
    innodb-open-files = 2048
    #

#

  1. * Security Features
    #
  2. Read the manual, too, if you want chroot!
  3. chroot = /var/lib/mysql/
    #
  4. For generating SSL certificates I recommend the OpenSSL GUI "tinyca".
    #
  5. ssl-ca = /etc/mysql/cacert.pem
  6. ssl-cert = /etc/mysql/server-cert.pem
  7. ssl-key = /etc/mysql/server-key.pem

#

  1. * Galera-related settings
    #

log_error=/var/log/mysql/mariadb-error.log

[galera]
wsrep_gtid_mode = on
wsrep_gtid_domain_id = 0
log_slave_updates = on
log-bin = /var/log/mysql/binlog
log-bin-index = /var/log/mysql/binlog.index
gtid_domain_id = 1
binlog_format=ROW
expire-logs-days = 3
max-binlog-size = 1024M
sync-binlog = 0
binlog-stmt-cache-size = 128K
binlog-cache-size = 256K
#

default-storage-engine=innodb
innodb_autoinc_lock_mode=2
wsrep_on=ON
wsrep_cluster_name='mariadb_cluster'
wsrep_node_name=node4
wsrep_node_address=10.253.253.93
wsrep_provider = /usr/lib/galera/libgalera_smm.so
wsrep_provider_options = "gcache.size = 5G; gcache.name = /var/lib/mysql/galera.cache;"

  1. # wsrep_sst_method = rsync
    wsrep_sst_method = mariabackup
    wsrep_sst_auth = root:extra27frz
    wsrep_slave_threads=32
    wsrep_certify_nonPK=1
    wsrep_max_ws_rows=0
    wsrep_max_ws_size=1073741824
    wsrep_debug=0
    wsrep_log_conflicts = 1
    wsrep_retry_autocommit = 1
    wsrep_cluster_address=gcomm://10.253.253.93,10.253.253.94,10.253.253.95
  2. wsrep_cluster_address = gcomm://
    wsrep_forced_binlog_format=ROW
    enforce_storage_engine = "InnoDB"
    #
    server-id = 01

[mysqldump]
quick
quote-names
max_allowed_packet = 16M
[mysql]
#no-auto-rehash # faster start of mysql but no tab completion

[isamchk]
key_buffer = 16M

#

  1. * IMPORTANT: Additional settings that can override those from this file!
  2. The files must end with '.cnf', otherwise they'll be ignored.
    #
    !include /etc/mysql/mariadb.cnf
    !includedir /etc/mysql/conf.d/
    !include /etc/mysql/secrets-backup.cnf

[xtrabackup]
databases-exclude=lost+found
ssl=0

Comment by Levieux stéphane [ 2020-07-28 ]

This morning i done an interesting test :
i run the transfer on the duplicate table (has 70 000 records from yesterday ) , no problem
30 minutes later, i run the same transfert on the production table (3M records) , the cluster has frozen ... ad need to bootstrap

so it seems ( need to be confirmed ) that it's not here the data volume imported the problem but the existings data and make an import on this table

Ps : we have another import on a 1M table without problem .

Comment by Levieux stéphane [ 2020-08-31 ]

News : Just after my post, i partitionned the table and since no more crash of our cluster mariaDB . So the problem was more the volume of existing data than the volume of data transfered ( the combinaison of both surely )

Generated at Thu Feb 08 09:11:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.