[MXS-4374] Not Balanced Connection Between Master & Slave From monitor Created: 2022-10-31  Updated: 2022-12-02  Resolved: 2022-12-02

Status: Closed
Project: MariaDB MaxScale
Component/s: Monitor, readwritesplit
Affects Version/s: 6.4.1
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: febriyant Assignee: markus makela
Resolution: Not a Bug Votes: 0
Labels: connection, maxscale, readwritesplit
Environment:

CentOS 7


Attachments: PNG File connection from gui maxscale.png     PNG File image-2022-10-31-13-48-53-347.png     Text File maxscale.log    

 Description   

Hi I am already using maxscale as a load balancer from version 2.4 so currently, I set up new env using maxscale version 6.4.1 and MariaDB 10.5.17

but I see the monitor connection between master and slave does not balance

on version 2.4 connection between master and slave is balance
example if master has a connection 100 slave also has 100 connection

this is my maxscale configuration

[maxscale]
threads=auto
max_auth_errors_until_block=0
admin_host=10.50.71.33
admin_port=8989
admin_enabled=1
admin_secure_gui=false
 
## v38_43301 ##
[dbsf7_v38_71_35]
type=server
address=10.50.71.35
port=33301
protocol=MariadBBackend
 
 
[dbsf7_v38_71_36]
type=server
address=10.50.71.36
port=33301
protocol=MariadBBackend
 
[MariaDB-Monitor-v38-43301]
type=monitor
module=mariadbmon
servers=dbsf7_v38_71_36,dbsf7_v38_71_35
user=maxscale
password=F33D9A6E1376BD25313EB4EF0733ED43
monitor_interval=1000
auto_failover=false
#detect_stale_master=true deprecate variables
failover_timeout=10
auto_rejoin=true
failcount=5
master_failure_timeout=2
verify_master_failure=true
switchover_timeout=90
replication_user=maxscale
replication_password=F33D9A6E1376BD25313EB4EF0733ED43
#detect_stale_slave=true deprecate variables
master_conditions=none
 
[Read-Write-Service-v38-43301]
type=service
router=readwritesplit
servers=dbsf7_v38_71_36,dbsf7_v38_71_35
user=maxscale
password=F33D9A6E1376BD25313EB4EF0733ED43
master_failure_mode=fail_on_write
strict_multi_stmt=false
use_sql_variables_in=master
master_reconnection=true
delayed_retry=true
delayed_retry_timeout=20s
transaction_replay=true
max_sescmd_history=60
prune_sescmd_history=true
connection_timeout=36000s
filters=Hint
 
[Read-Write-Listener]
type=listener
service=Read-Write-Service-v38-43301
protocol=MariadbClient
port=43301
authenticator=MariaDBAuth
authenticator_options=lower_case_table_names=true
## END ##
 
[Hint]
type=filter
module=hintfilter
 

and this is my database configuration for replication
Thisi Master Database

[server]
log_warnings                                     = 1
server_id                                        = 7135         #custom
gtid-domain-id                                   = 1
gtid_ignore_duplicates                           = ON
gtid_strict_mode                                 = 0
rpl_semi_sync_master_enabled                     = ON
rpl_semi_sync_slave_enabled                      = ON
rpl_semi_sync_master_wait_point                  = AFTER_SYNC
slave-skip-errors                                = 1062,1032
 
#add new , replication config
#sync_binlog                                      = 1
#sync_master_info                                 = 1
#sync_relay_log                                   = 1
#sync_relay_log_info                              = 1
log_slave_updates                                = 1
 
# this is only for the mysqld standalone daemon
[mysqld]
thread_handling                                  = pool-of-threads
log-bin                                          = /var/lib/mysqlbinlogs/mysql-bin       #binlog
#skip-log-bin
datadir                                          = /var/lib/mysql       #data directory
lower_case_table_names                           = 1                    #in-case sensitive
sql-mode                                         = "PIPES_AS_CONCAT"    #closer to ANSI
skip-host-cache                                                         #avoid dns lookup
skip-name-resolve                                                       #avoid dns lookup
log-slave-updates                                = 1    #binlog update each other
query_cache_size                                 = 0    #query cache is not supported with wsrep
query_cache_type                                 = 0    #query cache is not supported with wsrep
 
# LIMIT #
net_buffer_length                                = 16384
max_allowed_packet                               = 1G
expire_logs_days                                 = 3
max_connections                                  = 10000
max_connect_errors                               = 1000
wait_timeout                                     = 40
interactive_timeout                              = 40
max_statement_time                               = 60
open-files-limit                                 = 393210 #65535
 
# INNODB #
default_storage_engine                           = InnoDB
innodb_data_home_dir                             = /var/lib/mysql
innodb_log_group_home_dir                        = /var/lib/mysql
innodb_file_per_table                            = 1
innodb_log_file_size                             = 2G   #Normal value 2G
innodb_autoinc_lock_mode                         = 2    #avoid bulk insert using auto inc
innodb_flush_log_at_trx_commit                   = 0
innodb_doublewrite                               = 1
binlog_format                                    = ROW
log_bin_trust_function_creators                  = 1
 
# LOGGING #
log_error                                        = /var/lib/mysql/mysql_error.log        #error log
slow_query_log                                   = 1
slow_query_log_file                              = /var/lib/mysql/mysql_slow.log
 
# CUSTOM #
port                                             = 33301
bind-address                                     = 0.0.0.0
#skip-log-bin                                    #temporary disable if needed
 
# Monitoring #
#innodb_monitor_enable=all
performance_schema                               = ON
 
 
## TUNING ##
innodb_buffer_pool_size                         = 4G
innodb_buffer_pool_instances                    = 4
innodb_buffer_pool_chunk_size                   = 128M
thread_cache_size                               = 256
join_buffer_size                                = 32M #64M
key_buffer_size                                 = 128M
max_heap_table_size                             = 512M
tmp_table_size                                  = 512M
table_open_cache                                = 2000
table_definition_cache                          = 400
innodb_flush_method                             = O_DIRECT             #avoid double buffering
## -- ##
 
 
# * Galera-related settings
#
[galera]
# Mandatory settings
#wsrep_on=OFF
#wsrep_provider_options="gcache.size=10G"
#wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
#wsrep_cluster_address=gcomm://
#wsrep_cluster_address=
#wsrep_cluster_name=NBC-DataCentre_Reg
#wsrep_node_address=172.17.200.168
#wsrep_node_name=clusterreg168
#wsrep_sst_method=rsync
#innodb_autoinc_lock_mode=2
#
# Allow server to accept connections on all interfaces.
#
#port=133xx
#bind-address=0.0.0.0
#
# Optional setting
#wsrep_slave_threads=1
#innodb_flush_log_at_trx_commit=0
 
# this is only for embedded server
[embedded]
 
# This group is only read by MariaDB servers, not by MySQL.
# If you use the same .cnf file for MySQL and MariaDB,
# you can put MariaDB-only options here
[mariadb]
 
# This group is only read by MariaDB-10.4 servers.
# If you use the same .cnf file for MariaDB of different versions,
# use this group for options that older servers don't understand
[mariadb-10.5]

This Is Slave Database

[server]
log_warnings                                     = 1
server_id                                        = 7136         #custom
gtid-domain-id                                   = 1
gtid_ignore_duplicates                           = ON
gtid_strict_mode                                 = 0
rpl_semi_sync_master_enabled                     = ON
rpl_semi_sync_slave_enabled                      = ON
rpl_semi_sync_master_wait_point                  = AFTER_SYNC
slave-skip-errors                                = 1062,1032
 
#add new , replication config
#sync_binlog                                      = 1
#sync_master_info                                 = 1
#sync_relay_log                                   = 1
#sync_relay_log_info                              = 1
log_slave_updates                                = 1
 
# this is only for the mysqld standalone daemon
[mysqld]
thread_handling                                  = pool-of-threads
log-bin                                          = /var/lib/mysqlbinlogs/mysql-bin       #binlog
#skip-log-bin
datadir                                          = /var/lib/mysql       #data directory
lower_case_table_names                           = 1                    #in-case sensitive
sql-mode                                         = "PIPES_AS_CONCAT"    #closer to ANSI
skip-host-cache                                                         #avoid dns lookup
skip-name-resolve                                                       #avoid dns lookup
log-slave-updates                                = 1    #binlog update each other
query_cache_size                                 = 0    #query cache is not supported with wsrep
query_cache_type                                 = 0    #query cache is not supported with wsrep
 
# LIMIT #
net_buffer_length                                = 16384
max_allowed_packet                               = 1G
expire_logs_days                                 = 3
max_connections                                  = 10000
max_connect_errors                               = 1000
wait_timeout                                     = 40
interactive_timeout                              = 40
max_statement_time                               = 60
open-files-limit                                 = 393210 #65535
 
# INNODB #
default_storage_engine                           = InnoDB
innodb_data_home_dir                             = /var/lib/mysql
innodb_log_group_home_dir                        = /var/lib/mysql
innodb_file_per_table                            = 1
innodb_log_file_size                             = 2G   #Normal value 2G
innodb_autoinc_lock_mode                         = 2    #avoid bulk insert using auto inc
innodb_flush_log_at_trx_commit                   = 0
innodb_doublewrite                               = 1
binlog_format                                    = ROW
log_bin_trust_function_creators                  = 1
 
# LOGGING #
log_error                                        = /var/lib/mysql/mysql_error.log        #error log
slow_query_log                                   = 1
slow_query_log_file                              = /var/lib/mysql/mysql_slow.log
 
# CUSTOM #
port                                             = 33301
bind-address                                     = 0.0.0.0
#skip-log-bin                                    #temporary disable if needed
 
# Monitoring #
#innodb_monitor_enable=all
performance_schema                               = ON
 
 
## TUNING ##
innodb_buffer_pool_size                         = 4G
innodb_buffer_pool_instances                    = 4
innodb_buffer_pool_chunk_size                   = 128M
thread_cache_size                               = 256
join_buffer_size                                = 32M #64M
key_buffer_size                                 = 128M
max_heap_table_size                             = 512M
tmp_table_size                                  = 512M
table_open_cache                                = 2000
table_definition_cache                          = 400
innodb_flush_method                             = O_DIRECT             #avoid double buffering
## -- ##
 
 
# * Galera-related settings
#
[galera]
# Mandatory settings
#wsrep_on=OFF
#wsrep_provider_options="gcache.size=10G"
#wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
#wsrep_cluster_address=gcomm://
#wsrep_cluster_address=
#wsrep_cluster_name=NBC-DataCentre_Reg
#wsrep_node_address=172.17.200.168
#wsrep_node_name=clusterreg168
#wsrep_sst_method=rsync
#innodb_autoinc_lock_mode=2
#
# Allow server to accept connections on all interfaces.
#
#port=133xx
#bind-address=0.0.0.0
#
# Optional setting
#wsrep_slave_threads=1
#innodb_flush_log_at_trx_commit=0
 
# this is only for embedded server
[embedded]
 
# This group is only read by MariaDB servers, not by MySQL.
# If you use the same .cnf file for MySQL and MariaDB,
# you can put MariaDB-only options here
[mariadb]
 
# This group is only read by MariaDB-10.4 servers.
# If you use the same .cnf file for MariaDB of different versions,
# use this group for options that older servers don't understand
[mariadb-10.5]

– EDITED
add maxscale.log



 Comments   
Comment by markus makela [ 2022-10-31 ]
  • Can you confirm that this is a fresh installation and not an upgrade from 2.4?
  • Does the MaxScale log in /var/log/maxscale/maxscale.log contain any errors or warnings?
  • Have you run any MaxCtrl commands like maxctrl alter service or maxctrl alter server?
Comment by febriyant [ 2022-11-01 ]

hi @markus

  • Can you confirm that this is a fresh installation and not an upgrade from 2.4?
  • answer : yes I setup new env with fresh installation
  • Does the MaxScale log in /var/log/maxscale/maxscale.log contain any errors or warnings?
    this is top 50 line from log

2022-11-01 08:10:05   warning: Timing out 'sfnbc_anj_admin'@'::ffff:172.17.71.40', idle for 36001 seconds
2022-11-01 08:10:37   warning: Timing out 'sfnbc_anj_admin'@'::ffff:172.17.71.40', idle for 36001 seconds
2022-11-01 08:10:46   warning: Timing out 'sfnbc_anj_admin'@'::ffff:172.17.71.40', idle for 36001 seconds
2022-11-01 08:34:48   notice : Read 69 user@host entries from 'dbsf7_v38_71_36' for service 'Read-Write-Service-v38-43301'.
2022-11-01 09:00:01   warning: Timing out 'sf7_admin'@'::ffff:172.17.71.40', idle for 36001 seconds
2022-11-01 10:12:24   error  : (187488) [readwritesplit] Could not find valid server for target type TARGET_SLAVE (COM_QUERY: SELECT * FROM TCLLUserLogBook WHERE user_id = 1 AND sessid = 'carolus_73707D5EC6AB454B95F0460A754FB7F4' AND logend is null ORDER BY logstart DESC), closing connection.
 
name: [dbsf7_v38_71_36] status: [Master, Running] state: [FATAL_FAILURE] last opened at: [Tue Nov  1 10:12:04 2022] last closed at: [Tue Nov  1 10:12:04 2022] last close reason: [Master connection failed: #HY000: Response from server 'dbsf7_v38_71_36' differs from the expected response to COM_QUERY. Closing connection due to inconsistent session state. Error: Duplicate entry 'EMPPSY202211000002-33145' for key 'PRIMARY' (dbsf7_v38_71_36)]
name: [dbsf7_v38_71_35] status: [Slave, Running] state: [FATAL_FAILURE] last opened at: [Tue Nov  1 09:14:13 2022] last closed at: [Tue Nov  1 09:14:13 2022] last close reason: [Slave connection failed: #HY000: Response from server 'dbsf7_v38_71_35' differs from the expected response to COM_QUERY. Closing connection due to inconsistent session state. Error: Duplicate entry 'EMPPSY202211000002-33145' for key 'PRIMARY' (dbsf7_v38_71_35)]
2022-11-01 10:17:49   error  : (189921) [readwritesplit] Could not find valid server for target type TARGET_SLAVE (COM_QUERY: SELECT * FROM TCLLUserLogBook WHERE user_id = 1 AND sessid = 'carolus_73707D5EC6AB454B95F0460A754FB7F4' AND logend is null ORDER BY logstart DESC), closing connection.
 
name: [dbsf7_v38_71_36] status: [Master, Running] state: [FATAL_FAILURE] last opened at: [Tue Nov  1 10:17:29 2022] last closed at: [Tue Nov  1 10:17:29 2022] last close reason: [Master connection failed: #HY000: Response from server 'dbsf7_v38_71_36' differs from the expected response to COM_QUERY. Closing connection due to inconsistent session state. Error: Duplicate entry 'EMPCTFCT2022110000002' for key 'PRIMARY' (dbsf7_v38_71_36)]
name: [dbsf7_v38_71_35] status: [Slave, Running] state: [FATAL_FAILURE] last opened at: [Tue Nov  1 10:16:31 2022] last closed at: [Tue Nov  1 10:16:31 2022] last close reason: [Slave connection failed: #HY000: Response from server 'dbsf7_v38_71_35' differs from the expected response to COM_QUERY. Closing connection due to inconsistent session state. Error: Duplicate entry 'EMPCTFCT2022110000002' for key 'PRIMARY' (dbsf7_v38_71_35)]
2022-11-01 10:59:10   warning: Timing out 'sfnbc_hanwaindonesia_admin'@'::ffff:172.17.71.40', idle for 36001 seconds
2022-11-01 10:59:10   warning: Timing out 'sfnbc_hanwaindonesia_admin'@'::ffff:172.17.71.40', idle for 36001 seconds
2022-11-01 11:00:01   warning: Timing out 'sf7_admin'@'::ffff:172.17.71.40', idle for 36001 seconds

  • Have you run any MaxCtrl commands like maxctrl alter service or maxctrl alter server?
    answer no I never run this command
    1. EDITED
      I Put refresh log after restart maxscale service

MariaDB MaxScale  /var/log/maxscale/maxscale.log  Tue Nov  1 11:32:00 2022
----------------------------------------------------------------------------
2022-11-01 11:32:00   notice : The systemd watchdog is Enabled. Internal timeout = 30s
2022-11-01 11:32:00   notice : Worker message queue size: 1MiB
2022-11-01 11:32:00   notice : Using up to 1.14GiB of memory for query classifier cache
2022-11-01 11:32:00   notice : syslog logging is enabled.
2022-11-01 11:32:00   notice : maxlog logging is enabled.
2022-11-01 11:32:00   notice : Host: 'i54-maxs-p-71-33' OS: Linux@3.10.0-693.el7.x86_64, #1 SMP Tue Aug 22 21:09:27 UTC 2017, x86_64 with 8 processor cores.
2022-11-01 11:32:00   notice : Total usable main memory: 7.62GiB.
2022-11-01 11:32:00   notice : MariaDB MaxScale 6.4.1 started (Commit: 8ac5dcd8c87e10d79993a7db11987f4bb4c845e7)
2022-11-01 11:32:00   notice : MaxScale is running in process 70288
2022-11-01 11:32:00   notice : Configuration file: /etc/maxscale.cnf
2022-11-01 11:32:00   notice : Log directory: /var/log/maxscale
2022-11-01 11:32:00   notice : Data directory: /var/lib/maxscale
2022-11-01 11:32:00   notice : Module directory: /usr/lib64/maxscale
2022-11-01 11:32:00   notice : Service cache: /var/cache/maxscale
2022-11-01 11:32:00   notice : Working directory: /var/log/maxscale
2022-11-01 11:32:00   notice : Module 'qc_sqlite' loaded from '/usr/lib64/maxscale/libqc_sqlite.so'.
2022-11-01 11:32:00   notice : Query classification results are cached and reused. Memory used per thread: 146.23MiB
2022-11-01 11:32:00   notice : Password encryption key file '/var/lib/maxscale/.secrets' not found, using configured passwords as plaintext.
2022-11-01 11:32:00   notice : MaxScale started with 8 worker threads.
2022-11-01 11:32:00   notice : Loading /etc/maxscale.cnf.
2022-11-01 11:32:00   notice : Module 'hintfilter' loaded from '/usr/lib64/maxscale/libhintfilter.so'.
2022-11-01 11:32:00   notice : Module 'readwritesplit' loaded from '/usr/lib64/maxscale/libreadwritesplit.so'.
2022-11-01 11:32:00   notice : Module 'mariadbmon' loaded from '/usr/lib64/maxscale/libmariadbmon.so'.
2022-11-01 11:32:00   notice : Started REST API on [172.17.71.33]:8989
2022-11-01 11:32:00   notice : 'dbsf7_v38_71_36' sent version string '10.5.17-MariaDB-log'. Detected type: 'MariaDB', version: 10.5.17.
2022-11-01 11:32:00   notice : 'dbsf7_v38_71_35' sent version string '10.5.17-MariaDB-log'. Detected type: 'MariaDB', version: 10.5.17.
2022-11-01 11:32:00   notice : Server 'dbsf7_v38_71_35' charset: latin1
2022-11-01 11:32:00   notice : Server 'dbsf7_v38_71_36' charset: latin1
2022-11-01 11:32:00   notice : [mariadbmon] Selecting new master server.
2022-11-01 11:32:00   notice : [mariadbmon] Setting 'dbsf7_v38_71_36' as master.
2022-11-01 11:32:00   notice : Server changed state: dbsf7_v38_71_36[172.17.71.36:33301]: master_up. [Down] -> [Master, Running]
2022-11-01 11:32:00   notice : Server changed state: dbsf7_v38_71_35[172.17.71.35:33301]: slave_up. [Down] -> [Slave, Running]
2022-11-01 11:32:00   notice : Starting a total of 1 services...
2022-11-01 11:32:00   notice : (Read-Write-Listener); Listening for connections at [::]:43301
2022-11-01 11:32:00   notice : Service 'Read-Write-Service-v38-43301' started (1/1)
2022-11-01 11:32:01   notice : Read 69 user@host entries from 'dbsf7_v38_71_36' for service 'Read-Write-Service-v38-43301'.
2022-11-01 11:32:07   warning: Saving runtime modifications to 'maxscale' in '/var/lib/maxscale/maxscale.cnf.d/maxscale.cnf'. The modified values will override the values found in the static configuration files.

Comment by markus makela [ 2022-11-01 ]

Are the connections still not balanced after the restart? If they are not, you can try enabling info level logging by adding log_info=true under the [maxscale] section. The info logging is verbose so be ready to disable it if it starts to affect the performance adversely.

Comment by febriyant [ 2022-11-01 ]

hi Markus
yes after restart still not balanced

ok I will apply the config ``` log_info=true ``` I will provide the log soon

Comment by febriyant [ 2022-11-02 ]

maxscale.log

hi markus
that is to 10k line from /var/log/maxscale/maxscale.log

I hope you got some clue

currently I already disabled again for config log_info=true

Comment by markus makela [ 2022-11-02 ]

What's the value of wait_timeout on the backend database servers? Looks like there's a lot of connections that are idle and they keep opening new connections due to timeouts that happen on the backends. The value of connection_timeout should be lower than the value of wait_timeout in the database.

I think the behavior is explained by some changes that were added to newer versions of MaxScale. The routing is slightly smarter when it comes to opening connections for commands that modify the session state: if there are no connections, it prefers to pick a server labeled as Master to avoid the situation where the result would be different if it were executed on a server labeled as Slave. The log shows that this happens as the clients send a COM_PING command that wakes the connection up and forces a new connection to be executed. As it prefers the current Master server, it naturally ends up having more connections.

Comment by febriyant [ 2022-11-02 ]

hi markus
as you can see on above this on database

wait_timeout                                     = 40
interactive_timeout                              = 40

this on maxscale

connection_timeout=36000s

Comment by markus makela [ 2022-11-02 ]

Ah, my bad. I didn't realize the full config was there and saw only the first page of it. This does confirm my theory that the reason why the connections pile up on one server is due to the COM_PING triggering a reconnection. So far this seems like expected behavior and it's not a bug.

I'd recommend lowering connection_timeout to a value below wait_timeout. You should also configure your client-side connection pools to close connections before either of the two timeouts are reached. Having MaxScale constantly reopen connections is quite wasteful.

The ideal configuration is to have the timeouts be as such:

client connection pool timeout < connection_timeout in MaxScale < wait_timeout in MariaDB

For example, you should set the maximum lifetime of a client-side connection pool to 30 seconds, the connection_timeout in MaxScale to 60 seconds and wait_timeout in MariaDB to 90 seconds. This way the closing of the connection is always initiated in the "right" order and only error conditions in the client application cause other timeouts to be triggered.

Comment by febriyant [ 2022-11-03 ]

Hi markus thanks for update

Let me test first on dev env.
give me time until next week, please don't closed this ticket

I will give the results ASAP

Thank you

Comment by markus makela [ 2022-11-03 ]

OK, I'll keep it in Needs Feedback state until we hear back from you.

Comment by febriyant [ 2022-11-14 ]

hi markus I have already changed the configuration maxscale and database
currently on maxscale connection_timeout=1800s

and on database is wait_timeout = 3600 and interactive_timeout = 3600
the results is connection balanced

but I still need more time for monitoring, because we have some big process end of this months

I hope you can still open this ticket

Thank you

Comment by markus makela [ 2022-11-14 ]

Sure, we'll keep it open.

Comment by febriyant [ 2022-12-02 ]

Hi Markus thanks for the support
so far the results are the same as I expected.

so you can close this ticket

Thank You
Febriyant

Generated at Thu Feb 08 04:28:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.