[MCOL-1148] Multi server separated out in different data centers and geographic locations Created: 2018-01-08 Updated: 2021-01-17 Resolved: 2021-01-17 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | N/A |
| Affects Version/s: | 1.1.3 |
| Fix Version/s: | 5.5.1 |
| Type: | New Feature | Priority: | Minor |
| Reporter: | Sasha V | Assignee: | Todd Stoffel (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Multi server separated out in two data centers and geographic locations |
||
| Description |
|
According to the InfiniDB Concepts Guide, User and Performance Modules can be separated out in different data centers and geographic locations. Thus, I am hopeful that MariaDB ColumnStore could provide such feature. As linuxjedi informed that there were several bugs in the network packet handling and compression that are fixed in the develop-1.1 branch (which will make up the 1.1.3 release), I deployed MariaDB ColumnStore from 1.1.3 repository (build of Jan 2) on three combined UM/PM nodes, with PM3 in a different geo-location separated by 25 ms of RTT from PM1/2 nodes. The system is loaded with 3M rows of data separated across all three PMs. After a system start, a test data aggregation query works as expected and completes in 15-20s. However, after few days of system idling and no records in the logs, the same data aggregation query fails with
having records in the /var/log/mariadb/columnstore/crit.log
The same query executes fine following system shutdown/start. How can I increase the log level verbosity to debug broken pipe and lost connection issues? |
| Comments |
| Comment by Andrew Hutchings (Inactive) [ 2018-01-08 ] | ||||
|
For clarification my comment in the mailing list was talking about mcsapi only. This does not appear to be an mcsapi issue so my comment does not apply. | ||||
| Comment by Sasha V [ 2018-01-10 ] | ||||
|
After several system shutdown/start cycles, the (unreleased) 1.1.3 system failed to start having
Thanks to the clarification from linuxjedi, I downgraded the system from the (unreleased) 1.1.3 to release 1.1.2 GA. It appears that the observed problem with a distributed system was caused by data centers firewalls that dropped idle connections. After implementing some kind of a keep-alive ping (a periodic execution of the same data aggregation query) the MariaDB ColumnStore 1.1.2 no longer reported broken pipe or lost connection issues. | ||||
| Comment by Sasha V [ 2018-01-10 ] | ||||
|
The distributed multi-server system exhibits the following behavior, when on a combined UM/PM1 node I use mcsmysql to execute the same query (an then quit):
It appears that upon mcsmysql launch on UM/PM1 node there is a round-robin assignment of the UM/ExeMgr node responsible for the query. (If I remain in the mcsmysql prompt that has been assigned to the remote UM/PM3 node, each query execution remains slow.) This behavior is in agreement with InifiniDB documentation on "Automatic round-robin distribution/scale-out of queries (based on connection id) across all UMs." It is possible to bind the query execution to UM/ExeMgr on a particular UM/PM node? | ||||
| Comment by Sasha V [ 2018-01-11 ] | ||||
|
The round-robin distribution of queries (based on connection id) across all UMs is hardcoded at As a workaround, I use a stored procedure that KILLs it's own connection if CONNECTION_ID%3 equals 2, since this corresponds to queries using ExeMgr in a remote data center. | ||||
| Comment by David Thompson (Inactive) [ 2018-01-12 ] | ||||
|
hi sasha, there is an enhancement to productize this 1. stop the system using the stopsystem command The effect will be that the mysqld process will then always use the same exemgr process on the same host. | ||||
| Comment by Sasha V [ 2018-01-12 ] | ||||
|
Hi dthompson, Thanks a lot for the guidance. I changed the ExeMgr IP addresses to 127.0.0.1 on all three nodes. Following this change, the MariaDB Columstore system (distributed across two data centers) executes queries stably. |