[MCOL-1128] exemgr becomes non responsive Created: 2017-12-21 Updated: 2020-08-25 Resolved: 2018-01-22 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | None |
| Affects Version/s: | 1.1.2 |
| Fix Version/s: | 1.1.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | David Thompson (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Sprint: | 2017-25, 2018-01, 2018-02 | ||||||||
| Description |
|
1um1pm setup. Data is loaded by periodic multi value insert DML and obviously queried. This worked ok with infinidb 4.0 but after upgrade customer finds that the system works for a few hours and then either inserts and / or queries will stop working. |
| Comments |
| Comment by David Hall (Inactive) [ 2017-12-22 ] |
|
The issue that Daniel reproduced can easily be reproduced on a single server 1.1 stack. This implies there's a good possibility the bug was introduced in 1.1. I will attempt to ascertain the exact patch that did it. Meantime, perhaps the customer may want to try 1.0.12 and see if that helps. |
| Comment by Andrew Hutchings (Inactive) [ 2018-01-10 ] |
|
A git bisect shows with some certainty the regression happened during the transition from MariaDB 10.1 -> 10.2 in ColumnStore 1.1. It is difficult to pinpoint exactly what triggered it as finding matching server and engine code revisions is very difficult (and not all of them compile on my test machine). Recommended next step is to look at what the server is asking of the engine in both 1.0 and 1.1 for these queries to spot any differences. |
| Comment by David Hall (Inactive) [ 2018-01-11 ] |
|
In 02/17, we added threadpooling to ExeMgr to increase performance. The connections to the connector, DMLProc, or DDLProc are thread-pooled and limited to 50 threads. Any additional connections are delayed until a thread becomes available. By removing the 50 limit, and allowing the threadpool to grow to an unspecified number of threads, the issue clears up. DMLProc still calls ExeMgr even though there's no explicit query – it still needs to do system catalog queries. Why waiting for an ExeMgr thread causes DMLProc to wait forever is not yet clear. However, this is the first progress we've made on this issue so I thought a status update was needed. I should be able to clear this all up in one day more. |
| Comment by Andrew Hutchings (Inactive) [ 2018-01-12 ] |
|
Since they are inserts every one of them will flush the system catalog cache (for extent metadata update) and since they are on the UM they will get the system catalog via ExeMgr rather than direct from PrimProc (there are two access methods in the code). I believe ExeMgr waits forever because it is waiting on PrimProc ( |
| Comment by David Hall (Inactive) [ 2018-01-12 ] |
|
A work around is to shut the system down. Be sure it is completely down. Then add the following to the Columnstore.xml: <ServerThreads>200</ServerThreads> <Columnstore Version="V1.0.0"> Restart the system The defaults here are 50/100. You could make them bigger than 200/400. |
| Comment by David Hall (Inactive) [ 2018-01-12 ] |
|
After some experiments, it becomes clear that the accept loop in ExeMgr must have unfettered access to new threads, so that is what I did. |
| Comment by Daniel Lee (Inactive) [ 2018-01-22 ] |
|
Build verified: 1.1.3-1 Github source /root/columnstore/mariadb-columnstore-server Merge pull request #84 from mariadb-corporation/ /root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine Merge pull request #375 from mariadb-corporation/dev-1.1-build-fix Fix missing compiler flag from 1.0 -> 1.1 merge No longer reproducing the issue originally reported (see test case in comment #1) |