Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4845

ExeMgr becomes temporarely unavailable for some seconds [happens rarely], causing some queries to fail in the application

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 5.6.2
    • Fix Version/s: Icebox
    • Component/s: ExeMgr
    • Labels:
      None
    • Environment:

      Description

      Hello!

      We have a MariaDB 10.5.12 server with ColumnStore 5.6.2 running on a Google Compute Instance (64GB RAM, 64vCPUs, 2TB ssd), serving some queries for an API with a couple of accesses per day (about a dozen, but making some other dozen queries each).

      The queries include mixes of InnoDB tables and ColumnStore tables.

      Since sometime from now this sort of thing happens (rarely) while our backend executes a query (this is an example):

      "message": "(conn=123745, no: 1815, SQLState: HY000) Internal error: Lost connection to ExeMgr. Please contact your administrator
      sql:
       
      SELECT
          c.c4_id,
          c.c4_name,
          ROUND(SUM(quantity)) as quantity,
          ROUND(SUM(total_value),2) as total_value
       
      FROM transaction_daily AS sales
          JOIN store s USING (store_id)
          JOIN product p USING (product_id, ean)
          JOIN latest_product_category pc1 USING (product_id, ean) JOIN client_category_tree c ON pc1.category_id = c.c6_id
          JOIN latest_product_category pc2 USING (product_id, ean) JOIN client_origin_tree o ON pc2.category_id = o.o4_id
       
      WHERE date BETWEEN '2021-07-01' AND '2021-07-31' AND (s.store_id=1 OR s.store_id=2 OR s.store_id=3 OR s.store_id=4 OR s.store_id=5 OR s.store_id=6 OR s.store_id=7 OR s.store_id=8 OR s.store_id=9 OR s.store_id=10 OR s.store_id=11 OR s.store_id=12 OR s.store_id=13 OR s.store_id=14 OR s.store_id=15 OR s.store_id=16 OR s.store_id=17 OR s.store_id=18 OR s.store_id=19 OR s.store_id=20 OR s.store_id=21 OR s.store_id=22 OR s.store_id=23 OR s.store_id=24 OR s.store_id=25 OR s.store_id=26 OR s.store_id=27 OR s.store_id=28 OR s.store_id=29 OR s.store_id=30 OR s.store_id=31 OR s.store_id=33 OR s.store_id=34 OR s.store_id=35 OR s.store_id=36 OR s.store_id=37 OR s.store_id=38 OR s.store_id=39 OR s.store_id=40 OR s.store_id=41 OR s.store_id=42 OR s.store_id=43 OR s.store_id=44 OR s.store_id=45 OR s.store_id=46 OR s.store_id=47 OR s.store_id=48 OR s.store_id=49 OR s.store_id=50 OR s.store_id=51 OR s.store_id=52 OR s.store_id=53 OR s.store_id=54 OR s.store_id=55 OR s.store_id=56 OR s.store_id=57 OR s.store_id=58 OR s.store_id=59 OR s.store_id=60 OR s.store_id=61 OR s.store_id=62 OR s.store_id=63 OR s.store_id=64 OR s.store_id=65 OR s.store_id=66 OR s.store_id=67 OR s.store_id=68 OR s.store_id=69 OR s.store_id=70 OR s.store_id=71 OR s.store_id=72 OR s.store_id=73 OR s.store_id=74 OR s.store_id=77 OR s.store_id=82 OR s.store_id=83) 
       
      GROUP BY c.c4_id, c.c4_name
      ORDER BY total_value DESC, quantity DESC, c.c4_name
       
      parameters:[]",
      

      Related entry in
      /var/log/mariadb/columnstore/crit.log

      Aug 26 14:55:36 mariadb-ubuntu-2004-2-vm ExeMgr[656717]: 36.860088 |2147606742|0|0| C 16 CAL0055: ERROR: ExeMgr has caught an exception. Resource temporarily unavailable
      Aug 26 14:55:36 mariadb-ubuntu-2004-2-vm ExeMgr[656717]: 36.860148 |2147607379|0|0| C 16 CAL0055: ERROR: ExeMgr has caught an exception. Resource temporarily unavailable
      Aug 26 14:55:36 mariadb-ubuntu-2004-2-vm ExeMgr[656717]: 36.860186 |2147607393|0|0| C 16 CAL0055: ERROR: ExeMgr has caught an exception. Resource temporarily unavailable
      Aug 26 14:55:36 mariadb-ubuntu-2004-2-vm ExeMgr[656717]: 36.860242 |2147606750|0|0| C 16 CAL0055: ERROR: ExeMgr has caught an exception. Resource temporarily unavailable
      

      The database feeds an API that generates some couple OLAP queries often, by request. This machine was running for 20 days straight without reboot. I noticed the last ExeMgr's PIDs were in the 4 millions mark. Machine was rebooted and the problem seemed to go away by now.

      This is a rare event but causes issues on some of our dashboards.

      Any idea of what could be causing this? I can provide any additional information necessary. I've attached our ColumnStore.xml setup.

        Attachments

          Activity

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            eugenio.pacceli Eugênio Pacceli Reis da Fonseca
            Votes:
            3 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:

                Git Integration