Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16249

CHECKSUM TABLE for a spider table is not parallel and saves all data in memory in the spider head by default

Details

    Description

      When doing 'CHECKSUM TABLE t' on a partitioned spider table it will fetch all rows from the different data nodes sequentially and store the result on the spider head. On very large tables the mysqld process will be killed due to OOM (without a trace in the error log).

      One suggested workaround is to set spider_quick_mode = 3 before running such statement, but we would prefer that the command is sent to each data node and executed in parallel and then aggregate (xor?) the result on the spider head.

      This appears to be a specific case of the more general issue that a large result may cause an Out-Of-Memory on the spider head. This should never be the case, and thus we would prefer that spider have an upper limit on how much results it can cache on the spider head, or some other way to avoid a valid query causing a server crash due to out of memory.

      Attachments

        Issue Links

          Activity

            I'd suggest a different approach. Make existing handler::checksum() method to calculate the checksum in the engine, the slow way. Move current implementation from sql_table.cc to handler::checksum().

            And to get the fast checksum value from the engine — use handler::info(), not handler::checksum().

            serg Sergei Golubchik added a comment - I'd suggest a different approach. Make existing handler::checksum() method to calculate the checksum in the engine, the slow way. Move current implementation from sql_table.cc to handler::checksum(). And to get the fast checksum value from the engine — use handler::info(), not handler::checksum().

            serg

            > And to get the fast checksum value from the engine — use handler::info(), not handler::checksum().

            Currently, hsndler::info() does not return checksum value. Should I add checksum value into ha_statistics class?
            Also, I think it requires to move logic from handler::checksum() to handler::info() of all storage engines. Would it possible to I avoid it?

            Kentoku Kentoku Shiba (Inactive) added a comment - serg > And to get the fast checksum value from the engine — use handler::info(), not handler::checksum(). Currently, hsndler::info() does not return checksum value. Should I add checksum value into ha_statistics class? Also, I think it requires to move logic from handler::checksum() to handler::info() of all storage engines. Would it possible to I avoid it?

            Yes, that's what I mean, moving checksum to handler::info(), ha_statistics looks like a good place.

            Only myisam and aria support live checksum, there are no other engines that are affected, as far as I can see.

            serg Sergei Golubchik added a comment - Yes, that's what I mean, moving checksum to handler::info(), ha_statistics looks like a good place. Only myisam and aria support live checksum, there are no other engines that are affected, as far as I can see.

            checksum API refactoring is pushed into 10.3, commit ffb83ba6502

            serg Sergei Golubchik added a comment - checksum API refactoring is pushed into 10.3, commit ffb83ba6502
            Kentoku Kentoku Shiba (Inactive) added a comment - - edited

            serg
            I added checksum_null parameter to ha_statistics, because Spider have to set null value if Spider get null from data nodes. Would you please review this again?
            a252067

            Kentoku Kentoku Shiba (Inactive) added a comment - - edited serg I added checksum_null parameter to ha_statistics, because Spider have to set null value if Spider get null from data nodes. Would you please review this again? a252067

            People

              Kentoku Kentoku Shiba (Inactive)
              mattiasjonsson Mattias Jonsson
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.