Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4533

Research columnstore handling disk I/O errors

    XMLWordPrintable

Details

    • Task
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • None
    • Icebox
    • None
    • None

    Description

      It appears that in many cases when a NFS i/o subsystem returns errors, Columnstore logs a message but takes no further action and proceeds as if nothing bad happened. This policy naturally leads to serious problems like corruption of databases, maybe incorrect query responses. Some examples of what appears in some customers logs:

      • controllernode[1000861]: 49.535374 |0|0|0| C 29 CAL0000: VSS::load(): No such file or directory
      • writeengine[1074762]: 18.480080 |0|92|0| I 19 CAL0080: Compression Handling: Compressed data does not fit, caused a chunk shifting
      • IDBFile[1211506]: 31.500512 |0|0|0| D 35 CAL0002: Failed to open file: /var/lib/columnstore/data1/systemFiles/dbrm/DMLLog_182_1, exception: unable to open Buffered file

      This task is ONLY (repeat - ONLY) about studying the code and producing a written document describing what the code does when something abnormal is detected. There should be NO development actions. Opinions on what to change are welcome but not required or expected at this stage.

      Among other things whether the text of error messages is faithful to the event or error code received (there is a fear that they are insufficiently fine grained and end up masking the underlying problem instead of illuminating it). But primarily:

      • what do we actually do in each case (looks like we just proceed and pray hard than tnothing bad will happen, but this needs validation).
      • as we go on, what are the possible consequences? Can we later on get a corrupted extent map and blow up the database? Can we corrupt S3 metadata? Can we start returning wrong results to the select statements in case of "compressed data does not fit"?
      • Other?

      Attachments

        Issue Links

          Activity

            People

              sergey.zefirov Sergey Zefirov
              gdorman Gregory Dorman (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.