[MDEV-4991] GTID binlog indexing Created: 2013-09-03  Updated: 2024-02-08

Status: In Testing
Project: MariaDB Server
Component/s: Replication
Fix Version/s: 11.4.1

Type: New Feature Priority: Critical
Reporter: Kristian Nielsen Assignee: Roel Van de Paar
Resolution: Unresolved Votes: 5
Labels: Preview_11.4, gtid

Issue Links:
Blocks
blocks MDEV-25764 SELECT binlog_gtid_pos takes a very l... Confirmed
PartOf
includes MDEV-25392 IO thread reporting yes despite faili... Open

 Description   

Current GTID code needs to scan one binlog file from the beginning when a
slave connects, to find the place to start replicating. If slave reconnects
are frequent, this can be a performance regression, as in earlier versions
slaves could immediate seek to the supplied file offset.

To fix this problem, indexing should be done on the binlog files, allowing to
quickly locate any GTID in the binlog file. As an added bonus, this would
allow to detect if old-style replication tries to connect at an incorrect file
offset (eg. in the middle of an event), avoiding sending potentially corrupted
events.

The index could be an extra file master-bin.000001.idx written in parallel
with the binlog file. There is no need to flush or sync the file at every
binlog write, as it can be recovered easily in case of crash or code can fall
back to scanning the corresponding binlog file.

The index would be page-based, allowing a connecting slave to do binary search
to find the desired location in the binlog to start replication. The file
would contain an ordered sequence of GTID binlog states with their
corresponding start offset into the associated binlog file.

The connecting slave would then binary-search for its start position in the index, and this way be able to jump directly to the right start position in the binlog file, without needing to scan that binlog from the start. This would greatly improve performance when many slaves connect simultaneously.
A B-tree like structure is more efficient for disk-based searching than binary search.
Since we write out the index record in order, we can actually build a B-tree
append-only to the index file. At the end, the root node will be the last page in the
file.

There is no need to include every position in the index. We can write say one
in every 10 transactions into the index; a connecting slave will then lookup
the closest matching position in the index and at most need to skip over 10
transactions in the binlog. In general, we can keep track of the size of
binlog written and index written, and write only a fraction of transactions
into the index to ensure that the ratio of index size to binlog size does not
exceed some appropriate number (eg. 2% or something).

To further reduce the index size, it could be "compressed" by omitting from entries those (domain_id, server_id) combinations that do not change. Typically, there can be many distinct such values in a binlog file, but only a few of them are likely to change within one given file.

A work-in-progress high-level design description:

  This implements an on-disk index for each binlog file to speed up access to
  the binlog at a specific offset or GTID position. This is primarily used when
  a slave connects to the master, but also by user calling BINLOG_GTID_POS().
 
  A connecting slave first scans the binlog files to find the last one with an
  initial GTID_LIST event that lies before the starting GTID position. Then a
  sequential scan of the binlog file is done until the requested GTID position
  is found.
 
  The binlog index conceptually extends this using index records corresponding
  to different offset within one binlog file. Each record functions as if it
  was the initial GTID_LIST event of a new binlog file, allowing the
  sequential scan to start from the corresponding position. By having
  sufficiently many index records, the scan will be fast.
 
  The code has a performance-critical "sync" path which is called while holding
  LOCK_log whenever a new GTID is added to a binlog file. And a less critical
  "async" path which runs in the binlog background thread and does most of the
  processing. The "sync" and "async" paths each run single threaded, but can
  execute in parallel with each other.
 
  The index file is written incrementally together with the binlog file.
  However there is no fsync()'s of the index file needed while writing. A
  partially written index left by a crashing server will be re-written during
  binlog recovery. A reader is allowed to use the index as it is begin written
  (for the "hot" binlog file); such access is protected by mutex.
 
  In case of lost or corrupt index, fallback to full sequential scan is done
  (so performance will be affected but not correct functionality).
 
  The index file is structured like a B+-tree. The index is append-only, so
  also resembles a log-structured merge-tree, but with no merging of levels
  needed as it covers a single fixed-size binlog file. This makes the building
  of the tree relatively simple.
 
  Keys in the tree consist of a GTID state (corresponding to a GTID_LIST
  event) and the associated binlog file offset. All keys (except the first key
  in each level of the tree) are delta-compressed to save space, holding only
  the (domain_id, server_id) pairs that differ from the previous record.
 
  The file is page-based. The first page contains the leftmost leaf node, and
  the root node is at the end of the file. An incompletely written index file
  can be detected by the last page in the file not being a root node page.
  Nodes in the B+-tree usually fit in one page, but a node can be split across
  multiple pages if GTID states are very large.
 
  ToDo: Document the page /indexfile format.
 
  Here is an example index file in schematic form:
 
       S0 D1 D2    D3 D4 D5    D6 D7 D8    D9 D10 D11
    A(S0 D1 D2) B(D3 D4 D5) C(D6 D7 D8) E(D9 D10) F(D11)
        D(A <S3> B <D4+D5+D6> C)   G(E <D10+D11> F)
                        H(D <S9> G)
 
  S0 is the full initial GTID state at the start of the file.
  D1-D11 are the differential GTID states in the binlog file; eg. they could
      be the individual GTIDs in the binlog file if a record is writte for
      each GTID.
  S3 is the full GTID state corresponding to D3, ie. S3=S0+D1+D2+D3.
  A(), B(), ..., H() are the nodes in the binlog index. H is the root.
  A(S0 D1 D2) is a leaf node containing records S0, D1, and D2.
  G(E <D10+D11> F) is an interior node with key <D10+D11> and child pointers to
      E and F.
 
  To find eg. S4, we start from the root H. S4<S9, so we follow the left child
  pointer to D. S4>S3, so we follow the child pointer to leaf node C.
 
  Here are the operations that occur while writing the example index file:
 
    S0  A(A) R(A,S0)
    D1       R(A,D1)
    D2       R(A,D2)
    D3  W(A) I(D) P(D,A) A(B) R(B,D3) R(D,S3)
    D4       R(A,D4)
    D5       R(A,D5)
    D6  W(B) P(D,B) A(C) R(C,D6) R(D,D4+D5+D6)
    D7       R(C,D7)
    D8       R(C,D8)
    D9  W(C) P(D,C) A(E) R(E,D9) W(D) I(H) P(H,D) R(H,S9)
    D10      R(E,D10)
    D11 W(E) I(G) P(G,E) A(F) R(F,S10) R(G,D10+D11)
    <EOF> W(F) P(G,F) W(G) P(H,G) W(H)
 
    A(x)   -> allocate leaf node x.
    R(x,k) -> insert an index record containing key k in node x.
    W(x)   -> write node x to the index file.
    I(y)   -> allocate interior node y.
    P(y,x) -> insert a child pointer to y in x.



 Comments   
Comment by Kristian Nielsen [ 2015-01-26 ]

This might also be a nice project for GSoC. It would be very valuable to have, and while not trivial, it is reasonably well defined and reasonably limited to a smaller part of the server code.

I have some more detailed notes on a possible design, I will add them later.

Comment by Kristian Nielsen [ 2015-02-09 ]

Jonas Oreland said he had already implemented this:

https://lists.launchpad.net/maria-developers/msg08143.html

Comment by Jonas Oreland [ 2015-04-24 ]

fyi...now trying to produce a usable patch...

Comment by Andrei Elkin [ 2023-05-17 ]

bnestere, to the design, we can also make use of the slave knowledge of the gtid's binlog coordinates which it could contribute to master
(if the latter needs that) that would be incorporated with or initiate a memory gtid binlog index.

Comment by Aleksey Midenkov [ 2023-06-27 ]

Please review bb-10.11-midenok-MDEV-4991 (all last commits authored by me)

Comment by Kristian Nielsen [ 2023-08-02 ]

From a quick look at mostly the commit comments, it sounds like the existing work doesn't implement this task, but instead a different task of keeping an in-memory index of parts of the binlog.

I think there are a couple of problems with this approach (if I understood correctly).

The most serious problem is if this does not solve the serious problem of slow worst-case performance of slave connect. In the worst case, the slave connect needs on the master to fully scan a whole binlog file, even worse if that file is encrypted. IIUC, an in-memory index can improve the average time for this (in case of many slave connects during master server uptime), but it will not help the worst-case latency resulting from this.

Another problem is that this incurs a memory usage in the master server that's unused for most of the time, only used when a slave connects, which is rare in many setups.

So if we want to implement the in-memory index first, it's very important to design the user-visible part of the feature (eg. configuration options, status variables, etc.) so that it can be seamlessly upgraded to when the real solution with disk-based binlog indexes is implemented.

Comment by Aleksey Midenkov [ 2023-08-03 ]

So if we want to implement the in-memory index first, it's very important to design the user-visible part of the feature (eg. configuration options, status variables, etc.) so that it can be seamlessly upgraded to when the real solution with disk-based binlog indexes is implemented.

Please share your insights, if any, on that.

Comment by Kristian Nielsen [ 2023-08-04 ]

I'll need to look closer at the design then, I'll try to do that at some point. Is there any design description or discussions around this I could refer to? Or is it just the code in branch bb-11.2-midenok-MDEV-4991 ?

Comment by Andrei Elkin [ 2023-08-04 ]

knielsen, I'd go to implement the in-memory first, after all but some worst cases access to the slave start gtid is reduced drastically.
Having just two index record per file halves (well, discounting the index read and seek to its position) the access time.

The disk-based can be considered separately, and unlike rob.schwyzer@mariadb.com's proposal I'd consider how to turn it on automatically after a memory limit is reached.

Comment by Aleksey Midenkov [ 2023-08-07 ]

knielsen As far is user interface is concerned you may not go far into design. Commit message explanation should be enough.

Comment by Kristian Nielsen [ 2023-08-17 ]

Ok, looks like the two configuration variables can just be made no-op and the instrumentation show zeros, once the proper implementation gets done.

Comment by Kristian Nielsen [ 2023-08-30 ]

FTR, I have now started to work on this.

Comment by Andrei Elkin [ 2023-08-31 ]

knielsen, so you're adding the index persistency layer to bb-11.2-midenok-MDEV-4991?

Comment by Kristian Nielsen [ 2023-08-31 ]

Elkin, no, persistent index is the proper implementation of this, as described in the task description. It will replace any in-memory index that's in the code when the implementation is done.

Comment by Andrei Elkin [ 2023-08-31 ]

knielsen, you say 'replace', but could your ultimate goal materialize with enriching the Alekey's design with a layer to write memory index records to a file?

Comment by Kristian Nielsen [ 2023-08-31 ]

I don't believe an in-memory index serves any useful purpose.
Do you think there's a reason to have an in-memory index, and if so, why? I haven't seen any description on how an in-memory index is preferable to a disk-based.

Comment by VAROQUI Stephane [ 2023-08-31 ]

Why not using an extra aria system table partition by range of binlog name ( fix record size) , add partition on binlog file create, drop partition when binlog is purged

Comment by Andrei Elkin [ 2023-09-01 ]

knielsen,
To start off with the in-memory index, I always thought, requires from its design a possibility to add a disk base and rather cheaply. I did not have time yet to look into Aleksey's work, but took for granted its memory structs could be written down to disk at binlog rotation. The current binlog's gtid index can conceptually be thought as an extension to the binlog file name record of master-bin.index file.
Even implemented like that. For instance to append, say 200, offsets (numbers) to the index file record which would become

master-bin.000001 offset_1 offset_2 ... offset_200

(and still "minimally" impact on the current business logics of that index maintenance)
would make decrease the average (when the requested one is in the middle) cost of seeking to a requested gtid position to just 1% in the worst case.

I don't insist on this particular approach. After all it's ad hoc at this very moment. My point is obviously try to exploit Aleksey (well his work efforts ) to the maximum.

Comment by Kristian Nielsen [ 2023-09-01 ]

Elkin I think the index should be written incrementally, in parallel with the binlog file. But I'll share my thoughts as soon as I have something concrete,. For now I'm still working on ideas.

I'm not sure how having the offsets in the .index would help? You need the complete binlog state associated with each index to be able to locate correctly a GTID position, the same info that's stored in GTID_LIST_EVENT. And this can be potentially fairly large on setups where many domain_ids or server_ids were configured in the past.

But agree, I also just want this fixed properly. Looking now, this bug is 10 years since I created it. Maybe it was a mistake that I pushed GTID without an implementation of this included.

Comment by Andrei Elkin [ 2023-09-01 ]

> I'm not sure how having the offsets in the .index would help? You need the complete binlog state associated with each index to be able to locate correctly a GTID position

I had in mind a simple one domain case. And offsets are offsets in the binlog files for each say 1000th gtid logged into it.
I am not sure about the multiple domains. While there are few the master.info record extension may still do well enough, but it does not look to scale up.

To
> the index should be written incrementally
and then again perhaps as a batch (of the my offset delta size).

Comment by Kristian Nielsen [ 2023-09-01 ]

Ok, I took a quick look at the bb-11.2-midenok-MDEV-4991 branch.

So this only addresses the lookup from offset to GTID position. It doesn't try to speed up slave connect, where it needs to lookup from GTID position to offset. That's why it never needs the complete binlog state.

It also still processes every GTID in the binlog file up to offset (IIUC), just doing so from an in-memory GTID list rather than reading full events from a file.

Comment by Aleksey Midenkov [ 2023-09-01 ]

Excuse me, but binlog_gtid_pos() call is what used in SHOW MASTER STATUS upon slave connection. Isn't it? https://mariadbcorp.atlassian.net/browse/SAMU-142 is the original task for the above branch to speed up SHOW MASTER STATUS. I guess, rob.schwyzer@mariadb.com had this requirement based on production cases.

Comment by Kristian Nielsen [ 2023-09-02 ]

midenok, it depends on whether the connection slave is using GTID position to connect or not.

If using GTID position, the slave needs to look up the file/offset of where to start in the binlog.

If not using GTID position, the slave calls binlog_gtid_pos(), but this is just to record a GTID position that can be used for a later connect after CHANGE MASTER TO master_use_gtid=slave_pos.

Both of these currently requires scanning a binlog file from the start, which can be slow.

I'm not sure about how SHOW MASTER STATUS would be involved?

Hope this helps,

- Kristian.

Comment by Aleksey Midenkov [ 2023-09-04 ]

knielsen You right, SHOW MASTER STATUS is not involved here. In any case, the use case of binlog_gtid_pos() suffers from significant slowdowns as SAMU-142 testifies. I'm not sure on how much important the opposite speedup though. Btw, the patches for review are in bb-10.11-midenok-MDEV-4991 (there are more patches)

Comment by Kristian Nielsen [ 2023-09-04 ]

The speedup for a slave connecting with Master_use_gtid=slave_pos is the most critical one, as for GTID position, the search for the place to start is required.

The speedup for binlog_gtid_pos() is simpler to solve. This is called by a slave connecting with Master_use_gtid=no, and it is only done so that the slave can compute a correct @@gtid_slave_pos in case the user later wants to switch to GTID automatically. An option could be implemented for the user to disable that to avoid the slowdown (if they don't plan to switch to GTID at the moment), which would be even simpler than your patch and could probably even be backported to stable releases.

Both cases of slowdown are of similar magnitude and both very important of course.

Comment by Kristian Nielsen [ 2023-09-07 ]

Branch knielsen_mdev4991 is still very much work in progress.
But now there is a bit to see.
It writes out gtid indexes and can search the index quickly to implement BINLOG_GTID_POS().

Comment by Kristian Nielsen [ 2023-09-09 ]

Now the code in branch knielsen_mdev4991 is more mature. It speeds up both GTID and non-GTID slave connect, as well as BINLOG_GTID_POS().
Binlog and replication tests are passing.
Still missing binlog purge, crash recovery, and async code path.

Comment by Kristian Nielsen [ 2023-09-11 ]

Added a link to the mailing list thread discussing the design

Comment by Kristian Nielsen [ 2023-11-10 ]

The GTID index feature is now complete and ready for testing.

The code, based on 11.3, is pushed to knielsen_mdev4991_11.3:

https://github.com/MariaDB/server/commits/knielsen_mdev4991_11.3

(Taking this back, I feel the original MDEV was kind of hi-jacked here for something entirely different.)

Comment by Kristian Nielsen [ 2023-11-10 ]

Here is some documentation for the testing and eventually for the KB:

Binlog GTID indexes
 
This feature is a performance improvement for when a slave connects to the
master.
 
For each binlog file master-bin.000001, an index file master-bin.000001.idx
will be written (unless disabled with binlog_gtid_index=0). This index
allows fast lookup of a GTID position to obtain the corresponding binlog
coordinates (used for slave connect with Master_use_gtid=slave_pos). And
conversely, fast lookup of binlog file offset to corresponding GTID position
(used for BINLOG_GTID_POS() called by slave connect with Master_use_gtid=no).
 
Before this feature, the above required sequential scanning of the binlog
file, which has a default size of 1GB. This will be particularly impactful
when the binlog is not cached in memory and IO is slow; when binlog file is
encrypted (CPU overhead for decryption); and when many slaves connect
simultaneously.
 
Use of gtid indexes is automatic, enabled by default.
After upgrade to 11.4, new binlog files will have an index created. Old
pre-11.4 binlog files will not be indexed. The code will fall back
gracefully to the old method of sequential scan if an index file is not
available or corrupt (so slave connect will not fail in this case).
 
The feature is not expected to need any tuning for most users. Still, some
tuning parameters are available:
 
binlog_gtid_index_page_size (default 4096)
  Page size to use for the binlog GTID index. This is the size of the nodes
  in the B+-tree used internally in the index. A very small page-size (64 is
  the minimum) will be less efficient, but can be used to stress the
  BTree-code during testing.
 
binlog_gtid_index_span_min (default 65536)
  Control sparseness of the binlog GTID index. If set to N, at most one
  index record will be added for every N bytes of binlog file written,
  to reduce the size of the index.
 
With binlog_gtid_index_span_min, we can reduce the number of records in the
index (and thus its size) when the binlog has lots of very small transactions. A
lookup of an omitted GTID will start from the closest prior GTID in the binlog file
and scan forward from there. Skipping a moderate amount of event data at the
start of slave connect will normally have negligible performance impact.
 
Two status variables are available to monitor the use of the GTID indexes:
 
  Binlog_gtid_index_hit
  Binlog_gtid_index_miss
 
The "hit" status increments for each successful lookup in a GTID index.
The "miss" increments when a lookup is not possible. This indicates that the
index file is missing (eg. after upgrade), or corrupt. Thus in normal
operation the "miss" counter is expected to be small/zero. A "Note"-level
message is logged in the error log when an index is corrupt and fallback to
sequential scan is needed.
 
The GTID indexes are written asynchroneously to minimally impact the
performance of binlog write and transaction commmit. In case of a server
crash, the index may be only partially written. During binlog crash
recovery, the GTID index files are recovered (re-created) together with the
scanning of the binlog files.

Comment by Jukka Pihl [ 2023-11-13 ]

If I understand main purpose of this feature is make BINLOG_GTID_POS() faster. (arggh...
make slave connection faster!)

I think this could enable some other new functionality too:

  • Function to convert GTID => FILE:POS (reverse BINLOG_GTID_POS()) => BINLOG_FILE_POS(gtid)
  • Function to get previous GTID/FILE:POS. This is also difficult because MariaDB/MySQL binlog events don't contain references to previous binlog entries (pity). Currently backtracking binlog is very hard, because you have to basically scan whole binlog again. "BINLOG_RELATIVE_GTID(GITD, COUNT=-1)" or "BINLOG_RELATIVE_POS(file,pos,count=-1)" something...
  • Previous functionality could be nice also with with mysqlbinlog/mariadb-binlog tool.
    You could have relative start positions:
    Something like "--start-position=1-394-3000:-1" where you give position and relative position as transactions. In local mode this could use index files directly and remote mode this would use these functions told previously.
Comment by Kristian Nielsen [ 2023-11-13 ]

bluebike, Yes, the functionality for GTID=>FILE:POS is already in the code, used when a slave connects with Master_use_gtid=slave_pos. It's just not exposed through a user function currently.

Interesting idea about backtracking. Indeed this should be possible to implement using the indexes, by following the previous key in the GTID index.

(The main purpose of GTID indexes is to speed up when a slave connects. This can otherwise use up a lot of CPU, disk I/O, and/or wallclock time.)

Comment by Kristian Nielsen [ 2023-11-15 ]

Roel, at the Unconference it was discussed that the test team might want to test the gtid indexes before it's pushed to 11.4. Elena suggested to assign tentatively to you and "request testing".

The code is available in the branch knielsen_mdev4991_11.3 .
Just let me know in case of any questions. Especially, if you want to test, please be sure to let me know of any problems that I need to fix in time for me to do it before the 11.4 release.

- Kristian.

Comment by Roel Van de Paar [ 2023-11-15 ]

knielsen Yes, thank you. And from the discussion I understand you will also co-test, which will be helpful as there are a fair number of things to test in the pipeline for the coming weeks and months. Thank you

Comment by Ramesh Sivaraman [ 2023-12-13 ]

Roel Functional testing looks good. Did not see any related/specific issues. Also tested in combination with binlog encryption.

Comment by Kristian Nielsen [ 2024-01-09 ]

I have updated the patch:

  • Included few fixes from Serg following 11.4 preview release
  • Updated with some review comments from Monty
  • Rebase on latest 11.4

Main visibility is that the options --binlog-gtid-index-sparse and --binlog-gtid-index-span-max have been removed to simplify the user configuration, as these options were mostly redundant.

The current code is pushed to the branch: bb-11.4-knielsen-mdev4991

Comment by Kristian Nielsen [ 2024-01-27 ]

Binlog GTID indexes has now been merged to 11.4.
Text for documentation is available in a previous comment (I can add it to KB myself if needed).

  • Kristian.
Comment by Roel Van de Paar [ 2024-01-29 ]

Not sure why this was merged; no OK to push was provided.

As there are still errors, observed during testing, to be evaluated, as well as a backlog of issues and tasks, this will not make it into 11.4.

Comment by Roel Van de Paar [ 2024-01-29 ]

knielsen Please revert https://github.com/MariaDB/server/commit/d039346a7acac7c72f264377a8cd6b0273c548df

Comment by Kristian Nielsen [ 2024-01-29 ]

Roel, the patch will not be reverted, unless I learn of a technical reason to do so.

I have not been informed of any errors observed during test, what are you referring to?

In fact, I have not seen hardly any discussions anywhere, IRC, Zulip, mailing list, of the 11.4 release. Does such discussion take place on internal communication channels not accessible to all developers? Or is there just no coordination whatsoever?

This task has been completed and testing has been done and merge has been agreed with the reviewer. All is as it should be.

Roel, it's great if you want to contribute testing of new features, that's much appreciated. But you can't come months after a patch is complete, with no communication at all except "Functional testing looks good", and suddenly say "no OK to push was provided". That's simply not workable.

Comment by Roel Van de Paar [ 2024-01-31 ]

> I have not been informed of any errors observed during test, what are you referring to?
Various server errors and crashes observed during testing. Work on these is ongoing; debugging and analysis take time and the current workload in replication testing is very high. If anything related to the feature is found, it will be posted here.

> In fact, I have not seen hardly any discussions anywhere, IRC, Zulip, mailing list, of the 11.4 release.
As far as this feature is concerned, it is because there is nothing to report: testing work continues.

> testing has been done
Functional testing has been completed and is fine, stress testing was and is in progress and is not completed.

> All is as it should be.
It is not. The current regular procedure for all features is: Implementation > Review > Reviewer Signoff > Testing > Tester Signoff. Management can overwrite these steps if needed.

> it's great if you want to contribute testing of new features
This is not how it works; each feature is tested by a tester and when the overall workload is too high, features are postponed to the next release.

Comment by Kristian Nielsen [ 2024-01-31 ]

"The current regular procedure is..." "Management can override ..." - there are MariaDB corporation procedures. But this is not a MariaDB corporation work. Maybe that's the misunderstanding here.

Since this is not MariaDB corporation work, you are not under any obligation to test it. But of course if you want to contribute testing, that's much appreciated. That is why I wrote in November:

"if you want to test, please be sure to let me know of any problems that I need to fix in time for me to do it before the 11.4 release."

Already back then, you informed me that it was uncertain if you would have the time. Therefore I planned to do the required testing otherwise. And this is what happened, the required testing has been completed outside of MariaDB corporation, that is what is meant by "testing has been done".

Comment by Roel Van de Paar [ 2024-02-01 ]

Thank you. I understand where the confusion comes from now. However, please note that wherever a patch or feature comes from, testing and signoff by MariaDB testers is standard procedure, alike to review signoff.

Comment by Roel Van de Paar [ 2024-02-05 ]

knielsen Hi!

An issue that I regularly see in the runs is this assertion upon shutdown:

11.3.0 d6efded921e4168875ef6d4e1a0ba439a03b5c30

2023-12-30 23:51:47 0 [Note] /test/MDEV-4991_MD301223-mariadb-11.3.0-linux-x86_64-dbg/bin/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown
2023-12-30 23:51:47 6 [Note] Error reading relay log event: slave SQL thread was killed
2023-12-30 23:51:47 6 [Note] Slave SQL thread exiting, replication stopped in log 'binlog.000002' at position 367; GTID position '0-1-202', master: 127.0.0.1:45519
2023-12-30 23:51:47 5 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
2023-12-30 23:51:47 5 [Note] Slave I/O thread exiting, read up to log 'binlog.000002', position 367; GTID position 0-1-202, master 127.0.0.1:45519
mariadbd: /test/knielsen_mdev4991_11.3_dbg/sql/mysqld.cc:3767: void my_malloc_size_cb_func(long long int, my_bool): Assertion `!is_thread_specific || (mysqld_server_initialized && thd)' failed.

The only bug I could locate with a similar assert is an old closed Galera bug (MDEV-25389) and possible MDEV-19515 is related.

The issue has proven to be not reducible into a testcase, and always happens after shutdown.

Comment by Kristian Nielsen [ 2024-02-05 ]

Roel , do you have a stack trace showing where this happens? This is an assertion about memory usage accounting, the stack trace should show which memory allocation is involved.

Can you get a core file and show (with gdb) the values of the variables is_thread_specific, mysql_server_initialized, and thd? Apparently this is a memory allocation that should be accounted on a specific thread, but either the server is not yet initialized, or it's called from a place where the thread is unknown.

Comment by Roel Van de Paar [ 2024-02-06 ]

knielsen Thank you for the input. All good; I have found that the issue is present in a non-patched 11.3 as well. I will continue to debug it on the side and log it later as a new ticket (including the information mentioned), but it is confirmed not related to MDEV-4991. (NTS: data/000000)

Generated at Thu Feb 08 07:00:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.