Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35834

Server crash in FVector::distance_to upon concurrent SELECT

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 11.7(EOL)
    • 11.7.2
    • Vector search
    • None

    Description

      The failure was observed on a development branch which already has a patch for MDEV-35745. I assume it affects the main branch too – it fails there, but at the time of the writing, the patch for MDEV-35745 is not yet in main, so I can't tell for sure.

      Unlike MDEV-35745, the scenario here doesn't do anything "wrong" with the data. It loads a table (10K rows, 96 dimensions, a dataset from one of public benchmarks), performs an UPDATE – it looks like any update would do, I've made it a no-op on 1 row, but it can be different – and then 2 identical concurrent SELECTs using the vector key.

      The test case fails very non-deterministically and unreliably. On one of my machines, it fails within 10-20 attempts, on another one, the same build, it takes hundreds.
      With rr it is even worse, although I got it fail a couple of times, so it's possible. I'll provide an rr profile. It's not a good profile, because it is a run with many repetitions without a restart in between, thus a lot of unnecessary noise; but so far I couldn't produce a better one.

      The test case comes in two variations, although it's essentially the same.

      The attached tdump.test is self-contained, it is a data dump plus logic described above. However, it takes some time on each repetition to load the data.
      Another one uses pre-created .frm and .MYI/.MYD files instead of the dump. Extract table.tar.gz in the tmp dir and run the test case below (or change the path in the test case).

      If you need to reproduce the failure locally and it doesn't happen easily, the second one is faster – more repetitions in the same time interval.

      test case using pre-created table

      --let $datadir= `select @@datadir`
      --exec cp /tmp/deep_image_96_10K_MyISAM_cosine* $datadir/test/
      flush tables;
      update deep_image_96_10K_MyISAM_cosine set veccol = veccol order by pk limit 1;
       
      --connect (con1,localhost,root,,)
      --send
      select pk from deep_image_96_10K_MyISAM_cosine order by VEC_DISTANCE_COSINE(0x661B3F46BC0114FEE4E37A229920B6EFCCA186F807D641897CA5B236847F313B524B7C3B6E89C752CB021153F3448F5C1727FFFEA11CA777B30A757AD660AE221D270033E09EE1D972BCFFA93DF3AFDC4249BF082BFE6578E91A9FD007FD823CA78EFC5F414D436408034501564D27604CF6D736747B8B9DDFA6014B0805BE5A1181409E82881D136EF4E28D4F340618254E47779C84F8C6A5DDDAC9E9C9319B6A2FEBE1926F2ED5A371D71D07B53DD3DE221DDB9429BD125AA0FB6A99181CE08359DE4782D1A6ABB88A24C0AED2CCB167A23C436C69D981DE0F732F28F34E587B1E28C141EF6694AC3ED887268BA2B3CFBDB1CF13445DF4220843F78779D7D4437FD95EE0889EA1607EC3617CCFDFC9F7638F7E99BB289A569E6AD3C58BB873EA7BD20E5ECD0DC10359265E92BE74E21EA5B320FFBE4B4468CFF8D2D338F126610333D39C9DE30576DFD06F9829502FF48330F6355BC5024B8BCFF5C18191FCC736EBAAB909015CC8E2D2847D30948FCAFC14E04A95A7E30DE3FC2B8D5679D6, veccol) limit 100;
      --connection default
      select pk from deep_image_96_10K_MyISAM_cosine order by VEC_DISTANCE_COSINE(0x661B3F46BC0114FEE4E37A229920B6EFCCA186F807D641897CA5B236847F313B524B7C3B6E89C752CB021153F3448F5C1727FFFEA11CA777B30A757AD660AE221D270033E09EE1D972BCFFA93DF3AFDC4249BF082BFE6578E91A9FD007FD823CA78EFC5F414D436408034501564D27604CF6D736747B8B9DDFA6014B0805BE5A1181409E82881D136EF4E28D4F340618254E47779C84F8C6A5DDDAC9E9C9319B6A2FEBE1926F2ED5A371D71D07B53DD3DE221DDB9429BD125AA0FB6A99181CE08359DE4782D1A6ABB88A24C0AED2CCB167A23C436C69D981DE0F732F28F34E587B1E28C141EF6694AC3ED887268BA2B3CFBDB1CF13445DF4220843F78779D7D4437FD95EE0889EA1607EC3617CCFDFC9F7638F7E99BB289A569E6AD3C58BB873EA7BD20E5ECD0DC10359265E92BE74E21EA5B320FFBE4B4468CFF8D2D338F126610333D39C9DE30576DFD06F9829502FF48330F6355BC5024B8BCFF5C18191FCC736EBAAB909015CC8E2D2847D30948FCAFC14E04A95A7E30DE3FC2B8D5679D6, veccol) limit 100;
       
      --connection con1
      --reap
      drop table deep_image_96_10K_MyISAM_cosine;
      

      bb-11.8-MDEV-35450-vec_distance 57e669f1f0d5ed5d349d73e439b2d3549583310a

      #3  <signal handler called>
      #4  0x0000557b671700c0 in FVector::distance_to (this=0x0, other=0x7fbc6001d6f8, vec_len=96) at /data/bld/main-debug/sql/vector_mhnsw.cc:212
      #5  0x0000557b6716a559 in FVectorNode::distance_to (this=0x7fbc643a86f0, other=0x7fbc6001d6f8) at /data/bld/main-debug/sql/vector_mhnsw.cc:719
      #6  0x0000557b671718bc in VisitedSet::create (this=0x7fbc7c0ddb90, node=0x7fbc643a86f0) at /data/bld/main-debug/sql/vector_mhnsw.cc:857
      #7  0x0000557b6716c1b7 in search_layer (ctx=0x7fbc60235eb8, graph=0x7fbc60c19ef8, target=0x7fbc6001d6f8, threshold=-1, result_size=1, layer=165, inout=0x7fbc7c0ddda0, construction=false) at /data/bld/main-debug/sql/vector_mhnsw.cc:1057
      #8  0x0000557b6716dafc in mhnsw_read_first (table=0x7fbc60117a68, keyinfo=0x7fbc60c19da8, dist=0x7fbc6001a3d0, limit=1) at /data/bld/main-debug/sql/vector_mhnsw.cc:1295
      #9  0x0000557b66aaa0e1 in TABLE::hlindex_read_first (this=0x7fbc60117a68, nr=1, item=0x7fbc6001a3d0, limit=1) at /data/bld/main-debug/sql/sql_base.cc:9927
      #10 0x0000557b66c095e5 in join_read_first (tab=0x7fbc6001bde0) at /data/bld/main-debug/sql/sql_select.cc:25220
      #11 0x0000557b66c066bf in sub_select (join=0x7fbc6001a620, join_tab=0x7fbc6001bde0, end_of_records=false) at /data/bld/main-debug/sql/sql_select.cc:24106
      #12 0x0000557b66c057ea in do_select (join=0x7fbc6001a620, procedure=0x0) at /data/bld/main-debug/sql/sql_select.cc:23620
      #13 0x0000557b66bd032f in JOIN::exec_inner (this=0x7fbc6001a620) at /data/bld/main-debug/sql/sql_select.cc:5040
      #14 0x0000557b66bcf30f in JOIN::exec (this=0x7fbc6001a620) at /data/bld/main-debug/sql/sql_select.cc:4823
      #15 0x0000557b66bd0dca in mysql_select (thd=0x7fbc60000dc8, tables=0x7fbc60018cb8, fields=..., conds=0x0, og_num=1, order=0x7fbc6001a498, group=0x0, having=0x0, proc_param=0x0, select_options=2164525824, result=0x7fbc6001a5f8, unit=0x7fbc600052f0, select_lex=0x7fbc60018648) at /data/bld/main-debug/sql/sql_select.cc:5356
      #16 0x0000557b66bbed22 in handle_select (thd=0x7fbc60000dc8, lex=0x7fbc60005210, result=0x7fbc6001a5f8, setup_tables_done_option=0) at /data/bld/main-debug/sql/sql_select.cc:633
      #17 0x0000557b66b61023 in execute_sqlcom_select (thd=0x7fbc60000dc8, all_tables=0x7fbc60018cb8) at /data/bld/main-debug/sql/sql_parse.cc:6191
      #18 0x0000557b66b58be9 in mysql_execute_command (thd=0x7fbc60000dc8, is_called_from_prepared_stmt=false) at /data/bld/main-debug/sql/sql_parse.cc:3980
      #19 0x0000557b66b66080 in mysql_parse (thd=0x7fbc60000dc8, rawbuf=0x7fbc60017f20 "SELECT pk FROM deep_image_96_10K_MyISAM_cosine ORDER BY VEC_DISTANCE_COSINE(0x661B3F46BC0114FEE4E37A229920B6EFCCA186F807D641897CA5B236847F313B524B7C3B6E89C752CB021153F3448F5C1727FFFEA11CA777B30A757AD6"..., length=867, parser_state=0x7fbc7c0df2e0) at /data/bld/main-debug/sql/sql_parse.cc:7915
      #20 0x0000557b66b522af in dispatch_command (command=COM_QUERY, thd=0x7fbc60000dc8, packet=0x7fbc60b96249 "", packet_length=867, blocking=true) at /data/bld/main-debug/sql/sql_parse.cc:1903
      #21 0x0000557b66b50c08 in do_command (thd=0x7fbc60000dc8, blocking=true) at /data/bld/main-debug/sql/sql_parse.cc:1416
      #22 0x0000557b66d5d413 in do_handle_one_connection (connect=0x557b6af577f8, put_in_cache=true) at /data/bld/main-debug/sql/sql_connect.cc:1415
      #23 0x0000557b66d5d194 in handle_one_connection (arg=0x557b6af19688) at /data/bld/main-debug/sql/sql_connect.cc:1327
      #24 0x0000557b672e0bf2 in pfs_spawn_thread (arg=0x557b6af3fa28) at /data/bld/main-debug/storage/perfschema/pfs.cc:2198
      #25 0x00007fbc86ea8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #26 0x00007fbc86f2861c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
      

      Attachments

        1. table.tar.gz
          6.03 MB
        2. tdump.test
          7.43 MB

        Issue Links

          Activity

            People

              serg Sergei Golubchik
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.