Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17105

Simple Subscription of data change

    XMLWordPrintable

Details

    • Task
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      TL DR:

      mariadb client>subscribe stream "connection stream name" using table1 where field1=abc or field2>=1234
      mariadb result>Created ID = 10

      mariadb client> insert into table1 values(abc,1235)

      mariadb 'subscribe'> hello ID=10

      LONG:
      from MDEV-5263

      the main idea is something that kdb+ is doint today

      kdb+ allow stream of data (something like binlog stream with CDC) , this allow applications like high frequency trading to receive updates with lower latency (i'm sure that ultra low latency don't use database and this kind of service, everything is in the same machine, but a "mid frequency trading" use), or maybe business analytics in real time (an example: https://kx.com/media/2016/07/Gerry-Buggy-blog-Streaming-Analytics.pdf)

      check that only sending the information of "hey guy this changed" is the main feature here, this turn off all network and applications workload when using pool like structures. i saw systems with more than 1000+ connections doing pools with >4QPS per connection, i know that's a bad design but since we can solve this at database side without rewriting legacy code, that's interesting. allowing a stream of just "this changed" is a reduction of +4000QPS (lower AWS credit use)

      the problem of "this changed" is : what changed? the table? one field of one row? one row?

      there's a trigger like thing that tells you what changed, this is what pool like applications do at client side, and this should be port to server side, something like "please tell me when database xyz, table abc, change where WHERE CLAUSULE happens" considering that WHERE CLAUSULE should be a easy where clausule, like "WHERE field >= asdf OR field <=ASDFA" and not "WHERE UDF(file)=5", and we have a problem here, this don't need to be executed at client connections, in other words, this should be executed when data change (binlog is the best place to tell you that data changed), a galera cluster, a slave replication, handler socket plugin, or anything else can change data, that's the main advantage of doing this at mariadb, and yes i know that this is a problem to CONNECT engine (xml, txt and others files can change outside mariadb server without a knowledge, CONNECT engine should report and pool the files to see what changed, but that's a version 2 problem)

      —

      implementation problems:

      1) mysql protocol: we should include a new kind of protocol message (subscribe reports)

      2) one server side thread (per table?) should read binlog like stream, check all streams ID rellated to that table being read at binlog, wake up client threads and tell them to report client - maybe something like ( https://dev.mysql.com/doc/refman/5.7/en/mysql-session-track-get-first.html )

      3) create a information_schema table to select all subscribes being used and statistcs (reports/subscribe)

      Attachments

        Activity

          People

            Unassigned Unassigned
            rspadim roberto spadim
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.