Details

    • Task
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      This would be somewhat comparable to Oracle Database Continuous Query Notification, SQL Server waitfor/receive, and Postgresql Listen / Notify; but better.

      Instead of https://jira.mariadb.org/browse/MDEV-16590, this feature would provide a simpler and more developer-friendly SQL based API to be notified quickly when new records become available.

      The experience may be like the following

      select * from StreamView_Sales_Inserted where SaleAmount > 10000;
      

      Where an external process could use a client driver like this

      public async Task ProcessEventData(SqlCommand com, int cutIn)
      {
      	//The following has already beed processed occurred:
          //$@"Select * from Sales where SaleAmount > 10000 and SaleDate >= '{cutIn.ToString("yyyy-MM-dd HH:mm:ss.fff")}' 
      	
          com.CommandText = 
      		"select * from StreamView_Sales_Inserted where ID > cutIn and SaleAmount > 10000";
      	com.CommandTimeout = 0;
          while (true)
          {
      		try
      		{
      			using (var reader = await com.ExecuteReaderAsync())
      			{
      				while (await reader.ReadAsync()) //Blocks if there are no rows
      				{
      					var sale = MaterializeSale(reader);
                                              cutIn = sale.ID;
      					await SendBigSaleEmail(sale);
      				}
      			}
      		}
      		catch (NetworkException nex)
      		{
      			//Some sort of disconnect
      		}
      		catch (TimeoutException tex)
      		{
      			//These shouldn't happen
      		}
      		catch (Exception Ex)
      		{
      			//Sort of unexpected
      			await Task.Delay(10000);
      		}
      	}
      }
      

      I am not familiar with the source code for MySQL.

      Perhaps implementation could effectively be like a long-poll or cursor. I'm not sure if the internal mechanisms would require a complete overhaul though. At the highest database driver level, the client library is waiting for another network packet with more data for that resultset, so there should be no significant resources used up, except potentially for the resultset context on the serverside.

      System-defined Views would be available per table, or may be defined using new keywords like `WITH Trigger(Sales, Insert)`.

      Such Views may be joined to other tables, but MUST BE the first FROM-SOURCE of the View, and the only Trigger-View.

      Learning from SQL Server, perhaps these should be Queue-Views, but simpler, where triggers are enqueuing to active Views at the time; not requiring a broker.

      Attachments

        Issue Links

          Activity

            danblack Daniel Black added a comment -

            Maybe you could use [master_gtid_wait https://mariadb.com/kb/en/master_gtid_wait/] as the ReadAsync mechanism and a specific domain-id in the gtid to indicate the queue you want. (disclaimer untested)

            danblack Daniel Black added a comment - Maybe you could use [master_gtid_wait https://mariadb.com/kb/en/master_gtid_wait/] as the ReadAsync mechanism and a specific domain-id in the gtid to indicate the queue you want. (disclaimer untested)
            merarischroeder Todd Hubers added a comment -

            Another approach:

            select * from StreamView_Sales_Inserted where ID > cutIn and SaleAmount > 10000 wait 1; – new keyword, that requires at least 1 record before the statement completes. internally. (limit could also be used)

            This statement is looped, and the cutIn variable is increased each time from the last ID that was returned.

            In C#, it would block the longest at ExecuteReaderAsync.
            Internally, it could acquire a read-lock only when the row becomes available internally.
            Internally, it should register a callback on an internal insert trigger of the target table. When a record is inserted into the table, all registered callbacks are signalled.

            But ideally, the 'wait' keyword would not be necessary. Ideally, the callback registration occurs for this special VIEW first, then the table is queried according to the filter clauses, then the rows are returned and the read-lock is relinquished, then it waits and notes the max(ID), then when it's signalled it gets a read-lock and applies the updated max(ID) filter, returns new records, and the cycle continues.

            Max(ID) isn't the only way though. If there is an internal mechanism, perhaps the [NEW] table of the insert trigger is used instead.

            merarischroeder Todd Hubers added a comment - Another approach: select * from StreamView_Sales_Inserted where ID > cutIn and SaleAmount > 10000 wait 1; – new keyword, that requires at least 1 record before the statement completes. internally. (limit could also be used) This statement is looped, and the cutIn variable is increased each time from the last ID that was returned. In C#, it would block the longest at ExecuteReaderAsync. Internally, it could acquire a read-lock only when the row becomes available internally. Internally, it should register a callback on an internal insert trigger of the target table. When a record is inserted into the table, all registered callbacks are signalled. But ideally, the 'wait' keyword would not be necessary. Ideally, the callback registration occurs for this special VIEW first, then the table is queried according to the filter clauses, then the rows are returned and the read-lock is relinquished, then it waits and notes the max(ID), then when it's signalled it gets a read-lock and applies the updated max(ID) filter, returns new records, and the cycle continues. Max(ID) isn't the only way though. If there is an internal mechanism, perhaps the [NEW] table of the insert trigger is used instead.
            danblack Daniel Black added a comment -

            Or the client becomes a replication client filtered to the table.

            I'm still of the general option that I stated in the linked MDEV 2 years ago that message queue applications are mature and there's no need for a SQL server to undergo large changes internally to forfil a function that alternate message queue products have reached a significant maturity with, have a better application interface, and have more options and monitoring tools associated with them.

            danblack Daniel Black added a comment - Or the client becomes a replication client filtered to the table. I'm still of the general option that I stated in the linked MDEV 2 years ago that message queue applications are mature and there's no need for a SQL server to undergo large changes internally to forfil a function that alternate message queue products have reached a significant maturity with, have a better application interface, and have more options and monitoring tools associated with them.
            merarischroeder Todd Hubers added a comment - - edited

            Thanks Daniel. I'm not sure how to @mention in Jira.

            Disclaimer: I haven't read deeply about GTIDs yet

            I don't foresee a replication client approach being the right solution for what I am looking for. If there are 1000 endpoints I assume that 1000 read replica streams would be necessary, each filtering to a subset of target-table events. This would not be scalable.

            Mechanics Progress

            Instead, there might be the following implementation:

            • Ideally the underlying mechanisms needed are made available and reusable to any user generally: TableSignal("sales", "Insert"), async TableWait("sales", "Insert"), TableSet("sales", "LastInsertSequenceID", @value), TableGet("sales", "LastInsertSequenceID"), TableNextSequence("sales", "SequenceID"). Of course, these could also become standardised SQL statements too with accompanying security enforcement (per command, per table, per Table+Variable).
            • Compared to Postgres TableWait/TableSignal is similar to Listen/Notify. However, the in this case they are partitioned per-table because this makes sense here, but also, it avoids a central global list. And then I add a few more mechanisms.
            • Perhaps "Sales" table can have a ChangeSequenceID field (either user-defined or system-convention). This is an Int64 that is sequential per table with a 2-bit (MSB) that is set to differentiate Insert/Update/Delete/Reserved.
            • Upon Insert in a trigger, the record's ChangeSequenceID is set to TableNextSequence("sales", "SequenceID") with the Insert MSB state set. TableSet("sales", "LastInsertSequenceID", @TransactionMaxChangeSequenceID) is called at the end of the transaction, then TableSignal("sales", "Insert") is called
            • Update is similar to Insert mechanics.
            • For Delete, the record will need a soft-delete field that may also include several states (flag delete, delete handled, (actual row-delete)). Table.LastDeleteSequenceID so that the soft-delete is not signaled as an Update.
            • TableSet/TableGet would link to the same per-table struct that may have a field UserFields that is an array of Key/Value tuples. The value field would have a fixed memory address. Read/Write to a value field would use a non-locking approach with CPU-atomic-set instructions, and delay-write to disk.

            Here is a user implementation in a pseudocoded stored procedure for streaming an insert view:

            --WaitView_Sales_Insert
             
            declare CutInSequenceID bigint unsigned default 0;
            declare LastSequenceID bigint unsigned default 0;
            declare InsertLimit bigint unsigned default InsertMax();
             
            while (1=1)
            	set LastSequenceID = TableGet("sales", "LastInsertSequenceID");
            	
            	select * from View_Sales
            	where (CutInSequenceID = 0) or (ChangeSequenceID >= CutInSequenceID and ChangeSequenceID < InsertLimit)
            	
            	set CutInSequenceID = LastSequenceID;
            	TableWait("sales", "Insert");
            	
            end while;
            

            In the stored procedure version above, the client will receive multiple result-sets, and may receive duplicate records in some overlap circumstances. This can be avoided by using a temp table and getting the Max(ChangeSequenceID) before selecting those records onwards to the caller.

            If TableWait is implemented in the database client protocol, then the server-side would only need to remember which connection references to signal and wouldn't need to keep any stored procedure state (activity registration), and so this could be implemented remotely (distributed):

            //StreamSalesInserts(Action<Sale> OnSale)
             
            long CutInSequenceID = 0;
            long InsertLimit = MyLimits.InsertSequence;
             
            while (true)
            {
            	//..building command with CutInSequenceID and InsertLimit, plus other important parameters
            	using (var reader = com.ExecuteReader())
            	{
            		while (reader.Read())
            		{
            			var sale = MaterializeSale(reader);
            			OnSale(sale); //Callback pattern
            			CutInSequenceID = sale.SequenceID.WithInsertMSB(); //WithInsertMSB. in case this is the first run that isn't constrained to any SequenceID
            		}
            	}
            	
            	await connection.TableWait("sales", "Insert")
            }
            

            A system-implementation of a WaitView would benefit from async coding, conventional system-level use of the SequenceID, and the ability to present it as a "VIEW" and not a Stored Procedure, because then such a WaitView can be joined to other tables. (The client-driver might participate when "SET WaitView=REMOTE" where the client driver retains the context to continue the WaitView execution and waits for the network packet before sending the necessary command to continue the WaitView.)

            Merit

            RE: "message queue applications are mature": I strongly believe that all databases need a trigger event system that propagates through the established database connection. Even better if there are several means. One could be Publish/Notify; the other could resemble what I am proposing on this task.

            I proceed with the following understanding:

            • an RDBMS is quite a general catch-all that does a lot (evidence: other RDBMS vendors leveraging db client connection)
            • the success of Message Queue applications indicate the failure of the RDBMS (evidence: Message Queue applications popularity)
            • inbuilt messaging features in other RDBMSes are used in place of external Message Queue systems (evidence? I expect to only find qualitative data)
            • "Triggers" have not been enough (evidence: middleware programmers trying to avoid them)

            I am happy to debate these, and I am happy to be proven wrong, but you may already agree with all/some of the assertions. However, I prefer to keep these as good approximations that can be qualified in time.

            I think a proposal document is necessary:

            • capturing use-cases and cataloging real-world problems/need
            • analyzing other RDBMS implementations
            • design directions
            • implementation options
            • development effort estimations

            Then with all of the pertinent information in one place, I trust a discussion may proceed of "when" rather than "whether" something like this would be built for MySQL.

            merarischroeder Todd Hubers added a comment - - edited Thanks Daniel. I'm not sure how to @mention in Jira. Disclaimer: I haven't read deeply about GTIDs yet I don't foresee a replication client approach being the right solution for what I am looking for. If there are 1000 endpoints I assume that 1000 read replica streams would be necessary, each filtering to a subset of target-table events. This would not be scalable. Mechanics Progress Instead, there might be the following implementation: Ideally the underlying mechanisms needed are made available and reusable to any user generally: TableSignal("sales", "Insert"), async TableWait("sales", "Insert"), TableSet("sales", "LastInsertSequenceID", @value), TableGet("sales", "LastInsertSequenceID"), TableNextSequence("sales", "SequenceID"). Of course, these could also become standardised SQL statements too with accompanying security enforcement (per command, per table, per Table+Variable). Compared to Postgres TableWait/TableSignal is similar to Listen/Notify. However, the in this case they are partitioned per-table because this makes sense here, but also, it avoids a central global list. And then I add a few more mechanisms. Perhaps "Sales" table can have a ChangeSequenceID field (either user-defined or system-convention). This is an Int64 that is sequential per table with a 2-bit (MSB) that is set to differentiate Insert/Update/Delete/Reserved. Upon Insert in a trigger, the record's ChangeSequenceID is set to TableNextSequence("sales", "SequenceID") with the Insert MSB state set. TableSet("sales", "LastInsertSequenceID", @TransactionMaxChangeSequenceID) is called at the end of the transaction, then TableSignal("sales", "Insert") is called Update is similar to Insert mechanics. For Delete, the record will need a soft-delete field that may also include several states (flag delete, delete handled, (actual row-delete)). Table.LastDeleteSequenceID so that the soft-delete is not signaled as an Update. TableSet/TableGet would link to the same per-table struct that may have a field UserFields that is an array of Key/Value tuples. The value field would have a fixed memory address. Read/Write to a value field would use a non-locking approach with CPU-atomic-set instructions, and delay-write to disk. Here is a user implementation in a pseudocoded stored procedure for streaming an insert view: --WaitView_Sales_Insert   declare CutInSequenceID bigint unsigned default 0; declare LastSequenceID bigint unsigned default 0; declare InsertLimit bigint unsigned default InsertMax();   while (1=1) set LastSequenceID = TableGet("sales", "LastInsertSequenceID"); select * from View_Sales where (CutInSequenceID = 0) or (ChangeSequenceID >= CutInSequenceID and ChangeSequenceID < InsertLimit) set CutInSequenceID = LastSequenceID; TableWait("sales", "Insert"); end while; In the stored procedure version above, the client will receive multiple result-sets, and may receive duplicate records in some overlap circumstances. This can be avoided by using a temp table and getting the Max(ChangeSequenceID) before selecting those records onwards to the caller. If TableWait is implemented in the database client protocol, then the server-side would only need to remember which connection references to signal and wouldn't need to keep any stored procedure state (activity registration), and so this could be implemented remotely (distributed): //StreamSalesInserts(Action<Sale> OnSale)   long CutInSequenceID = 0; long InsertLimit = MyLimits.InsertSequence;   while ( true ) { //..building command with CutInSequenceID and InsertLimit, plus other important parameters using ( var reader = com.ExecuteReader()) { while (reader.Read()) { var sale = MaterializeSale(reader); OnSale(sale); //Callback pattern CutInSequenceID = sale.SequenceID.WithInsertMSB(); //WithInsertMSB. in case this is the first run that isn't constrained to any SequenceID } } await connection.TableWait( "sales" , "Insert" ) } A system-implementation of a WaitView would benefit from async coding, conventional system-level use of the SequenceID, and the ability to present it as a "VIEW" and not a Stored Procedure, because then such a WaitView can be joined to other tables. (The client-driver might participate when "SET WaitView=REMOTE" where the client driver retains the context to continue the WaitView execution and waits for the network packet before sending the necessary command to continue the WaitView.) Merit RE: "message queue applications are mature": I strongly believe that all databases need a trigger event system that propagates through the established database connection. Even better if there are several means. One could be Publish/Notify; the other could resemble what I am proposing on this task. I proceed with the following understanding: an RDBMS is quite a general catch-all that does a lot (evidence: other RDBMS vendors leveraging db client connection) the success of Message Queue applications indicate the failure of the RDBMS (evidence: Message Queue applications popularity) inbuilt messaging features in other RDBMSes are used in place of external Message Queue systems (evidence? I expect to only find qualitative data) "Triggers" have not been enough (evidence: middleware programmers trying to avoid them) I am happy to debate these, and I am happy to be proven wrong, but you may already agree with all/some of the assertions. However, I prefer to keep these as good approximations that can be qualified in time. I think a proposal document is necessary: capturing use-cases and cataloging real-world problems/need analyzing other RDBMS implementations design directions implementation options development effort estimations Then with all of the pertinent information in one place, I trust a discussion may proceed of "when" rather than "whether" something like this would be built for MySQL.

            People

              Unassigned Unassigned
              merarischroeder Todd Hubers
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.