Uploaded image for project: 'MariaDB Connector/Python'
  1. MariaDB Connector/Python
  2. CONPY-308

Prepared Statements Not Releasing On Cursor Close

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 1.1.10, 1.1.12
    • 1.1, 1.2
    • DBAPI 2.0
    • None
    • Ubuntu 20.04.6 LTS
    • 3.10

    Description

      The following script raises the error `mariadb.OperationalError: Can't create more than max_prepared_stmt_count statements (current value: 16382)`, which arises when the number of stored prepared statements in the backend database exceed a defined limit.

      I could reproduce and would expect this error if I kept a list cursor objects, one per call and explicitly did not close the cursors.

      However since I am using a with block to manage the cursor object, it should be closing (and indeed the error persists even when I explicity add `cursor.close()` to the for loop).

      (The error occurs after 16382 calls of cursor.executemany)

      mport mariadb
      import json
       
      test_connection_params = {
          'user': MARIADB_USERNAME,
          'password': MARIADB_PASSWORD,
          'host': MARIADB_HOST_ADDRESS,
          'port': 3306,
          'database': MARIADB_DATABASE_NAME
      }
       
      def main():
          conn = mariadb.connect(**test_connection_params)
       
          with conn.cursor() as cursor:
              cursor.execute("""
                  CREATE TABLE IF NOT EXISTS my_table (
                      id INT AUTO_INCREMENT PRIMARY KEY,
                      name VARCHAR(128),
                      description VARCHAR(2000),
                      embedding TEXT NOT NULL
                  );
              """)
              conn.commit()
       
          data = [(f"Product {i}", f"Description for product {i}", json.dumps([i*0.01, i*0.02, i*0.03, i*0.04])) for i in range(1_000_000)]
          query = """
          INSERT INTO my_table (name, description, embedding)
          VALUES (?, ?, ?)
          """
       
          chunk_size = 10
       
          for i in range(0, len(data), chunk_size):
              with conn.cursor() as cursor:
                  cursor.executemany(query, data[i:i + chunk_size])
                  conn.commit()
       
          conn.close()
       
          cursor.close()
          conn.close()
      

      This error does not occur when using the mypysql connector with similar logic.

      The error also does not occur when I swap the TEXT field for the new MariaDB 11.7.1-rc vector feature:

      def main():
          conn = mariadb.connect(**test_connection_params)
       
          with conn.cursor() as cursor:
              query = """
              CREATE TABLE IF NOT EXISTS products (
                  id INT AUTO_INCREMENT PRIMARY KEY,
                  name VARCHAR(128),
                  description VARCHAR(2000),
                  embedding VECTOR(4) NOT NULL
              );
              """
              cursor.execute(query)
              conn.commit()
       
          conn = mariadb.connect(**test_connection_params)
       
          data = [(f"Product {i}", f"Description for product {i}", json.dumps([i*0.01, i*0.02, i*0.03, i*0.04])) for i in range(1_000_000)]
          query = "INSERT INTO products (name, description, embedding) VALUES (?, ?, VEC_FromText(?))"
       
          chunk_size = 10
          for i in range(0, len(data), chunk_size):
              with conn.cursor() as cursor:
                  cursor.executemany(query, data[i:i + chunk_size])
                  conn.commit()
       
          conn.close()
      

      However oddly, the error returns using the same logic that was just working, when I wrap the MariaDB part of the code into a Python Object:

      class DatabaseManager:
          def __init__(self, config):
              self.connection = mariadb.connect(**config)
       
          def create_table_if_not_exists(self):
              query = """
              CREATE TABLE IF NOT EXISTS products (
                  id INT AUTO_INCREMENT PRIMARY KEY,
                  name VARCHAR(128),
                  description VARCHAR(2000),
                  embedding VECTOR(4) NOT NULL
              );
              """
              with self.connection.cursor() as cursor:
                  cursor.execute(query)
                  self.connection.commit()
       
          def insert_rows(self, chunk):
              query = """
              INSERT INTO products (name, description, embedding)
              VALUES (?, ?, VEC_FromText(?))
              """
              with self.connection.cursor() as cursor:
                  cursor.executemany(query, chunk)
       
          def close_connection(self):
              if self.connection:
                  self.connection.close()
       
      def main():
          db_manager = DatabaseManager(test_connection_params)
          db_manager.create_table_if_not_exists()
       
          data = [(f"Product {i}", f"Description for product {i}", json.dumps([i * 0.01, i * 0.02, i * 0.03, i * 0.04])) for i
                  in range(1_000_000)]
       
          chunk_size = 10
          for i in range(0, len(data), chunk_size):
              db_manager.insert_rows(data[i:i + chunk_size])
       
          db_manager.close_connection()
      

      Attachments

        Activity

          greenfield_csr William Dixon added a comment -

          Just realizing its difficult to read the code in the description.

          It's easier to view in this Stack Overflow Question

          greenfield_csr William Dixon added a comment - Just realizing its difficult to read the code in the description. It's easier to view in this Stack Overflow Question
          georg Georg Richter added a comment - - edited

          MariaDB's (but also PyMySQL's) cursors are not deleted when exiting the context. Changing that, would break a lot of applications.

          If you need that feature you can easily extend the cursor class with a context manager:

          import mariadb
           
          class myCursor(mariadb.Cursor):
              def __enter__(self):
                  return self
           
              def __exit__(self, exc_type, exc_value, traceback):
                  self.close()  # Ensure cursor closes when leaving the `with` block
           
           
          conn= mariadb.connect()
          with myCursor(conn) as cursor:
              ....
          

          In general it's a bad idea to create a cursor in a loop, especially when executing the same statement with different parameters again and again. It allocates more memory, server and client side reparsing and increases network traffic.

          georg Georg Richter added a comment - - edited MariaDB's (but also PyMySQL's) cursors are not deleted when exiting the context. Changing that, would break a lot of applications. If you need that feature you can easily extend the cursor class with a context manager: import mariadb   class myCursor(mariadb.Cursor): def __enter__( self ): return self   def __exit__( self , exc_type, exc_value, traceback): self .close() # Ensure cursor closes when leaving the `with` block     conn = mariadb.connect() with myCursor(conn) as cursor: .... In general it's a bad idea to create a cursor in a loop, especially when executing the same statement with different parameters again and again. It allocates more memory, server and client side reparsing and increases network traffic.
          greenfield_csr William Dixon added a comment - - edited

          @Georg Richter

          Any idea why the following code:

          class DatabaseManager:
              def __init__(self, config):
                  self.connection = mariadb.connect(**config)
           
              def insert_rows(self, chunk):
                  query = """
                  INSERT INTO my_table (name, description, embedding)
                  VALUES (?, ?, ?)
                  """
                  formatted_chunk = [(name, description, json.dumps(embedding)) for name, description, embedding in chunk]
           
                  cursor = self.connection.cursor()
                  cursor.executemany(query, formatted_chunk)
                  self.connection.commit()
                  cursor.close()
           
           
          def main():
              db_manager = DatabaseManager(test_connection_params)
           
              data = [(f"Product {i}", f"Description for product {i}", [i * 0.01, i * 0.02, i * 0.03, i * 0.04]) for i in
                      range(1_000_000)]
           
              chunk_size = 10
              for i in tqdm(range(0, len(data), chunk_size), desc="Inserting rows"):
                  db_manager.insert_rows(data[i:i + chunk_size])
           
              db_manager.close_connection()
          

          Doesn't release the prepared statements?

          Is it better practice to reuse a single cursor object?

          What if I have an application that runs indefinitely (live data ingest), is there any problem keeping a single cursor alive indefinitely?

          greenfield_csr William Dixon added a comment - - edited @Georg Richter Any idea why the following code: class DatabaseManager: def __init__( self , config): self .connection = mariadb.connect( * * config)   def insert_rows( self , chunk): query = """ INSERT INTO my_table (name, description, embedding) VALUES (?, ?, ?) """ formatted_chunk = [(name, description, json.dumps(embedding)) for name, description, embedding in chunk]   cursor = self .connection.cursor() cursor.executemany(query, formatted_chunk) self .connection.commit() cursor.close()     def main(): db_manager = DatabaseManager(test_connection_params)   data = [(f "Product {i}" , f "Description for product {i}" , [i * 0.01 , i * 0.02 , i * 0.03 , i * 0.04 ]) for i in range ( 1_000_000 )]   chunk_size = 10 for i in tqdm( range ( 0 , len (data), chunk_size), desc = "Inserting rows" ): db_manager.insert_rows(data[i:i + chunk_size])   db_manager.close_connection() Doesn't release the prepared statements? Is it better practice to reuse a single cursor object? What if I have an application that runs indefinitely (live data ingest), is there any problem keeping a single cursor alive indefinitely?
          georg Georg Richter added a comment - - edited

          Reopening the issue:

          PyMySQL cursor has a context manager and closes the cursor. If the cursor has a buffered resultset, the resultset is still accessable.

          We should do the same in MariaDB - however there will be some problems:

          • if a reconnect occurred, statement handles become invalid - even if the result set is buffered, there is currently no way to get the buffered result via API without accessing MYSQL_STMT structure.
          • if the connection was closed MariaDB Connector/C also invalidates the statement handles
          georg Georg Richter added a comment - - edited Reopening the issue: PyMySQL cursor has a context manager and closes the cursor. If the cursor has a buffered resultset, the resultset is still accessable. We should do the same in MariaDB - however there will be some problems: if a reconnect occurred, statement handles become invalid - even if the result set is buffered, there is currently no way to get the buffered result via API without accessing MYSQL_STMT structure. if the connection was closed MariaDB Connector/C also invalidates the statement handles
          greenfield_csr William Dixon added a comment -

          @Georg Richter

          In the meantime, for continuous applications that use an Class approach to manage database resources.

          Do you recommend I keep an instance variable:

          class DatabaseManager:
              def __init__(self, config):
                  self.connection = mariadb.connect(**config)
                  self.cursor = self.connection.cursor()
           
              @contextmanager
              def maria_db_connection(self, begin=False):
                  try:
                      if begin:
                          self.connection.begin()
                      yield self.connection, self.cursor
                      self.connection.commit()
                  except mariadb.Error as e:
                      self.connection.rollback()
                      raise e
                  
              def insert_rows(self, chunk):
                  query = """
                  INSERT INTO my_table (name, description, embedding)
                  VALUES (?, ?, ?)
                  """
                  formatted_chunk = [(name, description, json.dumps(embedding)) for name, description, embedding in chunk]
           
                  with self.maria_db_connection as (conn, cursor):
                      cursor.executemany(query, formatted_chunk)
          

          And reuse the `self.cursor` variable for every future method call?

          Will the cursor ever get stale and need manual refreshing or checking in my `maria_db_connection` method?

          greenfield_csr William Dixon added a comment - @Georg Richter In the meantime, for continuous applications that use an Class approach to manage database resources. Do you recommend I keep an instance variable: class DatabaseManager: def __init__( self , config): self .connection = mariadb.connect( * * config) self .cursor = self .connection.cursor()   @contextmanager def maria_db_connection( self , begin = False ): try : if begin: self .connection.begin() yield self .connection, self .cursor self .connection.commit() except mariadb.Error as e: self .connection.rollback() raise e def insert_rows( self , chunk): query = """ INSERT INTO my_table (name, description, embedding) VALUES (?, ?, ?) """ formatted_chunk = [(name, description, json.dumps(embedding)) for name, description, embedding in chunk] with self .maria_db_connection as (conn, cursor): cursor.executemany(query, formatted_chunk) And reuse the `self.cursor` variable for every future method call? Will the cursor ever get stale and need manual refreshing or checking in my `maria_db_connection` method?

          People

            georg Georg Richter
            greenfield_csr William Dixon
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.