Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5522

PM join exceeded the join match count -invalid result

    XMLWordPrintable

Details

    • 2023-7, 2023-8

    Description

      Reproduction

      SET columnstore_s3_region='us-west-2';
      SET columnstore_s3_key='xxxxxxx';
      SET columnstore_s3_secret='xxxxxxxxf';
       
      DATABASE="test"
      mariadb -e "create database if not exists $DATABASE; use $DATABASE;"
      mariadb $DATABASE -e "CREATE TABLE IF NOT EXISTS lu_proc (  PROC_CD  varchar(25) NOT NULL,  PROC_DESC  varchar(200) NOT NULL,  PROC_GRP_CD  varchar(2) NOT NULL,  SERV_CTG_CD  varchar(2) DEFAULT NULL) ENGINE=Columnstore DEFAULT CHARSET=latin1;";
      mariadb $DATABASE -e "CREATE TABLE IF NOT EXISTS mdcdm_bill_line (  SERV_YR  smallint(6) NOT NULL,  PROC_CD_DER  varchar(25) DEFAULT NULL,  PROC_GRP_CD_DER  varchar(2) DEFAULT NULL,  SERV_CTG_CD  varchar(2) DEFAULT NULL) ENGINE=Columnstore DEFAULT CHARSET=latin1;";
      CALL columnstore_info.load_from_s3('xxxxxxxx','lu_proc.csv','test', 'lu_proc', ',', '"', '^');
      CALL columnstore_info.load_from_s3('xxxxxxxx','mdcdm_bill_line-quarter.csv','test', 'mdcdm_bill_line', ',', '"', '^');
       
      # 1.4mill and 256 mill
      mariadb $DATABASE -q -e "select count(*) from lu_proc; select count(*) from mdcdm_bill_line;" 
       
      # correct result
      mariadb $DATABASE -q -e "select SERV_YR, count(*) from (SELECT A.SERV_YR, A.SERV_CTG_CD FROM mdcdm_bill_line A LEFT JOIN lu_proc B ON A.PROC_CD_DER = B.PROC_CD AND A.PROC_GRP_CD_DER = B.PROC_GRP_CD) j WHERE SERV_CTG_CD = '11' group by SERV_YR order by SERV_YR"
       
      # wrong result
      mariadb $DATABASE -q -e "select SERV_YR, count(*) from (SELECT A.SERV_YR, A.SERV_CTG_CD FROM mdcdm_bill_line A LEFT JOIN lu_proc B ON A.PROC_CD_DER = B.PROC_CD AND A.PROC_GRP_CD_DER = B.PROC_GRP_CD) j WHERE SERV_CTG_CD = '11' AND SERV_YR >= 2011 group by SERV_YR order by SERV_YR"
      

      See private notes for credentials and bucket for test data.
      Also test with the full 800 mill dataset

      Right Result from above

       
      +---------+-----------------------+
      | SERV_YR | count(*)              |
      +---------+-----------------------+
      |       0 |                   905 |
      |     123 |                     1 |
      |     932 |                     2 |
      |    1901 |                     7 |
      |    1907 |                     1 |
      |    1920 |                     1 |
      |    1969 |                     2 |
      |    1981 |                     2 |
      |    1987 |                     6 |
      |    1988 |                     3 |
      |    1989 |                     2 |
      |    1990 |                     9 |
      |    1992 |                     2 |
      |    1993 |                     4 |
      |    1994 |                     3 |
      |    1995 |                    10 |
      |    1996 |                    14 |
      |    1997 |                    20 |
      |    1998 |                    25 |
      |    1999 |                    33 |
      |    2000 |                    16 |
      |    2001 |                    59 |
      |    2002 |                   100 |
      |    2003 |                   118 |
      |    2004 |                   488 |
      |    2005 |                  2168 |
      |    2006 |                  3684 |
      |    2007 |                  6258 |
      |    2008 |                 18524 |
      |    2009 |                 62637 |
      |    2010 |               1253499 |
      |    2011 |               1684772 |
      |    2012 |               1595403 |
      |    2013 |               1542136 |
      |    2014 |               1416396 |
      |    2015 |               1181466 |
      |    2016 |                958816 |
      |    2017 |                911336 |
      |    2018 |                930263 |
      |    2019 |                932707 |
      |    2020 |                761918 |
      |    2021 |                749684 |
      |    2022 |                530476 |
      |    2023 |                   362 |
      +---------+-----------------------+
      

      Wrong Result

      +---------+-----------------------+
      | SERV_YR | count(*)              |
      +---------+-----------------------+
      |    2011 |               8759604 |
      |    2012 |               8478452 |
      |    2013 |               8242161 |
      |    2014 |               7697444 |
      |    2015 |               6473961 |
      |    2016 |               5255764 |
      |    2017 |               4948912 |
      |    2018 |               5063151 |
      |    2019 |               5055187 |
      |    2020 |               4003684 |
      |    2021 |               3929401 |
      |    2022 |               2770433 |
      |    2023 |                  1237 |
      +---------+-----------------------+
      

      Attachments

        Issue Links

          Activity

            People

              denis0x0D Denis Khalikov
              allen.herrera Allen Herrera
              Roman Roman
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.