This is a general issue I have been already thinking about. What CONNECT currently does,
when a column is nullable, is to regard character columns as NULL when represented by an empty string
and to regard numeric columns as NULL when their value is 0. The case of DATE column is unclear.
Currently they are treated as numeric (dates are internally represented like in C by the number of
second since 01-01-1970) Unfortunately, even invalid dates are automatically set to 01-01-1970,
their value is not 0 because added or substracted a time shift corresponding to their GMT setting.
This why they are not set to NULL in this case.
I am not yet ready to implement a general solution. Should the NULL_CHAR option be a column or
table option, the NULL representation be a single character or a string or a numeric value?
These questions are still open.
Meanwhile, to take care of your problem, I have updated the DATE data type for invalid dates
to be regarded as NULL whatever GMT setting is applied. Here is the CREATE TABLE I used:
CREATE TABLE `salute` (
|
`CODICEIDENTIFICATIVOSITO` int(6) NOT NULL,
|
`DENOMINAZIONESITOLOGISTICO` char(83) NOT NULL,
|
`INDIRIZZO` char(97) NOT NULL,
|
`PARTITAIVA` bigint(11) NOT NULL FIELD_FORMAT='Z',
|
`CAP` int(5) NOT NULL,
|
`CODICECOMUNEISTAT` char(6) NOT NULL,
|
`DESCRIZIONECOMUNE` char(33) NOT NULL,
|
`CODICEPROVINCIAISTAT` char(3) NOT NULL,
|
`SIGLAPROVINCIA` char(2) NOT NULL,
|
`DESCRIZIONEPROVINCIA` char(25) NOT NULL,
|
`CODICEREGIONE` char(3) NOT NULL,
|
`DESCRIZIONEREGIONE` char(21) NOT NULL,
|
`DATAINIZIOVALIDITA` date NOT NULL DATE_FORMAT='DD/MM/YYYY',
|
`DATAFINEVALIDITA` date DATE_FORMAT='DD/MM/YYYY',
|
`LATITUDINE` double(17,14) NOT NULL FIELD_FORMAT='D,',
|
`LONGITUDINE` double(17,14) NOT NULL FIELD_FORMAT='D,',
|
`LOCALIZE` int(1) NOT NULL
|
) ENGINE=CONNECT DEFAULT CHARSET=latin1 `TABLE_TYPE`='csv' `FILE_NAME`='E:/Data/salute.csv' `SEP_CHAR`=';' `HEADER`=1 `QUOTED`=0;
|
Note the FIELD_FORMAT='D,' to take care of the comma used as decimal separator and
the QUOTED=0 option needed because two lines in the file have their INDIRIZZO column quoted.
The query:
SELECT DENOMINAZIONESITOLOGISTICO, DATAINIZIOVALIDITA, DATAFINEVALIDITA, LATITUDINE FROM salute LIMIT 10;
|
now returns:
DENOMINAZIONESITOLOGISTICO |
DATAINIZIOVALIDITA |
DATAFINEVALIDITA |
LATITUDINE |
Parafarmacia S.F.A. Fasani |
2006-10-01 |
NULL |
41.55404142482020 |
eurosan |
2006-10-01 |
NULL |
38.18235051630770 |
DOCPHARMA DI EPIS DOTT. LEONARDO |
2006-10-01 |
2014-01-31 |
45.73083169375610 |
PARAFARMACIA NATURESTE |
2006-10-01 |
NULL |
45.21971325978440 |
parafarmacia sanitaria di tamagni dott.ssa roberta |
2006-10-01 |
NULL |
46.18183947061230 |
FARMA+ dott.Giuseppe moscariello |
2006-10-01 |
NULL |
40.84112547106560 |
pharmasan |
2006-10-01 |
2007-02-28 |
40.83665880000000 |
Sole e Natura di Aricò Alessandra |
2006-10-01 |
2009-03-31 |
37.51717130000000 |
ARTEMISIA |
2006-11-01 |
2009-03-31 |
36.95185760000000 |
Erboristeria D.ssa Valeria Tantardini |
2006-11-01 |
2007-06-30 |
46.11608700000000 |
This is a general issue I have been already thinking about. What CONNECT currently does,
when a column is nullable, is to regard character columns as NULL when represented by an empty string
and to regard numeric columns as NULL when their value is 0. The case of DATE column is unclear.
Currently they are treated as numeric (dates are internally represented like in C by the number of
second since 01-01-1970) Unfortunately, even invalid dates are automatically set to 01-01-1970,
their value is not 0 because added or substracted a time shift corresponding to their GMT setting.
This why they are not set to NULL in this case.
I am not yet ready to implement a general solution. Should the NULL_CHAR option be a column or
table option, the NULL representation be a single character or a string or a numeric value?
These questions are still open.
Meanwhile, to take care of your problem, I have updated the DATE data type for invalid dates
to be regarded as NULL whatever GMT setting is applied. Here is the CREATE TABLE I used:
CREATE TABLE `salute` (
`CODICEIDENTIFICATIVOSITO` int(6) NOT NULL,
`DENOMINAZIONESITOLOGISTICO` char(83) NOT NULL,
`INDIRIZZO` char(97) NOT NULL,
`PARTITAIVA` bigint(11) NOT NULL FIELD_FORMAT='Z',
`CAP` int(5) NOT NULL,
`CODICECOMUNEISTAT` char(6) NOT NULL,
`DESCRIZIONECOMUNE` char(33) NOT NULL,
`CODICEPROVINCIAISTAT` char(3) NOT NULL,
`SIGLAPROVINCIA` char(2) NOT NULL,
`DESCRIZIONEPROVINCIA` char(25) NOT NULL,
`CODICEREGIONE` char(3) NOT NULL,
`DESCRIZIONEREGIONE` char(21) NOT NULL,
`DATAINIZIOVALIDITA` date NOT NULL DATE_FORMAT='DD/MM/YYYY',
`DATAFINEVALIDITA` date DATE_FORMAT='DD/MM/YYYY',
`LATITUDINE` double(17,14) NOT NULL FIELD_FORMAT='D,',
`LONGITUDINE` double(17,14) NOT NULL FIELD_FORMAT='D,',
`LOCALIZE` int(1) NOT NULL
) ENGINE=CONNECT DEFAULT CHARSET=latin1 `TABLE_TYPE`='csv' `FILE_NAME`='E:/Data/salute.csv' `SEP_CHAR`=';' `HEADER`=1 `QUOTED`=0;
Note the FIELD_FORMAT='D,' to take care of the comma used as decimal separator and
the QUOTED=0 option needed because two lines in the file have their INDIRIZZO column quoted.
The query:
SELECT DENOMINAZIONESITOLOGISTICO, DATAINIZIOVALIDITA, DATAFINEVALIDITA, LATITUDINE FROM salute LIMIT 10;
now returns: