This is a follow-up on this post. For large datasets using individual inserts is not as efficient as applying bulk operations. In MySQL bulk inserts can be done with the LOAD DATA LOCAL INFILE command. This statement takes a local (i.e. at the client) text file, copies it to the server and then does a bulk insert of the whole thing. This approach is built in the tool gdx2mysql. If a symbol has more than N records (N=500 by default), then we write a text file and call LOAD DATA LOCAL INFILE. If a symbol has fewer records then we just a standard prepared insert statement. A verbose log will show what happens.
set i /i1*i100/; alias(i,j,k); parameter a(i,j,k); a(i,j,k) = uniform(0,100); execute_unload"test",i,a; execute"gdx2mysql -i test.gdx -s tmp -u test -p test -v"; |
In the above test model we generate a set with 100 elements and a parameter with 1003=1,000,000 elements. The verbose log looks like
--- Job Untitled_1.gms Start 04/22/16 04:01:39 24.6.1 r55820 WEX-WEI x86 64bit/MS Windows GAMS 24.6.1 Copyright (C) 1987-2016 GAMS Development. All rights reserved Licensee: Erwin Kalvelagen G150803/0001CV-GEN Amsterdam Optimization Modeling Group DC10455 --- Starting compilation --- Untitled_1.gms(6) 3 Mb --- Starting execution: elapsed 0:00:00.007 --- Untitled_1.gms(5) 36 Mb --- GDX File C:\Users\Erwin\Documents\Embarcadero\Studio\Projects\gdx2mysql\Win32\Debug\test.gdx --- Untitled_1.gms(6) 36 Mb GDX2MySQL v 0.1 Copyright (c) 2015-2016 Amsterdam Optimization Modeling Group LLC GDX Library 24.6.1 r55820 Released Jan 18, 2016 VS8 x86 32bit/MS Windows GDX:Input file: test.gdx GDX:Symbols: 2 GDX:Uels: 100 GDX:Loading Uels SQL:Selected driver: MySQL ODBC 5.3 ANSI Driver SQL:Connection string: Driver={MySQL ODBC 5.3 ANSI Driver};Server=localhost;User=xxx;Password=xxx set autocommit=0 select @@version_comment SQL:RDBMS: MySQL Community Server (GPL) select @@version SQL:RDBMS version: 5.6.26-log select count(*) from information_schema.schemata where schema_name = 'tmp' ----------------------- i (100 records) drop table if exists `tmp`.`i` create table `tmp`.`i`(`i` varchar(4)) insert into `tmp`.`i` values (?) sqlexecute(100 times) commit Time : 0.6 a (1000000 records) drop table if exists `tmp`.`a` create table `tmp`.`a`(`i` varchar(4),`j` varchar(4),`k` varchar(4),`value` double) temp file: [C:\Users\Erwin\AppData\Local\Temp\tmpA046.tmp] writing C:\Users\Erwin\AppData\Local\Temp\tmpA046.tmp load data local infile 'C:\\Users\\Erwin\\AppData\\Local\\Temp\\tmpA046.tmp' into table `tmp`.`a` rows affected: 1000000 commit Time : 39.6 deleting [C:\Users\Erwin\AppData\Local\Temp\tmpA046.tmp] *** Status: Normal completion --- Job Untitled_1.gms Stop 04/22/16 04:02:20 elapsed 0:00:41.353 |
The smaller set i is imported using normal inserts, while the larger parameter a is imported through an intermediate text file. This is much more efficient than using the standard inserts. There is a gdx2mysql option to force larger symbols to use standard inserts, so we can compare timings:
a (1000000 records) drop table if exists `tmp`.`a` create table `tmp`.`a`(`i` varchar(4),`j` varchar(4),`k` varchar(4),`value` double) insert into `tmp`.`a` values (?,?,?,?) sqlexecute(1000000 times) commit 100 times Time : 257.5 |
So we are about 6.5 times as fast. (We can expect even larger differences in other cases).
A final way to make imports faster is to use ISAM (or rather MyISAM) tables. ISAM is an older storage format (MySQL nowadays uses the InnoDB storage engine by default). However ISAM is still faster for our simple (but large) tables, as can be seen when running with the –isam flag:
i (100 records) drop table if exists `tmp`.`i` create table `tmp`.`i`(`i` varchar(4)) engine=myisam insert into `tmp`.`i` values (?) sqlexecute(100 times) commit Time : 0.3 a (1000000 records) drop table if exists `tmp`.`a` create table `tmp`.`a`(`i` varchar(4),`j` varchar(4),`k` varchar(4),`value` double) engine=myisam temp file: [C:\Users\Erwin\AppData\Local\Temp\tmpBF31.tmp] writing C:\Users\Erwin\AppData\Local\Temp\tmpBF31.tmp load data local infile 'C:\\Users\\Erwin\\AppData\\Local\\Temp\\tmpBF31.tmp' into table `tmp`.`a` rows affected: 1000000 commit Time : 9.9 |
This again makes a substantial difference in getting data into MySQL.