Populating SQLite databases

GAMS has three easy ways to populate a SQLite database:

Using the tool gdx2sqlite. This tool populates a SQLite database with data from a GDX file. This means we first have to export GAMS data to a GDX file. As there is quite some file I/O going on here (writing GDX file, reading GDX file, writing database), I would expect this to be slower than the next method.
The new GAMS-connect facility. This does not use intermediate files, and directly copies records from in-memory data. This should be the fastest.
Old fashioned CSV files. We first export data as a GDX file, and then use gdxdump to convert the data to a CSV file. Then sqlite can import the CSV file, and populate the database. There is much file I/O here, so this should be slow.

When trying these methods on a multidimensional parameter with 6,250,000 records, we actually see the following timings:

----     86 PARAMETER timings  elapsed time (seconds)

gdx2sqlite   6.329,    GAMSConnect 39.480,    CSV         17.053

Contrary to my predictions, GAMS-connect is much slower than gdx2sqlite and even slower than using CSV files. A factor of 6 compared to gdx2sqlite, and a factor of 2 compared to CSV files is quite a large margin considering we are using the same database technology. GAMS-connect is using the SQLAlchemy database tools, which are advertised as providing "high-performing database access" [1]. Obviously, we don't see this in these numbers.

The gdx2sqlite log shows some great statistics on the inserts:

GDX2SQLITE v 0.7
GDX Library      41.4.0 caab8bc0 Dec 14, 2022          WEI x86 64bit/MS Window
Current directory:C:\Users\erwin\Downloads

   InputFile:pdata.gdx
   OutputFile:p1.sqlite
   Symbols:1
   Uels:50
   Loading Uels
   Processing Symbols
     p(6250000 records)
   Inserts:6250000
   Elapsed:5.96 seconds
   Inserts/second:1048130.14
   Done

About a million inserts per second. Most other databases are way slower. That may also mean that these timing issues are likely not to be very pronounced when using other databases: the inserts may be much slower.

Some notes

Another smaller issue is that GAMS-connect generates slightly strange column names:

C:\Users\erwin\Downloads>sqlite3 p2.sqlite
SQLite version 3.8.52014-06-0414:06:34
Enter ".help" for usage hints.
sqlite> .schema p
CREATE TABLE p (
        i1_0 TEXT,
        i2_1 TEXT,
        i3_2 TEXT,
        i4_3 TEXT,
        value FLOAT
);
sqlite>

As the names for the GAMS indices are already unique, there is no need to mangle them.

Finally, I am not sure how to tell GAMS-connect to use a database in the current working directory. I think we always need to specify a full path using URL notation (with a lot of / characters).

The CSV method always uses TEXT as column type:

C:\Users\erwin\Downloads>sqlite3 p3.sqlite
SQLite version 3.8.52014-06-0414:06:34
Enter ".help" for usage hints.
sqlite> .schema p
CREATE TABLE p(
"i1" TEXT,
"i2" TEXT,
"i3" TEXT,
"i4" TEXT,
"Val" TEXT
);
sqlite>

In SQLite that is not a problem: column types are merely hints. Double-precision numbers are stored as floats. SQLite can have a different type for each value. Putting it differently: types are associated with values instead of columns [2]. However, it may cause problems with other software. If we want to force the column type, we can do something like:

$onecho > import.sql

CREATE TABLE p(i1 TEXT, i2 TEXT, i3 TEXT, i4 TEXT, val FLOAT);

.import --csv --skip 1 p.csv p

$offecho

Note: this requires a newer version of sqlite3.exe than the one that is shipped with GAMS.

Conclusion

SQLite can be very fast. Achieving a million inserts per second is a phenomenal performance. Only gdx2sqlite seems to achieve this. I would have guessed that the new tool GAMS-connect would be a little bit faster.

GAMS-connect is quite new, so it is understandable that things are not fully optimized yet. Future releases may be much faster.

References

SQLAlchemy, The Python SQL Toolkit and Object Relational Mapper, https://www.sqlalchemy.org/.
Data types in SQLite, https://www.sqlite.org/datatype3.html

Appendix: GAMS code

$onText

Populate a SQLite database with a large parameter

Number of records: 6.25m

$offText

*----------------------------------------------------

* clean up before we start

*----------------------------------------------------

$call rm -f p1.sqlite

$call rm -f p2.sqlite

$call rm -f p3.sqlite

$call rm -f p.csv

*----------------------------------------------------

* random data

*----------------------------------------------------

Set i /index1*index50/;

alias(i,i1,i2,i3,i4);

Parameter p(i1,i2,i3,i4);

p(i1,i2,i3,i4) = uniform(0,1);

*----------------------------------------------------

* timing

*----------------------------------------------------

Parameter timings(*) 'elapsed time (seconds)';

Scalar t 'current time';

*----------------------------------------------------

* method1: gdx2sqlite via GDX file

*----------------------------------------------------

t = timeElapsed;

execute_unload "pdata",p;

execute "gdx2sqlite -i pdata.gdx -o p1.sqlite";

timings('gdx2sqlite') = timeElapsed - t;

*----------------------------------------------------

* method2: new GAMS Connect facility

*----------------------------------------------------

t = timeElapsed;

embeddedCode Connect:

- GAMSReader:

symbols:

- name: p

- PandasSQLWriter:

connection: "sqlite:////users//erwin//downloads//p2.sqlite"

symbols:

- name: p

tableName: p

endEmbeddedCode

timings('GAMSConnect') = timeElapsed - t;

*----------------------------------------------------

* method3: CSV files

*----------------------------------------------------

$onecho > import.sql

.mode csv

.import p.csv p

$offecho

t = timeElapsed;

execute_unload "pdata",p;

execute "gdxdump pdata.gdx symb=p format=csv output=p.csv";

execute "sqlite3 p3.sqlite < import.sql";

timings('CSV') = timeElapsed - t;

display timings;

Populating SQLite databases

Some notes

Conclusion

References

Appendix: GAMS code

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List