Quantcast
Channel: Yet Another Math Programming Consultant
Viewing all articles
Browse latest Browse all 804

Populating SQLite databases

$
0
0

 GAMS has three easy ways to populate a SQLite database:

  1. Using the tool gdx2sqlite. This tool populates a SQLite database with data from a GDX file. This means we first have to export GAMS data to a GDX file. As there is quite some file I/O going on here (writing GDX file, reading GDX file, writing database), I would expect this to be slower than the next method.
  2. The new GAMS-connect facility. This does not use intermediate files, and directly copies records from in-memory data. This should be the fastest.
  3. Old fashioned CSV files. We first export data as a GDX file, and then use gdxdump to convert the data to a CSV file. Then sqlite can import the CSV file, and populate the database. There is much file I/O here, so this should be slow.

When trying these methods on a multidimensional parameter with 6,250,000 records, we actually see the following timings:

----     86 PARAMETER timings  elapsed time (seconds)

gdx2sqlite 6.329, GAMSConnect 39.480, CSV 17.053

Contrary to my predictions, GAMS-connect is much slower than gdx2sqlite and even slower than using CSV files. A factor of 6 compared to gdx2sqlite, and a factor of 2 compared to CSV files is quite a large margin considering we are using the same database technology. GAMS-connect is using the SQLAlchemy database tools, which are advertised as providing "high-performing database access" [1]. Obviously, we don't see this in these numbers. 

The gdx2sqlite log shows some great statistics on the inserts:

GDX2SQLITE v 0.7
GDX Library 41.4.0 caab8bc0 Dec 14, 2022 WEI x86 64bit/MS Window
Current directory:C:\Users\erwin\Downloads

InputFile:pdata.gdx
OutputFile:p1.sqlite
Symbols:1
Uels:50
Loading Uels
Processing Symbols
p(6250000 records)
Inserts:6250000
Elapsed:5.96 seconds
Inserts/second:1048130.14
Done

About a million inserts per second. Most other databases are way slower. That may also mean that these timing issues are likely not to be very pronounced when using other databases: the inserts may be much slower. 


Some notes


Another smaller issue is that GAMS-connect generates slightly strange column names:


C:\Users\erwin\Downloads>sqlite3 p2.sqlite
SQLite version 3.8.52014-06-0414:06:34
Enter ".help" for usage hints.
sqlite> .schema p
CREATE TABLE p (
i1_0 TEXT,
i2_1 TEXT,
i3_2 TEXT,
i4_3 TEXT,
value FLOAT
);
sqlite>


As the names for the GAMS indices are already unique, there is no need to mangle them. 

Finally, I am not sure how to tell GAMS-connect to use a database in the current working directory. I think we always need to specify a full path using URL notation (with a lot of / characters).

The CSV method always uses TEXT as column type:

C:\Users\erwin\Downloads>sqlite3 p3.sqlite
SQLite version 3.8.52014-06-0414:06:34
Enter ".help" for usage hints.
sqlite> .schema p
CREATE TABLE p(
"i1" TEXT,
"i2" TEXT,
"i3" TEXT,
"i4" TEXT,
"Val" TEXT
);
sqlite>


In SQLite that is not a problem: column types are merely hints. Double-precision numbers are stored as floats. SQLite can have a different type for each value. Putting it differently: types are associated with values instead of columns [2].  However, it may cause problems with other software. If we want to force the column type, we can do something like:


$onecho > import.sql

CREATE TABLE p(i1 TEXT, i2 TEXT, i3 TEXT, i4 TEXT, val FLOAT);

.import --csv --skip 1 p.csv p

$offecho


Note: this requires a newer version of sqlite3.exe than the one that is shipped with GAMS. 

 

Conclusion

 

SQLite can be very fast. Achieving a million inserts per second is a phenomenal performance. Only gdx2sqlite seems to achieve this. I would have guessed that the new tool GAMS-connect would be a little bit faster.

GAMS-connect is quite new, so it is understandable that things are not fully optimized yet. Future releases may be much faster.  


References


  1. SQLAlchemy, The Python SQL Toolkit and Object Relational Mapper,  https://www.sqlalchemy.org/.
  2. Data types in SQLite, https://www.sqlite.org/datatype3.html 


Appendix: GAMS code

$onText

 

  Populate a SQLite database with a large parameter

  Number of records: 6.25m 

 

$offText

 

 

*----------------------------------------------------

clean up before we start

*----------------------------------------------------

$call rm -f p1.sqlite

$call rm -f p2.sqlite

$call rm -f p3.sqlite

$call rm -f p.csv

 

*----------------------------------------------------

random data

*----------------------------------------------------

Set  i /index1*index50/;

alias(i,i1,i2,i3,i4);

Parameter p(i1,i2,i3,i4);

p(i1,i2,i3,i4) = uniform(0,1);

 

*----------------------------------------------------

timing

*----------------------------------------------------

 

Parameter timings(*) 'elapsed time (seconds)';

Scalar 'current time';

 

*----------------------------------------------------

* method1: gdx2sqlite via GDX file

*----------------------------------------------------

 

t = timeElapsed;

 

execute_unload "pdata",p;

execute "gdx2sqlite -i pdata.gdx -o p1.sqlite";

 

timings('gdx2sqlite') = timeElapsed - t;

 

 

*----------------------------------------------------

* method2: new GAMS Connect facility

*----------------------------------------------------

 

t = timeElapsed;

 

embeddedCode Connect:

 

GAMSReader:

    symbols:

      - name: p

PandasSQLWriter:

    connection: "sqlite:////users//erwin//downloads//p2.sqlite"

    symbols:

      - name: p

        tableName: p

     

endEmbeddedCode

 

timings('GAMSConnect') = timeElapsed - t;

 

 

*----------------------------------------------------

* method3: CSV files

*----------------------------------------------------

 

$onecho > import.sql

.mode csv

.import p.csv p

$offecho

 

 

t = timeElapsed;

 

execute_unload "pdata",p;

execute "gdxdump pdata.gdx symb=p format=csv output=p.csv";

execute "sqlite3 p3.sqlite < import.sql";

 

timings('CSV') = timeElapsed - t;

 

display timings;

 


Viewing all articles
Browse latest Browse all 804

Trending Articles