In Pivot a table: GAMS, SQLite, R and Python we described a pivoting operation on a small example. In the comments it was mentioned that for larger tables or dataframes, especially tydir is very fast. Here we try to confirm this using an artificial example:
Below we time each step (see here for more information).
GAMS Step | Code | Time (seconds) |
GAMS populate a 2D parameter with 1 million entries | set i /i1*i100000/ j /j1*j10/ ; parameter p(i,j); p(i,j) = uniform(0,1); | 0.4 |
Write a GDX file | execute_unload'x.gdx'; | 0.3 |
Store in SQLite database | execute'gdx2sqlite -i x.gdx -o x.db -fast'; | 5 |
R Step | Code | Time (seconds) |
Read database table | df1<-dbGetQuery(db,"select * from p") | 2 |
Pivot using SQL | df2<- dbGetQuery(db,"select tj1.i as i, tj1.value as j1, tj2.value as j2, | 50 |
Pivot using SQL after creating indices on columns i and j | id | 15 |
Pivot using reshape2/dcast | df2<-dcast(df1,i~j,value.var="value") | 2 |
Pivot using tydir/spread | df2<-spread(df1,j,value) | 2 |
Python Step | Code | Time (seconds) |
pandas/pivot | df2=df1.pivot(index='i',columns='j',values='value') | 2 |
Conclusion
The methods that work on in-memory datastructures are faster than SQL (and there is much less typing needed).