Dataframe Pivoting: some timings

In Pivot a table: GAMS, SQLite, R and Python we described a pivoting operation on a small example. In the comments it was mentioned that for larger tables or dataframes, especially tydir is very fast. Here we try to confirm this using an artificial example:

Below we time each step (see here for more information).

GAMS Step	Code	Time (seconds)
GAMS populate a 2D parameter with 1 million entries	set i /i1i100000/ j /j1j10/ ; parameter p(i,j); p(i,j) = uniform(0,1);	0.4
Write a GDX file	execute_unload'x.gdx';	0.3
Store in SQLite database	execute'gdx2sqlite -i x.gdx -o x.db -fast';	5
R Step	Code	Time (seconds)
Read database table	df1<-dbGetQuery(db,"select * from p")	2
Pivot using SQL	df2<- dbGetQuery(db,"select tj1.i as i, tj1.value as j1, tj2.value as j2, + tj3.value as j3, tj4.value as j4, tj5.value as j5, + tj6.value as j6, tj7.value as j7, tj8.value as j8, + tj9.value as j9, tj10.value as j10 + from (select i,value from p where j='j1') as tj1, + (select i,value from p where j='j2') as tj2, + (select i,value from p where j='j3') as tj3, + (select i,value from p where j='j4') as tj4, + (select i,value from p where j='j5') as tj5, + (select i,value from p where j='j6') as tj6, + (select i,value from p where j='j7') as tj7, + (select i,value from p where j='j8') as tj8, + (select i,value from p where j='j9') as tj9, + (select i,value from p where j='j10') as tj10 + where tj1.i=tj2.i and tj1.i=tj3.i and tj1.i=tj4.i and tj1.i=tj5.i + and tj1.i=tj6.i and tj1.i=tj7.i and tj1.i=tj8.i + and tj1.i=tj9.i and tj1.i=tj10.i")	50
Pivot using SQL after creating indices on columns i and j	id	15
Pivot using reshape2/dcast	df2<-dcast(df1,i~j,value.var="value")	2
Pivot using tydir/spread	df2<-spread(df1,j,value)	2
Python Step	Code	Time (seconds)
pandas/pivot	df2=df1.pivot(index='i',columns='j',values='value')	2

Conclusion

The methods that work on in-memory datastructures are faster than SQL (and there is much less typing needed).