Quantcast
Channel: Yet Another Math Programming Consultant
Viewing all articles
Browse latest Browse all 805

Dataframe Pivoting: some timings

$
0
0

In Pivot a table: GAMS, SQLite, R and Python we described a pivoting operation on a small example. In the comments it was mentioned that for larger tables or dataframes, especially tydir is very fast. Here we try to confirm this using an artificial example:

image

Below we time each step (see here for more information).

GAMS StepCodeTime (seconds)
GAMS populate a 2D parameter with 1 million entries set i /i1*i100000/
    j
/j1*j10/
;
parameter
p(i,j);
p(i,j) = uniform(0,1);
0.4
Write a GDX fileexecute_unload'x.gdx';0.3
Store in SQLite database

execute'gdx2sqlite -i x.gdx -o x.db -fast';

5
R StepCodeTime (seconds)
Read database table
df1<-dbGetQuery(db,"select * from p")
2
Pivot using SQL
df2<- dbGetQuery(db,"select tj1.i as i, tj1.value as j1, tj2.value as j2, 
+ tj3.value as j3, tj4.value as j4,
tj5.value as j5,
+
tj6.value as j6, tj7.value as j7, tj8.value as j8,
+
tj9.value as j9, tj10.value as j10
+
from (select i,value from p where j='j1') as tj1,
+
(select i,value from p where j='j2') as tj2,
+
(select i,value from p where j='j3') as tj3,
+
(select i,value from p where j='j4') as tj4,
+
(select i,value from p where j='j5') as tj5,
+
(select i,value from p where j='j6') as tj6,
+
(select i,value from p where j='j7') as tj7,
+
(select i,value from p where j='j8') as tj8,
+
(select i,value from p where j='j9') as tj9,
+
(select i,value from p where j='j10') as tj10
+
where tj1.i=tj2.i and tj1.i=tj3.i and tj1.i=tj4.i and tj1.i=tj5.i
+
and tj1.i=tj6.i and tj1.i=tj7.i and tj1.i=tj8.i
+
and tj1.i=tj9.i and tj1.i=tj10.i")
50
Pivot using SQL after creating indices on columns i and j
id
15
Pivot using reshape2/dcast
df2<-dcast(df1,i~j,value.var="value")
2
Pivot using tydir/spread
df2<-spread(df1,j,value)
2
Python StepCodeTime (seconds)
pandas/pivotdf2=df1.pivot(index='i',columns='j',values='value')2
Conclusion

The methods that work on in-memory datastructures are faster than SQL (and there is much less typing needed).


Viewing all articles
Browse latest Browse all 805