I was playing a bit with generating a large network (graph) with \(n=5000\) nodes and say \[\frac{n^2}{100}= 250,000\] arcs. The standard way to generate this in GAMS is:
Basically, this approach generates \(n^2\) random numbers \( r_{i,j} \sim U(0,1)\) (uniform distribution) and picks the arcs related to a value \(r_{i,j}\lt 0.01\). This is a randomized process, so we don't see exactly 250,000 arcs, but rather:
This operation is not exactly cheap. It takes 11.2 seconds on my laptop.
There are a few other approaches we can try.
This requires some explanation.
Let's try some Python. We can build up sets using Python within a GAMS model:
* * nodes * set i 'nodes'/node1*node5000/; alias(i,j); * * random arcs (1%) * set A(i,j) 'arcs'; A(i,j) = uniform(0,1)<0.01; scalar numArcs 'number of arcs'; numArcs = card(A); display numArcs; |
Basically, this approach generates \(n^2\) random numbers \( r_{i,j} \sim U(0,1)\) (uniform distribution) and picks the arcs related to a value \(r_{i,j}\lt 0.01\). This is a randomized process, so we don't see exactly 250,000 arcs, but rather:
---- 16 PARAMETER numArcs = 249170.000 number of arcs
This operation is not exactly cheap. It takes 11.2 seconds on my laptop.
There are a few other approaches we can try.
Random locations
Instead of generating \(n^2\) random numbers, we can generate \(k=250,000\) pairs of \(i,j\) values (integers between 1 and \(n\)). A straigthforward implementation would be:
Unfortunately, this is extremely slow. I stopped it after 2,000 seconds: it was still not finished. GAMS is horribly slow when doing loops like this.
scalars k,n,nn,ni,nj; n = card(i); nn = n*n/100; for (k = 1 to nn, ni = uniformint(1,n); nj = uniformint(1,n); A(i,j)$(ord(i)=ni andord(j)=nj) = yes; ); |
Unfortunately, this is extremely slow. I stopped it after 2,000 seconds: it was still not finished. GAMS is horribly slow when doing loops like this.
Crazy indexing
A variant of the above approach is to try to trick GAMS into a vectorized version of this. Well, this is not so easy. Here is a version that uses leads/lags when indexing:
sets A(*,*) 'arcs' k /node1*node5000,k5001*k250000/ ij /i,j/ ; parameter r(k,ij) 'random offset from k'; r(k,ij) = uniformint(1,card(i)) - ord(k); A(k+r(k,'i'),k+r(k,'j')) = yes; |
This requires some explanation.
- The set A is no longer domain-checked. We use funky indexing here, so we need loosen things up.
- The set k has 250,000 elements. The first 5000 are identical to \(i,j\).
- The parameter r contains \(2\times 250,000\) random integers between 1 and \(n\). We store them as offsets from the index \(k\). This sounds crazy but the next statement explains why this is done.
- The assignment to A is using leads. Think of it as a variant of A(k,k) = yes. This would populate a diagonal. Similarly, A(k+1,k) = yes shifts the diagonal by one. Using A(k+r(k,'i'),k+r(k,'j')) we index exactly the correct location.
- We may generate duplicates, so the number will be slightly below 250k arcs.
This beast actually works, but the performance is not great. The fragment takes about 13 seconds, so no gain compared to our original approach.
Python loop
Let's try some Python. We can build up sets using Python within a GAMS model:
set A(i,j) 'arcs'; $onEmbeddedCode Python: from random import seed,randint seed(999) n = len(list(gams.get("i"))) nn = n*n//100 a = {} for k in range(nn): ni = randint(1,n) nj = randint(1,n) elem = ("node{}".format(ni),"node{}".format(nj)) a[elem] = 1 gams.set("A",list(a)) $offEmbeddedCode A |
This is just a k-loop. We use here a Python dictionary to store elements as this handles possible duplicates correctly. This approach also takes about 11 seconds, so no gain compared to our first approach.
R to the rescue
So let's see if we can use R to speed things up:
$set n 5000 $set inc data.inc * * R script * $onecho > script.R library(data.table) n <- %n% nn <- n^2/100 df <- data.frame( ni=sample(n,nn,replace=TRUE), nj=sample(n,nn,replace=TRUE)) df <- df[order(df$ni,df$nj),] v <- unique(paste0("node",df$ni,".node",df$nj)) fwrite(list(v),"%inc%",col.names=F,quote=F) $offecho $call '"c:\program files\R\R-3.5.0\bin\Rscript.exe" script.R' * * nodes and arcs * set i 'nodes'/node1*node%n%/; alias(i,j); set A(i,j) 'arcs'/ $offlisting $include %inc% $onlisting /; |
We write from inside GAMS an R script that gets executed by rscript.exe. The steps are:
- sample the arcs,
- sort in the order that GAMS likes,
- create a string representation of the tuple, e.g. "node1.node2",
- remove duplicates (this means we end up with a number of arcs that is smaller than 250k),
- write to an include file
The include file is then processed by GAMS. To speed things up we remove echoing to the listing file.
This method is the winner: it takes just 3.5 seconds. Sometimes using a plain old text file as a communication channel is not so bad.