Quantcast
Channel: Yet Another Math Programming Consultant
Viewing all articles
Browse latest Browse all 809

Random Sparse Arcs in GAMS

$
0
0
I was playing a bit with generating a large network (graph) with \(n=5000\) nodes and say \[\frac{n^2}{100}= 250,000\] arcs. The standard way to generate this in GAMS is:

*
* nodes
*
set i 'nodes'/node1*node5000/;
alias(i,j);

*
* random arcs (1%)
*
set A(i,j) 'arcs';
A(i,j) = uniform(0,1)
<0.01;

scalar numArcs 'number of arcs';
numArcs =
card(A);
display numArcs;


Basically, this approach generates \(n^2\) random numbers  \( r_{i,j} \sim U(0,1)\) (uniform distribution) and picks the arcs related to a value \(r_{i,j}\lt 0.01\). This is a randomized process, so we don't see exactly 250,000 arcs, but rather:


----     16 PARAMETER numArcs              =   249170.000  number of arcs


This operation is not exactly cheap. It takes 11.2 seconds on my laptop.

There are a few other approaches we can try.

Random locations


Instead of generating \(n^2\) random numbers, we can generate \(k=250,000\) pairs of \(i,j\) values (integers between 1 and \(n\)). A straigthforward implementation would be:

scalars k,n,nn,ni,nj;
n =
card(i);
nn = n*n/100;
for (k = 1 to nn,
   ni = uniformint(1,n);
   nj = uniformint(1,n);
   A(i,j)$(
ord(i)=ni andord(j)=nj) = yes;
);

Unfortunately, this is extremely slow. I stopped it after 2,000 seconds: it was still not finished. GAMS is horribly slow when doing loops like this.

Crazy indexing


A variant of the above approach is to try to trick GAMS into a vectorized version of this. Well, this is not so easy. Here is a version that uses leads/lags when indexing:

sets
   A(*,*)
'arcs'
   k
/node1*node5000,k5001*k250000/
   ij
/i,j/
;
parameter r(k,ij) 'random offset from k';
r(k,ij) = uniformint(1,
card(i)) - ord(k);
A(k+r(k,
'i'),k+r(k,'j')) = yes;


This requires some explanation. 
  • The set A is no longer domain-checked. We use funky indexing here, so we need loosen things up.
  • The set k has 250,000 elements. The first 5000 are identical to \(i,j\).
  • The parameter r contains \(2\times 250,000\) random integers between 1 and \(n\). We store them as offsets from the index \(k\). This sounds crazy but the next statement explains why this is done.
  • The assignment to A is using leads. Think of it as a variant of A(k,k) = yes.  This would populate a diagonal. Similarly, A(k+1,k) = yes shifts the diagonal by one. Using A(k+r(k,'i'),k+r(k,'j')) we index exactly the correct location.
  • We may generate duplicates, so the number will be slightly below 250k arcs.
This beast actually works, but the performance is not great. The fragment takes about 13 seconds, so no gain compared to our original approach.

Python loop


Let's try some Python. We can build up sets using Python within a GAMS model:

set A(i,j) 'arcs';

$onEmbeddedCode Python:
from random import seed,randint
seed(
999)
n = len(list(gams.get(
"i")))
nn = n*n//
100
a = {}
for k in range(nn):
   ni = randint(
1,n)
   nj = randint(
1,n)
   elem = (
"node{}".format(ni),"node{}".format(nj))
   a[elem] =
1
gams.set(
"A",list(a))
$offEmbeddedCode A



This is just a k-loop. We use here a Python dictionary to store elements as this handles possible duplicates correctly. This approach also takes about 11 seconds, so no gain compared to our first approach.

R to the rescue


So let's see if we can use R to speed things up:

$set n    5000
$set inc  data.inc

*
* R script
*
$onecho > script.R
library(data.table)
n <- %n%
nn <- n^2/100
df <- data.frame(
        
ni=sample(n,nn,replace=TRUE),
        
nj=sample(n,nn,replace=TRUE))
df <- df[order(df$ni,df$nj),]
v <- unique(paste0("node",df$ni,".node",df$nj))
fwrite(list(v),"%inc%",col.names=F,quote=F)
$offecho
$call '"c:\program files\R\R-3.5.0\bin\Rscript.exe" script.R'

*
* nodes and arcs
*
set i 'nodes'/node1*node%n%/;
alias(i,j);

set A(i,j) 'arcs'/
$offlisting
$include %inc%
$onlisting
/;


We write from inside GAMS an R script that gets executed by rscript.exe. The steps are:
  • sample the arcs,
  • sort in the order that GAMS likes,
  • create a string representation of the tuple, e.g. "node1.node2",
  • remove duplicates (this means we end up with a number of arcs that is smaller than 250k),
  • write to an include file
The include file is then processed by GAMS. To speed things up we remove echoing to the listing file. 

This method is the winner: it takes just 3.5 seconds. Sometimes using a plain old text file as a communication channel is not so bad.

Viewing all articles
Browse latest Browse all 809

Trending Articles