Quantcast
Channel: Yet Another Math Programming Consultant
Viewing all articles
Browse latest Browse all 809

matrix operations via GAMS Embedded Python: multi-regional Input-Output tables

$
0
0

Wassily Leontief in front of an IO table, credit:NYU


Inverting a dense matrix

GAMS allows running pieces of Python code as part of a GAMS model. Unfortunately, the interface is rather low-level and inefficient.  Usually "low-level" is associated with high-performance, but in the Python world, this is not the case. Here is an example.

Suppose we want to use the Python interface to invert a matrix \(A\).

The main GAMS file can look like:

$ontext

  
Invert a square, dense matrix.


$offtext


set i /i1*i1000/;
alias(i,j,k);

parameters
  a(i,j)
'square matrix A'
  b(j,i)
'B=inv(A)'
;
a(i,j) = 1/(
ord(i)+ord(j)-1);
a(i,i) = 1;
* This is a Hilbert matrix with diagonal replaced by ones.

$batincludeinvert.inc A B i

display$(card(i)<50) B;

* check solution for smaller instances
parameters
   ident(i,j)
'identity matrix'
   maxerr    
'maximum error'
;

if (card(i)<=200,
   ident(i,i)=1;
   maxerr =
smax((i,j),abs(sum(k,A(i,k)*B(j,k))-ident(i,j)));
  
display maxerr;
);



Note: the check may look a bit strange. However, \(B\) is symmetric, so I swapped the indices. It can save a bit on the processing time.
 
The inversion code itself is packaged in an include file:


* invert.inc
* Arguments:
*    A: input matrix
*    B: output matrix (inverse or all zero)
*    i: sets defining A and B
* if singular return a parameter with all zeros


$offlisting

embeddedCode Python:

gams.printLog("\n### invert.inc")
import numpy as np
import numpy.linalg as la
import time

t0 = time.time()

sa =
"%1"  # name of A matrix
sb =
"%2"  # name of B=inv(A) matrix
si =
"%3"  # name of driving set

#
# load set i
#
I = list(gams.get(si,keyType=KeyType.INT))
n = len(I)
assert n>0, f">>>> set {si} has no elements"
assert isinstance(I[0],int),  f">>>> set {si} is not a 1 dim set"
imap = dict(zip(I,range(n))) 
# maps gams indices -> Python indices


#
# allocate and populate matrix A
#
A = np.zeros((n,n))
try:
  
for (i,j,aij) in gams.get(sa,keyType=KeyType.INT,keyFormat=KeyFormat.FLAT):
       A[imap[i],imap[j]] = aij
except:
  
assert False,f">>>> Error in set elements. Check the set {si}."

t1 = time.time()-t0
gams.printLog(f
".. Copied data from {si},{sa} ({t1:.1f}s)")


#
# invert
#
try:
  B = la.inv(A)
except la.LinAlgError:
  gams.printLog(
".. Singular matrix. Returning zeros.")
  B = np.zeros((n,n))

t2 = time.time()-t0-t1
gams.printLog(f
".. Inverted {sa} ({t2:.1f}s)")

#
# return B
#
GamsB = [((I[k[
0]],I[k[1]]),v) for k,v in np.ndenumerate(B) if abs(v)>1e-9]
if len(GamsB)==0:
   GamsB = [((I[
0],I[0]),0.0)]
gams.set(sb,GamsB)

t3 = time.time()-t0-t1-t2
gams.printLog(f
".. Created GAMS sparse matrix {sb} ({t3:.1f}s)")

t = t1+t2+t3
gams.printLog(f
".. Total elapsed {t:.1f}s")

endEmbeddedCode
%2



We can pass command line arguments to this include file so it functions like a batch file (hence the name batinclude). Not really a function, but similar functionality. The batinclude file contains embedded Python code that does the matrix inversion and also spends time on converting GAMS data to a Python numpy matrix and back. 

The performance of this piece of code is rather dismal. We see:
 

### invert.inc
--- .. Copied data from i,A (19.9s)
--- .. Inverted A (0.7s)
--- .. Created GAMS sparse matrix B (25.6s)
--- .. Total elapsed 46.2s


This means we have an overhead of \[\frac{46.2-0.7}{0.7}\times 100\% = 6,500\%\] We pay dearly for the privilege to use np.inv(A). The reason is that the interface is not using the right abstraction level. We get data that is in a form that needs much processing.

The Leontief inverse


In the field of input-output modeling, economists are interested in the Leontief inverse (sometimes called total requirements matrix), or: \[(I-A)^{-1}\] I have a national IO table lying around for Japan, 1995 [2]. This table has the following 93 sectors or industries:


001'Crop cultivation'
002'Livestock and sericulture'
003'Agricultural services'
004'Forestry'
005'Fisheries'
006'Metallic ores'
007'Non-metallic ores'
008'Coal'
009'Crude petroleum and natural gas'
010'Foods'
011'Drinks'
012'Feeds and organic fertilizer, n.e.c.'
013'Tabacco'
014'Textile products'
015'Wearing apparel and other textile products'
016'Timber and wooden products'
017'Furniture and fixtures'
018'Pulp, paper, paperboard and processed paper'
019'Paper products'
020'Publishing and printing'
021'Chemical fertilizer'
022'Inorganic basic chemical products'
023'Petrochemical basic products and intermediate chemical products'
024'Synthetic resins'
025'Synthetic fibers'
026'Medicaments'
027'Final chemical products, n.e.c.'
028'Petroleum refinery products'
029'Coal products'
030'Plastic products'
031'Rubber products'
032'Leather, fur skins and miscellaneous leather products'
033'Glass and glass products'
034'Cement and cement products'
035'Pottery, china and earthenware'
036'Other ceramic, stone and clay products'
037'Pig iron and crude steel'
038'Steel products'
039'Steel castings and forgings and other steel products'
040'Non-ferrous metals'
041'Non-ferrous metal products'
042'Metal products for construction and architecture'
043'Other metal products'
044'General industrial machinery'
045'Special industrial machinery'
046'Other general machines'
047'Machinery for office and service industry'
048'Household electric appliance'
049'Electronic equipment and communication equipment'
050'Heavy electrical equipment'
051'Other electrical machinery'
052'Motor vehicles'
053'Ships and repair of ships'
054'Other transportation equipment and repair of transportation equipment'
055'Precision instruments'
056'Miscellaneous manufacturing products'
057'Construction'
058'Repair of constructions'
059'Civil engineering'
060'Electric power for enterprise use'
061'Gas and heat supply'
062'Water supply'
063'Waste disposal services'
064'Commerce'
065'Finance and insurance'
066'Real estate agencies and rental services'
067'House rent'
068'Railway transport'
069'Road transport(except transport by private cars)'
070'Transport by private cars'
071'Water transport'
072'Air transport'
073'Freight forwarding'
074'Storage facility services'
075'Services relating to transport'
076'Communication'
077'Broadcasting'
078'Public administration'
079'Education'
080'Research'
081'Medical service and health'
082'Social security'
083'Other public services'
084'Advertising, survey and information services'
085'Goods rental and leasing services'
086'Repair of motor vehicles and machine'
087'Other business services'
088'Amusement and recreational services'
089'Eating and drinking places'
090'Hotel and other lodging places'
091'Other personal services'
092'Office supplies'
093'Activities not elsewhere classified'

That is a bit smaller than our previous random matrix.

An IO table has roughly the following structure:


The main matrix elements of this transaction matrix can be defined as \[x_{i,j} = \text{output of industry $i$  sold to industry $j$}\] I.e. the rows are the sellers and the columns are the buyers [3].

The data needs a bit of preprocessing to calculate \(A\) and then \(I-A\) before we can invoke our matrix inversion code. The matrix \(A\) is called the matrix of input coefficients or technological coefficients, and can be formed by: \[a_{i,j} := \frac{x_{i,j}}{X_j}\]

The code to calculate the Leontief inverse \((I-A)^{-1}\) can look like:


$ontext

   
Calculate the Leontief inverse.

$offtext

set i 'sectors/industries'/001*093/;
alias(i,j);

singletonset totalProduction /115/;

table IOTable(*,*) 'complete IO table of 1995 Japan [millions yens]'
$include io93pro.inc
;

display IOTable;

parameter
   A(i,j)    
'input coefficients'
   Ident(i,j)
'Identity matrix'
   IA(i,j)   
'I-A'
   L(i,j)    
'Leontief inverse'
;
A(i,j) = IOTable(i,j)/IOTable(totalProduction,j);

Ident(i,i) = 1;
IA(i,j) = Ident(i,j)-A(i,j);

$batincludeinvert.inc IA L i

display L;



This solves very fast as the matrix is small.

Leontief Inverse for Regional IO Tables


Regional Input-Output tables provide us with some challenges:
  • They tend to be big: the number of rows and columns of \(A\) is the number of regions times the number of sectors.
  • The indices for the rows and columns are 2-dimensional: regions and sectors. We either need to map this to a one-dimensional set or we need the matrix-inversion code to accept multi-dimensional sets for the rows and columns. 
Here we try to develop some code that can take a regional \(A\) matrix with multi-dimensional row and column indices (without the need to map this to a 1d set) and produces a Leontief inverse.

As an example, I use WIOT data [4]. This data set has 56 industries and 44 regions.

REGIONS
=======

AUS 'Australia'
AUT 'Austria'
BEL 'Belgium'
BGR 'Bulgaria'
BRA 'Brazil'
CAN 'Canada'
CHE 'Switserland'
CHN 'China'
CYP 'Cyprus'
CZE 'Czech Republic'
DEU 'Germany'
DNK 'Denmark'
ESP 'Spain'
EST 'Estonia'
FIN 'Finland'
FRA 'France'
GBR 'UK'
GRC 'Greece'
HRV 'Croatia'
HUN 'Hungary'
IDN 'Indonesia'
IND 'India'
IRL 'Ireland'
ITA 'Italy'
JPN 'Japan'
KOR 'South Korea'
LTU 'Lithuania'
LUX 'Luxembourg'
LVA 'Latvia'
MEX 'Mexico'
MLT 'Malta'
NLD 'The Netherlands'
NOR 'Norway'
POL 'Poland'
PRT 'Portugal'
ROU 'Romania'
RUS 'Russia'
SVK 'Slovak Republic'
SVN 'Slovenia'
SWE 'Sweden'
TUR 'Turkey'
TWN 'Taiwan'
USA 'USA'
ROW 'Rest of the world'


SECTORS
=======

A01 'Crop and animal production, hunting and related service activities'
A02 'Forestry and logging'
A03 'Fishing and aquaculture'
B 'Mining and quarrying'
C10-C12 'Manufacture of food products, beverages and tobacco products'
C13-C15 'Manufacture of textiles, wearing apparel and leather products'
C16 'Manufacture of wood and of products of wood and cork, except furniture; manufacture of articles of straw and plaiting materials'
C17 'Manufacture of paper and paper products'
C18 'Printing and reproduction of recorded media'
C19 'Manufacture of coke and refined petroleum products'
C20 'Manufacture of chemicals and chemical products'
C21 'Manufacture of basic pharmaceutical products and pharmaceutical preparations'
C22 'Manufacture of rubber and plastic products'
C23 'Manufacture of other non-metallic mineral products'
C24 'Manufacture of basic metals'
C25 'Manufacture of fabricated metal products, except machinery and equipment'
C26 'Manufacture of computer, electronic and optical products'
C27 'Manufacture of electrical equipment'
C28 'Manufacture of machinery and equipment n.e.c.'
C29 'Manufacture of motor vehicles, trailers and semi-trailers'
C30 'Manufacture of other transport equipment'
C31_C32 'Manufacture of furniture; other manufacturing'
C33 'Repair and installation of machinery and equipment'
D35 'Electricity, gas, steam and air conditioning supply'
E36 'Water collection, treatment and supply'
E37-E39 'Sewerage; waste collection, treatment and disposal activities; materials recovery; remediation activities and other waste management services'
F 'Construction'
G45 'Wholesale and retail trade and repair of motor vehicles and motorcycles'
G46 'Wholesale trade, except of motor vehicles and motorcycles'
G47 'Retail trade, except of motor vehicles and motorcycles'
H49 'Land transport and transport via pipelines'
H50 'Water transport'
H51 'Air transport'
H52 'Warehousing and support activities for transportation'
H53 'Postal and courier activities'
I 'Accommodation and food service activities'
J58 'Publishing activities'
J59_J60 'Motion picture, video and television programme production, sound recording and music publishing activities; programming and broadcasting activities'
J61 'Telecommunications'
J62_J63 'Computer programming, consultancy and related activities; information service activities'
K64 'Financial service activities, except insurance and pension funding'
K65 'Insurance, reinsurance and pension funding, except compulsory social security'
K66 'Activities auxiliary to financial services and insurance activities'
L68 'Real estate activities'
M69_M70 'Legal and accounting activities; activities of head offices; management consultancy activities'
M71 'Architectural and engineering activities; technical testing and analysis'
M72 'Scientific research and development'
M73 'Advertising and market research'
M74_M75 'Other professional, scientific and technical activities; veterinary activities'
N 'Administrative and support service activities'
O84 'Public administration and defence; compulsory social security'
P85 'Education'
Q 'Human health and social work activities'
R_S 'Other service activities'
T 'Activities of households as employers; undifferentiated goods- and services-producing activities of households for own use'
U 'Activities of extraterritorial organizations and bodies'


The resulting IO matrix is \(2464\times 2464\). Of course, the indexing will be a bit more complicated. The \(A\) matrix will look like \(a_{i,r,i',r'}\), i.e. a four-dimensional structure. To help with this, I created a batinclude file that given a matrix \(A\) and a one-, two- or more-dimensional index set, calculates the Leontief inverse matrix \((I-A)^{-1}\). 

Regional IO Table with two-dimensional rows and columns



First: here is the code to read in the IO table, to call the Leontief inverse code, and to perform some checks. The checks are useful in themselves, but also to demonstrate how to write matrix operations over what are essentially tensors.


$ontext

   
Leontief inverse of a large regional IO table

  
Data from http://www.wiod.org/database/wiots16

$offtext

*----------------------------------------------------------------
* Convert spreadsheet and import GDX file
*----------------------------------------------------------------

$set data     WIOT2014_Nov16_ROW
$set xls      %data%.xlsb
$set gdx      %data%.gdx

* remove WIOT2014_Nov16_ROW.gdx to force rereading the spreadsheet

$if not exist %gdx% $call 'gdxxrw %xls% par=wiot rng=A3 rdim=2 cdim=2 ignorerows=4,6 ignorecolumns=B,D'


parameter WIOT(*,*,*,*) 'WIOT Intercountry Input-Output Table';
$gdxin %gdx%
$loaddc WIOT


*----------------------------------------------------------------
* Sets
*----------------------------------------------------------------

sets
   i
'industries'/
     
A01,A02,A03,B,C10-C12,C13-C15,C16,C17,C18,C19,C20,C21
     
C22,C23,C24,C25,C26,C27,C28,C29,C30,C31_C32,C33,D35,E36
     
E37-E39,F,G45,G46,G47,H49,H50,H51,H52,H53,I,J58,J59_J60
     
J61,J62_J63,K64,K65,K66,L68,M69_M70,M71,M72,M73,M74_M75
     
N,O84,P85,Q,R_S,T,U
   
/
   r
'regions (43 countries + ROW)'/
     
AUS,AUT,BEL,BGR,BRA,CAN,CHE,CHN,CYP,CZE,DEU,DNK,ESP
     
EST,FIN,FRA,GBR,GRC,HRV,HUN,IDN,IND,IRL,ITA,JPN,KOR
     
LTU,LUX,LVA,MEX,MLT,NLD,NOR,POL,PRT,ROU,RUS,SVK,SVN
     
SWE,TUR,TWN,USA,ROW
   
/
  ir(i,r)
;
ir(i,r) =
yes;
alias(i,ii,i2,i3),(r,rr,r2,r3),(ir,ir2,ir3);


*----------------------------------------------------------------
* do some counting
*----------------------------------------------------------------

parameter counts(*) 'industries,regions,combinations';
counts(
'i:sectors') = card(i);
counts(
'r:regions') = card(r);
counts(
'ir:combinations') = card(ir);
option counts:0;
display counts;

*----------------------------------------------------------------
* extract parameters
*----------------------------------------------------------------

set
  fd
'final demand'/
         
CONS_h  'Final consumption expenditure by households'
         
CONS_np'Final consumption expenditure by non-profit organisations serving households (NPISH)'
         
CONS_g  'Final consumption expenditure by government'
         
GFCF    'Gross fixed capital formation'
         
INVEN   'Changes in inventories and valuables'
      
/
;

parameter
   finalDemand(i,r)
'final demand f'
   grossOutput(i,r)
'total'
   A(i,r,i,r)      
'technological coefficients'
   Ident(i,r,i,r)  
'Indentity matrix I'
;
finalDemand(ir) =
sum((fd,r),WIOT(ir,fd,r));
grossOutput(ir) = WIOT(ir,
'GO','TOT');
A(ir,ir2)$grossOutput(ir2) = WIOT(ir,ir2)/grossOutput(ir2);
Ident(i,r,i,r) = 1;

*----------------------------------------------------------------
* sanity check
*----------------------------------------------------------------

* x = Z*1 + f

parameter Xerr(i,r) 'error in identity x = Z*1 + f';
Xerr(ir) = grossOutput(ir) -
sum(ir2,WIOT(ir,ir2)) - finalDemand(ir);
abort$sum(ir$(abs(Xerr(ir))>1e-5),1) "Check Xerr";

*----------------------------------------------------------------
* form Leontief inverse
*----------------------------------------------------------------

parameter L(i,r,i,r) 'Leontief inverse';

$batincludeleontief.inc A L ir

*----------------------------------------------------------------
* some checks
*----------------------------------------------------------------

* check  (I-A) * inv(I-A) = I
* do a small subset of countries (a complete check would take a long time).
set rsub(r) /AUS,AUT/;
parameter InvErr(i,r,i,r) 'error in identity (I-A)*L=I';
InvErr(ir(i,r),ir2(i2,r2))$(rsub(r)
and rsub(r2)) =
   
sum(ir3, (Ident(ir,ir3)-A(ir,ir3))*L(ir3,ir2)) - Ident(ir,ir2);
abort$sum((ir,ir2)$(abs(InvErr(ir,ir2))>1e-5),1) "Check InvErr";

* check  x = inv(I-A)*f = L*f
parameter xInvErr(i,r) 'error in x = inv(I-A)*f';
xInvErr(ir) =
sum(ir2, L(ir,ir2)*finalDemand(ir2)) - GrossOutput(ir);
abort$sum(ir$(abs(XInvErr(ir))>1e-5),1) "Check XInvErr";



Here the two-dimensional set ir(i,r) is used to drive the inversion code. It is also used in writing the matrix expressions (to make the matrices look like 2d structures with rows and columns, e.g. A(ir,ir2)).
 
The batinclude file with the Python code for the Leontief inverse is:


$ontext

 
leontief.inc

 
Calculate the leontief inverse.

 
Arguments:
  
%1 : input matrix: square matrix with technological coefficients
  
%2 : output matrix: Leontief matrix
  
%3 : sets defining A and L
       
can have 1 or multiple dimensions

$offtext

$offlisting

embeddedCode Python:

gams.printLog("\n### leontief.inc")
import numpy as np
import numpy.linalg as la
import scipy.sparse as sp
import time

TRACE = True

# arguments
sA =
"%1"
sL =
"%2"
sI =
"%3"

t0 = time.time()

#
#  load set I
#
if TRACE: gams.printLog(f".. load set {sI}")
gmsI = list(gams.get(sI,keyType=KeyType.INT))
if TRACE: gams.printLog(f".... {gmsI[0:5]} (max 5 shown)")

#
#  get size and dimension
#
n = len(gmsI)
if TRACE: gams.printLog(f".. size of matrices=({n}x{n})")
assert n>0
if isinstance(gmsI[0],int):
   dim =
1
else:
   dim = len(gmsI[
0])
if TRACE: gams.printLog(f".. dim={dim} (dimension of row and column indices)")

#
#  load matrix
#
gams.printLog(f
".. load parameter {sA}")
gmsA = list(gams.get(sA,keyType=KeyType.INT,keyFormat=KeyFormat.FLAT))
if TRACE: gams.printLog(f".... {gmsA[0:3]} (max 3 shown)")

t1 = time.time()-t0
gams.printLog(f
".. loaded gams data {sI},{sA} ({t1:.1f}s)")


#
#  convert gmsA into something we can actually use
#

# set up dict for indices that maps GAMS numbers to Python indices
imap = dict(zip(gmsI,range(len(gmsI))))

# set up matrix.
gmsA2 = list(zip(*gmsA))
# unzip
try:
 
if dim==1:
    irow = [imap[k]
for k in gmsA2[0]]
    jcol = [imap[k]
for k in gmsA2[1]]
    aij  =  list(a2[
2])
 
else:
    irow = [imap[kk]
for kk in zip(*[gmsA2[k] for k in range(dim)])]
    jcol = [imap[kk]
for kk in zip(*[gmsA2[k] for k in range(dim,2*dim)])]
    aij  = list(gmsA2[
2*dim])
except:
   gams.printLog(f
">>>> Some indices are not in {sI}. Check the set {sI}.")
  
raise ValueError()

# create a dense numpy matrix I-A
npIA = np.eye(n) - sp.coo_matrix((aij,(irow,jcol)),shape=(n,n)).toarray()

t2 = time.time()-t0-t1
gams.printLog(f
".. Created matrix I-{sA} ({t2:.1f}s)")

#
# invert I-A
#
try:
  invIA = la.inv(npIA)
except la.LinAlgError:
  gams.printLog(
".. Singular matrix. Returning empty matrix.")
  invIA = np.zeros_like(npIA)

t3 = time.time()-t0-t1-t2
gams.printLog(f
".. Inverted matrix I-{sA} ({t3:.1f}s)")

#
# return inv(I-A)
#
if dim==1:
   listL = [((gmsI[i1],gmsI[j1]),invIA[i1,j1])
for i1 in range(n) for j1 in range(n)]
else:
   listL = [(tuple(gmsI[i1]+gmsI[j1]),invIA[i1,j1])
for i1 in range(n) for j1 in range(n)]

gams.set(sL,listL)

t4 = time.time()-t0-t1-t2-t3
gams.printLog(f
".. Created matrix {sL} ({t4:.1f}s)")


t = time.time()-t0
gams.printLog(f
".. elapsed {t:.1f}s")

endEmbeddedCode
%2



Again, the Python code is very slow so we spend a lot of time preparing the matrix for inversion. In this case, the inversion takes a couple of seconds, but the overhead is several minutes! 


Conclusions


  • The embedded Python API is not well suited for algorithms that require a substantial amount of data to be exchanged with GAMS. It uses a low-level approach, something which is not compatible with Python. Python is just not designed to do large-scale low-level stuff. This is a design problem in the API: the wrong abstraction level was chosen. Data should be exchanged in higher-level data structures such as numpy arrays or Pandas dataframes using high-performance kernels. This will lead to performance gains of at least a factor of 10.
  • As a result, an implementation using the Python embedded API suffers from large overhead and requires patience from the user.   

Notes


  • I have had some good experiences with Cython and Numba to speed up Python code. But I don't think those tools are applicable here.
  • GAMS 36.0 comes with a Python-based matrix inversion. It is similar to my version and can be invoked as:
     
       $libInclude linalg -e invert i A B

    Unfortunately, it seems to be about 50% slower than my version:

    46.125   0.068GB        33 EmbeddedCode   (my version)
    73.469   0.068GB       107 EmbeddedCode   (GAMS linalg version)

    (The first column is the time in seconds). I have not analyzed this carefully but I think I do better because I don't have if statements in a tight loop. Note that the actual matrix inversion takes less than a second.
  • When you want to use this for practical input-output models, you probably want to save the inverse matrix in a GDX file or a restart file so it is readily available.
  • We could use an LP solver to invert a matrix [5]. That is more of a curiosity. 
  • There are good arguments to never invert a matrix [6]. However, for reasons of convenience and familiarity, it is easier to use an explicit inverse here. This is just how users work and how they are educated.

References

 
  1. Input-output model, https://en.wikipedia.org/wiki/Input%E2%80%93output_model
  2. Input-Output Tables for Japan, https://www.soumu.go.jp/english/dgpp_ss/data/io/index.htm
  3. Ronald E. Miller and Peter D. Blair, Input-Output Analysis, Foundations and Extensions, Cambridge University Press, 2nd edition, 2009.
  4. World Input-Output Database, http://www.wiod.org/database/wiots16
  5. Inverting a matrix by LP, http://yetanothermathprogrammingconsultant.blogspot.com/2021/04/inverting-matrix-by-lp.html
  6. John D. Cook, Don't invert that matrixhttps://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/

Viewing all articles
Browse latest Browse all 809

Trending Articles