The following aggregation code in GAMS aggregates PRODUCTION and AREA HARVESTED from a relatively large data set from FAO (2.2 million records). Afterwards we recalculate YIELDS, as aggregating yields can not be done in the same way as PRODUCTION and AREA (this also famously occurs when aggregating prices: first do volume and value and then recalculate prices).
|
See also: http://yetanothermathprogrammingconsultant.blogspot.com/2013/05/large-scale-aggregation-example.html.
Of course we can also do this in SQL:
--- --- Aggregation --- useaggregation;
--- --- if target table exists, drop it --- IFEXISTS(SELECT*FROMsys.objects WHEREobject_id=OBJECT_ID('[dbo].[AggregatedData]')ANDtype='U') DROPTABLEAggregatedData;
--- --- Step 1: --- Aggregate Area and Production ---
SELECTC.CropGroup,R.Region,A.Element,A.[Year],Sum(A.Value)AS[Value] INTOAggregatedData FROMProduction_Crops_E_All_DataASA, cropmapASC, regionmapASR WHERE A.ElementIn('Area Harvested','Production Quantity')And A.[Year]>='1990'AndA.[Year]<='2000'And A.Country=R.CountryAnd A.Item=C.Crop GROUPBYC.CropGroup,R.Region,A.Element,A.[Year];
--- --- Step 2: --- Recalculate Yield --- INSERTINTOAggregatedData(CropGroup,Region,Element,[Year],[Value]) SELECTA.CropGroup,A.Region,'Yield',A.[Year], 1.0e5*A.[Value]/B.[Value] FROMAggregatedDataASA, AggregatedDataASB WHERE A.CropGroup=B.CropGroupAND A.Region=B.RegionAND A.[Year]=B.[Year]AND A.Element='Production Quantity'AND B.Element='Area Harvested'AND B.Value> 0;
|
Interesting to see how well SQL server is doing compared to GAMS. Here are timings in seconds:
| Step1 | Step2 | |
| GAMS | 1.4 | 0.08 |
| MS Access | 5.4 | 0.04 |
| SQL Server | 1.4 | 0.06 |