R can read data very fast and conveniently from .Rdata files. E.g.
> load("indus89.rdata")
> length(ls())
[1] 275
For this data set we have a lot of data: 275 objects are loaded:
If we just want to inspect a few of these objects, it may be more convenient to use a lazy load DB format. Such a database consists of two files: an .RDB file with data and an .RDX file with an index indicating where each object is located in the .RDB data file. The .RDB data file is slightly larger than the corresponding .Rdata file:
This is because each object is stored and compressed individually.
Loading is super fast as we don’t really load the data:
> lazyLoad("indus89")
NULL
> length(ls())
[1] 275
This will only load the index of the data. The RStudio environment shows:
Now, as soon as we do anything with the data, it will load it. This is the “lazy” loading concept. E.g. lets just print alpha:
> head(alpha)
cq z1 value
1 basmati nwfp 6
2 basmati pmw 6
3 basmati pcw 21
4 basmati psw 21
5 basmati prw 21
6 basmati scwn 6
Now suddenly the Rstudio environment shows:
We can also lazy load a subset of the objects. E.g. if we want to load all symbols starting with the letter ‘a’ we can do:
> lazyLoad("indus89",filter=function(s){grepl("^a",s)})
NULL
> length(ls())
[1] 8
All symbols containing ‘water’ can be loaded as follows:
> lazyLoad("indus89",filter=function(s){grepl("water",s)})
NULL
This format seems an interesting alternative to store larger data sets, allowing more selective and lazy loading.
Update
As indicated in the comments below, the lazyLoad function is described as being for internal use. So usage may require a certain braveness.