Quantcast
Channel: Yet Another Math Programming Consultant
Viewing all articles
Browse latest Browse all 805

R: lazy load DB files

$
0
0

R can read data very fast and conveniently from .Rdata files. E.g.

> load("indus89.rdata")
>
length(ls())
[1] 275

For this data set we have a lot of data: 275 objects are loaded:

image

If we just want to inspect a few of these objects, it may be more convenient to use a lazy load DB format. Such a database consists of two files: an .RDB file with data and an .RDX file with an index indicating where each object is located in the .RDB data file. The .RDB data file is slightly larger than the corresponding .Rdata file:

image

This is because each object is stored and compressed individually.

Loading is super fast as we don’t really load the data:

> lazyLoad("indus89")
NULL
>
length(ls())
[1] 275

This will only load the index of the data. The RStudio environment shows:

image

Now, as soon as we do anything with the data, it will load it. This is the “lazy” loading concept. E.g. lets just print alpha:

> head(alpha)
cq z1 value
1 basmati nwfp 6
2 basmati pmw 6
3 basmati pcw 21
4 basmati psw 21
5 basmati prw 21
6 basmati scwn 6

Now suddenly the Rstudio environment shows:

image

We can also lazy load a subset of the objects. E.g. if we want to load all symbols starting with the letter ‘a’ we can do:

> lazyLoad("indus89",filter=function(s){grepl("^a",s)})
NULL
>
length(ls())
[1] 8

image

All symbols containing ‘water’ can be loaded as follows:

> lazyLoad("indus89",filter=function(s){grepl("water",s)})
NULL

image

This format seems an interesting alternative to store larger data sets, allowing more selective and lazy loading.

Update

As indicated in the comments below, the lazyLoad function is described as being for internal use. So usage may require a certain braveness.


Viewing all articles
Browse latest Browse all 805

Trending Articles