format(object.size(x=i),units="Mb") # (approximate) size of the object (in Mb)
memory.size(max=FALSE) # memory amount currently in use by R (Mb)
memory.size(max=TRUE) # maximum memory amount used during R session (Mb)
getOption("ffpagesize") # see page size (in b)
options(ffpagesize=2^16) # set page size (in b)
- factor
- ordered
- custom compounds
- Date
- POSIXct
To create a one billion long ff vector which contains sequence of numbers 1:1e9, one can try:
f <- ff(initdata=1:1e9,vmode="integer") # try creating ff vector
However the above statement execution will probably fail due to memory constraints, because the initdata sequence 1,2,...1e9 is standard R operation on integers and it may not fit into memory at once. One has to create an empty one billion long ff vector of integers first and then populate it in a batch approach. The size of the batch can be set either with parameter BATCHSIZE (maximum number of elements to be processed at once) or with parameter BATCHBYTES (maximum number of bytes to be processed at once):
f <- ff(length=1e9,vmode="integer",caching="mmeachflush") # define an empty one billion long ff vector of integers, the „each flush“ caching secures that memory pages are flushed at each step
ffvecapply(f[i1:i2]<-i1:i2, X=f, ,BATCHSIZE = 1e5,VERBOSE=T) # the i1 and i2 are standard variables supplied by the ffvecapply function and mark the start and end of the records processed in each batch
class(f) # "ff_vector" "ff"
class(f[1:10]) # but if using [] result is standard R object (i.e. vector of integers)
To define a ff matrix(array) and populate it by columns or by rows:
m <- ff(length=3e8,vmode="quad",caching="mmeachflush",levels=c("A","B","C")) # define a long ff vector of quads (up to 4 values) and define levels
class(m) # "ff_vector" "ff"
dim(m)<-c(1e8,3) # define matrix 1e8x3
class(m) # "ff_matrix" "ff_array" ff"
class(m[1:10,]) # but if using [] result is standard R object (i.e. matrix of factors)
ffrowapply(m[i1:i2,,bydim=c(2,1)]<-c("A","B","C"), X=m, ,BATCHSIZE = 3e5,VERBOSE=T) # the i1 and i2 are standard variables supplied by the ffrowapply function and mark the first and last row of the batch, use bydim to set row-based ordering
To see the attributes of a ff class object:
physical(m) # shows the physical attributes of the R object
virtual(m) # shows the virtual attributes of the R object
To coerce the ff object to a ram object (if enough memory):
ramclass(m)
as.ram(m)
And to coerce ram object to an ff object:
as.ff(i)
To iterate through the ff object elements to perform some operation, the range index approach can be exploited, as the chunk.bit() function returns range indexes for each chunk. Either the for loop can be used or lapply functionality as well:
chunk.bit(f,BATCHBYTES = 1024^2) # range indexes by 1MB chunks
for (i in chunk.bit(f,BATCHBYTES = 10*1024^2)) f[i] <- rnorm(n=sum(as.bit(i))) # generate random numbers, by chunks
l=lapply(chunk(f,BATCHBYTES = 1024^2),function(i) max(f[i])) # get maximum for each chunk
print(crbind(l)) # bind results of all chunks by rows together
It is possible (and even wished) that multiple ff objects use same flat file.
To find out which flat file is attached by which R objects:
b1 <- as.bit(sample(c(FALSE,TRUE), 1000, TRUE))
class(b1) # bit
f1 <- as.ff(b1)
class(f1) # "ff_vector" "ff"
b2 <- as.bit(f1)
f2 <- as.ff(b2)
filename(f1)
filename(f2) # yet another object attached to the same flat file
f1[1]=T # set first object of first ff to TRUE
f2[1]=F # set first object of second ff to FALSE
f1[1]==f2[1] # since both use same file, the first object is equal (FALSE)
If there is need to physically copy whole ff object, use instead:
f2=clone(f1)
f1[1]=T
f2[1]=F
f1[1]==f2[1] # returns FALSE since they use different files
Similar to data.frame is the ffdf class, which is constructed as container of ff vectors:
m <- matrix(1:12, 3, 4, dimnames=list(c("r1","r2","r3"), c("m1","m2","m3","m4")))
v <- 1:3
ffm <- as.ff(m)
ffv <- as.ff(v)
ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm))
filename(ffm)
filename(ffv)
filename(ffd)
ffd
physical(ffd)
The storage of vectors can be controlled. Either split them into more or join them into fewer.
ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm), ff_join=list(newff=c(1,2)))
ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm), ff_split=1)
To sort the dataframe:
d2 <- ffdfsort(d) # sort directly
i <- ffdforder(d2) # sort through order index
d[i,]
To save ff objects:
ffsave(df, add=TRUE, file="d:/tmp/y", rootpath="d:/tmp")
No comments:
Post a Comment