qs

Build Status CRAN_Status_Badge CRAN_Downloads_Badge CRAN_Downloads_Total_Badge

Quick serialization of R objects

qs provides an interface for quickly saving and reading objects to and from disk. The goal of this package is to provide a lightning-fast and complete replacement for the saveRDS and readRDS functions in R.

Inspired by the fst package, qs uses a similar block-compression design using either the lz4 or zstd compression libraries. It differs in that it applies a more general approach for attributes and object references.

saveRDS and readRDS are the standard for serialization of R data, but these functions are not optimized for speed. On the other hand, fst is extremely fast, but only works on data.frame’s and certain column types.

qs is both extremely fast and general: it can serialize any R object like saveRDS and is just as fast and sometimes faster than fst.

Usage

library(qs)
df1 <- data.frame(x=rnorm(5e6), y=sample(5e6), z=sample(letters,5e6, replace=T))
qsave(df1, "myfile.qs")
df2 <- qread("myfile.qs")

Installation:

# CRAN version
install.packages("qs")

# CRAN version compile from source (recommended)
remotes::install_cran("qs", type="source", configure.args="--with-simd=AVX2")

# For earlier versions of R <= 3.4
remotes::install_github("traversc/[email protected]")

Features

The table below compares the features of different serialization approaches in R.

qs fst saveRDS
Not Slow
Numeric Vectors
Integer Vectors
Logical Vectors
Character Vectors
Character Encoding (vector-wide only)
Complex Vectors
Data.Frames
On disk row access
Random column access
Attributes Some
Lists / Nested Lists
Multi-threaded

qs also includes a number of advanced features:

These features have the possibility of additionally increasing performance by orders of magnitude, for certain types of data. See sections below for more details.

Summary Benchmarks

The following benchmarks were performed comparing qs, fst and saveRDS/readRDS in base R for serializing and de-serializing a medium sized data.frame with 5 million rows (approximately 115 Mb in memory):

data.frame(a=rnorm(5e6), 
           b=rpois(5e6,100),
           c=sample(starnames$IAU,5e6,T),
           d=sample(state.name,5e6,T),
           stringsAsFactors = F)

qs is highly parameterized and can be tuned by the user to extract as much speed and compression as possible, if desired. For simplicity, qs comes with 4 presets, which trades speed and compression ratio: “fast”, “balanced”, “high” and “archive”.

The plots below summarize the performance of saveRDS, qs and fst with various parameters:

Serializing

De-serializing

(Benchmarks are based on qs ver. 0.21.2, fst ver. 0.9.0 and R 3.6.1.)

Benchmarking write and read speed is a bit tricky and depends highly on a number of factors, such as operating system, the hardware being run on, the distribution of the data, or even the state of the R instance. Reading data is also further subjected to various hardware and software memory caches.

Generally speaking, qs and fst are considerably faster than saveRDS regardless of using single threaded or multi-threaded compression. qs also manages to achieve superior compression ratio through various optimizations (e.g. see “Byte Shuffle” section below).

ALTREP character vectors

The ALTREP system (new as of R 3.5.0) allows package developers to represent R objects using their own custom memory layout. This allows a potentially large speedup in processing certain types of data.

In qs, ALTREP character vectors are implemented via the stringfish package and can be used by setting use_alt_rep=TRUE in the qread function. The benchmark below shows the time it takes to qread several million random strings (nchar = 80) with and without ALTREP.