xlsxpackage provides tools neccessary to interact with Excel 2007 files from R. The user can read and write xlsx files, and can control the appearance of the spreadsheet by setting data formats, fonts, colors, borders. Set the print area, the zoom control, create split and freeze panels, adding headers and footers. The package uses a java library from the Apache POI project.
xlsx makes possible to interact with Excel 2007 files from R
r Cite(bib,'R-base'). While a power R user usually does not need to use Excel or even avoids it altogether, there are cases when being able to generate Excel output or read Excel files into R is a must. One such case is in an office environment when you need to collaborate with co-workers who use Excel as their primary tool. Another case is to use Excel for basic reporting of results. For moderate data sets, the formatting capabilities of Excel can prove useful. A flexible way to manipulate Excel 2007 xlsx files from R would then be a nice addition.
xlsx package focuses on Excel 2007 because for Excel 97 there are already several solutions,
readxl``r Cite(bib,'readxl'), etc. While some of these packages work with Excel 2007 files too, the contribution of the
xlsx package is different.
The xlsx Excel 2007 file format is essentially a zipped set of xml files. It is possible to interact directly with these files from R as shown by the package
RExcelXML. All the functionality of the
xlsx package can be replicated with
RExcelXML package or by extending it. Working directly with the zipped xml files, and using the xml schema to extract the useful information into R gives you ultimate control.
The approach taken in the
xlsx package is to use a proven, existing API between Java and Excel 2007 and use the
rJava  package to link Java and R. The advantage of this approach is that the code on the R side is very compact, easy to maintain, and easy to extend even for people with little Java experience. All the heavy lifting of parsing XML schemas is being done in Java. We also benefit from a mature software project with many developers, test suites, and users that report issues on the Java side. In principle, this should make the maintainance of the
xlsx package easy. The Java API used by the
xlsx is one project of the Apache Software Foundation, called Apache POI and can be found at http://poi.apache.org/.
The Apache POI Project’s mission is to create and maintain Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft’s OLE 2 Compound Document format (OLE2). These include Excel, Word, and PowerPoint documents. While the focus of the
xlsx package has been only on Excel files, extensions for Word and PowerPoint documents are available in the ReporteRs package.
That said, sometimes using Java from R can be a bit tricky to configure and work with. While using Apache POI allows us to benefit from a robust Java community, others might prefer to use a C++ API and
Rcpp to interact with the raw XML in a way more natural to R. For more reading on this approach, see the openxlsx package or officer (for Word / PowerPoint).
read.xlsx for reading the sheet of an xlsx file into a data.frame. See
write.xlsx writing a data.frame to an xlsx file.
Workbook for creating workbooks. See
Worksheet for code to manipulate worksheets. See
Cell for manipulating cells.
OtherEffects for various spreadsheet effects, for example, merge cells, auto size columns, create freeze panels, create split panels, set print area, set the zoom, etc.
PrintSetup for customizing the settings for printing.
CellStyle for how to format a particular cell.
Font to set a font.
By adding a lightweight R layer on top of the Apache project Java interface to Excel 2007 documents, we achieve a multi-platform solution for interacting with Excel 2007 file formats from R.