CRAN Task View: Official Statistics & Survey Statistics

Maintainer:Matthias Templ, Alexander Kowarik, Tobias Schoch
Contact:matthias.templ at gmail.com
Version:2022-02-10
URL:https://CRAN.R-project.org/view=OfficialStatistics
Source:https://github.com/cran-task-views/OfficialStatistics/
Contributions:Suggestions and improvements for this task view are very welcome and can be made through issues or pull requests on GitHub or via e-mail to the maintainer address. For further details see the Contributing guide.
Citation:Matthias Templ, Alexander Kowarik, Tobias Schoch (2022). CRAN Task View: Official Statistics & Survey Statistics. Version 2022-02-10. URL https://CRAN.R-project.org/view=OfficialStatistics.
Installation:The packages from this task view can be installed automatically using the ctv package. For example, ctv::install.views("OfficialStatistics", coreOnly = TRUE) installs all the core packages or ctv::update.views("OfficialStatistics") installs all packages that are not yet installed and up-to-date. See the CRAN Task View Initiative for more details.

This CRAN Task View contains a list of packages with methods typically used in official statistics and survey statistics. Many packages provide functions for more than one of the topics listed below. Therefore, this list is not a strict categorization and packages may be listed more than once.

The task view is split into several parts

First Part: Production of Official Statistics

1 Preparations/ Management/ Planning (questionnaire design, etc.)

2 Sampling

3 Data Collection (incl. record linkage)

3.1 Data Integration (Statistical Matching and Record Linkage)

3.2 Web Scraping

Web scraping is used nowadays used more frequently in the production of official statistics. For example in price statistics, the collection of product prices, formerly collected by hand over the web or by in person visits to stores are replaced by scraping specific homepages. Tools for this process step are not listed here, but a detailed overview can be found on the CRAN task view on WebTechnologies.

4 Data Processing

4.1 Weighting and Calibration

4.2 Editing (including outlier detection)

4.3 Imputation

A general overview of imputation methods can be found in the CRAN Task View on Missing Data, MissingData. However, most of these presented methods do not take into account the specificities of survey’s from complex designs, i.e., methods that are not specifically designed for official statistics and surveys. For example, the criteria for applying a method often depend on the scale of the data, which in official statistics are usually a mixture of continuous, semi-continuous, binary, categorical, and count variables. In addition, measurement error can greatly affect non-robust imputation methods.

Commonly used packages within statistical agencies are VIM and simputation having fast k-nearest neighbor (knn) algorithms for general distances and (robust) EM-based multiple imputation algorithms implemented.

4.4 Seasonal Adjustment

Seasonal adjustment is an important step in producing official statistics and a very limited set of methodologies are used here frequently, e.g. X13-ARIMA-SEATS developed by the US Census Bureau. In the CRAN Task View TimeSeries section seasonal adjustment, R packages for this can be found.

5 Analysis of Survey Data

5.1 Estimation and Variance Estimation

5.2 Visualization

6 Statistical Disclosure Control

Data from statistical agencies and other institutions are in its raw form mostly confidential and data providers have to be ensure confidentiality by both modifying the original data so that no statistical units can be re-identified and by guaranteeing a minimum amount of information loss.

Unit-level data (microdata)

Aggregated information (tabular data)

Remote access

Second Part: Access to Official Statistics

Access to data from international organizations and multiple organizations

Access to data from national organizations

Third Part: Related Methods

Small Area Estimation

Microsimulation

Indices, Indicators, Tables and Visualization of Indicators

Miscellaneous

CRAN packages

Core:errorlocate, sae, sampling, SamplingStrata, sdcMicro, sdcTable, simPop, survey, validate, validatetools, VIM.
Regular:acs, anesrake, BalancedSampling, BayesSAE, BIFIEsurvey, blaise, CalibrateSSB, cancensus, CANSIM2R, cbsodataR, cdlTools, censusapi, censusGeography, collapse, convey, deducorrect, deductive, DHS.rates, diffpriv, DSI, easySdcTable, editrules, EdSurvey, emdi, eurostat, extremevalues, FAOSTAT, fastLink, FFD, Frames2, fuzzyjoin, gustave, hbsae, icarus, idbr, inca, inegiR, ineq, insee, iotables, ipumsr, JoSAE, laeken, lavaan, lavaan.survey, longCatEDA, MatchIt, MatchThem, MBHdesign, memisc, micEconIndex, MicSim, mind, mipfp, nlme, nomisr, OECD, panelaggregation, pps, PracTools, prevR, pxweb, PxWebApiData, quantification, questionr, R2BEAT, rdbnomics, rdhs, readabs, readsdmx, reclin, RecordLinkage, regions, Rilostat, rjstat, robsurvey, rpms, RRTCS, rsae, rsdmx, rspa, rtrim, rworldmap, saeSim, SAEval, samplingbook, samplingVarEst, SDaA, sdcHierarchies, sdcSpatial, simputation, SimSurvey, SmallCountRounding, sms, sorvi, spsurvey, srvyr, statcanR, statcodelists, StatMatch, stratification, stringdist, surveydata, surveyoutliers, surveyplanning, svrep, synthpop, tidyBdE, tidycensus, tidyqwi, tmap, treemap, univOutl, vardpoor, weights, XBRL.
Archived:RRreg, surveybootstrap, surveysd.

Other resources