dfms provides a user friendly and computationally efficient approach to estimate linear Gaussian Dynamic Factor Models in R. The package is not geared at any specific application, and can be used for dimensionality reduction, forecasting and nowcasting systems of time series. The use of the package is facilitated by a comprehensive set of methods to explore/plot models and extract results.
introduction.R
This vignette walks through the main features of the package. The data provided in the package in xts format is taken from Banbura and Modugno (2014)1, henceforth BM14, and covers the Euro Area from January 1980 through September 2009.
# Using the monthly series from BM14
dim(BM14_M)
#> [1] 357 92
range(index(BM14_M))
#> [1] "1980-01-31" "2009-09-30"
head(colnames(BM14_M))
#> [1] "ip_total" "ip_tot_cstr" "ip_tot_cstr_en" "ip_constr"
#> [5] "ip_im_goods" "ip_capital"
plot(scale(BM14_M), lwd = 1)
introduction.R
The data frame BM14_Models
provides information about
the series2, and the various models estimated by
BM14.
head(BM14_Models, 3)
#> series
#> 1 ip_total
#> 2 ip_tot_cstr
#> 3 ip_tot_cstr_en
#> label
#> 1 IP-Total Industry - Working Day and Seasonally Adjusted
#> 2 IP-Total Industry (Excluding Construction) - Working Day and Seasonally Adjusted
#> 3 IP-Total Industry Excluding Construction and MIG Energy - Working Day and Seasonally Adjusted
#> code freq log_trans small medium large
#> 1 sts.m.i5.Y.PROD.NS0010.4.000 M TRUE FALSE FALSE TRUE
#> 2 sts.m.i5.Y.PROD.NS0020.4.000 M TRUE TRUE TRUE TRUE
#> 3 sts.m.i5.Y.PROD.NS0021.4.000 M TRUE FALSE FALSE TRUE
# Using only monthly data
BM14_Models_M <- subset(BM14_Models, freq == "M")
introduction.R
Prior to estimation, all data is differenced by BM14, and some series
are log, differenced, as indicated by the log_trans
column
in BM14_Models
. In general, dfms uses a stationary
Kalman Filter with time-invariant system matrices, and therefore expects
data to be stationary. Data is also scaled and centered3 in the main
DFM()
function, thus this does not need to be done by the
user.
library(magrittr)
# log-transforming and first-differencing the data
BM14_M[, BM14_Models_M$log_trans] %<>% log()
BM14_M_diff = diff(BM14_M)
plot(scale(BM14_M_diff), lwd = 1)
introduction.R
Before estimating a model, the ICr()
function can be
applied to determine the number of factors. It computes 3 information
criteria proposed in Bai and NG (2002)4, whereby the second
criteria generally suggests the most parsimonious model.
ic = ICr(BM14_M_diff)
#> Missing values detected: imputing data with tsnarmimp() with default settings
print(ic)
#> Optimal Number of Factors (r) from Bai and Ng (2002) Criteria
#>
#> IC1 IC2 IC3
#> 7 7 13
plot(ic)
introduction.R
Another option is to use a Screeplot to gauge the number of factors
by looking for a kink in the plot. A mathematical procedure for finding
the kink was suggested by Onatski (2010)5, but this is currently
not implemented in ICr()
.
introduction.R
Based on both the information criteria and the Screeplot, I gauge
that a model with 4 factors should be estimated, as factors, 5, 6 and 7
do not add much to the explanatory power of the model. Next to the
number of factors, the lag order of the factor-VAR of the transition
equation should be estimated (the default is 1 lag). This can be done
using the VARselect()
function from the vars
package, with PCA factor estimates reported by ICr()
.
# Using vars::VARselect() with 4 principal components to estimate the VAR lag order
vars::VARselect(ic$F_pca[, 1:4])
#> $selection
#> AIC(n) HQ(n) SC(n) FPE(n)
#> 6 3 3 6
#>
#> $criteria
#> 1 2 3 4 5 6
#> AIC(n) 5.810223 5.617282 5.427760 5.389413 5.407765 5.381829
#> HQ(n) 5.898758 5.776646 5.657953 5.690434 5.779614 5.824507
#> SC(n) 6.032560 6.017490 6.005838 6.145361 6.341582 6.493517
#> FPE(n) 333.696100 275.153456 227.671078 219.144228 223.265640 217.639133
#> 7 8 9 10
#> AIC(n) 5.409877 5.394900 5.421375 5.460761
#> HQ(n) 5.923383 5.979235 6.076538 6.186753
#> SC(n) 6.699434 6.862328 7.066673 7.283929
#> FPE(n) 223.956824 220.793226 226.933863 236.331677
introduction.R
The selection thus suggests we should estimate a factor model with
r = 4
factors and p = 3
lags6. Before estimating the
model I note that dfms does not deal with seasonality in
series, thus it is recommended to also seasonally adjust data,
e.g. using the seasonal package before estimation. BM14 only
use seasonally adjusted series, thus this is not necessary with the
example data provided.
Estimation can then simply be done using the DFM()
function with parameters r
and p
7.
# Estimating the model with 4 factors and 3 lags using BM14's EM algorithm
model1 = DFM(BM14_M_diff, r = 4, p = 3)
#> Converged after 26 iterations.
print(model1)
#> Dynamic Factor Model: n = 92, T = 356, r = 4, p = 3, %NA = 25.8366
#>
#> Factor Transition Matrix [A]
#> L1.f1 L1.f2 L1.f3 L1.f4 L2.f1 L2.f2 L2.f3 L2.f4 L3.f1
#> f1 0.4720 -0.1297 0.8460 0.2098 -0.0733 -0.1436 -0.0595 0.1565 0.2356
#> f2 -0.1612 0.1699 0.2389 0.1598 0.0641 -0.1341 -0.0542 0.1287 0.1336
#> f3 0.3965 0.3264 0.0213 -0.3033 -0.1542 -0.0467 -0.1484 -0.0150 -0.1172
#> f4 0.1096 0.1601 -0.1578 0.2485 -0.0365 -0.0563 -0.0230 -0.1117 -0.0719
#> L3.f2 L3.f3 L3.f4
#> f1 -0.0803 -0.0386 0.0408
#> f2 0.1347 -0.0024 -0.0342
#> f3 -0.0087 0.1767 0.0249
#> f4 0.0307 0.0662 -0.0035
plot(model1)
introduction.R
The model can be investigated using summary()
, which
returns an object of class ‘dfm_summary’ containing the system matrices
and summary statistics of the factors and the residuals in the
measurement equation, as well as the R-Squared of the factor model for
individual series. The print method automatically adjusts the amount of
information printed to the data size. For large databases with more than
40 series, no series-level statistics are printed.
dfm_summary <- summary(model1)
print(dfm_summary) # Large model with > 40 series: defaults to compact = 2
#> Dynamic Factor Model: n = 92, T = 356, r = 4, p = 3, %NA = 25.8366
#>
#> Call: DFM(X = BM14_M_diff, r = 4, p = 3)
#>
#> Summary Statistics of Factors [F]
#> N Mean Median SD Min Max
#> f1 356 -0.0448 0.3455 4.4505 -21.9265 11.0306
#> f2 356 -0.0319 -0.0892 2.68 -9.9549 7.4988
#> f3 356 -0.1032 -0.0593 3.2891 -12.0969 16.2455
#> f4 356 -0.0118 0.089 2.161 -8.2883 10.7219
#>
#> Factor Transition Matrix [A]
#> L1.f1 L1.f2 L1.f3 L1.f4 L2.f1 L2.f2 L2.f3 L2.f4 L3.f1
#> f1 0.4720 -0.1297 0.84605 0.2098 -0.07334 -0.14356 -0.05950 0.15645 0.2356
#> f2 -0.1612 0.1699 0.23889 0.1598 0.06406 -0.13413 -0.05415 0.12869 0.1336
#> f3 0.3965 0.3264 0.02128 -0.3033 -0.15424 -0.04669 -0.14839 -0.01495 -0.1172
#> f4 0.1096 0.1601 -0.15776 0.2485 -0.03655 -0.05626 -0.02304 -0.11169 -0.0719
#> L3.f2 L3.f3 L3.f4
#> f1 -0.080320 -0.038592 0.040812
#> f2 0.134692 -0.002391 -0.034215
#> f3 -0.008694 0.176663 0.024876
#> f4 0.030716 0.066201 -0.003465
#>
#> Factor Covariance Matrix [cov(F)]
#> f1 f2 f3 f4
#> f1 19.8067 2.0846* -3.4700* -2.1094*
#> f2 2.0846* 7.1822 -2.8725* -1.0631*
#> f3 -3.4700* -2.8725* 10.8182 1.9286*
#> f4 -2.1094* -1.0631* 1.9286* 4.6701
#>
#> Factor Transition Error Covariance Matrix [Q]
#> u1 u2 u3 u4
#> u1 9.0178 0.3303 -3.0764 -1.0182
#> u2 0.3303 5.4425 -1.3095 -0.5051
#> u3 -3.0764 -1.3095 7.0230 0.8639
#> u4 -1.0182 -0.5051 0.8639 3.8005
#>
#> Summary of Residual AR(1) Serial Correlations
#> N Mean Median SD Min Max
#> 92 -0.0409 -0.0782 0.2959 -0.5073 0.6858
#>
#> Summary of Individual R-Squared's
#> N Mean Median SD Min Max
#> 92 0.3712 0.299 0.2888 0.0067 0.9978
# Can request more detailed printouts
# print(dfm_summary, compact = 1)
# print(dfm_summary, compact = 0)
introduction.R
Apart from the model summary, the dfm methods
residuals()
and fitted()
return observation
residuals and fitted values from the model. The default format is a
plain matrix, but the functions also have an argument to return data in
the original (input) format.