library("devtools") install_github("fasrc/CausalGPS", ref="master") library("CausalGPS")
Y A vector of observed outcome
w A vector of observed continuous exposure
c A data.frame or matrix of observed
ci_appr The causal inference approach.
Possible values are:
- “matching”: Matching by GPS
- “weighting”: Weighting by GPS
pred_model a prediction model (use “sl”
gps_model Model type which is used for
estimating GPS value, including parametric (default) and
use_cov_transform If TRUE, the function
uses transformer to meet the covariate balance.
transformers A list of transformers. Each
transformer should be a unary function. You can pass name of customized
function in the quotes.
- pow2: to the power of 2
- pow3: to the power of 3
bin_seq Sequence of w (treatment) to
generate pseudo population. If NULL is passed the default value will be
used, which is
trim_quantiles A numerical vector of two.
Represents the trim quantile level. Both numbers should be in the range
of [0,1] and in increasing order (default: c(0.01,0.99)).
optimized_compile If TRUE, uses counts to
keep track of number of replicated pseudo population.
params Includes list of params that is
used internally. Unrelated parameters will be ignored.
nthread An integer value that represents
the number of threads to be used by internal packages.
... Additional arguments passed to
set.seed(422) <- 10000 n <- generate_syn_data(sample_size=n) mydata <- sample(x=c("2001","2002","2003","2004","2005"),size = n, replace = TRUE) year <- sample(x=c("North", "South", "East", "West"),size = n, replace = TRUE) region $year <- as.factor(year) mydata$region <- as.factor(region) mydata$cf5 <- as.factor(mydata$cf5) mydata <- generate_pseudo_pop(mydata$Y, pseudo_pop $treat, mydatac("cf1","cf2","cf3","cf4","cf5","cf6","year","region")], mydata[ci_appr = "matching", pred_model = "sl", gps_model = "non-parametric", use_cov_transform = TRUE, transformers = list("pow2", "pow3", "abs", "scale"), trim_quantiles = c(0.01,0.99), optimized_compile = TRUE, sl_lib = c("m_xgboost"), covar_bl_method = "absolute", covar_bl_trs = 0.1, covar_bl_trs_type = "mean", max_attempt = 4, matching_fun = "matching_l1", delta_n = 1, scale = 0.5, nthread = 1) plot(pseudo_pop)
matching_l1 is Manhattan distance
matching approach. For prediction model we use SuperLearner
package. User need to pass
pred_model to use SuperLearner package.
SuperLearner supports different machine learning methods and packages.
params is a list of hyperparameters that
users can pass to the third party libraries in the SuperLearner package.
All hyperparameters go into the params list. The prefixes are used to
distinguished parameters for different libraries. The following table
shows the external package names, their equivalent name that should be
sl_lib, the prefixes that should
be used for their hyperparameters in the
params list, and available
||nrounds, eta, max_depth, min_child_weight|
||num.trees, write.forest, replace, verbose, family|
nthread is the number of available
threads (cores). XGBoost needs OpenMP installed on the system to
parallelize the processing.
<- estimate_gps(Y, data_with_gps w, c,pred_model = "sl", internal_use = FALSE, params = list(xgb_max_depth = c(3,4,5), xgb_rounds = c(10,20,30,40)), nthread = 1, sl_lib = c("m_xgboost") )
internal_use is set to be TRUE, the
program will return additional vectors to be used by the selected causal
inference approach to generate a pseudo population. See
?estimate_gps for more details.
<-function(matched_Y, estimate_npmetric_erf matched_w,matched_counter = NULL, bw_seq=seq(0.2,2,0.2), w_vals, nthread)
<- generate_syn_data(sample_size=1000, syn_data outcome_sd = 10, gps_spec = 1, cova_spec = 1)
The CausalGPS package is logging internal activities into the
CausalGPS.log file. The file is located in the source file
location and will be appended. Users can change the logging file name
(and path) and logging threshold. The logging mechanism has different
thresholds (see logger package).
The two most important thresholds are INFO and DEBUG levels. The former,
which is the default level, logs more general information about the
process. The latter, if activated, logs more detailed information that
can be used for debugging purposes.