Each of the three main
etl methods must take an
etl_foo object as it’s first argument, and (should invisibly) return an
etl_foo object. These methods are pipeable and predictable, but not pure, since they by design have side-effects (i.e. downloading files, etc.) Your major task in writing the
foo package will be to write these functions. How you write them is entirely up to you, and the particular implementation will of course depend on what the purpose of
All three of the main
etl methods should take the same set or arguments. Most commonly these define the span of time for the files that you want to extract, transform, or load. For example, in the
airlines package, these functions take optional
We illustrate with
cities, which unfortunately takes only
etl_load.default(), so there is no
etl_extract.etl_cities %>% args()
## Error in eval(lhs, parent, parent): object 'etl_extract.etl_cities' not found
etl_transform.etl_cities %>% args()
## Error in eval(lhs, parent, parent): object 'etl_transform.etl_cities' not found
etl_load.etl_cities %>% args()
## Error in eval(lhs, parent, parent): object 'etl_load.etl_cities' not found
There are four additional functions in the
etl_init() - initialize the database
etl_cleanup() - delete unnecessary files
etl_update() - run
etl_load() in succession with the same arguments
etl_create() - run
etl_cleanup() in succession
These functions can generally be used without modification and thus are not commonly extended by
etl_init() function will initialize the SQL database.
If you want to contribute your own hard-coded SQL initialization script, it must be placed in
etl_init() function will look there, and find files whose file extensions match the database type. For example, scripts written for MySQL should have the
.mysql file extension, while scripts written for PostgreSQL should have the
.postgresql file extension.
If no such file exists, all of the tables and views in the database will be deleted, and new tables schemas will be created on-the-fly by
etl_foo object attributes
etl_foo object has a directory where it can store files and a
DBIConnection where it can write to a database. By default, these come from
RSQLite::SQLite(), but the user can alternatively specify other locations.
## No database was specified so I created one for you at:
## List of 2
## $ con :Formal class 'SQLiteConnection' [package "RSQLite"] with 7 slots
## .. [email protected] ptr :<externalptr>
## .. [email protected] dbname : chr "/tmp/RtmpCrveYY/file52e2fe728f9.sqlite3"
## .. [email protected] loadable.extensions: logi TRUE
## .. [email protected] flags : int 70
## .. [email protected] vfs : chr ""
## .. [email protected] ref :<environment: 0x55bf4048f040>
## .. [email protected] bigint : chr "integer64"
## $ disco:<environment: 0x55bf43962578>
## - attr(*, "class")= chr [1:6] "etl_cities" "etl" "src_SQLiteConnection" "src_dbi" ...
## - attr(*, "pkg")= chr "etl"
## - attr(*, "dir")= chr "/tmp/RtmpCrveYY"
## - attr(*, "raw_dir")= chr "/tmp/RtmpCrveYY/raw"
## - attr(*, "load_dir")= chr "/tmp/RtmpCrveYY/load"
Note that an
etl_foo object is also a
src_dbi object and a
src_sql object. Please see the
dbplyr vignette for more information about these database connections.