litteR is a modular tool for analyzing litter data (e.g., beach litter). The current version (0.7.0) contains the following modules:
One can optionally switch modules on or off. These modules run independently from each other.
This user guide consists of two parts. In the first part, the user interface is described, the second part gives more details on the modules.
For applications with litteR see Schulz et al. (2019).
The litteR-package should be loaded in R before you can use it. This can be done by running the following code in the R-console or the RStudio-console:
The easiest way to start working with litteR is to create an empty project directory. This directory can be filled with example and reference files by running:
The argument of function
create_litter_project (i.e., the quoted part in parentheses) is an existing work directory on your computer. This can be any valid directory name with sufficient user privileges. Note for MS-Windows users: R requires forward slashes!
It is also possible to run
create_litter_project() without an argument. In that case, a simple graphical user interface pops up for interactive directory selection.
litteR can be started typing
litter() in the RStudio console (see the figure below).
litter(), a simple graphical user interface pops up for file selection. An example of a file selection dialogue is given below.
The current version of litteR supports three data formats:
These formats will be briefly described below.
The wide format is litteR’s native and recommended format. It is comparable to the OSPAR format given below, but less restrictive. The following columns are required: “region_name”,“country_code”,“country_name”,“location_name”,and “date”. The columns are separated by comma’s (CSV-file)
The image below gives an example of the wide format.
# A tibble: 10 x 8 region_name country_code country_name location_code <chr> <chr> <chr> <chr> 1 North East Atlantic Ocean NL Netherlands NL001 2 North East Atlantic Ocean NL Netherlands NL001 3 North East Atlantic Ocean NL Netherlands NL001 4 North East Atlantic Ocean NL Netherlands NL001 5 North East Atlantic Ocean NL Netherlands NL001 6 North East Atlantic Ocean NL Netherlands NL001 7 North East Atlantic Ocean NL Netherlands NL001 8 North East Atlantic Ocean NL Netherlands NL001 9 North East Atlantic Ocean NL Netherlands NL001 10 North East Atlantic Ocean NL Netherlands NL001 location_name date `Plastic: Yokes ` `Plastic: Bags ` <chr> <date> <dbl> <dbl> 1 Bergen 2012-01-27 0 3 2 Bergen 2012-04-20 0 8 3 Bergen 2012-07-22 0 1 4 Bergen 2012-10-19 0 2 5 Bergen 2013-02-19 0 24 6 Bergen 2013-04-11 0 0 7 Bergen 2013-07-20 0 10 8 Bergen 2013-10-16 0 7 9 Bergen 2014-01-08 0 9 10 Bergen 2014-04-23 0 10
It’s less restrictive than the OSPAR-format in the sense that litter types are not restricted to the format
litter group : litter type [litter code]
The only requirement is that a [litter code] should be available. In fact, all litter specifications given below are valid:
The long format is convenient for data analysis. The following columns are required: “region_name”, “country_code”, “country_name”, “location_name”, “date”, “type_name”, and “abundance”. The columns are separated by comma’s (CSV-file)
The image below gives an example of the long format. It supports the same litter coding as the wide format.
# A tibble: 10 x 8 region_name country_code country_name location_code <chr> <chr> <chr> <chr> 1 North East Atlantic Ocean NL Netherlands NL001 2 North East Atlantic Ocean NL Netherlands NL001 3 North East Atlantic Ocean NL Netherlands NL001 4 North East Atlantic Ocean NL Netherlands NL001 5 North East Atlantic Ocean NL Netherlands NL001 6 North East Atlantic Ocean NL Netherlands NL001 7 North East Atlantic Ocean NL Netherlands NL001 8 North East Atlantic Ocean NL Netherlands NL001 9 North East Atlantic Ocean NL Netherlands NL001 10 North East Atlantic Ocean NL Netherlands NL001 location_name date type_name abundance <chr> <date> <chr> <dbl> 1 Bergen 2012-01-27 Plastic: Yokes  0 2 Bergen 2012-04-20 Plastic: Yokes  0 3 Bergen 2012-07-22 Plastic: Yokes  0 4 Bergen 2012-10-19 Plastic: Yokes  0 5 Bergen 2013-02-19 Plastic: Yokes  0 6 Bergen 2013-04-11 Plastic: Yokes  0 7 Bergen 2013-07-20 Plastic: Yokes  0 8 Bergen 2013-10-16 Plastic: Yokes  0 9 Bergen 2014-01-08 Plastic: Yokes  0 10 Bergen 2014-04-23 Plastic: Yokes  0
The OSPAR format is a wide format, meaning that all litter types are stored in columns and each row represents a survey. OSPAR beach litter data can be downloaded from the OSPAR website.
The image below gives an example of the first 10 columns and records of litter data in the OSPAR-format.
# A tibble: 10 x 10 RefNo `Beach name` Country Region `Survey date` Period <chr> <chr> <chr> <chr> <chr> <dbl> 1 NL001 Bergen Netherlands 3. Southern North Sea 27/01/2012 1 2 NL001 Bergen Netherlands 3. Southern North Sea 20/04/2012 2 3 NL001 Bergen Netherlands 3. Southern North Sea 22/07/2012 3 4 NL001 Bergen Netherlands 3. Southern North Sea 19/10/2012 4 5 NL001 Bergen Netherlands 3. Southern North Sea 19/02/2013 -1 6 NL001 Bergen Netherlands 3. Southern North Sea 11/04/2013 2 7 NL001 Bergen Netherlands 3. Southern North Sea 20/07/2013 3 8 NL001 Bergen Netherlands 3. Southern North Sea 16/10/2013 4 9 NL001 Bergen Netherlands 3. Southern North Sea 08/01/2014 1 10 NL001 Bergen Netherlands 3. Southern North Sea 23/04/2014 2 `Plastic: Yokes ` `Plastic: Bags ` `Plastic: Small_bags ` <dbl> <dbl> <dbl> 1 0 3 9 2 0 8 12 3 0 1 5 4 0 2 4 5 0 24 23 6 0 0 9 7 0 10 4 8 0 7 5 9 0 9 20 10 0 10 29 `Plastic: Bag_ends ` <dbl> 1 0 2 0 3 0 4 0 5 13 6 1 7 0 8 1 9 0 10 0
The columns are separated by comma’s (CSV-file). Five columns are compulsory, i.e., “refno”, “beach name”, “country”, “region”, and “survey date”. Note that the OSPAR date format currently does not comply with ISO 6801 standard date format. Instead, OSPAR uses dd/mm/YYYY (see the image above). However, for convenience and consistency, litteR also allows for dates in the ISO 6801 format. The other columns contain litter types. The names of these columns have the following format
litter group: litter type [litter code]
for instance, ‘Plastic: Bags ’.
Optionally, other columns may be added as metadata. However, these columns will be ignored by litteR.
All input files are validated by litteR. The following validation rules apply:
The settings file contains all settings needed to run litteR. The settings file is in the YAML-format. This is a human-readable data language that is commonly used for settings files. An example of the contents of a settings file is given in the figure below.
### BASIC SETTINGS ### # Name of analyst analyst_name: "RWS" # Which modules to run (false or true) module_stats: true module_trend: true module_baseline: false module_power: false # Period to analyse (YYYY-mm-dd) min_date: 2012-01-01 max_date: 2017-12-31 # Percentage of total abundance to analyse (0 < percentage_total_abundance <= 100) percentage_total_abundance: 80 # name of group file (see package vignette for more details) file_groups: ospar-groups.csv # Litter type(s) and/or groups to analyse # (e.g., OSPAR codes in square brackets and [TA] for total abundance) litter_types_groups: [[TA], ] # Image quality: high or low image_quality: high ### ADVANCED SETTINGS ### # Power-analysis: number of Monte Carlo simulations # Note that larger values lead to longer run times. # The default number of simulations is 100 to speed up computation. # However, 1000 simulations generally give more accurate results number_of_simulations: 100 # Power-analysis: significance level alpha: 0.05 # Power-analysis: resolution of effect size (range: 5% ... 50%) resolution_effect_size: 10 # Power-analysis: minimum number of surveys to sample from min_surveys: 16 # Show source code? (true or false) show_source_code: false
The YAML-file contains the following entries:
|analyst_name||name of the person who performs the litter analysis||text|
|module_stats||Activate the descriptive statistics module?||true or false|
|module_trend||Activate the trend analysis module?||true or false|
|module_baseline||Activate the baseline analysis module?||true or false|
|module_power||Activate the power analysis module?||true or false|
|min_date||first date to analyse||YYYY-mm-dd (ISO 6801)|
|max_date||last date to analyse||YYYY-mm-dd (ISO 6801)|
|percentage_total_abundance||percentage of total abundance to analyse||percentage, default value: 80%|
|file_groups||name of litter group file||text|
|litter_types_groups||litter type(s) and group(s) to analyse||litter/group code(s) in square brackets, e.g., [, [TA], [SUP]]|
|image_quality: high||quality of the images||high or low|
|number_of_simulations||number of Monte Carlo simulations for power analysis||integer greater than 0. Default value: 100|
|alpha||significance level used for power-analysis||numeric in 0..1. Default value: 0.05|
|resolution_effect_size||resolution of the effectsize (power analysis)||range: 5% .. 50%|
|min_surveys||minimum number of surveys to sample from in power analysis||integer greater than 0. Default value: 16|
|show_source_code||Show all R source code?||true or false|
The work directory should also contain a litter groups file. An example file, named ‘
ospar-groups.csv’ is automatically generated when using the
create_litter_project-function, described earlier in this tutorial. A groups file assigns each litter type (
type_name, in rows) to one or more litter groups (columns) by placing an
x in a cell. The first 11 rows of
ospar-groups.csv are given in the table below.
Both individual type codes and litter groups (column names) can be specified as
litter_types in the settings-file (*.yaml). For instance:
litter_types: [[TA], , [SUP], [FISH]]
The user may use ‘
ospar-groups.csv’ as a template for his own group file. litteR will use the group file that has been specified in the settings-file (*.yaml), e.g.,
file_groups: ospar-groups.csv. Note that the litteR-user can create and use tailor-made groups files, which match the input data used.
litteR produces an HTML report that can best be viewed with modern web browsers like Mozilla FireFox, Google Chrome, or Safari. These browsers are freely available from the internet.
The filename of each report starts with ‘litter-report’, followed by
In the remainder of this section, each section of the HTML-report is briefly described.
This section gives a summary of the settings/parameters in the settings file.
In this section (potential) problems in the input files are reported.
For each selected litter type and period, this section gives several descriptive statistics. These statistics provide useful information about the data in a concise way. The following statistics are given:
These statistics will be estimated for the top x% types, i.e. types with the greatest abundances making up x% of the total abundance for each location. For example in OSPAR data analysis, the top 80% is used by default.
This section gives trend analysis results. The figures show time-series of litter items for each location, together with a monotonic trend line based on the Theil-Sen slope estimator. The Theil-Sen slope estimator is usually more robust than slopes estimated by ordinary least squares regression. In addition, a loess-smoother is given to reveal potential non-linearities in the trend.
Finally, a table is provided showing the magnitude of the Theil-Sen slope estimator and its corresponding p-value.
The aim of baseline analysis is to identify the minimum number of surveys needed to obtain stable baseline estimates.
This section provides figures showing the moving average as function of window size, i.e. the number of consecutive years, for each location.
The following procedure was followed to produce these plots:
In addition, also a table is presented giving for each location and number of years (# years) the mean, the standard deviation (sd), the coefficient of variation (CV), the median, the median absolute deviation (MAD), and the ratio of MAD to median of the baseline statistics (mean and median) plotted above.
In this section, the power of the Wilcoxon signed rank test is estimated. The null hypothesis of this test is
H0: distribution of litter data is symmetric about the baseline value
and the alternative hypothesis is
H1: distribution of litter data is less than the baseline value
Hence, this is a test for a step trend. The power of a hypothesis test is the probability that the test correctly rejects the null hypothesis (H0) when a specific alternative hypothesis (H1) is true.
The power is useful to check if the number of surveys is sufficient. If the power is too low, sampling effort should be increased to be able to correctly detect trends. On the other hand, if the power is too high, sampling effort can be reduced. In both cases, power analysis may lead to more efficient allocation of financial resources.
In litteR, power analysis is carried out by means of Monte Carlo simulation for different values of the reduction (effect size), sample size and statistical significance. The procedure is as follows:
For each location, the time-series of the selected litter types are selected. For each of these time-series:
The reduction factor f scales the monitoring data. The following expression holds:
mean(simulated data) \(\approx\) f \(\times\) mean(monitoring data) = f \(\times\) (baseline value)
Note that f = 1 means no reduction (mean of the simulated data is approximately equal to the baseline value), and f = 0 means absence of litter (for instance, a pristine clean beach).
In addition to a report, a CSV-file with summary statistics will be produced for each location. This file is accompanied by a file with metadata. The metadata are given below:
|region_name||administrative unit, e.g., OSPAR or Southern North Sea||1|
|country_code||two-letter upper case country code according to ISO 3166-1 alpha-2||1|
|location_name||name of the survey location||1|
|type_name||name of the litter type||1|
|type_code||code of the litter type||1|
|from||first date of the survey||date|
|to||final date of the survey||date|
|cv||coefficient of variation of the abundance||1|
|rmad||ratio of MAD to median||1|
|n||number of surveys used to estimate these statistics||1|
|intercept||intercept of the Theil-Sen trend line, i.e., intercept + slope * (year - 1970)||count|
|slope||slope of the Theil-Sen trend line (annual increase in abundance)||1/a|
|p_value_slope||p-value of the Theil-Sen slope||1|
|p01||1st percentile of the abundance||count|
|p05||5th percentile of the abundance||count|
|p10||10th percentile of the abundance||count|
|p25||25th percentile of the abundance (first quartile)||count|
|p50||50th percentile of the abundance (second quartile or median)||count|
|p75||75th percentile of the abundance (third quartile)||count|
|p90||90th percentile of the abundance||count|
|p95||95th percentile of the abundance||count|
|p99||99th percentile of the abundance||count|
litter()in the RStudio-console, a file dialogue should appear. If that is not the case, the file dialogue is probably covered by RStudio (see the task manager or use ALT-TAB on MS-Windows to navigate to the hidden file dialogue).
invalid multibyte string, there is a character in your input file that is not part of the English alphabet. Substituting this character by a valid character in the range A-Z or a-z usually solves this problem.
Schulz, Marcus, Dennis J.J. Walvoort, Jon Barry, David M. Fleet, Willem M.G.M. van Loon, 2019. Baseline and power analyses for the assessment of beach litter reductions in the European OSPAR region. Environmental Pollution 248:555-564. https://doi.org/10.1016/j.envpol.2019.02.030