Get started

Koen Derks

last modified: 29-06-2021

Welcome to the ‘Get started’ page of the jfa package. In this vignette you are able to find detailed examples of how you can incorporate the functions provided by the package into your statistical sampling workflow.

Example data

To concretely illustrate jfa’s functionality, we consider the BuildIt data set that is included in the package (for more info, see ?BuildIt). This data set contains a population of 3500 invoices paid to a fictional construction company. Each invoice has an identification number (ID), a recorded value (bookValue), and a corresponding audit (true) value (auditValue).

Note: The information in the auditValue column is added for illustrative purposes, as it is unknown to the auditor before having audited any items from a sample.

First, we load in the data set and display the first 10 invoices in the population.

library(jfa)

data('BuildIt')
head(BuildIt, n = 10)
##       ID bookValue auditValue
## 1  82884    242.61     242.61
## 2  25064    642.99     642.99
## 3  81235    628.53     628.53
## 4  71769    431.87     431.87
## 5  55080    620.88     620.88
## 6  93224    501.76     501.76
## 7  24331    466.01     466.01
## 8  81460    295.20     295.20
## 9  14608    216.48     216.48
## 10 79064    243.43     243.43

For a walkthrough of jfa’s workflow functionality using the BuildIt data set, see The audit sampling workflow. For a Bayesian version of the the walkthrough, see The Bayesian audit sampling workflow.

(Optional) Using auditPrior(): The basics

The auditPrior() function allows you to create a prior distribution for the misstatement in the population that can be used to incorporate existing information into the sampling workflow. Prior distributions can be constructed based on various types of audit information. See the vignette Prior distributions for a detailed explanation of the types of audit information that jfa is able to incorporate into a prior distribution. Bayesian planning and evaluation using the prior distribution can be performed by specifying the returned object from the auditPrior() function as input for the prior argument in the planning() and evaluation() functions.

Using planning(): The basics

Planning a sample using the planning() function requires knowledge of the objective of the sampling procedure, the assumed distribution of the data (binomial, poisson, or hypergeometric), and the expected errors in the sample. However, it is advised to set the value for the expected errors in the sample conservatively to minimize the probability that the observed errors in the sample exceed the expected errors, which would imply that insufficient work has been done in the end.

Because we have access to the recorded values of the invoices in the population, we are going to consider each monetary unit in the population as a possible unit of inference. The audit standards assume that such data are distributed according to the Poisson distribution and so we will assume this distribution in this example as well. For illustrative purposes, we do not expect to find any errors in the sample.

Testing against a performance materiality

First, we take a look at how you can use the planning() function to construct a sample with the objective of testing the misstatement in the population against a performance materiality (i.e., the maximum tolerable misstatement in the population). In this example, we will set the performance materiality at 5% of the total value of the population.

Sampling objective: Calculate a minimal sample size such that, when zero misstatements are found in the sample, there is a 95% ‘chance’ that the misstatement in the population is lower than 5% of the population value.

Constructing a sample with this objective in mind can be done using the code below (specifically by specifying the materiality argument). A summary of the output can be obtained from the summary() function. As you can see, the minimal sample size to achieve 95% assurance with respect to the performance materiality of 5% is 60 monetary units.

stage1 <- planning(materiality = 0.05, expectedError = 0, likelihood = 'poisson', confidence = 0.95, N = 3500)
summary(stage1)
## # ------------------------------------------------------------ 
## #              jfa Planning Summary (Classical)
## # ------------------------------------------------------------ 
## # Input:
## # 
## # Confidence:                    95% 
## # Materiality:                   5% 
## # Minimum precision:             Not specified 
## # Likelihood:                    poisson  
## # Expected sample errors:        0 
## # ------------------------------------------------------------
## # Output:
## #
## # Sample size:                   60  
## # ------------------------------------------------------------
## # Statistics:
## #
## # Expected most likely error:    0% 
## # Expected upper bound:          4.993% 
## # Expected precision:            4.993%  
## # ------------------------------------------------------------

Obtaining a minimum precision

Next, we take a look at how you can use the planning() function to construct a sample with the objective of obtaining a minimum precision. The precision is an indication of the accuracy of your estimate of the misstatement, as it is defined as the difference between the most likely misstatement and the upper confidence bound on the misstatement. For this example, we will set the minimum precision to 2% of the population value.

Sampling objective: Calculate a minimal sample size such that, when zero misstatements are found in the sample, there is a 95% ‘chance’ that the misstatement in the population is at most 2% above the most likely misstatement.

Planning a sample with this objective can be done using the code below (specifically by specifying the minPrecision argument). As you can see, the minimal sample size for to achieve a precision of at least 2% is 150 monetary units.

stage1 <- planning(minPrecision = 0.02, expectedError = 0, likelihood = 'poisson', confidence = 0.95, N = 3500)
summary(stage1)
## # ------------------------------------------------------------ 
## #              jfa Planning Summary (Classical)
## # ------------------------------------------------------------ 
## # Input:
## # 
## # Confidence:                    95% 
## # Materiality:                   Not specified 
## # Minimum precision:             2% 
## # Likelihood:                    poisson  
## # Expected sample errors:        0 
## # ------------------------------------------------------------
## # Output:
## #
## # Sample size:                   150  
## # ------------------------------------------------------------
## # Statistics:
## #
## # Expected most likely error:    0% 
## # Expected upper bound:          1.997% 
## # Expected precision:            1.997%  
## # ------------------------------------------------------------

Using selection(): The basics

Selecting a sample using the selection() function requires knowledge of the sampling units (i.e., units of inference) in the population. Items can be selected from the population using record sampling (also known as attribute sampling) using units = 'records', or using monetary unit sampling (MUS) using units = 'mus'. Selection also requires knowledge of the sampling algorithm. Sampling units can be selected with a random sampling scheme using algorithm = 'random', with a cell sampling scheme using algorithm = 'cell', or with a fixed interval sampling (also known as systematic sampling) scheme using algorithm = 'interval'.

Record sampling

First, we take a look at how you can use the selection() function to perform random sampling from the items in the population. As an example, the code below samples 60 invoices from the BuildIt data set using a random record sampling scheme.

stage2 <- selection(population = BuildIt, sampleSize = 60, units = 'records', algorithm = 'random')
summary(stage2)
## # ------------------------------------------------------------
## #                  jfa Selection Summary
## # ------------------------------------------------------------
## # Input:
## #      
## # Population size:               3500 
## # Requested sample size:         60 
## # Sampling units:                Records 
## # Algorithm:                     Random sampling   
## # ------------------------------------------------------------ 
## # Output:
## #
## # Obtained sampling units:       60 
## # Obtained sample items:         60 
## # ------------------------------------------------------------
## # Statistics:
## #
## # Proportion n/N:                0.017  
## # ------------------------------------------------------------

Monetary unit sampling (MUS)

Next, we take a look at how you can use the selection() function to perform fixed interval sampling using the monetary units in the population as sampling units. As an example, the code below samples 150 monetary units from the BuildIt data set using a fixed interval monetary unit sampling scheme.

stage2 <- selection(population = BuildIt, sampleSize = 150, units = 'mus', algorithm = 'interval', bookValues = 'bookValue')
summary(stage2)
## # ------------------------------------------------------------
## #                  jfa Selection Summary
## # ------------------------------------------------------------
## # Input:
## #      
## # Population size:               3500 
## # Requested sample size:         150 
## # Sampling units:                Monetary units 
## # Algorithm:                     Fixed interval sampling 
## # Interval:                      9354.805 
## # Starting point:                1 
## # ------------------------------------------------------------ 
## # Output:
## #
## # Obtained sampling units:       150 
## # Obtained sample items:         150 
## # ------------------------------------------------------------
## # Statistics:
## #
## # Proportion n/N:                0.043 
## # Percentage of value:           5.294% 
## # ------------------------------------------------------------

Extracting the sample

The selected sample is stored in the object that is returned by the selection() function. It can be accesses or extracted by indexing it via $sample. Let’s take a look at the first ten rows of the previously selected sample of 60 invoices.

stage2 <- selection(population = BuildIt, sampleSize = 60, units = 'records', algorithm = 'random')

sample <- stage2$sample
head(sample, n = 10)
##    rowNumber count    ID bookValue auditValue
## 1       1017     1 50755    618.24     618.24
## 2        679     1 20237    669.75     669.75
## 3       2177     1  9517    454.02     454.02
## 4        930     1 85674    257.82     257.82
## 5       1533     1 31051    308.53     308.53
## 6        471     1 84375    824.66     824.66
## 7       2347     1 75616    623.70     623.70
## 8        270     1 82033    352.75     352.75
## 9       1211     1 12877     52.89      52.89
## 10      3379     1 85322    330.24     330.24

Using evaluation(): The basics

After saving the sample and annotating the invoices in the sample with their audit values, you can evaluate the misstatement in the sample with the evaluation() function. The function can also be used with summary statistics from a data sample. For a more elaborate explanation of how to use this function, see the package vignettes Testing misstatement and Estimating misstatement.

Summary statistics from the sample

First, let’s take a look at how you can use the evaluation() function to evaluate the misstatement in the population using summary statistics from a sample. Suppose that in the previously selected sample of 60 invoices you have found that 1 invoice is missing. Using nSumstats = 60 and kSumstats = 1 you can provide these outcomes of the sample to the evaluation() function. Don’t forget to specify your sampling objectives using the materiality or minPrecision arguments. In this example, a performance materiality of 5% applies.

Sampling objective: Evaluate, on the basis of summary statistics of a sample, whether the misstatement in the population exceeds the allocated performance materiality such that there is a 5% ‘chance’ of incorrectly concluding that the population is free of material misstatement.

stage4 <- evaluation(materiality = 0.05, method = 'binomial', confidence = 0.95, N = 3500, nSumstats = 60, kSumstats = 1)
summary(stage4)
## # ------------------------------------------------------------ 
## #             jfa Evaluation Summary (Classical)
## # ------------------------------------------------------------ 
## # Input:
## #
## # Confidence:                    95% 
## # Materiality:                   5% 
## # Minimum precision:             Not specified 
## # Sample size:                   60 
## # Sample errors:                 1 
## # Sum of taints:                 1 
## # Method:                        binomial   
## # ------------------------------------------------------------
## # Output:
## #  
## # Most likely error:             1.667%
## # Upper bound:                   7.664%
## # Precision:                     5.997%  
## # Conclusion:                    Do not approve population 
## # ------------------------------------------------------------

As you can see, the upper bound on the misstatement is higher than 5% and thus the sample does not provide sufficient evidence to conclude that the misstatement is lower than 5%.

Annotated sample

Next, we take a look at how you can use the evaluation() function to evaluate the misstatement using an annotated sample. Suppose that you have audited the 60 invoices in the sample and have found 1 misstatement.

sample$auditValue    <- sample$bookValue
sample$auditValue[1] <- sample$auditValue[1] - 100

You can evaluate the misstatement in the annotated sample using the sample, bookValues, auditValues, and counts arguments. For example, the code below evaluates the misstatement in the population with respect to the performance materiality of 5% using the commonly used Stringer bound. You can find more information about which evaluation methods are implemented on the home page.

Sampling objective: Evaluate, on the basis of an annotated sample, whether the misstatement in the population exceeds the allocated performance materiality such that there is a 5% ‘chance’ of incorrectly concluding that the population is free of material misstatement.

stage4 <- evaluation(materiality = 0.05, method = 'stringer', confidence = 0.95,
                     sample = sample, bookValues = 'bookValue', auditValues = 'auditValue',
                     counts = sample$count, N = 3500)
summary(stage4)
## # ------------------------------------------------------------ 
## #             jfa Evaluation Summary (Classical)
## # ------------------------------------------------------------ 
## # Input:
## #
## # Confidence:                    95% 
## # Materiality:                   5% 
## # Minimum precision:             Not specified 
## # Sample size:                   60 
## # Sample errors:                 1 
## # Sum of taints:                 0.162 
## # Method:                        stringer   
## # ------------------------------------------------------------
## # Output:
## #  
## # Most likely error:             0.27%
## # Upper bound:                   5.322%
## # Precision:                     5.053%  
## # Conclusion:                    Do not approve population 
## # ------------------------------------------------------------

Using report(): The basics

After obtaining the result from the evaluation() function you can generate a report containing the data, the statistical results and their interpretation, and the conclusion of the sampling procedure with respect to the input for materiality and minPrecision. The report can be automatically generated by providing the object returned by the evaluation() function to the report() function.

stage4 <- evaluation(materiality = 0.05, method = 'stringer', confidence = 0.95,
                     sample = sample, bookValues = 'bookValue', auditValues = 'auditValue',
                     counts = sample$count, N = 3500)

report(stage4, file = 'report.html', format = 'html_document') # Generates .html report