Statistical Inference on Measures of Lineup Fairness

Colin G. Tredoux

Tamsyn M. Naylor


This package contains functions to compute various properties of laboratory or police lineups.
Since the early 1970s eyewitness testimony researchers have recognised the importance of estimating properties such as lineup bias (is the lineup biased against the suspect, leading to a rate of choosing higher than one would expect by chance?), and lineup size (how many reasonable choices are in fact available to the witness? A lineup is supposed to consist of a suspect and a number of additional members, or foils, whom a poor quality witness might mistake for the perpetrator). Lineup measures are descriptive, in the first instance, but since the earliest articles in the literature researchers have recognised the importance of reasoning inferentially about them.

The measures described below were originally proposed by Doob & Kirshenbaum (1973), Wells, Leippe & Ostrom (1979), Malpass (1981), Tredoux (1998, 1999), and Malpass, Tredoux & McQuiston-Surrett (2007). Most of the measures assume that a sample of mock witnesses has been given a description of the perpetrator, and a lineup, and asked to choose someone from the lineup (either as the best match to the description, or the best match to the lineup member the police likely have in mind; see Malpass et al., 2007)

Much of the justification for what follows can be found in Tredoux (1998), and Malpass et al. (2007)

Data Format

The majority of functions in r4lineups require a numeric vector representing lineup data, whilst other functions require you to pass a list of lineup data. It is advisable to read the documentation to determine the format of data needed by each specific function. Most r4lineups functions will not work with a dataframe, even if the dataframe contains only one column, and thus appears to be in vector like format. If your lineup data is in dataframe format already, you can easily convert it to a vector or a list.

For instance, if we have 10 mock witnesses, and each views a lineup with 6 members, then we could record their choices as a numeric vector, where the elements represent the position of the lineup member chosen by each witness:

lineup_vec <- c(3, 2, 5, 6, 1, 3, 3, 5, 6, 4)

However, it is also possible to represent this set of choices as a one-dimensional table:

lineup_tbl <- table(lineup_vec)

If we had our lineup choices recorded in a data frame, which is what we might end up with if we import the data from a spreadsheet program, then we could easily extract the variable representing choices in the dataframe into a vector.

For instance, if the lineup choices shown above for 10 mock witnesses were part of a larger dataset, and we had recorded the ages and participant number of each participant, then the dataframe might appear as shown below (we create the dataframe below with simulated data here, though)

participant_num <- paste("Partic_",1:10, sep = "") age <- round(runif(10, 18, 80), 2) line_df <- data.frame(participant_num, age, lineup_vec)

Show the dataframe:


Extract the lineup choices, per participant, to a vector: a_lineup_vec <- line_df$linep_vec

Or make a table directly from the variable of choices: a_lineup_table <- table(line_df$lineup_vec)

One important issue is that there is no way of the functions in this package knowing if some of the members of a lineup received no choices. In other words, if there were in fact 7 lineup members in our running example, but member 7 was not chosen by any witnesses, then this would not be evident from the lineup vector, since it records the choice made by each witness, but not the number of lineup members. To deal with this problem, generally the functions in the package require nominal lineup size (the number of physical members of the lineup) to be passed as an argument.

Let’s look at a slightly larger data set:

nortje2012 data
lineup_1 lineup_2 lineup_3
2 7 1
5 1 3
2 2 3

r4lineups Sample Datasets

This package contains several sample datasets, listed below:

  1. nortje2012: A dataframe of lineup choices for 3 independent lineups, from a study conducted by Nortje & Tredoux (2012)
  2. line73 : Lineup data from Doob & Kirshenbaum (1973)
  3. mockdata: Lineup data from a study conducted by Nortje & Tredoux (2012)
  4. mickwick : Confidence and accuracy data for lineup choices. Sample data to be passed to ROC function make_roc()
  5. Foil images (used to demonstrate face_sim) can be found at https://github.com/tmnaylor/malpass_foils

All sample datasets can be loaded by calling data(sample_data). Further details can be found in the documentation for each dataset.

Lineup Proportion

Proportion refers to the proportion of witnesses selecting a particular member of a lineup. This package provides calculations of proportion for individual lineup members (lineup_prop_vec()), and for all members of a lineup with a subsidiary function (allprop()).

The calculation of the proportion of mock witnesses choosing the suspect is essentially a measure of lineup bias (see Malpass, 1981; Tredoux, 1998), especially when one tests whether it exceeds the proportion expected by chance guessing alone (1/nominal_size).

Calculating proportion for a single target/lineup member:

This function requires three arguments: a vector of lineup data, the position of the target in the lineup (target_pos), and nominal size (k). The example use of the function here returns the proportion of mock witnesses choosing the target/linuep member occupying position 3 in the lineup, and is 0.14% of the witnesses.

If you are not familiar with how the boot or the boot.ci functions work, you could consult the documentation for the boot package. However, by using the functions as shown here you will be able to bootstrap without needing to understand them in detail. Briefly, the argument ‘conf’ allows you to specify the significance level, alpha, and ‘type’ specifies the type of confidence interval to be returned. If type = “bca”, the function will return bias-corrected confidence intervals.

Calculating proportion for each lineup member

Should you want to calculate a proportion for each member of a given lineup, the allprop() function can be called. This might be the case when there is no pre-specified suspect in the lineup, as some researchers do in laboratory experiments. Instead of manually specifying the target position, a vector indexing all target positions is automatically generated from the argument ‘k’, within the function itself.

The function then returns proportion of mock witnesses selecting each lineup member.

  Note: k = nominal size, and must always be explicitly declared (for more info, see the Data Format section, above).

Functional Size

We defined nominal size above as the number of physical members of a lineup, but it is clear from experience that not all lineup members are good foils, and some might not end up being chosen at all. They are not good tests of witness memory, in short. Wells et al. (1979) proposed the notion of ‘functional size’ as a measure of the number of plausible lineup members, and defined it as the inverse of the proportion of mock witnesses choosing the suspect.

Functional size = \(\frac{D}{N}\)

You can compute functional size by inverting the result of the lineup_prop_vec() function, or by calling func_size, but we suggest you use the function func_size_report() which provides more detail, including bootstrapped confidence intervals.

Effective Size

The measure of functional size proposed by Wells et al. (1979) does not take the choosing rates of foils other than the suspect into account, and Malpass (1981) thus proposed a measure of the number of plausible lineup foils that takes all the data from a lineup table into account. He called this measure ‘effective size’.

Malpass’ Effective size A = \(k_{a}-\sum_{i=1}^{k_{a}}\frac{|o_{i}-e_{a}|}{2e_{a}}\)

where \(o_{i}\) is the (observed) number of mock witnesses who choose lineup member i; \(e_{a}\) is the adjusted nominal chance expectation [N(l/\(k_{a}\))], and \(k_{a}\) is the adjusted nominal number of alternatives in the lineup (original number-number of null foils)

Malpass’ measure of effective size poses problems for statistical inference based on distribution theory, though, as argued by Tredoux (1998). He instead proposed an alternate measure of effective size, known as E’, derived from earlier work by Agresti and Agresti (1978).

This package allows you to calculate effective size in several different ways.

  1. Malpass’s original (1981) & adjusted effective size (see: Tredoux, 1998)

    Malpass’ original formula for effective size proposed a denominator computed by entering the number of foils that attracted non-zero choice frequencies. Tredoux (1998) argued against this practice, pointing out that zero frequencies could arise from random guessing with appreciable frequency. He suggested a minor amendment of Malpass’ original formula, as follows.

    Malpass’ Effective size B = \(k-\sum_{i=1}^{k}\frac{|o_{i}-e|}{2e}\)

    Both Malpass’ measures of effective size can be computed by calling the esize_m() function and passing a table of lineup data. If ‘both = FALSE’ is passed as an argument, only Malpass’s adjusted formula for effective size is used. If both = TRUE, both Malpass’s original and adjusted calculations of effective size are returned. You must also specify nominal size.

esize_m(table(lineup_vec), k = 8, both = TRUE)
Effective size (Malpass, 1981) =  5.962406 
Effective size (Malpass, 1981, 
             adj Tredoux, 1998) =  5.962406 
[1] 5.962406
  1. Tredoux’s measure of Effective Size, E’

Tredoux (1998) proposed a measure of effective size that is amenable to statistical inference using distribution theory.

E’=\(\frac{1}{1-I}\), where I=\(1-\sum_{1=1}^k(\frac{O_i}{N})^2\)

and where k, N, and 0_i_ are all as defined earlier.

To compute Tredoux’s measure of effective size, E’, use the function esize_T. We can compute confidence intervals around the measure of E’ using normal distribution theory, and using bootstrap intervals, with the functions esize_T_boot() from r4lineups, as well asboot() & boot.ci() from the boot package. Usage is shown below.

#Compute effective size from a single vector of lineup choices
[1] 5.693273

#Compute bootstrapped effective size
esize_boot <- boot::boot(lineup_table, esize_T_boot, R = 1000)

#View boot object


boot::boot(data = lineup_table, statistic = esize_T_boot, R = 1000)

Bootstrap Statistics :
    original    bias    std. error
t1* 5.693273 0.2036739   0.7642065

#Get confidence intervals
esize_boot.ci <- boot::boot.ci(esize_boot)
Warning in boot::boot.ci(esize_boot): bootstrap variances needed for
studentized intervals
Based on 1000 bootstrap replicates

boot::boot.ci(boot.out = esize_boot)

Intervals : 
Level      Normal              Basic         
95%   ( 3.992,  6.987 )   ( 3.971,  6.972 )  

Level     Percentile            BCa          
95%   ( 4.415,  7.416 )   ( 4.131,  7.024 )  
Calculations and Intervals on Original Scale
Some BCa intervals may be unstable
  1. Effective Size Per Foils

    Malpass (1981) proposed an additional method for computing effective size, based on the number of foils who are chosen at a rate nearly equal to chance expectation. We propose that ‘nearly equal’ could mean that chance expectation falls within a confidence interval constructed around the proportion of mock witnesses choosing the lineup member.

    To calculate effective size by counting the number of foils who fall within the CI for chance guessing, we pass a vector of lineup data, a vector of target positions and nominal size. This function returns effective size from a set of bootstrapped data. Default alpha is 0.05, but this can be adjusted as desired.

eff_size_per_foils(lineup_vec, target_pos, k = 8, conf = 0.95)
[1] 4

Applying this function to our data, we see that effective size for this lineup is 4.

Comparing two effective sizes

To test if the effective sizes of two independent lineups are significantly different from one another, we can take the difference between the effective sizes and calculate confidence intervals. This function, effsize_compare(), requires a dataframe containing lineup data for two independent lineups. Here, we create a new dataframe containing two independent lineups from nortje2012.

The effective size for lineup_1 is therefore not significantly different from the effective size for lineup_2.

Diagnosticity Ratio

Wells and Lindsay (1980) proposed the use of a ‘diagnosticity ratio’ (essentially a risk ratio) computed from witness choices in target present lineup and target absent lineup (i.e. the ratio of those). Such a ratio could tell us the trade off between hits and false positives, and be used to evaluate modifications to lineup procedures, for instance.

We can calculate diagnosticity ratio for a set of identification parades using either diag_ratio_W() for Wells & Lindsay’s (1980) diagnosticity ratio formula, or diag_ratio_T() for Wells’ adjusted diagnosticity ratio (see: Tredoux, 1998).

Differences between Independent Diagnosticity Ratios

We can compare diagnosticity ratios to other diagnosticity ratios, as detailed in Tredoux (1998). He showed a method for computing the equivalence of k diagnosticity ratios, in particular.

The functions included in this package are based on an approach to calculating homogeneity for k independent diagnosticity ratios, as detailed in Tredoux (1998).

Calculate homogeneity for k independent diagnosticity ratios, using normal theory estimates

The homogeneity function requires a dataframe containing the specific parameters used in the estimation of diagnosticity ratio homogeneity. For each lineup pair (TP and TA), we need to calculate the natural log of the diagnosticity ratio (referred to here as lnd), its variance (var_lnd), and a weight for each ratio that is equal to the inverse of its variance (wi). This is used to calculate a pooled estimator (i.e., the mean of the set of diagnosticity ratios). From this, we compute a chi-square deviate with k-1 degrees of freedom, and calculate its significance.

All these calculations are built into the homog_diag function, and so do not need to be calculated separately. You will need to pass a dataframe of lnd, var_lnd and wi to homog_diag, and all subsequent calculations are performed on this dataframe.

To get the dataframe required by homog_diag(), we follow the following steps:

  1. For each lineup pair, load data for TP and TA lineups into separate vectors. Then, make a list containing all vectors for TP lineups, and another containing the TA lineups. The positions of each lineup pair should correspond in each list.

    • For example, if the first element of the TP lineup list is TP lineup data for lineup pair 1, then the first element of the TA lineup list should be the TA lineup data for lineup pair 1.
  1. Next, the function requires a list of target positions for each lineup pair. For each set of TP/TA data in the TP/TA lists, there should be a corresponding target position list.

    • Each target position indexes the position of each member in the lineup. Therefore, its length should = k for that lineup pair.
  1. To ensure the data have been coded accurately, we then specify nominal size for each lineup pair. The order in which nominal size for each lineup pair is listed must also correspond with the positions of each respective lineup in the lineup lists (i.e., if lineup 1 has k = 6, then the first element of vector ‘k’ = 6). Following the example above, we create our nominal size vector k by calling k <- c(6, 5, 4)

  2. Finally, we pass the above arguments/data to homog_diag(), which assesses the homogeneity of k independent diagnosticity ratios.

Calculate homogeneity for k independent diagnosticity ratios, with bootstrapped confidence intervals

This function returns bootstrapped estimates of the homogeneity of k diagnosticity ratios, and therefore takes the same arguments as homog_diag() (which provides normal theory estimates).

This function does not require you to specify a list of target positions (this is generated from the data).

Thus, we follow steps 1 and 3, outlined above, before calling homog_diag_boot():

Note: R = number of bootstrap replications

ROC Curve (Confidence ~ Accuracy)

The make_roc() function allows you to compute and plot an ROC curve for data from an eyewitness experiment, where accuracy is recorded for target present and target absent lineups. We follow the ideas suggested by Mickes, Wixted, and others (see references). This function is experimental, and we don’t offer much control over the output.

  1. This function requires a dataframe with two columns, which must be named confidence and accuracy (where accuracy = binary accuracy):
Confacc data
confidence accuracy
1 1
1 1
1 1
1 1
2 1
2 1
2 1
3 1
#Call roc function
ROC Curve for Confidence ~ Accuracy

ROC Curve for Confidence ~ Accuracy

Similarity of Faces in a Lineup

Following some ideas suggested by Tredoux (2002), this function computes the degree to which each face in a set of faces loads onto a common factor computed from numeric representations of the faces.

The images used in this example can be found at https://github.com/tmnaylor/malpass_foils, and can be loaded by calling image_read from the magick package:

foil1 <- magick::image_read('https://raw.githubusercontent.com/tmnaylor/malpass_foils/master/malp1.jpg')
  format width height colorspace matte filesize density
1   JPEG   138    154       sRGB FALSE    33506   72x72

Without passing any arguments, call face_sim():

Set of faces is printed to viewer pane

Set of faces is printed to viewer pane

Returns factor loading for each face

Returns factor loading for each face