Introduction to PHEindicatormethods

Georgina Anderson

2020-04-12

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

Package functions

This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.

Function Type Description
phe_proportion Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_rate Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_mean Aggregate Performs a calculation on each grouping set
phe_dsr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
phe_smr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
phe_isr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop     value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015  59 143 0.4125874 0.33521563 0.4945327        95% proportion of 1
#> 2 Area2 2015  38 177 0.2146893 0.16059916 0.2809006        95% proportion of 1
#> 3 Area3 2015  67 192 0.3489583 0.28509984 0.4187422        95% proportion of 1
#> 4 Area4 2015  35 114 0.3070175 0.22979102 0.3968260        95% proportion of 1
#> 5 Area1 2016  31 113 0.2743363 0.20051412 0.3629970        95% proportion of 1
#> 6 Area2 2016  19 174 0.1091954 0.07102821 0.1642457        95% proportion of 1
#> 7 Area3 2016  28 124 0.2258065 0.16110096 0.3069902        95% proportion of 1
#> 8 Area4 2016  55 126 0.4365079 0.35305937 0.5237134        95% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#>    area year obs pop     value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015  59 143 0.4125874 0.29476498 0.5413538      99.8% proportion of 1
#> 2 Area2 2015  38 177 0.2146893 0.13525246 0.3233364      99.8% proportion of 1
#> 3 Area3 2015  67 192 0.3489583 0.25211746 0.4601121      99.8% proportion of 1
#> 4 Area4 2015  35 114 0.3070175 0.19283219 0.4510353      99.8% proportion of 1
#> 5 Area1 2016  31 113 0.2743363 0.16613541 0.4177063      99.8% proportion of 1
#> 6 Area2 2016  19 174 0.1091954 0.05554019 0.2035154      99.8% proportion of 1
#> 7 Area3 2016  28 124 0.2258065 0.13190131 0.3589243      99.8% proportion of 1
#> 8 Area4 2016  55 126 0.4365079 0.30926677 0.5726952      99.8% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence  statistic method
#> 1 Area1 2015  59 143 41.25874 33.521563 49.45327        95% percentage Wilson
#> 2 Area2 2015  38 177 21.46893 16.059916 28.09006        95% percentage Wilson
#> 3 Area3 2015  67 192 34.89583 28.509984 41.87422        95% percentage Wilson
#> 4 Area4 2015  35 114 30.70175 22.979102 39.68260        95% percentage Wilson
#> 5 Area1 2016  31 113 27.43363 20.051412 36.29970        95% percentage Wilson
#> 6 Area2 2016  19 174 10.91954  7.102821 16.42457        95% percentage Wilson
#> 7 Area3 2016  28 124 22.58065 16.110096 30.69902        95% percentage Wilson
#> 8 Area4 2016  55 126 43.65079 35.305937 52.37134        95% percentage Wilson

# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence  statistic method
#> 1 Area1 2015  59 143 41.25874 29.476498 54.13538      99.8% percentage Wilson
#> 2 Area2 2015  38 177 21.46893 13.525246 32.33364      99.8% percentage Wilson
#> 3 Area3 2015  67 192 34.89583 25.211746 46.01121      99.8% percentage Wilson
#> 4 Area4 2015  35 114 30.70175 19.283219 45.10353      99.8% percentage Wilson
#> 5 Area1 2016  31 113 27.43363 16.613541 41.77063      99.8% percentage Wilson
#> 6 Area2 2016  19 174 10.91954  5.554019 20.35154      99.8% percentage Wilson
#> 7 Area3 2016  28 124 22.58065 13.190131 35.89243      99.8% percentage Wilson
#> 8 Area4 2016  55 126 43.65079 30.926677 57.26952      99.8% percentage Wilson

# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#>    area year obs pop    value   lowercl  uppercl
#> 1 Area1 2015  59 143 41.25874 29.476498 54.13538
#> 2 Area2 2015  38 177 21.46893 13.525246 32.33364
#> 3 Area3 2015  67 192 34.89583 25.211746 46.01121
#> 4 Area4 2015  35 114 30.70175 19.283219 45.10353
#> 5 Area1 2016  31 113 27.43363 16.613541 41.77063
#> 6 Area2 2016  19 174 10.91954  5.554019 20.35154
#> 7 Area3 2016  28 124 22.58065 13.190131 35.89243
#> 8 Area4 2016  55 126 43.65079 30.926677 57.26952

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop    value   lowercl  uppercl confidence       statistic
#> 1 Area1 2015  59 143 41258.74 31406.256 53221.70        95% rate per 100000
#> 2 Area2 2015  38 177 21468.93 15190.773 29468.61        95% rate per 100000
#> 3 Area3 2015  67 192 34895.83 27042.544 44317.15        95% rate per 100000
#> 4 Area4 2015  35 114 30701.75 21381.764 42700.01        95% rate per 100000
#> 5 Area1 2016  31 113 27433.63 18636.408 38941.18        95% rate per 100000
#> 6 Area2 2016  19 174 10919.54  6571.253 17053.07        95% rate per 100000
#> 7 Area3 2016  28 124 22580.65 15001.360 32636.54        95% rate per 100000
#> 8 Area4 2016  55 126 43650.79 32881.608 56818.59        95% rate per 100000
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl confidence    statistic method
#> 1 Area1 2015  59 143 41.25874 26.611873 60.72303      99.8% rate per 100  Byars
#> 2 Area2 2015  38 177 21.46893 12.274562 34.57934      99.8% rate per 100  Byars
#> 3 Area3 2015  67 192 34.89583 23.177487 50.19647      99.8% rate per 100  Byars
#> 4 Area4 2015  35 114 30.70175 17.096388 50.39502      99.8% rate per 100  Byars
#> 5 Area1 2016  31 113 27.43363 14.655538 46.36546      99.8% rate per 100  Byars
#> 6 Area2 2016  19 174 10.91954  4.752588 21.11794      99.8% rate per 100  Byars
#> 7 Area3 2016  28 124 22.58065 11.620588 39.15808      99.8% rate per 100  Byars
#> 8 Area4 2016  55 126 43.65079 27.675745 65.09821      99.8% rate per 100  Byars

# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#>    area year obs pop    value   lowercl  uppercl
#> 1 Area1 2015  59 143 41.25874 26.611873 60.72303
#> 2 Area2 2015  38 177 21.46893 12.274562 34.57934
#> 3 Area3 2015  67 192 34.89583 23.177487 50.19647
#> 4 Area4 2015  35 114 30.70175 17.096388 50.39502
#> 5 Area1 2016  31 113 27.43363 14.655538 46.36546
#> 6 Area2 2016  19 174 10.91954  4.752588 21.11794
#> 7 Area3 2016  28 124 22.58065 11.620588 39.15808
#> 8 Area4 2016  55 126 43.65079 27.675745 65.09821

These functions can also return aggregate data if the input dataframes are grouped:



Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

Execute phe_dsr

INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:

The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.

The function can also accept standard populations provided as a column within the input data frame.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified

# calculate separate dsrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop)
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        1671    278507  541.    513.    569. 95%       
#>  2 Area1  2006 Male         1696    297231  514.    487.    542. 95%       
#>  3 Area1  2007 Fema~        1880    307114  607.    577.    638. 95%       
#>  4 Area1  2007 Male         1726    290219  585.    556.    615. 95%       
#>  5 Area1  2008 Fema~        1780    305196  601.    571.    632. 95%       
#>  6 Area1  2008 Male         2091    309328  709.    678.    742. 95%       
#>  7 Area1  2009 Fema~        2036    286227  816.    780.    854. 95%       
#>  8 Area1  2009 Male         1981    293111  680.    649.    712. 95%       
#>  9 Area1  2010 Fema~        2287    289740  878.    839.    917. 95%       
#> 10 Area1  2010 Male         1448    292873  523.    495.    551. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 x 8
#> # Groups:   area, year [20]
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <fct> <int> <fct>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        1671    278507  541.    513.    569.
#>  2 Area1  2006 Male          1696    297231  514.    487.    542.
#>  3 Area1  2007 Female        1880    307114  607.    577.    638.
#>  4 Area1  2007 Male          1726    290219  585.    556.    615.
#>  5 Area1  2008 Female        1780    305196  601.    571.    632.
#>  6 Area1  2008 Male          2091    309328  709.    678.    742.
#>  7 Area1  2009 Female        2036    286227  816.    780.    854.
#>  8 Area1  2009 Male          1981    293111  680.    649.    712.
#>  9 Area1  2010 Female        2287    289740  878.    839.    917.
#> 10 Area1  2010 Male          1448    292873  523.    495.    551.
#> # ... with 30 more rows

# calculate same specifying standard population in vector form
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        1671    278507  541.    513.    569. 95%       
#>  2 Area1  2006 Male         1696    297231  514.    487.    542. 95%       
#>  3 Area1  2007 Fema~        1880    307114  607.    577.    638. 95%       
#>  4 Area1  2007 Male         1726    290219  585.    556.    615. 95%       
#>  5 Area1  2008 Fema~        1780    305196  601.    571.    632. 95%       
#>  6 Area1  2008 Male         2091    309328  709.    678.    742. 95%       
#>  7 Area1  2009 Fema~        2036    286227  816.    780.    854. 95%       
#>  8 Area1  2009 Male         1981    293111  680.    649.    712. 95%       
#>  9 Area1  2010 Fema~        2287    289740  878.    839.    917. 95%       
#> 10 Area1  2010 Male         1448    292873  523.    495.    551. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate the same dsrs by appending the standard populations to the data frame
df_std %>%
    mutate(refpop = rep(esp2013,40)) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        1671    278507  541.    513.    569. 95%       
#>  2 Area1  2006 Male         1696    297231  514.    487.    542. 95%       
#>  3 Area1  2007 Fema~        1880    307114  607.    577.    638. 95%       
#>  4 Area1  2007 Male         1726    290219  585.    556.    615. 95%       
#>  5 Area1  2008 Fema~        1780    305196  601.    571.    632. 95%       
#>  6 Area1  2008 Male         2091    309328  709.    678.    742. 95%       
#>  7 Area1  2009 Fema~        2036    286227  816.    780.    854. 95%       
#>  8 Area1  2009 Male         1981    293111  680.    649.    712. 95%       
#>  9 Area1  2010 Fema~        2287    289740  878.    839.    917. 95%       
#> 10 Area1  2010 Male         1448    292873  523.    495.    551. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
    filter(ageband <= 70) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        1206    217417  530.    500.    561. 95%       
#>  2 Area1  2006 Male         1162    224515  511.    482.    542. 95%       
#>  3 Area1  2007 Fema~        1313    233772  591.    559.    624. 95%       
#>  4 Area1  2007 Male         1217    229816  544.    513.    576. 95%       
#>  5 Area1  2008 Fema~        1433    248103  613.    581.    647. 95%       
#>  6 Area1  2008 Male         1798    248580  728.    694.    763. 95%       
#>  7 Area1  2009 Fema~        1809    233792  831.    791.    871. 95%       
#>  8 Area1  2009 Male         1472    238309  639.    606.    673. 95%       
#>  9 Area1  2010 Fema~        1869    227840  895.    853.    938. 95%       
#> 10 Area1  2010 Male         1220    239605  525.    496.    556. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>
    
# calculate separate dsrs for persons for each area and year)
df_std %>%
    group_by(area, year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(area, year) %>%
    phe_dsr(obs,pop)
#> # A tibble: 20 x 10
#> # Groups:   area [4]
#>    area   year total_count total_pop value lowercl uppercl confidence statistic
#>    <fct> <int>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>      <chr>    
#>  1 Area1  2006        3367    575738  544.    524.    565. 95%        dsr per ~
#>  2 Area1  2007        3606    597333  581.    560.    602. 95%        dsr per ~
#>  3 Area1  2008        3871    614524  638.    617.    659. 95%        dsr per ~
#>  4 Area1  2009        4017    579338  726.    703.    750. 95%        dsr per ~
#>  5 Area1  2010        3735    582613  682.    659.    705. 95%        dsr per ~
#>  6 Area2  2006        3879    566640  692.    669.    716. 95%        dsr per ~
#>  7 Area2  2007        3793    565429  651.    628.    673. 95%        dsr per ~
#>  8 Area2  2008        3327    564452  596.    575.    618. 95%        dsr per ~
#>  9 Area2  2009        3972    548964  726.    702.    750. 95%        dsr per ~
#> 10 Area2  2010        3642    586793  606.    585.    627. 95%        dsr per ~
#> 11 Area3  2006        3370    550080  623.    601.    645. 95%        dsr per ~
#> 12 Area3  2007        3710    561789  648.    626.    671. 95%        dsr per ~
#> 13 Area3  2008        3816    567972  649.    627.    672. 95%        dsr per ~
#> 14 Area3  2009        4265    571561  753.    729.    777. 95%        dsr per ~
#> 15 Area3  2010        3997    582175  677.    655.    700. 95%        dsr per ~
#> 16 Area4  2006        3743    593992  641.    619.    663. 95%        dsr per ~
#> 17 Area4  2007        4148    573292  739.    715.    763. 95%        dsr per ~
#> 18 Area4  2008        4133    558561  755.    731.    779. 95%        dsr per ~
#> 19 Area4  2009        3681    575834  646.    624.    669. 95%        dsr per ~
#> 20 Area4  2010        3865    562536  665.    643.    688. 95%        dsr per ~
#> # ... with 1 more variable: method <chr>

Execute phe_smr and phe_isr

INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the phe_smr and phe_isr functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (isr only), the smr or isr, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

Here are some example code chunks to demonstrate the phe_smr function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_smr(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   observed expected value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     1671    1748. 0.956   0.911   1.00  95%       
#>  2 Area1  2006 Male      1696    1870. 0.907   0.864   0.951 95%       
#>  3 Area1  2007 Fema~     1880    1960. 0.959   0.916   1.00  95%       
#>  4 Area1  2007 Male      1726    1815. 0.951   0.907   0.997 95%       
#>  5 Area1  2008 Fema~     1780    1901. 0.936   0.893   0.981 95%       
#>  6 Area1  2008 Male      2091    1907. 1.10    1.05    1.14  95%       
#>  7 Area1  2009 Fema~     2036    1790. 1.14    1.09    1.19  95%       
#>  8 Area1  2009 Male      1981    1836. 1.08    1.03    1.13  95%       
#>  9 Area1  2010 Fema~     2287    1803. 1.27    1.22    1.32  95%       
#> 10 Area1  2010 Male      1448    1810. 0.800   0.759   0.842 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate the same smrs by appending the reference data to the data frame
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    phe_smr(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   observed expected value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     1671    1748. 0.956   0.911   1.00  95%       
#>  2 Area1  2006 Male      1696    1870. 0.907   0.864   0.951 95%       
#>  3 Area1  2007 Fema~     1880    1960. 0.959   0.916   1.00  95%       
#>  4 Area1  2007 Male      1726    1815. 0.951   0.907   0.997 95%       
#>  5 Area1  2008 Fema~     1780    1901. 0.936   0.893   0.981 95%       
#>  6 Area1  2008 Male      2091    1907. 1.10    1.05    1.14  95%       
#>  7 Area1  2009 Fema~     2036    1790. 1.14    1.09    1.19  95%       
#>  8 Area1  2009 Male      1981    1836. 1.08    1.03    1.13  95%       
#>  9 Area1  2010 Fema~     2287    1803. 1.27    1.22    1.32  95%       
#> 10 Area1  2010 Male      1448    1810. 0.800   0.759   0.842 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate separate smrs for each year and drop metadata columns from output
df_std %>%
    group_by(year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(year) %>%
    phe_smr(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 6
#>    year observed expected value lowercl uppercl
#>   <int>    <int>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    14359   14359   1      0.984    1.02
#> 2  2007    15257   14495.  1.05   1.04     1.07
#> 3  2008    15147   14457.  1.05   1.03     1.06
#> 4  2009    15935   14250.  1.12   1.10     1.14
#> 5  2010    15239   14422.  1.06   1.04     1.07

The phe_isr function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate isrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_isr(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 12
#> # Groups:   area, year [20]
#>    area   year sex   observed expected ref_rate value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     1671    1748.     628.  600.    572.    630. 95%       
#>  2 Area1  2006 Male      1696    1870.     628.  570.    543.    597. 95%       
#>  3 Area1  2007 Fema~     1880    1960.     628.  602.    575.    630. 95%       
#>  4 Area1  2007 Male      1726    1815.     628.  597.    569.    626. 95%       
#>  5 Area1  2008 Fema~     1780    1901.     628.  588.    561.    616. 95%       
#>  6 Area1  2008 Male      2091    1907.     628.  689.    659.    719. 95%       
#>  7 Area1  2009 Fema~     2036    1790.     628.  714.    684.    746. 95%       
#>  8 Area1  2009 Male      1981    1836.     628.  677.    648.    708. 95%       
#>  9 Area1  2010 Fema~     2287    1803.     628.  797.    764.    830. 95%       
#> 10 Area1  2010 Male      1448    1810.     628.  502.    477.    529. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate the same isrs by appending the reference data to the data frame
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    phe_isr(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 12
#> # Groups:   area, year [20]
#>    area   year sex   observed expected ref_rate value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     1671    1748.     628.  600.    572.    630. 95%       
#>  2 Area1  2006 Male      1696    1870.     628.  570.    543.    597. 95%       
#>  3 Area1  2007 Fema~     1880    1960.     628.  602.    575.    630. 95%       
#>  4 Area1  2007 Male      1726    1815.     628.  597.    569.    626. 95%       
#>  5 Area1  2008 Fema~     1780    1901.     628.  588.    561.    616. 95%       
#>  6 Area1  2008 Male      2091    1907.     628.  689.    659.    719. 95%       
#>  7 Area1  2009 Fema~     2036    1790.     628.  714.    684.    746. 95%       
#>  8 Area1  2009 Male      1981    1836.     628.  677.    648.    708. 95%       
#>  9 Area1  2010 Fema~     2287    1803.     628.  797.    764.    830. 95%       
#> 10 Area1  2010 Male      1448    1810.     628.  502.    477.    529. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>

# calculate separate isrs for each year and drop metadata columns from output
df_std %>%
    group_by(year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(year) %>%
    phe_isr(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 7
#>    year observed expected ref_rate value lowercl uppercl
#>   <int>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    14359   14359      628.  628.    618.    638.
#> 2  2007    15257   14495.     628.  661.    651.    672.
#> 3  2008    15147   14457.     628.  658.    648.    669.
#> 4  2009    15935   14250.     628.  702.    691.    713.
#> 5  2010    15239   14422.     628.  664.    653.    674.