In this vignette, we will use the `lathyrus`

dataset to
illustrate the estimation of **integral projection models
(IPMs)**. To reduce output size, we have prevented some
statements from running if they produce long stretches of output.
Examples include most `summary()`

calls. In these cases, we
include hashtagged versions of these calls, and our text assumes that
the user runs these statements without hashtags. We have also written
this vignette using the default Ehrlén format as the output format for
all historical MPMs. All examples work with deVries format as well. This
vignette is only a sample analysis. Detailed information and
instructions on using `lefko3`

are available through a free
online e-book called *lefko3: a gentle introduction*, available
on the projects
page of the Shefferson lab website. This website also includes links
to long-format vignettes and other demonstrations of this package.

*Lathyrus vernus* (family Fabaceae) is a long-lived forest herb,
native to Europe and large parts of northern Asia. Individuals grow
larger slowly and usually flower only after 10-15 years of vegetative
growth. Flowering individuals have an average conditional lifespan of
44.3 years (Ehrlén and Lehtila 2002).
*L. vernus* lacks organs for vegetative spread and individuals
are well delimited (Ehrlén 2002). One or
several erect shoots of up to 40 cm height emerge from a subterranean
rhizome in March and April. Flowering occurs about four weeks after
shoot emergence. Shoot growth is determinate, and the number of flowers
is determined in the previous year (Ehrlén and
Van Groenendael 2001). Individuals sometimes do not produce
aboveground structures every year, instead remaining dormant in one or
more seasons. *L. vernus* is self-compatible but requires visits
from bumble-bees to produce seeds. Individuals produce few, large seeds
and establishment from seeds is relatively frequent (Ehrlén and Eriksson 1996). The pre-dispersal
seed predator *Bruchus atomarius* often consumes a large fraction
of developing seeds, and roe deer (*Capreolus capreolus*)
sometimes consume the shoots (Ehrlén and
Munzbergova 2009).

Data for this study were collected from six permanent plots in a
population of *L. vernus* located in a deciduous forest in the
Tullgarn area in southeastern Sweden (58.9496 N, 17.6097 E), during
1988–1991 (Ehrlén 1995). The six plots
were similar with regard to soil type, elevation, slope, and canopy
cover. Within each plot, all individuals were marked with numbered tags
that remained over the study period, and their locations were mapped.
New individuals were included in the study in each year. Individuals
were recorded at least three times every growing season. At the time of
shoot emergence, we recorded whether individuals were alive and produced
above-ground shoots, and if shoots had been grazed. During flowering, we
recorded flower number and the height and diameter of all shoots. At
fruit maturation, we counted the number of intact and damaged seeds. To
derive a measure of above-ground size for each individual, we calculated
the volume of each shoot as \(\pi ×
(\frac{1}{2} diameter)^2 × height\), and summed the volumes of
all shoots. This measure is strongly correlated with the dry mass of
aboveground tissues (\(R^2 = 0.924\),
\(P < 0.001\), \(n = 50\), log-transformed values; Ehrlén
1995). Size of individuals that had been grazed was estimated based on
measures of shoot diameter in grazed shoots, and the relationship
between shoot diameter and shoot height in non-grazed individuals. Only
individuals with an aboveground volume of more than 230 mm^{3}
flowered and produced fruits during this study. Individuals that lacked
aboveground structures in one season but reappeared in the following
year were considered dormant. Individuals that lacked aboveground
structures in two subsequent seasons were considered dead from the year
in which they first lacked aboveground structures. Probabilities of
seeds surviving to the next year, and of being present as seedlings or
seeds in the soil seed bank, were derived from separate yearly sowing
experiments in separate plots adjacent to each subplot (Ehrlén and Eriksson 1996).

An IPM is a kind of function-based MPM in which transitions are modeled
on a continuous state variable rather than discrete stages (Ellner and Rees 2006). A size metric is often
used as this continuous state variable. In practice, all size-classified
matrix projection models including IPMs require discretized size
classes. So, although size is modeled on the Gaussian distribution, the
actual stages developed break size up into many fine-scale, equally
sized classes or bins. Although the number of these bins varies from
analysis to analysis, package `lefko3`

uses a default of 100
(this can be changed as an option). Because vital rates are modeled
rather than directly calculated from the data, the number of individuals
moving through any particular size class on any particular occasion does
not need to be considered in determining stage boundaries (as they would
be in raw MPM estimation). IPMs are often *complex*, meaning that
they include some life history stages that fall outside of the size
classifications developed for the matrices, such as dormant seeds or
juveniles.

The dataset that we have provided is organized in horizontal format,
meaning that each row holds all of the data for a single, unique
individual, and columns correspond to individual condition in particular
monitoring occasions (which we refer to as *years* here, since
there was one main census in each year). The original spreadsheet file
used to keep the dataset has a repeating pattern to these columns, with
each year having a similarly arranged group of variables. Let’s load the
dataset.

```
data(lathyrus)
#summary(lathyrus)
```

This dataset includes information on 1,119 individuals, so there are
1,119 rows with data. There are 38 columns. The first two columns give
identifying information about each individual (`SUBPLOT`

refers to the patch, and `GENET`

refers to individual
identity), with each individual’s data entirely restricted to one row.
This is followed by four sets of nine columns, each named
`VolumeXX`

, `lnVolXX`

, `FCODEXX`

,
`FlowXX`

, `IntactseedXX`

, `Dead19XX`

,
`DormantXX`

, `Missing19XX`

, and
`SeedlingXX`

, where `XX`

corresponds to the year
of observation and with years organized consecutively. Thus, columns
3-11 refer to year 1988, columns 12-20 refer to year 1989, etc. This
strictly repeated pattern allows us to manipulate the original dataset
quickly and efficiently. We should know the number of years used, which
is 4 years here, and we should also have arranged the columns in the
same order for each year, with years in consecutive order with no extra
columns between them. This order is not required, provided that we are
willing to input all variable names in the correct order when
transforming the dataset later.

To begin, we will create a **stageframe** for this dataset.
A stageframe is a data frame that describes all stages in the life
history of the organism, in a way usable by the functions in this
package and using stage names and classifications that completely match
those used in the dataset. It must include complete descriptions of all
stages that occur in the dataset, with each stage defined uniquely.
Since this object can be used for automated classification of
individuals, all sizes, reproductive states, and other characteristics
defining each stage in the dataset need to be accounted for explicitly.
This can be difficult if a few data points exist outside the range of
sizes specified in the stageframe, so great care must be taken to
include all relevant size values and values of other descriptor
variables occurring within the dataset. The final description of each
stage occurring in the dataset must not completely overlap with any
other stage, although partial overlap is allowed. We will base our
stageframe on the life history model provided in Ehrlén (2000), but use a different size
classification based on leaf volume to allow IPM construction and make
all mature stages other than vegetative dormancy reproductive, as shown
in Figure 6.1 below.

Figure 6.1. Life history model of *Lathyrus vernus*. Not all
adult classes are shown. Survival transitions are indicated with solid
arrows, while fecundity transitions are indicated with dashed
arrows.

In the stageframe code below, we show that we want an IPM by choosing
two stages that serve as the size limits for IPM size classification.
These two size classes should have exactly the same characteristics in
the stageframe **other than size**. By choosing these two
size limits, we can skip adding and describing the many size classes
that will fall between these limits - function `sf_create()`

will create all of these for us. We mark these limits in the vector that
we load into the `stagenames`

option using the string
`ipm`

. Package `lefko3`

will then create and name
all IPM size classes according to its own conventions. The default
number of size classes is 100 bins, which can be changed using the
`ipmbins`

option.

```
<- c(0, 100, 0, 1, 7100)
sizevector <- c("Sd", "Sdl", "Dorm", "ipm", "ipm")
stagevector <- c(0, 0, 0, 1, 1)
repvector <- c(0, 1, 0, 1, 1)
obsvector <- c(0, 0, 1, 1, 1)
matvector <- c(1, 1, 0, 0, 0)
immvector <- c(1, 0, 0, 0, 0)
propvector <- c(0, 1, 1, 1, 1)
indataset <- c(0, 100, 0.5, 1, 1)
binvec <- c("Dormant seed", "Seedling", "Dormant", "ipm adult stage", "ipm adult stage")
comments <- sf_create(sizes = sizevector, stagenames = stagevector,
lathframeipm repstatus = repvector, obsstatus = obsvector, propstatus = propvector,
immstatus = immvector, matstatus = matvector, comments = comments,
indataset = indataset, binhalfwidth = binvec, ipmbins = 100, roundsize = 3)
#lathframeipm
```

This stageframe has 103 stages. The IPM portion technically starts with
the fourth stage and keeps going through the 103rd stage. Stage names
within this range are concatenations of the size centroid (designated
with `sz`

), and, given a limited string length, the
reproductive status, maturity status, and observation status. The first
three stages, which fall outside of the IPM classification, are left
unaltered.

To work with this dataset, we first need to format the data into
*vertical format*, in which each row corresponds to the state of
a single individual in two (if ahistorical) or three (if historical)
consecutive time intervals. Because this is an IPM, we will need to
estimate linear models of vital rates. This will require us to avoid NAs
in size and fecundity (fortunately, in this dataset, NA is equivalent to
0), so we will set `NAas0 = TRUE`

to inform R to treat NAs as
zeroes. We will also set `NRasRep = TRUE`

because we will
assume that all adult stages other than dormancy are reproductive, and
there are mature individuals in the dataset that do not reproduce but
need to be included in reproductive stages (setting this option to
`TRUE`

makes sure that the reproductive status of
non-reproductive individuals in potentially reproductive stages is set
to 1, although the actual fecundity is not altered). Finally, we will
ignore patches marked in the dataset and estimate matrices only for the
full population, in order to preserve statistical power for vital rate
modeling in historical IPM analysis.

In the input to `verticalize3()`

below, we utilize a
repeating pattern of variable names arranged in the same order for each
monitoring occasion. This arrangement allows us to enter only the first
variable in each set, as long as `noyears`

and
`blocksize`

are set properly and no gaps or shuffles appear
in the dataset. The data management functions that we have created for
`lefko3`

do not require such repeating patterns, but they do
make the required input in the function much shorter and more succinct.
To see a more detailed summary of the standardized dataset, please
remove `full = FALSE`

from the `summary_hfv()`

call.

Note that prior to standardizing the dataset, we will create a new variable to code individual identity, since different plants in different subpopulations use the same identifiers.

```
$indiv_id <- paste(lathyrus$SUBPLOT, lathyrus$GENET)
lathyrus
<- verticalize3(lathyrus, noyears = 4, firstyear = 1988,
lathvertipm individcol = "indiv_id", blocksize = 9, juvcol = "Seedling1988",
sizeacol = "Volume88", repstracol = "FCODE88", fecacol = "Intactseed88",
deadacol = "Dead1988", nonobsacol = "Dormant1988", stageassign = lathframeipm,
stagesize = "sizea", censorcol = "Missing1988", censorkeep = NA,
censorRepeat = TRUE, censor = TRUE, NAas0 = TRUE, NRasRep = TRUE)
summary_hfv(lathvertipm, full = FALSE)
#>
#> This hfv dataset contains 2527 rows, 42 variables, 1 population,
#> 1 patch, 1053 individuals, and 3 time steps.
```

Before we move on to the next steps in analysis, let’s take a closer look at fecundity. In this dataset, fecundity is mostly a count of intact seeds, and only differs in six cases where the seed output was estimated based on other models. To see this, try the following code.

```
writeLines(paste0("Length of fecundity variable in t+1: ", length(lathvertipm$feca3),
", and # non-integer entries: ", length(which(lathvertipm$feca3 !=
round(lathvertipm$feca3)))))
#> Length of fecundity variable in t+1: 2527, and # non-integer entries: 0
writeLines(paste0("Length of fecundity variable in t: ", length(lathvertipm$feca2),
", and # non-integer entries: ", length(which(lathvertipm$feca2 !=
round(lathvertipm$feca2)))))
#> Length of fecundity variable in t: 2527, and # non-integer entries: 6
writeLines(paste0("Length of fecundity variable in t-1: ", length(lathvertipm$feca1),
", and # non-integer entries: ", length(which(lathvertipm$feca1 !=
round(lathvertipm$feca1)))))
#> Length of fecundity variable in t-1: 2527, and # non-integer entries: 6
```

We see that we have quite a bit of fecundity data, and that it is overwhelmingly but not exclusively integer. So, we can either treat fecundity as a continuous variable, or round the values and treat fecundity as a count variable. We will choose the latter approach in this analysis.

```
$feca3 <- round(lathvertipm$feca3)
lathvertipm$feca2 <- round(lathvertipm$feca2)
lathvertipm$feca1 <- round(lathvertipm$feca1) lathvertipm
```

Although we wish to treat fecundity as a count, it is still not clear what underlying distribution we should use. This package currently allows eight choices: Gaussian, Gamma, Poisson, negative binomial, zero-inflated Poisson, zero-inflated negative binomial, zero-truncated Poisson, and zero-truncated negative binomial. To assess which to use, we should first assess whether the mean and variance of the count are equal using a dispersion test. The Poisson distribution assumes that the mean and variance are equal, and so we can test this assumption using a chi-squared test. If it is not significantly different, then we may use some variant of the Poisson distribution. If the data are significantly over- or under-dispersed, then we should use the negative binomial distribution. If fecundity of 0 is possible in reproductive stages, as in cases where reproductive status is defined by flowering rather than by offspring production, then we should also test whether the number of zeroes is significantly greater than expected under these distributions, and use a zero-inflated distribution if so (if fecundity does not equal 0 in any reproductive individuals at all, then we should use a zero-truncated distribution).

Let’s start off by looking at a plot of the distribution of fecundity.

```
hist(subset(lathvertipm, repstatus2 == 1)$feca2, main = "Fecundity",
xlab = "Intact seeds produced in occasion t")
```

We see that the distribution conforms to a classic count variable with a
very low mean value. The first bar suggests that there may be too many
zeroes, as well. Let’s now formally test our assumptions. We will use
the `hfv_qc()`

function, which will allow us to assess the
quality of the data from the viewpoint of the linear models to be built
later. We will use most of the input from the `modelsearch()`

call that we will conduct later.

```
hfv_qc(data = lathvertipm, vitalrates = c("surv", "obs", "size", "fec"),
juvestimate = "Sdl", indiv = "individ", year = "year2", age = "obsage")
#> Survival:
#>
#> Data subset has 43 variables and 2246 transitions.
#>
#> Variable alive3 has 0 missing values.
#> Variable alive3 is a binomial variable.
#>
#>
#> Observation status:
#>
#> Data subset has 43 variables and 2121 transitions.
#>
#> Variable obsstatus3 has 0 missing values.
#> Variable obsstatus3 is a binomial variable.
#>
#>
#> Primary size:
#>
#> Data subset has 43 variables and 1916 transitions.
#>
#> Variable sizea3 has 0 missing values.
#> Variable sizea3 appears to be a floating point variable.
#> 1256 elements are not integers.
#> The minimum value of sizea3 is 3.4 and the maximum is 6646.
#> The mean value of sizea3 is 512.8 and the variance is 507200.
#> The value of the Shapiro-Wilk test of normality is 0.7134 with P = 3.014e-49.
#> Variable sizea3 differs significantly from a Gaussian distribution.
#>
#> Variable sizea3 is fully positive, lacking even 0s.
#>
#>
#> Reproductive status:
#>
#> Data subset has 43 variables and 1916 transitions.
#>
#> Variable repstatus3 has 0 missing values.
#> Variable repstatus3 is a binomial variable.
#>
#>
#> Fecundity:
#>
#> Data subset has 43 variables and 2246 transitions.
#>
#> Variable feca2 has 0 missing values.
#> Variable feca2 appears to be an integer variable.
#>
#> Variable feca2 is fully non-negative.
#>
#> Overdispersion test:
#> Mean feca2 is 1.282
#> The variance in feca2 is 23.21
#> The probability of this dispersion level by chance assuming that
#> the true mean feca2 = variance in feca2,
#> and an alternative hypothesis of overdispersion, is 0
#> Variable feca2 is significantly overdispersed.
#>
#> Zero-inflation and truncation tests:
#> Mean lambda in feca2 is 0.2774
#> The actual number of 0s in feca2 is 1980
#> The expected number of 0s in feca2 under the null hypothesis is 623
#> The probability of this deviation in 0s from expectation by chance is 0
#> Variable feca2 is significantly zero-inflated.
#>
#>
#> Juvenile survival:
#>
#> Data subset has 43 variables and 281 transitions.
#>
#> Variable alive3 has 0 missing values.
#> Variable alive3 is a binomial variable.
#>
#>
#> Juvenile observation status:
#>
#> Data subset has 43 variables and 210 transitions.
#>
#> Variable obsstatus3 has 0 missing values.
#> Variable obsstatus3 is a binomial variable.
#>
#>
#> Juvenile primary size:
#>
#> Data subset has 43 variables and 193 transitions.
#>
#> Variable sizea3 has 0 missing values.
#> Variable sizea3 appears to be a floating point variable.
#> 127 elements are not integers.
#> The minimum value of sizea3 is 2.1 and the maximum is 61.
#> The mean value of sizea3 is 11.23 and the variance is 50.81.
#> The value of the Shapiro-Wilk test of normality is 0.5997 with P = 5.72e-21.
#> Variable sizea3 differs significantly from a Gaussian distribution.
#>
#> Variable sizea3 is fully positive, lacking even 0s.
#>
#>
#> Juvenile reproductive status:
#>
#> Data subset has 43 variables and 193 transitions.
#>
#> Variable repstatus3 has 0 missing values.
#> Variable repstatus3 is a binomial variable.
#>
#>
#> Juvenile maturity status:
#>
#> Data subset has 43 variables and 210 transitions.
#>
#> Variable matstatus3 has 0 missing values.
#> Variable matstatus3 is a binomial variable.
```

All of the probability variables look like binomials, so we have no problem there. Size and juvenile size appear to be continuous variables that differ significantly from the assumptions of the Gaussian distribution. However, we will still use a Gaussian distribution in this case, for instructional purposes. Fecundity is a count variable with significant overdispersion and significantly more zeros than expected, so we should use a zero-inflated negative binomial distribution.

Now we will create **supplement tables**, which provide
extra data for matrix estimation that is not included in the main
demographic dataset. Specifically, we will provide the seed dormancy
probability and germination rate, which are given as transitions from
the dormant seed stage to another year of seed dormancy or to the
germinated seedling stage, respectively. We assume that the germination
rate is the same regardless of whether seed was produced in the previous
year or has been in the seedbank for longer. We will incorporate these
terms both as fixed constants for specific transitions within the
resulting matrices, and as multipliers for fecundity, since ultimately
fecundity will be estimated as the production of seed multiplied by the
seed germination rate or the seed dormancy rate. Because some
individuals stay in the seedling stage for only 1 year, and the seed
stage itself cannot be observed and so does not exist in the dataset, we
will also set a proxy set of transitions so that R assumes that the
transitions from seed in occasion *t*-1 to seedling in occasion
*t* to all mature stages in occasion *t*+1 are equal to
the equivalent transitions from seedling in both occasions *t*-1
and *t*. We will start with the ahistorical case, and then move
on to the historical case, where we also need to input the corresponding
stages in occasion *t*-1 and transition types from occasion
*t*-1 to *t* for each transition.

```
<- supplemental(stage3 = c("Sd", "Sdl", "Sd", "Sdl"),
lathsupp2 stage2 = c("Sd", "Sd", "rep", "rep"),
givenrate = c(0.345, 0.054, NA, NA),
multiplier = c(NA, NA, 0.345, 0.054),
type = c(1, 1, 3, 3), stageframe = lathframeipm, historical = FALSE)
<- supplemental(stage3 = c("Sd", "Sd", "Sdl", "Sdl", "npr", "Sd", "Sdl"),
lathsupp3 stage2 = c("Sd", "Sd", "Sd", "Sd", "Sdl", "rep", "rep"),
stage1 = c("Sd", "rep", "Sd", "rep", "Sd", "mat", "mat"),
eststage3 = c(NA, NA, NA, NA, "npr", NA, NA),
eststage2 = c(NA, NA, NA, NA, "Sdl", NA, NA),
eststage1 = c(NA, NA, NA, NA, "Sdl", NA, NA),
givenrate = c(0.345, 0.345, 0.054, 0.054, NA, NA, NA),
multiplier = c(NA, NA, NA, NA, NA, 0.345, 0.054),
type = c(1, 1, 1, 1, 1, 3, 3), type_t12 = c(1, 2, 1, 2, 1, 1, 1),
stageframe = lathframeipm, historical = TRUE)
#lathsupp2
#lathsupp3
```

These supplement tables provide the best means of adding external data
to our MPMs because they allow both specific transitions to be isolated,
and because they allow the use of shorthand to identify large groups of
transitions (e.g. using `mat`

, `immat`

,
`rep`

, `nrep`

, `prop`

,
`npr`

, `obs`

, `nobs`

,
`groupX`

, or `all`

to signify all mature stages,
immature stages, reproductive stages, non-reproductive stages, propagule
stages, non-propagule stages, observable stages, unobservable stages,
stages within group `X`

, or simply all stages, respectively).
They also allow more powerful changes to MPMs, such as block reset of
groups of transitions to zero or other constants.

Integral projection models (IPMs) require functions of vital rates to
populate them. Here, we will develop these functions as linear models
using `modelsearch()`

. This function automates several
crucial and complex tasks in MPM analysis. Specifically, it automates 1)
the building of global models for each vital rate requested, 2) the
exhaustive construction of all reduced models, and 3) the selection of
the best-fit models. Let’s look at some of the options that we will
utilize for this purpose (please note that this list includes only some
of the options actually offered by the function - further options are
shown in the documentation for `modelsearch()`

, and further
theoretical details are shown in the
`Basic theory and concepts`

vignette):

**historical**: Setting this to `TRUE`

or
`FALSE`

indicates to include state in occasion *t*-1
in the global models.

**approach**: Designates whether to use generalized
linear models (`glm`

), in which all factors are treated as
fixed, or mixed effects models (`mixed`

), in which factors
are treated as either fixed or random (most notably, occasion, patch,
and individual identity can be treated as random).

**suite**: Designates which factors to include in the
global model. Possible values include `size`

, which includes
size only; `rep`

, which includes reproductive status only;
`main`

, which includes both size and reproductive status as
main effects only; `full`

, which includes both size and
reproductive status and all two-way interactions; and `cons`

,
which includes only an intercept. These factors are in addition to
individual identity, patch, and occasion.

**vitalrates**: Designates which vital rates to model.
The default is to model survival, size, and fecundity, but users can
also model observation status and reproduction status. Fewer vital rates
may also be chosen.

**juvestimate**: Designates the name of the juvenile
stage transitioning to maturity, in cases where the dataset includes
data on juveniles.

**juvsize**: Denotes whether size should be used as an
independent term in models involving transition from the juvenile
stage.

**bestfit**: Denotes whether the best-fit model should
be chosen as the model with the lowest AICc (`AICc`

) or as
the most parsimonious model (`AICc&k`

), where the latter
is the model that has the fewest estimated parameters and is within 2
AICc units of the model with the lowest AICc.

**sizedist**: Designates the distribution used for size.
The options include `gaussian`

, `gamma`

,
`poisson`

, and `negbin`

, the last of which refers
to the negative binomial distribution (quadratic parameterization).
Versions also exist for up to two further size variables in addition to
primary size.

**ipm_method**: Designates whether to estimate size
transition probabilities using the cumulative density function (CDF)
method (the default), or the midpoint method. The CDF method is exact
while the midpoint method generally overestimates (see Doak et al. (2021) for further details).

**fecdist**: Designates the distribution used for
fecundity. The options include `gaussian`

,
`gamma`

, `poisson`

, and `negbin`

, the
last of which refers to the negative binomial distribution (quadratic
parameterization).

**fectime**: Designates whether fecundity is estimated
within occasion *t* (the default) or occasion *t*+1. Plant
ecologists are likely to choose the former, since fecundity is typically
estimated via a proxy such as flowers, fruit, or seed produced. Wildlife
ecologists might choose the latter, since fecundity may be best
estimated as a count of actual juveniles within nests, burrows, or other
family structures.

**size.zero**: Designates whether the size distribution
should be zero-inflated. Only applies to the Poisson and negative
binomial distributions. Versions for up to two other size variables
exist.

**size.trunc**: Designates whether to use a
zero-truncated distribution for size. Only applies to the Poisson and
negative binomial distributions, and cannot be applied if
`size.zero = TRUE`

. Versions for up to two other size
variables exist.

**fec.zero**: Designates whether the fecundity
distribution should be zero-inflated. Only applies to the Poisson and
negative binomial distributions.

**fec.trunc**: Designates whether to use a
zero-truncated distribution for fecundity. Only applies to the Poisson
and negative binomial distributions, and cannot be applied if
`fec.zero = TRUE`

.

**jsize.zero**: Designates whether the juvenile size
distribution should be zero-inflated. Only applies to the Poisson and
negative binomial distributions. Versions for up to two other size
variables exist.

**jsize.trunc**: Designates whether to use a
zero-truncated distribution for juvenile size. Only applies to the
Poisson and negative binomial distributions, and cannot be applied if
`jsize.zero = TRUE`

. Versions for up to two other size
variables exist.

**indiv**: Designates the variable corresponding to
individual identity in the vertical dataset.

**patch**: Designates the variable corresponding to
patch identity in the vertical dataset.

**year**: Designates the variable corresponding to time
in occasion *t* in the vertical dataset.

**year.as.random**: Designates whether to treat year as
a random factor, and is only used when
`approach = "mixed"`

.

**patch.as.random**: Designates whether to treat patch
identity as a random factor, and is only used when
`approach = "mixed"`

.

**quiet**: Denotes whether to silence guidepost
statements and most warning messages. Only warnings incurred in model
testing will be visible, including warnings from the testing of global
and reduced models produced during model dredging.

This function both develops the vital rate models necessary for
function-based MPM estimation, and allows us to test whether individual
history affects demography. Setting `historical = TRUE`

fits
size and/or reproductive status in occasions *t* and *t*-1
into global models, while setting `historical = FALSE`

limits
testing to the impacts of state in occasion *t* only. The model
building and selection protocols can then be used to see if history has
a significant impact on a vital rate, by assessing whether a historical
term is retained in the best-fit model.

First, we will create the historical models to assess whether history is
a significant influence on vital rates. Because all stages are defined
as potentially reproductive, we will not test for the impacts of
reproductive status. Please remove the hashtag from the
`summary`

call to see details of the best-fit models.

```
<- modelsearch(lathvertipm, historical = TRUE, approach = "mixed",
lathmodels3ipm suite = "size", vitalrates = c("surv", "obs", "size", "fec"),
juvestimate = "Sdl", bestfit = "AICc&k", sizedist = "gaussian",
fecdist = "negbin", fec.zero = TRUE, indiv = "individ", year = "year2",
year.as.random = TRUE, juvsize = TRUE, quiet = TRUE)
#summary(lathmodels3ipm)
```

We see here, as before, that size in occasion *t*-1 exerts an
influence on some vital rates, including survival to occasion
*t*+1 and size in occasion *t*+1. So, the historical IPM
is the correct choice here. Accuracy is also quite high for adult
survival and juvenile and adult observation, but quite poor for primary
size and fecundity (R^{2} is 0.546 and 0.443, respectively).

We will also create an ahistorical IPM for comparison. For that purpose, we will create the ahistorical linear model set.

```
<- modelsearch(lathvertipm, historical = FALSE,
lathmodels2ipm approach = "mixed", suite = "size",
vitalrates = c("surv", "obs", "size", "fec"), juvestimate = "Sdl",
bestfit = "AICc&k", sizedist = "gaussian", fecdist = "negbin",
fec.zero = TRUE, indiv = "individ", year = "year2", year.as.random = TRUE,
juvsize = TRUE, quiet = TRUE)
#summary(lathmodels2ipm)
```

We will now create the historical suite of matrices covering the years of study. These matrices will be extremely large - large enough that some computers might have difficulty with them. If you encounter an error message telling you that you have run out of memory, then please try this on a more powerful computer :) .

```
<- flefko3(stageframe = lathframeipm, modelsuite = lathmodels3ipm,
lathmat3ipm supplement = lathsupp3, data = lathvertipm, reduce = FALSE)
#summary(lathmat3ipm)
```

These are giant matrices. With 10,609 rows and columns, there are a total of 112,550,881 elements per matrix. But they are also amazingly sparse - with 915,103 elements estimated, only 0.8% of elements per matrix are non-zero. The survival probability sums all look good, so we appear to have no problems with overly large given and proxy survival transitions provided through our supplemental tables. Let’s now build the ahistorical IPMs.

```
<- flefko2(stageframe = lathframeipm, modelsuite = lathmodels2ipm,
lathmat2ipm supplement = lathsupp2, data = lathvertipm, reduce = FALSE)
#summary(lathmat2ipm)
```

The ahistorical IPMs are certainly smaller than the historical IPMs, but are nonetheless quiet large. Although huge, these matrices are not sparse - an average of 9,182.3 elements out of 10,609 per matrix are estimated (86.6%; note that the summary() function only counts elements as estimated if they equal a value other than 0, leading the exact number of estimated elements to vary among matrices when elements are estimated at near 0).

Let’s do a further comparison - let’s view a matrix plot of each kind of MPM, ahistorical and then historical. First, the ahistorical plot.

`image3(lathmat2ipm, used = 1)`

```
#> [[1]]
#> NULL
```

Now the historical plot.

`image3(lathmat3ipm, used = 1)`

```
#> [[1]]
#> NULL
```

The plots above show that the ahistorical matrix is large but dense, mostly full of non-zero entries (the colored elements correspond to non-zero elements). In contrast, the historical matrix is huge and sparse, mostly full of zeroes with a general pattern to the distribution of non-zero elements.

Now let’s estimate the mean IPM matrices. We will estimate one mean matrix each, because we did not separate patches in the data reorganization and vital rate modeling. As a check, let’s also look over the summaries.

```
<- lmean(lathmat2ipm)
lath2ipmmean <- lmean(lathmat3ipm)
lath3ipmmean #summary(lath2ipmmean)
#summary(lath3ipmmean)
```

All looks fine! Let’s also take a look at a portion of one of the
conditional historical matrices, particularly the matrix conditional on
vegetative dormancy in occasion *t*-1. This matrix can be
compared to the ahistorical mean to assess the impacts of history on the
matrix elements themselves. Please remember to remove the hashtag.

```
<- cond_hmpm(lath3ipmmean)
l3mcond #l3mcond$Mcond[[1]]$Dorm
```

Now let’s estimate and plot the deterministic population growth rate, \(\lambda\), for each year.

```
<- lambda3(lathmat2ipm)
ipm2lambda <- lambda3(lathmat3ipm)
ipm3lambda
plot(lambda ~ year2, data = ipm2lambda, xlab = "Year", ylab = "Lambda",
ylim = c(0.65, 1.00), type = "l", lwd = 2, bty = "n")
lines(lambda ~ year2, data = ipm3lambda, lwd = 2, lty = 2, col = "red")
legend("bottomleft", c("ahistorical", "historical"), lty = c(1, 2),
col = c("black", "red"), lwd = 2, bty = "n")
```

Ahistorical estimates of \(\lambda\)
are lower than historical estimates, and the historical \(\lambda\) values are more in line with
estimates from the other *Lathyrus* vignettes.

Let’s now look at \(\lambda\) for the mean matrices and compare with the stochastic growth rate, \(a = \text{log} \lambda _{S}\). We will set the number of simulations low in the historical case in order to keep the amount of memory used and computation time low, because the size of the historical matrices will use up plenty of both (the default is 10,000). We will also set the seed for the random number generator to make our results reproducible.

```
lambda3(lath2ipmmean)
#> pop patch lambda
#> 1 1 1 0.8522166
lambda3(lath3ipmmean)
#> pop patch lambda
#> 1 1 1 0.8867451
set.seed(42)
slambda3(lathmat2ipm)
#> pop patch a var sd se
#> 1 1 1 -0.1667325 0.006176625 0.07859151 0.0007859151
set.seed(42)
slambda3(lathmat3ipm, times = 1000)
#> pop patch a var sd se
#> 1 1 1 -0.1261919 0.003141692 0.05605079 0.001772482
```

The historical growth rate is larger than the ahistorical in both deterministic and stochastic analyses, although all four numbers suggest a declining population. Now let’s compare the stable stage distribution from both the ahistorical and historical mean MPMs.

```
<- stablestage3(lath2ipmmean)
ipm2ss <- stablestage3(lath3ipmmean)
ipm3ss <- stablestage3(lathmat2ipm, stochastic = TRUE, seed = 42)
ipm2ss_s <- stablestage3(lathmat3ipm, stochastic = TRUE, times = 1000, seed = 42)
ipm3ss_s
<- cbind.data.frame(ipm2ss$ss_prop, ipm3ss$ahist$ss_prop,
ss_put_together $ss_prop, ipm3ss_s$ahist$ss_prop)
ipm2ss_snames(ss_put_together) <- c("det ahist", "det hist", "sto ahist", "sto hist")
rownames(ss_put_together) <- ipm2ss$stage_id
<- par("lty")
lty.o par(lty = 0)
barplot(t(ss_put_together), beside=T, ylab = "Proportion", xlab = "Stage",
col = c("black", "orangered", "grey", "darkred"), bty = "n")
legend("topright", c("det ahist", "det hist", "sto ahist", "sto hist"),
col = c("black", "orangered", "grey", "darkred"), pch = 15, bty = "n")
```

`par(lty = lty.o)`

Both ahistorical and historical approaches show the stable stage distribution dominated by the second stage, which is the seedling stage. The next highest is the dormant seed stage, followed by the adult dormant stage. Stochastic analysis shows similar patterns, with ahistorical analyses suggesting a greater overall share of the population in the seedling stage.

Let’s now try a sensitivity analysis. We will conduct both deterministic and stochastic analyses of ahistorical and historical MPMs. Because of the size of the hMPMs, we will limit run time of the stochastic sensitivity analysis to 200 time steps (this may take several hours even with reduced time steps). We will also limit our analysis to biologically possible transitions by ignoring the sensitivities of zero elements.

```
<- sensitivity3(lath2ipmmean)
lath2ipmsens <- sensitivity3(lath3ipmmean)
lath3ipmsens
set.seed(42)
<- sensitivity3(lathmat2ipm, stochastic = TRUE)
lath2ipmsens_s set.seed(42)
<- sensitivity3(lathmat3ipm, stochastic = TRUE, steps = 200)
lath3ipmsens_s #> Error: vector memory exhausted (limit reached?)
writeLines("Highest ahistorical deterministic sensitivity is associated with
element ")
#> Highest ahistorical deterministic sensitivity is associated with
#> element
which(lath2ipmsens$ah_sensmats[[1]] == max(lath2ipmsens$ah_sensmats[[1]][which(lath2ipmmean$A[[1]] > 0)]))
#> [1] 273
writeLines("Highest historically-corrected deterministic sensitivity is
associated with element ")
#> Highest historically-corrected deterministic sensitivity is
#> associated with element
which(lath3ipmsens$ah_sensmats[[1]] == max(lath3ipmsens$ah_sensmats[[1]][which(lath2ipmmean$A[[1]] > 0)]))
#> [1] 273
writeLines("Highest ahistorical stochastic sensitivity is associated with
element ")
#> Highest ahistorical stochastic sensitivity is associated with
#> element
which(lath2ipmsens_s$ah_sensmats[[1]] == max(lath2ipmsens_s$ah_sensmats[[1]][which(lath2ipmmean$A[[1]] > 0)]))
#> [1] 273
writeLines("Highest historically-corrected stochastic sensitivity is associated
with element ")
#> Highest historically-corrected stochastic sensitivity is associated
#> with element
which(lath3ipmsens_s$ah_sensmats[[1]] == max(lath3ipmsens_s$ah_sensmats[[1]][which(lath2ipmmean$A[[1]] > 0)]))
#> Error in which(lath3ipmsens_s$ah_sensmats[[1]] == max(lath3ipmsens_s$ah_sensmats[[1]][which(lath2ipmmean$A[[1]] > : object 'lath3ipmsens_s' not found
writeLines("Highest historical deterministic sensitivity is associated with
element ")
#> Highest historical deterministic sensitivity is associated with
#> element
which(lath3ipmsens$h_sensmats[[1]] == max(lath3ipmsens$h_sensmats[[1]][which(lath3ipmmean$A[[1]] > 0)]))
#> Error: vector memory exhausted (limit reached?)
writeLines("Highest historical stochastic sensitivity is associated with
element ")
#> Highest historical stochastic sensitivity is associated with
#> element
which(lath3ipmsens_s$h_sensmats[[1]] == max(lath3ipmsens_s$h_sensmats[[1]][which(lath3ipmmean$A[[1]] > 0)]))
#> Error in which(lath3ipmsens_s$h_sensmats[[1]] == max(lath3ipmsens_s$h_sensmats[[1]][which(lath3ipmmean$A[[1]] > : object 'lath3ipmsens_s' not found
```

Our sensitivity analyses generally suggest that population growth rate
is most sensitive to element 273, which is the third column (273 / 103 =
2.650) and the 67th row and so is associated with the transition from
vegetative dormancy to one of the mid-to-large-sized adult stages.
Deterministic analysis of the hMPM concurs with this (see the
historically-corrected output). Historical deterministic and stochastic
sensitivity analyses both suggest that population growth rate is most
sensitive to changes in element 2, which is the transition from dormant
seed in times *t*-1 and *t* to seedling in time
*t*+1. Historical stochastic sensitivity also shows an importance
to element 105, but this is a biologically impossible transition
requiring the individual to be in the dormant seed and seedling stage
concurrently in time *t*. So, sensitivity analysis strongly
suggests that vegetative dormancy and mid-sized adults have strong
impacts on population growth rate.

We will now move on to elasticity analysis. We will reduce the number of occasion steps simulated in the stochastic historical analysis to reduce the amount of computer processing time (this might take several hours even with the number of steps reduced). We will also remove the sensitivity analyses from memory.

```
rm("lath2ipmsens", "lath3ipmsens", "lath2ipmsens_s", "lath3ipmsens_s")
#> Warning in rm("lath2ipmsens", "lath3ipmsens", "lath2ipmsens_s", "lath3ipmsens_s"): object 'lath3ipmsens_s' not found
<- elasticity3(lath2ipmmean)
lath2ipmelas <- elasticity3(lath3ipmmean)
lath3ipmelas
set.seed(42)
<- elasticity3(lathmat2ipm, stochastic = TRUE)
lath2ipmelas_s set.seed(42)
<- elasticity3(lathmat3ipm, stochastic = TRUE, steps = 200)
lath3ipmelas_s #> Error: vector memory exhausted (limit reached?)
writeLines("\nThe highest ahistorical deterministic elasticity is associated
with element: ")
#>
#> The highest ahistorical deterministic elasticity is associated
#> with element:
which(lath2ipmelas$ah_elasmats[[1]] == max(lath2ipmelas$ah_elasmats[[1]]))
#> [1] 214
writeLines("\nThe highest historically-corrected deterministic elasticity is
associated with element: ")
#>
#> The highest historically-corrected deterministic elasticity is
#> associated with element:
which(lath3ipmelas$ah_elasmats[[1]] == max(lath3ipmelas$ah_elasmats[[1]]))
#> [1] 2
writeLines("\nThe highest ahistorical stochastic elasticity is associated with
element: ")
#>
#> The highest ahistorical stochastic elasticity is associated with
#> element:
which(lath2ipmelas_s$ah_elasmats[[1]] == max(lath2ipmelas_s$ah_elasmats[[1]]))
#> [1] 214
writeLines("\nThe highest historically-corrected stochastic elasticity is
associated with element: ")
#>
#> The highest historically-corrected stochastic elasticity is
#> associated with element:
which(lath3ipmelas_s$ah_elasmats[[1]] == max(lath3ipmelas_s$ah_elasmats[[1]]))
#> Error in which(lath3ipmelas_s$ah_elasmats[[1]] == max(lath3ipmelas_s$ah_elasmats[[1]])): object 'lath3ipmelas_s' not found
writeLines("\nThe highest historical deterministic elasticity is associated
with element: ")
#>
#> The highest historical deterministic elasticity is associated
#> with element:
which(lath3ipmelas$h_elasmats[[1]] == max(lath3ipmelas$h_elasmats[[1]]))
#> Error: vector memory exhausted (limit reached?)
writeLines("\nThe highest historical stochastic elasticity is associated with
element: ")
#>
#> The highest historical stochastic elasticity is associated with
#> element:
which(lath3ipmelas_s$h_elasmats[[1]] == max(lath3ipmelas_s$h_elasmats[[1]]))
#> Error in which(lath3ipmelas_s$h_elasmats[[1]] == max(lath3ipmelas_s$h_elasmats[[1]])): object 'lath3ipmelas_s' not found
```

Ahistorical analyses agree that \(\lambda\) is most elastic in response to
growth from vegetative dormancy to a smaller sized adult (element 214 is
the element in the third column, 214 / 103 = 2.078, and the eighth row,
214 - (103 * 2) = 8). The historical analyses both agree that population
growth rate is most elastic to changes in element 10,717, which is in
the 2nd column and the 108th row and corresponds to the transition from
dormant seed in time *t*-1 and seedling in time *t* to one
of the smallest adult stages in time *t*+1.
Historically-corrected deterministic and stochastic results suggest that
elements 214 and 5 are most crucial, with the latter corresponding to
the transition from dormant seed to extremely small adult (second size
class). Now let’s plot the elasticity of \(\lambda\) to transitions from both
perspectives.

```
<- cbind.data.frame(colSums(lath2ipmelas$ah_elasmats[[1]]),
elas_put_together colSums(lath3ipmelas$ah_elasmats[[1]]), colSums(lath2ipmelas_s$ah_elasmats[[1]]),
colSums(lath3ipmelas_s$ah_elasmats[[1]]))
#> Error in is.data.frame(x): object 'lath3ipmelas_s' not found
names(elas_put_together) <- c("det ahist", "det hist", "sto ahist", "sto hist")
rownames(elas_put_together) <- lath2ipmelas$ah_stages$stage_id
#> Error in `.rowNamesDF<-`(x, value = value): invalid 'row.names' length
<- par("lty")
lty.o par(lty = 0)
barplot(t(elas_put_together), beside=T, ylab = "Elasticity", xlab = "Stage",
col = c("black", "orangered", "grey", "darkred"), bty = "n")
legend("topright", c("det ahist", "det hist", "sto ahist", "sto hist"),
col = c("black", "orangered", "grey", "darkred"), pch = 15, bty = "n")
```

`par(lty = lty.o)`

The plot of these distributions shows the strong importance of vegetative dormancy and small adults. The tallest bars in ahistorical analyses correspond to dormant adults. Historical analyses reduce the importance of dormant adults somewhat, and show a greater importance to slightly larger adults than in the ahistorical case.

Now let’s take a look at the summed elasticities of different kinds of transitions, beginning with a comparison of ahistorical to historically-corrected transitions.

```
<- summary(lath2ipmelas)
lath2elas_sums <- summary(lath3ipmelas)
lath3elas_sums <- summary(lath2ipmelas_s)
lath2elas_sums_s <- summary(lath3ipmelas_s)
lath3elas_sums_s #> Error in summary(lath3ipmelas_s): object 'lath3ipmelas_s' not found
<- cbind.data.frame(lath2elas_sums$ahist[,2],
elas_sums_together $ahist[,2], lath2elas_sums_s$ahist[,2], lath3elas_sums_s$ahist[,2])
lath3elas_sums#> Error in data.frame(..., check.names = FALSE): object 'lath3elas_sums_s' not found
names(elas_sums_together) <- c("det ahist", "det hist", "sto ahist", "sto hist")
rownames(elas_sums_together) <- lath2elas_sums$ahist$category
barplot(t(elas_sums_together), beside=T, ylab = "Elasticity", xlab = "Transition",
col = c("black", "orangered", "grey", "darkred"), bty = "n")
legend("topright", c("det ahist", "det hist", "sto ahist", "sto hist"),
col = c("black", "orangered", "grey", "darkred"), pch = 15, bty = "n")
```

We see similar transition elasticities between ahistorical and historical analyses, with growth the most influential on \(\lambda\) and shrinkage next, while fecundity is the least influential depending on whether the analysis is ahistorical or historical.

Package `lefko3`

also includes functions to conduct
deterministic and stochastic life table response experiments, and two
general projection functions that can be used to project IPMs for
analyses such as quasi-extinction analysis as well as for density
dependent analysis. Users wishing to conduct these analyses should see
our free e-manual called ** lefko3: a gentle
introduction** and other vignettes, including long-format
and video vignettes, on
the projects page
of the Shefferson lab website. Our function-based

We are grateful to two anonymous reviewers whose scrutiny improved the quality of this vignette. The project resulting in this package and this tutorial was funded by Grant-In-Aid 19H03298 from the Japan Society for the Promotion of Science.

Doak, Daniel F., Ellen Waddle, Ryan E. Langendorf, Allison M. Louthan,
Nathalie Isabelle Chardon, Reilly R. Dibner, Douglas A. Keinath, et al.
2021. “A Critical Comparison of Integral Projection and Matrix
Projection Models for Demographic Analysis.” *Ecological
Monographs* 91 (2): e01447. https://doi.org/10.1002/ecm.1447.

Ehrlén, Johan. 1995. “Demography of the Perennial Herb
*Lathyrus Vernus*. I.
Herbivory and Individual Performance.” *Journal
of Ecology* 83 (2): 287–95. https://doi.org/10.2307/2261567.

———. 2000. “The Dynamics of Plant Populations: Does the History of
Individuals Matter?” *Ecology* 81 (6): 1675–84. https://doi.org/10.1890/0012-9658(2000)081[1675:TDOPPD]2.0.CO;2.

———. 2002. “Assessing the Lifetime Consequences of Plant-Animal
Interactions for the Perennial Herb Lathyrus Vernus
(Fabaceae).” *Perspectives in Plant Ecology,
Evolution and Systematics* 5 (3): 145–63. https://doi.org/10.1078/1433-8319-00031.

Ehrlén, Johan, and Ove Eriksson. 1996. “Seedling Recruitment in
the Perennial Herb Lathyrus Vernus.” *Flora*
191 (4): 377–83. https://doi.org/10.1016/S0367-2530(17)30744-2.

Ehrlén, Johan, and Kari Lehtila. 2002. “How Perennal Are Perennial
Plants?” *Oikos* 98: 308–22. https://doi.org/10.1034/j.1600-0706.2002.980212.x.

Ehrlén, Johan, and Zuzana Munzbergova. 2009. “Timing of Flowering:
Opposed Selection on Different Fitness Components and Trait
Covariation.” *The American Naturalist* 173 (6): 819–30.
https://doi.org/10.1086/598492.

Ehrlén, Johan, and Jan Van Groenendael. 2001. “Storage and the
Delayed Costs of Reproduction in the Understorey Perennial
*Lathyrus Vernus*.” *Journal of
Ecology* 89 (2): 237–46. https://doi.org/10.1046/j.1365-2745.2001.00546.x.

Ellner, Stephen P., and Mark Rees. 2006. “Integral Projection
Models for Species with Complex Demography.” *American
Naturalist* 167 (3): 410–28. https://doi.org/10.1086/499438.