Introduction

This tutorial demonstrates the major functions used within the Shiny application provided by the nprcgenekeepr package and provides sufficient insight into those functions that they may be used independently.

This tutorial is primarily directed toward someone with experience using R who wants to better understand how the Shiny application works or to perform some actions not directly supported by the Shiny application.

Please provide any comments, questions, or bug reports through the GitHub issue tracker at .

Installation and Help

You can install nprcgenekeepr from GitHub with the following code.

install.packages(nprcgenekeepr)
## Use the following code to get the development version
# install.packages("devtools")
# devtools::install_github("rmsharp/nprcgenekeepr")

All missing dependencies should be automatically installed.

You can get help from the R console with

?nprcgenekeepr

The help provided by this (nprcgenekeepr.R) needs to be more complete and include links to the tutorials.

Reading in a Pedigree

A pedigrees can be imported using either Excel worksheets or text files that contain all of the pedigree information or using either Excel worksheets or text files that contain a list of focal animals with the remainder of the pedigree information is pulled in through the LabKey API.

This tutorial will use a pedigree file that can be created using the makeExamplePedigreeFile function as shown below. The function makeExamplePedigreeFile both saves a file and returns the full path name to the saved file, which we are saving into the variable pedigreeFile. Note: the user will select where to store the file.

library(nprcgenekeepr)
pedigreeFile <- makeExamplePedigreeFile()

This writes ExamplePedigree.csv to a place you select within your file system.

You use the file name provided by the makeExamplePedigreeFile function to tell read.table what file to read.

breederPedCsv <- read.table(pedigreeFile, sep = ",", header = TRUE,
                            stringsAsFactors = FALSE)

Note the number of rows read. Each row represents an individual within the pedigree.

nrow(breederPedCsv)
## [1] 3694

The next step is to put the information read from the file into a pedigree object. This is done with the qcStudbook function, which examines the file contents and tests for common pedigree errors.

You can see the errors that can be detected by qcStudbook by returning the empty error list with getEmptyErrorLst(). We are not showing the output of the function call now because later in this tutorial we will explore errors in more depth.

qcStudbook can take four arguments sb, minParentAge (in years), reportChanges, and reportErrors. However, all but sb have default values and only the sb argument is required.

It is prudent to ensure that parents are at least of breeding age, which is species specific. I have used a minParentAge of 2 years.1

breederPed <- qcStudbook(breederPedCsv, minParentAge = 2)

If qcStudbook reports an error, change the call by adding the reportErrors argument set to TRUE and examine the returned object. More on this is presented in the Pedigree Errors section.

Identifying Focal Animals

You may want to focus your work on a focal group of animals. This can be done by reading in a list of animal IDs that make up the focal group and use that list to update the pedigree. Alternatively you can created a list of animal IDs based on criteria you have selected.

For example, to select living animals at the facility with at least one parent, the following can be used.

focalAnimals <- breederPed$id[!(is.na(breederPed$sire) & 
                                  is.na(breederPed$dam)) &
                                is.na(breederPed$exit)]
print(stri_c("There are ", length(focalAnimals), 
             " animals in the vector _focalAnimals_."))

[1] “There are 327 animals in the vector focalAnimals.”

As can be seen, these animals have at least one parent and have not left the facility.

breederPed[breederPed$id %in% 
             focalAnimals, c("id", "sire", "dam", "exit")][1:10, ]

We indicate that these are the animals of interest by using the setPopulation function. This function simply sets a column named population2 to the logical value of TRUE if the row represents an animal in the list and FALSE otherwise.

The first line of code below sets the population column and the second counts the number of rows where the value was set to TRUE.

breederPed <- setPopulation(ped = breederPed, ids = focalAnimals)
nrow(breederPed[breederPed$population, ])
## [1] 327

The IDs used to populate the population flag can be used to trim the pedigree so that it contains only those individuals who are in the ID list or are ancestors of those individuals.

trimmedPed <- trimPedigree(focalAnimals, breederPed)
nrow(breederPed); nrow(trimmedPed)
## [1] 3694
## [1] 704

The trimPedigree function has the ability to remove those ancestors that do not contribute genetic information. Uninformative founders are those individuals who are parents of only one individual and who have no parental information. (Currently genotypic information is ignored by trimPedigree).

trimmedPedInformative <- trimPedigree(focalAnimals, breederPed,
                                      removeUninformative = TRUE)
nrow(trimmedPedInformative)
## [1] 509

We can find all of the animals that are in the trimmed pedigree but are not focal animals.

nonfocalInTrimmedPed <- trimmedPed$id[!trimmedPed$id %in% focalAnimals]
length(nonfocalInTrimmedPed)
## [1] 377

We can see which of these 377 are and are not parents. We will first make sure we have all of the parents by getting our list of parents from the entire pedigree. We then demonstrate that they are all in the trimmed pedigree.

allFocalParents <- c(breederPed$sire[breederPed$id %in% focalAnimals], 
                       breederPed$dam[breederPed$id %in% focalAnimals])
trimmedFocalParents <- c(trimmedPed$sire[trimmedPed$id %in% focalAnimals], 
                       trimmedPed$dam[trimmedPed$id %in% focalAnimals])
all.equal(allFocalParents, trimmedFocalParents) # Are the IDs the same?
## [1] TRUE

However, not all of the animals in the trimmed pedigree are either the focal animals or their parents. They are more distant ancestors as we will show.

notFocalNotParent <- 
  trimmedPed$id[!trimmedPed$id %in% c(focalAnimals, allFocalParents)]
length(notFocalNotParent)
## [1] 187

Since the trimming process is supposed to retain the focal animals and their ancestors, we will leave it as an exercise for you to demonstrate that at least some of the remaining animals are grandparents of the focal animals. Hint: there are 490 grandparents in both the trimmed and the complete pedigree.

As you can see from the number of rows in the full pedigree (3694) versus the trimmed pedigree (704), trimmed pedigrees can be much smaller. Of the additional 377 animals, 182 provide genetic information while the others (195) are genetically uninformative.

As is shown below only 4 (0ZX29Q, 1QBKW9, 5PWJ0G, and Y3CJ5A) living animals are still in the colony but are not in the trimmed pedigree.3

unknownBirth <- breederPed$id[is.na(breederPed$birth)]
knownExit <- breederPed$id[ !is.na(breederPed$exit)]
unknownBirthKnownExit <- 
  breederPed$id[is.na(breederPed$birth) | !is.na(breederPed$exit)]
knownPed <- breederPed[!breederPed$id %in% unknownBirthKnownExit, ]
otherIds <- knownPed$id[!knownPed$id %in% trimmedPed$id[is.na(trimmedPed$exit)]]
print(stri_c("The living animals in the pedigree that are not in the trimmed ",
             "pedigree are ", get_and_or_list(otherIds), "."))

[1] “The living animals in the pedigree that are not in the trimmed pedigree are 0ZX29Q, 1QBKW9, 5PWJ0G, and Y3CJ5A.”

Age Sex Pyramid Plot

You can examine the population structure using an age-sex pyramid plot with a single function. We will limit our view to just the focal animals and their living relatives. This is appropriate for colony management because in addition to the genetic diversity we seek, we have to remain cognizant of the age and sex distributions within the colonies we manage.

getPyramidPlot(ped = trimmedPed[is.na(trimmedPed$exit), ])

## 45 45
## [1] 5.1 4.1 4.1 2.1

Genetic Value Analysis

Your genetic value analysis must be carefully performed. The next three commands set up the entire pedigree for analysis. The first of these three commands set all of the pedigree members to be part of the population of interest by setting the population column to TRUE for all individuals.

ped <- setPopulation(breederPed, NULL)

Note that a new pedigree object (ped) is being created.

probands <- ped$id[ped$population]
ped <- trimPedigree(probands, ped, removeUninformative = FALSE,
                    addBackParents = FALSE)

The arguments to reportGV are all optional except for ped, but you may often want to non-default values.

geneticValue <- reportGV(ped, guIter = 50,
                         guThresh = 3,
                         byID = TRUE,
                         updateProgress = NULL)
summary(geneticValue)
## The genetic value report 
## Individuals in Pedigree: 3694 
## Male Founders: 141
## Female Founders: 122
## Total Founders: 263 
## Founder Equivalents: 241.84 
## Founder Genome Equivalents: 163.62 
## Live Offspring: 4052 
## High Value Individuals: 2957 
## Low Value Individuals: 737

What happens if we limit our analysis to the trimmed pedigree? Remember that the trimmed pedigree still contains all of the genetic information that the full pedigree has for the focal animals.

trimmedGeneticValue <- reportGV(trimmedPed, guIter = 50,
                         guThresh = 3,
                         byID = TRUE,
                         updateProgress = NULL)
summary(trimmedGeneticValue)
## The genetic value report 
## Individuals in Pedigree: 327 
## Male Founders: 3
## Female Founders: 17
## Total Founders: 20 
## Founder Equivalents: 109.67 
## Founder Genome Equivalents: 47.44 
## Live Offspring: 321 
## High Value Individuals: 223 
## Low Value Individuals: 104

It is clear that limiting your analysis to the animals of interest reduces the effort required to examine the animals of interest.

Detailed look at the Genetic Value Report object

The names of the object within the genetic value report object (trimmedGeneticValue) can be listed as shown in the next line of code.

names(trimmedGeneticValue)
##  [1] "report"          "kinship"         "gu"              "fe"             
##  [5] "fg"              "maleFounders"    "femaleFounders"  "nMaleFounders"  
##  [9] "nFemaleFounders" "total"

The report object (an R dataframe) can in-turn be examined.

names(trimmedGeneticValue$report) ## column names
##  [1] "id"              "sex"             "age"             "birth"          
##  [5] "exit"            "population"      "origin"          "indivMeanKin"   
##  [9] "zScores"         "gu"              "totalOffspring"  "livingOffspring"
## [13] "value"           "rank"
nrow(trimmedGeneticValue$report) ## Number of rows
## [1] 327

The report is more conveniently used as a separate object. The next section of code rounds some of the numerical values and converts all columns to characters for display as a table where only the first 10 lines are included.

rpt <- trimmedGeneticValue[["report"]]
rpt$indivMeanKin <- round(rpt$indivMeanKin, 5)
rpt$zScores <- round(rpt$zScores, 2)
rpt$gu <- round(rpt$gu, 5)
rpt <- toCharacter(rpt)
names(rpt) <- headerDisplayNames(names(rpt))
knitr::kable(rpt[1:10, ]) # needs more work for display purposes.
Ego ID Sex Age (in years) Birth Date Exit Date Breeding Colony Member Origin Individual Mean Kinship Z-score (Mean Kinship) Genome Uniqueness (%) Total Offspring Living Offspring Value Designation Rank
KZM9RB M 30.1 1989-05-03 NA TRUE 0.00329 -1.90 90 0 0 High Value 1
CLSVU6 F 23.9 1995-08-02 NA TRUE 0.00287 -1.97 87 1 1 High Value 2
1SPLS8 F 7.9 2011-07-26 NA TRUE 0.00373 -1.83 83 0 0 High Value 3
WK89I9 F 21.1 1998-05-26 NA TRUE 0.00582 -1.49 78 0 0 High Value 4
8YP6PA M 5.0 2014-07-04 NA TRUE 0.00485 -1.65 76 0 0 High Value 5
01QRQ4 F 18.2 2001-04-04 NA TRUE 0.00373 -1.83 75 0 0 High Value 6
IZDV8K M 7.7 2011-09-29 NA TRUE 0.00480 -1.66 74 0 0 High Value 7
3MMZD4 M 12.2 2007-03-24 NA TRUE 0.00536 -1.57 73 0 0 High Value 8
G2GYST M 19.1 2000-05-16 NA TRUE 0.00488 -1.64 71 2 2 High Value 9
CHK1ZX F 7.8 2011-09-06 NA TRUE 0.00481 -1.66 70 0 0 High Value 10

We start the next lines of code by getting a fresh copy of the genetic value report since we changed all of the numeric values to characters in the last section to print the table. These lines demonstrate one way of extracting the component objects from the genetic value object.

rpt <- trimmedGeneticValue[["report"]]
kmat <- trimmedGeneticValue[["kinship"]]
f <- trimmedGeneticValue[["total"]]
mf <- trimmedGeneticValue[["maleFounders"]]
ff <- trimmedGeneticValue[["femaleFounders"]]
nmf <- trimmedGeneticValue[["nMaleFounders"]]
nff <- trimmedGeneticValue[["nFemaleFounders"]]
fe <- trimmedGeneticValue[["fe"]]
fg <- trimmedGeneticValue[["fg"]]

It is informative to examine the distribution of genetic uniqueness, mean kinship, and z-scores (normalized mean kinship values).

Creation of the boxplot for the genetic uniqueness values is shown below.

gu <- rpt[, "gu"]
guBox <- ggplot(data.frame(gu = gu), aes(x = "", y = gu)) +
  geom_boxplot(
    color = "darkblue",
    fill = "lightblue",
    notch = TRUE,
    outlier.color = "red",
    outlier.shape = 1
  ) +
  theme_classic() + geom_jitter(width = 0.2) + coord_flip() +
  ylab("Score")  + ggtitle("Genetic Uniqueness")
print(guBox)