Skip to contents

This report was created with the various griffin example files included with the package. The markdown file can be found at [https://github.com/JiscaH/sequoiaExtra] .


This pedigree was reconstructed using the R package sequoia.

The R package sequoia reconstructs the pedigree in two steps: first genotyped parents are assigned to genotyped offspring (‘parentage assignment’), followed by clustering of siblings sharing a non-genotyped parent and assignment of grandparents (‘full pedigree reconstruction’).


Comments

This is a overview of the pedigree reconstruction for Griffins born 2001-2010.


Input summary

Parameter settings

kable(as.data.frame(t(SeqOUT$Specs)), booktabs=TRUE) %>%
  kable_styling(full_width = FALSE)
Specs
NumberIndivGenotyped 142
NumberSnps 400
GenotypingErrorRate 1e-04
MaxMismatchDUP 3
MaxMismatchOH 2
MaxMismatchME 2
Tfilter -2
Tassign 0.5
nAgeClasses 7
MaxSibshipSize 100
Module ped
DummyPrefixFemale F
DummyPrefixMale M
Complexity full
Herm no
UseAge yes
CalcLLR TRUE
ErrFlavour version2.0
SequoiaVersion 2.3.18
TimeStart 2022-12-17 12:33:09
TimeEnd 2022-12-17 12:33:33

Genotyping error: probability that actual genotype ‘act’ (rows) is observed as genotype ‘obs’ (columns).

kable(SeqOUT$ErrM,
      caption = 'Presumed genotyping error rate', booktabs=TRUE) %>%
  kable_styling(full_width = FALSE)
Presumed genotyping error rate
obs-0|act obs-1|act obs-2|act
act-0 0.99990 0.0001 0.00000
act-1 0.00005 0.9999 0.00005
act-2 0.00000 0.0001 0.99990
# Note: this error matrix can be fully customised, see ?ErrToM

Genetic data

if (!all(GenoM %in% c('NA', NA))) {
  SnpStats(GenoM)
} else {
  cat('No genotype matrix provided for this report. \n')
}

# Note1: SNPs with missingness >=90% (scored for <10% of individuals) are 
#       automatically excluded.
# Note2: Higher MAF is better. Monomorphic SNPs are automatically excluded.
# Note3: Very strong departure from HWE may affect pedigree reconstruction. 

Life history data

In sequoia, sex is coded as 1=female, 2=male, 3=unknown, 4=hermaphrodite.

table(Sex = SeqOUT$LifeHist$Sex)
## Sex
##   1   2 
##  95 105

Unknown birth years may hinder pedigree reconstruction. Among others, they are used to determine which individual is the parent, and which the offspring in genetically identified parent-offspring pairs.

LH <- SeqOUT$LifeHist

table('Birth Year' = factor(LH$BirthYear <0, levels = c(FALSE, TRUE), 
                            labels = c('known', 'missing')),
      'min/max Birth Year' = factor(LH$BY.min <0 & LH$BY.max <0, 
                                    levels = c(FALSE, TRUE), 
                                    labels = c('known', 'missing')))
##           min/max Birth Year
## Birth Year known missing
##    known       0     200
##    missing     0       0
hist(LH$BirthYear, 
     breaks = c(min(LH$BirthYear, na.rm=TRUE) : (max(LH$BirthYear, na.rm=TRUE)+1)) -.5,
     main = 'Distribution of birth years', xlab='')

Age distribution prior

The ‘age prior’ specifies the minimum and maximum age of parents, and the age difference distribution between siblings.

By default, for parentage assignment a flat prior is used with maximum parental age equal to the largest age difference between genotyped individuals. The maximum age for dams and sires can also be specified, as can discrete versus overlapping generations. The distribution can also be fully customised.

This age difference distribution is updated after parentage assignment and before full pedigree reconstruction (see further).

if (!all(GenoM %in% c('NA', NA))) {
  LH_a <- LH[LH$ID %in% rownames(GenoM),]
} else {
  LH_a <- LH
}
AP_IN <- do.call(MakeAgePrior, c(list(Pedigree = NULL,
                                      LifeHistData = LH_a),
                                 SeqOUT$args.AP))
## Ageprior: Flat 0/1, overlapping generations, MaxAgeParent = 10,10

# Note: Details about and help on the age prior can be found in 
#       vignette("Sequoia - Age 'Prior'")
SeqOUT$args.AP
## $Flatten
## NULL
## 
## $Smooth
## [1] TRUE

Output summary

Pedigree summary

Sibling clusters sharing a non-genotyped parent are assigned a ‘dummy’ parent. Via grandparent assignment to sibling clusters, parents are assigned to these dummy individuals. These grandparents may be both genotyped or dummy individuals.

Identifying the real non-genotyped individual corresponding to each dummy individual may not always be possible, but the sequoia website offers some suggestions when candidates are known.

sumry <- SummarySeq(SeqOUT, Plot=FALSE)
PlotSeqSum(sumry, SeqOUT$Pedigree, Panels='all', ask=FALSE)

The distributions of the number of opposing homozygous loci and other Mendelian errors in parent-offspring pairs and parent-parent-offspring trios give a rough impression of the genotyping error rate. These would always be zero in absence of any genotyping errors (and in absence of pedigree errors).

The distributions of the log10 likelihood ratios give a rough impression of the power of the genetic data to distinguish between different types of relationships and resolve the pedigree. Note that this is the likelihood ratio between the assigned parent being the parent versus it being another type of close relative, such as a full sibling of the focal individual or the true parent. It is not relative to other candidate parents of the focal individual.

kable(sumry$PedSummary, booktabs=TRUE) %>%
  kable_styling(full_width = FALSE)
All SNPd
records 167 142
maternities 122 65
paternities 116 79
full sibs 5 5
maternal half sib 129 89
paternal half sibs 103 77
maternal grandmothers 89 77
maternal grandfathers 84 75
paternal grandmothers 69 58
paternal grandfathers 66 55
maximum pedigree depth 8 7
founders 36 39

The size, depth and interconnectedness of the pedigree affect the power with which the pedigree can be used in subsequent analyses, such as heritability estimates.

Age distribution

After assignment of genotyped parents to genotyped offspring, the reconstructed pedigree is combined with the provided birth year information to estimate the distribution of age differences among mother-offspring, father-offspring, and sibling pairs. This age distribution is then used during further pedigree reconstruction.

PlotAgePrior(SeqOUT$AgePriors)

Estimated birth years

When a birth year is unknown, it is estimated from combining the above distribution of parental ages with any known birth years of its assigned parents and offspring.

LH_new <- SeqOUT$LifeHistSib
LH_new[LH_new$BirthYear < 0, ]
## [1] id        Sex       BirthYear BY.min    BY.max    Sexx      BY.est   
## [8] BY.lo     BY.hi    
## <0 rows> (or 0-length row.names)

Inferred sex

When the sex of an individual is missing from the input, it may be inferred during pedigree reconstruction when this individual forms a complementary parent pair with an individual of known sex (in species without hermaphrodites).

LH_new[LH_new$Sex == 3 & LH_new$Sexx != 3, ]
## [1] id        Sex       BirthYear BY.min    BY.max    Sexx      BY.est   
## [8] BY.lo     BY.hi    
## <0 rows> (or 0-length row.names)

Non-assigned likely relatives

The sequoia algorithm is rather conservative when making assignments; it sequentially ‘grows’ the pedigree, and tries to avoid a snowball effect of assignment errors. So, when no parent or siblings are assigned, this does not necessarily mean these are not present in the dataset.

The R package includes a separate function to identify pairs of likely relatives that have not been assigned, which may be due to a variety of reasons.

if (!all(Maybe %in% c('NA', NA))) {
  MaybePO <- GetRelM(Pairs=Maybe$MaybePar)
  PlotRelPairs(MaybePO)
  if ('MaybeRel' %in% names(Maybe)) {
    MaybeM <- GetRelM(Pairs=Maybe$MaybeRel)
    PlotRelPairs(MaybeM)
  }
} else if (!all(GenoM %in% c('NA', NA))) {
  Maybe <- GetMaybeRel(GenoM, SeqList = SeqOUT, Module = 'ped', quiet=TRUE)
  MaybeM <- GetRelM(Pairs=Maybe$MaybeRel)
  PlotRelPairs(MaybeM)
} else {
  cat("No 'maybe relatives' provided for this report. \n")
}

Confidence probabilities

The assignment accuracy is estimated from simulations. These simulations make several simplifying assumptions, and these numbers are therefore a lower bound for the confidence probability.

The probability is not estimated for each individual separately. Instead, they are grouped across a few categories, based on whether the parent is dam or sire, genotyped or a dummy, and whether there is a co-parent or not. It is also separated for genotyped versus dummy focal individuals.

if (!all(Conf %in% c('NA', NA))) {
  kable(Conf$ConfProb,
        caption = 'parent-pair confidence, per category (Genotyped/Dummy/None)')
} else {
  cat('No confidence probabilities provided for this report. \n')
}
parent-pair confidence, per category (Genotyped/Dummy/None)
id.cat dam.cat sire.cat dam.conf sire.conf pair.conf N
14 G G G 1.0000000 1.0000000 1.0000000 568
13 G G D 1.0000000 1.0000000 1.0000000 277
15 G G X 1.0000000 NA NA 227
11 G D G 0.9841270 1.0000000 0.9841270 252
10 G D D 1.0000000 1.0000000 1.0000000 88
12 G D X 1.0000000 NA NA 49
17 G X G NA 1.0000000 NA 197
16 G X D NA 1.0000000 NA 38
18 G X X NA NA NA 804
5 D G G 0.9914530 0.9914530 0.9914530 117
4 D G D 1.0000000 1.0000000 1.0000000 9
6 D G X 1.0000000 NA NA 34
2 D D G 0.8888889 1.0000000 0.8888889 9
1 D D D NaN NaN NaN 0
3 D D X 0.7777778 NA NA 9
8 D X G NA 0.9705882 NA 34
7 D X D NA 1.0000000 NA 3
9 D X X NA NA NA 100

Further details

Likelihood curve

The total likelihood is the probability of observing the genetic data, given the inferred pedigree and the presumed genotyping error rate. The initial likelihood is the probability if all genotyped individuals were unrelated and sampled from a large population in Hardy-Weinberg Equilibrium. The likelihood increases during pedigree reconstruction, and pedigree reconstruction is terminated when the total likelihood asymptotes.

nIt <- c(par = length(SeqOUT$TotLikPar),
         ped = length(SeqOUT$TotLikSib))
with(SeqOUT, plot(1:nIt[1], TotLikPar, type="b", lwd=2, col="forestgreen",
                  xlim=c(1, sum(nIt)-1), xlab="Iteration (ped)", xaxt='n', cex.lab=1.2,
                  ylim=c(min(TotLikPar), max(TotLikSib)), ylab="Total log-likelihood"))
with(SeqOUT, lines((nIt[1]-1) + 1:nIt[2], TotLikSib, type="b", lwd=2))
axis(1, at=1, labels = 'HWE', cex.axis=1.2, col.axis='darkgrey')
axis(1, at=(nIt[1]+1)/2, labels = 'par', lwd.ticks=0, col.axis='forestgreen', 
     cex.axis=1.2)
axis(1, at=(nIt[1]):(nIt[1]+nIt[2]), labels=0:nIt[2], cex.axis=1.2)