Assignability of Reference Pedigree
getAssignCat.Rd
Identify which individuals are SNP genotyped, and which can potentially be substituted by a dummy individual ('Dummifiable').
Arguments
- Pedigree
dataframe with columns id-dam-sire. Reference pedigree.
- SNPd
character vector with ids of genotyped individuals.
- minSibSize
minimum requirements to be considered 'dummifiable':
'1sib' : sibship of size 1, i.e. the non-genotyped individual has at least 1 genotyped offspring. If there is no sibship-grandparent this isn't really a sibship, but can be useful in some situations. Used by
CalcOHLLR
.'1sib1GP': sibship of size 1 with at least 1 genotyped grandparent. The minimum to be potentially assignable by
sequoia
.'2sib': at least 2 siblings, with or without grandparents. Used by
PedCompare
.
.
Value
The Pedigree
dataframe with 3 additional columns,
id.cat
, dam.cat
and sire.cat
, with coding similar to
that used by PedCompare
:
- G
Genotyped
- D
Dummy or 'dummifiable'
- X
Not genotyped and not dummifiable, or no parent in pedigree
Details
It is assumed that all individuals in SNPd
have been
genotyped for a sufficient number of SNPs. To identify samples with a
too-low call rate, use CheckGeno
. To calculate the call rate
for all samples, see the examples below.
Some parents indicated here as assignable may never be assigned by sequoia, for example parent-offspring pairs where it cannot be determined which is the older of the two, or grandparents that are indistinguishable from full avuncular (i.e. genetics inconclusive because the candidate has no parent assigned, and ageprior inconclusive).
Examples
PedA <- getAssignCat(Ped_HSg5, rownames(SimGeno_example))
tail(PedA)
#> id dam sire id.cat dam.cat sire.cat
#> 995 b05187 a04045 b04098 X X X
#> 996 a05188 a04045 b04098 X X X
#> 997 a05189 a04006 b04177 X X X
#> 998 b05190 a04006 b04177 X X X
#> 999 b05191 a04006 b04177 X X X
#> 1000 b05192 a04006 b04177 X X X
table(PedA$dam.cat, PedA$sire.cat, useNA="ifany")
#>
#> D G X
#> D 4 52 0
#> G 8 232 24
#> X 0 64 616
# calculate call rate
if (FALSE) {
CallRates <- apply(MyGenotypes, MARGIN=1,
FUN = function(x) sum(x!=-9)) / ncol(MyGenotypes)
hist(CallRates, breaks=50, col="grey")
GoodSamples <- rownames(MyGenotypes)[ CallRates > 0.8]
# threshold depends on total number of SNPs, genotyping errors, proportion
# of candidate parents that are SNPd (sibship clustering is more prone to
# false positives).
PedA <- getAssignCat(MyOldPedigree, rownames(GoodSamples))
}