Check Data for Duplicates.
DuplicateCheck.Rd
Check the genotype and life history data for duplicate IDs (not permitted) and duplicated genotypes (not advised), and count how many individuals in the genotype data are not included in the life history data (permitted). The order of IDs in the genotype and life history data is not required to be identical.
Arguments
- GenoM
matrix with genotype data, size nInd x nSnp.
- FortPARAM.dup
list with Fortran-ready parameter values, as generated by
MkFortParams
.- quiet
suppress messages.
Value
A list with one or more of the following elements:
- DupGenoID
Dataframe, row numbers of duplicated IDs in genotype data. Please do remove or relabel these to avoid downstream confusion.
- DupGenotype
Dataframe, duplicated genotypes (with or without identical IDs). The specified number of maximum mismatches is allowed, and this dataframe may include pairs of closely related individuals. Mismatch = number of SNPs at which genotypes differ, LLR = likelihood ratio between 'self' and most likely non-self.
See also
CheckLH
, which performs the check for duplicated IDs
in the life history data, as well as for IDs (in genotype data) for which
no life history data is provided.