Skip to contents

Check that the provided genotype matrix is in the correct format, and check for low call rate samples and SNPs.


  quiet = FALSE,
  Plot = FALSE,
  Return = "GenoM",
  Strict = TRUE,
  DumPrefix = c("F0", "M0")



the genotype matrix.


suppress messages.


display the plots of SnpStats.


either 'GenoM' to return the cleaned-up genotype matrix, or 'excl' to return a list with excluded SNPs and individuals (see Value).


Exclude any individuals genotyped for <5 genotyped for <5 up to version 2.4.1. Otherwise only excluded are (very nearly) monomorphic SNPs, SNPs scored for fewer than 2 individuals, and individuals scored for fewer than 2 SNPs.


length 2 vector, to check if these don't occur among genotyped individuals.


If Return='excl' a list with, if any are found:


SNPs scored for <10 excluded when running sequoia


monomorphic (fixed) SNPs; automatically excluded when running sequoia. This includes nearly-fixed SNPs with MAF \(= 1/2N\). Column numbers are *after* removal of ExcludedSNPs, if any.


Individuals scored for <5 reliably included during pedigree reconstruction. Individual call rate is calculated after removal of 'Excluded SNPs'


SNPs scored for 10 recommended to be filtered out


individuals scored for <50 recommended to be filtered out

When Return='excl' the return is invisible, i.e. a check is run and warnings or errors are always displayed, but nothing may be returned.


Appropriate call rate thresholds for SNPs and individuals depend on the total number of SNPs, distribution of call rates, genotyping errors, and the proportion of candidate parents that are SNPd (sibship clustering is more prone to false positives). Note that filtering first on SNP call rate tends to keep more individuals in.

See also

SnpStats to calculate SNP call rates; CalcOHLLR to count the number of SNPs scored in both focal individual and parent.


GenoM <- SimGeno(Ped_HSg5, nSnp=400, CallRate = runif(400, 0.2, 0.8))
# the quick way:
GenoM.checked <- CheckGeno(GenoM, Return="GenoM")
#> Warning:  There are 178 SNPs scored for <50% of individuals 
#> There are  1000  individuals and  400  SNPs.

# the user supervised way:
Excl <- CheckGeno(GenoM, Return = "excl")
#> Warning:  There are 178 SNPs scored for <50% of individuals 
#> There are  1000  individuals and  400  SNPs.
GenoM.orig <- GenoM   # make a 'backup' copy
if ("ExcludedSnps" %in% names(Excl))
  GenoM <- GenoM[, -Excl[["ExcludedSnps"]]]
if ("ExcludedSnps-mono" %in% names(Excl))
  GenoM <- GenoM[, -Excl[["ExcludedSnps-mono"]]]
if ("ExcludedIndiv" %in% names(Excl))
  GenoM <- GenoM[!rownames(GenoM) %in% Excl[["ExcludedIndiv"]], ]

# warning about  SNPs scored for <50% of individuals ?
# note: this is not necessarily a problem, and sometimes unavoidable.
SnpCallRate <- apply(GenoM, MARGIN=2,
                     FUN = function(x) sum(x!=-9)) / nrow(GenoM)
hist(SnpCallRate, breaks=50, col="grey")

GenoM <- GenoM[, SnpCallRate > 0.6]

# to filter out low call rate individuals: (also not necessarily a problem)
IndivCallRate <- apply(GenoM, MARGIN=1,
                       FUN = function(x) sum(x!=-9)) / ncol(GenoM)
hist(IndivCallRate, breaks=50, col="grey")

GoodSamples <- rownames(GenoM)[ IndivCallRate > 0.8]