Introduction to sequoia
sequoia.Rmd
What you need
-
Genotype data for a couple hundred SNPs (<1000);
ideally with
- high minor allele frequency,
- low missingness,
- low genotyping error rate, and
- in low LD, but real-world data usually works well too. The data format is 1 row per individual, 1 column per SNP, coded as 0/1/2 copies of the reference allele. Conversion from a few other formats is supported.
- Sex and birth year information for as many individuals as possible (‘birth year’: discrete time unit of birth/hatching/germination/…). If the exact year is unknown, minimum and/or maximum possible years can be provided.
- R: can be downloaded from CRAN.
- (optional) Rstudio: for a more user-friendly R experience
Note that if individuals are on average typed at only 30% of say 200 SNPs, this means that a random pair share only \(0.3*0.3*200 = 18\) SNPs, which is far too few for reliable pedigree reconstruction. As ballpark figures, at least 50-100 SNPs are required for parentage assignment, and 200-400 for full pedigree reconstruction. Simple pedigrees with discrete generations require fewer SNPs than complex pedigrees with inbreeding.
Download & installation
Current version
To install the most recent thoroughly checked version of
sequoia
, simply open R and run the command
install.packages("sequoia")
or install from github:
Other versions
The source files from previous versions are archived on CRAN. To install these you need a Fortran compiler, which is not by default installed on all computers. You can find binary (already compiled) files for Windows and MacOS of most versions in my github archive. You can install these by downloading them to your hard disk and then in R run:
install.packages("C:/path/to/file/sequoia_2.5.6.zip", repos=NULL)
On github you can also find a browsable directory of the latest development version. It can be installed with
devtools::install_github("JiscaH/sequoia")
# or
remotes::install_github("JiscaH/sequoia")
This requires a Fortran compiler (which for windows I think is included in the devtools package).
Reconstruct
The function to perform pedigree reconstruction is also called
sequoia()
:
# load the package
library(sequoia)
# run pedigree reconstruction on example data included in the pkg
SeqOUT <- sequoia(GenoM = SimGeno_example,
LifeHistData = LH_HSg5,
Err = 0.005, # genotyping error rate
Module="ped",
quiet="verbose",
Plot=TRUE)
# the result is a list with the pedigree, run parameters,
# and various other elements.
# graphical summary of results
SummarySeq(SeqOUT)
For an overview of the output of various functions included in the package, see the flow chart (function names in rectangles) and this mock report.
Access help files
In R, help for each function is available via
?functionname
, e.g. ?sequoia
. An overview of
all help files and other documentation in a package is available via
help(package="sequoia")
.
The ‘See Also’ section near the end of the sequoia()
help file has a list with the main functions in the package, for those
who tend to forget function names (like me).