vcf2diem {diemr} | R Documentation |
Reads vcf files and writes genotypes of the most frequent alleles based on chromosome positions to diem format.
vcf2diem(SNP, filename, chunk = 1L, requireHomozygous = TRUE, ...)
SNP |
character vector with a path to the '.vcf' or '.vcf.gz' file, or an |
filename |
character vector with a path where to save the converted genotypes. |
chunk |
numeric indicating by how many markers should the result be split into
separate files. |
requireHomozygous |
logical whether to require the marker to have at least one homozygous individual for each allele. |
... |
additional arguments. |
Importing vcf files larger than 1GB, and those containing multiallelic
genotypes is not recommended. Instead, the path to the
vcf file in SNP
reads the file line by line, and might be a solution for
very large and complex genomic datasets.
The number of files \code{vcf2diem} creates depends on the \code{chunk} argument and class of the \code{SNP} object. * When \code{chunk = 1}, one output file will be created. * Values of \code{chunk < 100} are interpreted as the number of files into which to split data in \code{SNP}. For \code{SNP} object of class \code{vcfR}, the number of markers per file is calculated from the dimensions of \code{SNP}. When class of \code{SNP} is \code{character}, the number of markers per file is approximated from a model with a message. If this number of markers per file is inappropriate for the expected output, provide the intended number of markers per file in \code{chunk} greater than 100. \code{vcf2diem} will scan the whole input in the \code{SNP} file, creating additional output files until the last line in \code{SNP} is reached. * Values of \code{chunk >= 100} mean that each output file in diem format will contain \code{chunk} number of lines with the data in \code{SNP}. When the vcf file contains markers non-informative for genome polarisation, those those are removed and listed in a file *omittedLoci.txt* in the working directory. The omitted loci are identified by their information in the CHROM and POS columns. The CHROM and POS information for loci included in the converted files are in *includedLoci.txt*.
No value returned, called for side effects.
Natalia Martinkova
Filip Jagos 521160@mail.muni.cz
Jachym Postulka 506194@mail.muni.cz
## Not run:
# vcf2diem will write files to a working directory or a specified folder
# make sure the working directory or the folder are at a location with write permission
myofile <- system.file("extdata", "myotis.vcf", package = "diemr")
myovcf <- vcfR::read.vcfR(myofile)
vcf2diem(SNP = myofile, filename = "test1")
vcf2diem(SNP = myofile, filename = "test2", chunk = 3)
vcf2diem(SNP = myovcf, filename = "test3")
vcf2diem(SNP = myovcf, filename = "test4", chunk = 3)
## End(Not run)