read.all {seq2R}R Documentation

Read FASTA and GBK formatted files

Description

Read nucleic acid sequences from a file in FASTA or GBK format.

Usage

read.all(file = system.file(""), seqtype = "DNA")

Arguments

file

The name of the file which the sequences in FASTA or GBK format are to be read from.

seqtype

The nature of the sequence. Nowadays only DNA, in further updates it will be possible to use for different type of sequences.

Details

Fasta is a widely used format in molecular biology. Sequence in FASTA format starts with a single-line description, distinguished by a greater-than ‘>’ symbol, followed by sequence data on the next lines.

'GenBank' format files have the extension GBK, by convention. Files contain fields with different types of information well-labeled. The header of the file has information describing the sequence, such as its type, shape, length and source. The features of the genome sequence follow the header, and include protein translations. The DNA sequence is the last element of the file, which ends with (and must include) a soluble slash. Complete genomes in this format are available at the https://ftp.ncbi.nlm.nih.gov/genbank/.

Value

Sequence

The returned list has a component Sequence containing the DNA sequence taken from the field “ORIGIN” in GenBank. The sequence is a vector of single characters.

Locus or accession

the returned list has a component Locus/Accession containing the names of the locus or accession number taken from the field “LOCUS” or “ACCESSION” in 'GenBank'. Also, return sequence length.

Author(s)

Nora M. Villanueva and Marta Sestelo.

Examples

library(seq2R)
data(mtDNAhum)
## Not run: 
data<-read.all("file.fasta")
data<-read.all("file.gbk")

## End(Not run)

[Package seq2R version 2.0.1 Index]