BioInterchange makes use of several external ontologies, i.e. ontologies that were developed outside the scope of the BioInterchange project. We are always happy to hear about ontologies that we should consider incorporating into BioInterchange and we welcome contributions of ontologies that permit including more file formats for RDF conversion.
External ontologies used in BioInterchange:
One particular ontology — the Genomic Feature and Variation Ontology — is being used for converting GFF3, GTF, GVF, and VCF file formats into RDF as described below.
The Genomic Feature and Variation Ontology (GFVO) enables the RDFization of genomic feature and variation data. It particularly addresses the representation of data in GFF3, GVF, GTF, and VCF formats.
The ontology is available under the CC0 1.0 Universal license, and therefore, de-facto within the public domain.
GFVO can be loaded into Protégé or TopBraid Composer via: http://www.biointerchange.org/gfvo
GFVO's development version can be found on GitHub: https://github.com/BioInterchange/Ontologies
Note: GFF3GTFGVFand VCF badges indicate a particular class or property definition's origin. The application of the ontology is not restricted to those file formats though.
GFVO is being developed as the joint work of Joachim Baran (corresponding author, joachim.baran@gmail.com), Erick Antezana, Begum Durgahee, Karen Eilbeck, Robert Hoehndorf, and Michel Dumontier.
Feedback on the ontology has been provided by Raoul Bonnal, Takatomo Fujisawa, Toshiaki Katayama, Chris Mungall, and Francesco Strozzi.
An overview of GFVO is available through its BioPortal web-page.
GFVO makes use of the Sequence Ontology for annotating genomic features and variants as well as the Feature Annotation Location Description Ontology (FALDO) for describing genomic locations such as genomic features, breakpoints, and coordinate ranges for fuzzy/probabilistic coordinate ranges. We also integrate the Variation Ontology (VariO) for describing variants, their effects and consequences.
Species are represented as linked data by URIs to NCBI's species taxonomy. We encourage the use of identifiers.org URIs for other entities, which eventually be automatically generated in a future release of BioInterchange.
The design patterns of GFVO follow the SIO and equivalence mappings are provided for GFVO's classes, object- and datatype-properties where applicable.
Summary statistics about the number of classes and properties of GFVO are provided in the following table. The breakdown of the classes based on their modeling origin does not sum up to the total number of classes, because some modeled classes appear in multiple specifications (e.g., "DNA Sequence" appears in all four specifications).
Total Number | Number of Equivalences to SIO | Number of Equivalences to SO | |
---|---|---|---|
Classes | 102 | 40 | 13 |
…modeled from GFF3 | 23 | 11 | 3 |
…modeled from GTF | 13 | 11 | 2 |
…modeled from GVF | 62 | 23 | 12 |
…modeled from VCF | 42 | 15 | 7 |
Class Metadata | |||
…Wikipedia references | 53 | n/a | n/a |
…pairwise disjoint axioms | 6 | n/a | n/a |
…disjoint collection axioms | 13 | n/a | n/a |
…with property restrictions | 32 | n/a | n/a |
Datatype properties | 1 | 1 | 0 |
Object properties | 32 | 31 | 0 |
GFVO Class | GFF3 Data Structure | GTF Data Structure | GVF Data Structure | VCF Data Structure | |
---|---|---|---|---|---|
Alias | "Alias" attribute | n/a | "Alias" attribute | n/a | |
AlleleCount | n/a | n/a | n/a | "AC" additional information; "EC" additional information (requires link to alternate allele) | |
AlleleFrequency | n/a | n/a | "Variant_freq" attribute | "AF" additional information | |
AminoAcid | n/a | n/a | "Variant_aa" attribute; "Reference_aa" attribute; (see GFVO classes "Sequence Variant" and "Reference Sequence" for annotating the amino acid sequence) | n/a | |
AncestralSequence | n/a | n/a | n/a | "AA" additional information | |
ArrayComparativeGenomicHybridization | n/a | n/a | "data-source" structured pragma | n/a | |
Attribute | contents of the "attributes" column that are not mapped otherwise | "attribute" column | contents of the "attributes" column that are not mapped otherwise | contents of the "INFO" column that are not mapped otherwise | |
AverageCoverage | n/a | n/a | "technology-platform-average-coverage" pragma | n/a | |
Base Quality | n/a | n/a | n/a | "BQ" additional information | |
BiologicalEntity | see subclasses for applications | see subclasses for applications | see subclasses for applications | see subclasses for applications | |
BiopolymerSequencing | n/a | n/a | see GFVO subclasses "DNASequencing" and "RNASequencing" for applications | n/a | |
Breakpoint | n/a | n/a | "Breakpoint_detail" attribute and "Breakpoint_range" attribute (see also FALDO in Table 4) | n/a | |
Catalog | n/a | n/a | n/a | see GFVO subclass "Haplotype" | |
Cell | n/a | n/a | see GFVO subclasses "GermlineCell" and "SomaticCell" and "PrenatalCell" | n/a | |
ChemicalEntity | see subclasses for applications | see subclasses for applications | see subclasses for applications | see subclasses for applications | |
Chromosome | n/a | n/a | "sequencing-scope" pragma (for use with targeted sequencing) | n/a | |
CircularHelix | "Is_circular" attribute (see also GFVO class "WatsonCrickHelix") | n/a | "Is_circular" attribute (see also GFVO class "WatsonCrickHelix") | n/a | |
CodingFrameOffset | "phase" column | "frame" column | n/a ("phase" column unused) | n/a | |
CodonSequence | n/a | n/a | "Variant_codon" attribute; "Reference_codon" attribute; (see GFVO classes "Sequence Variant" and "Reference Sequence" for annotating the codon sequence) | n/a | |
Collection | n/a | n/a | see subclasses "Contig" and "Catalog" and "Scaffold" for applications | see subclasses "Contig" and "Catalog" and "Scaffold" for applications | |
Comment | comment lines (beginning with a single "#"; linkage via GFVO properties "isAfter"/"isBefore") | comment lines (beginning with a single "#"; linkage via GFVO properties "isAfter"/"isBefore") | "Comment" attribute; comment lines (beginning with a single "#"; linkage via GFVO properties "isAfter"/"isBefore") | n/a | |
ConditionalGenotypeQuality | n/a | n/a | n/a | "GQ" additional information | |
Contig | n/a | n/a | "sequencing-scope" pragma (for use with targeted sequencing) | "contig" information field (see also GFVO class "Landmark") | |
Coverage | n/a | n/a | see GFVO class "AverageCoverage" | "DP" additional information | |
DNAMicroarray | n/a | n/a | "data-source" structured pragma | n/a | |
DNASequence | "FASTA" annotation | n/a | "sequencing-scope" pragma (for use with targeted sequencing) | n/a | |
DNASequencing | n/a | n/a | "data-source" structured pragma | n/a | |
Exome | n/a | n/a | "sequencing-scope" pragma | n/a | |
ExperimentalMethod | "source" column | "source" column | "source" column; "capture-method" pragma (for use with exome and targeted sequencing) | "CHROM" column; "VALIDATED" additional information (GFVO class instance does not need to be further annotated) | |
ExternalReference | "Dbxref" attribute; "genome-build" pragma | "source" column | "Dbxref" attribute; "genome-build" pragma; "source-method" structured pragma; "attribute-method" structured pragma; "phenotype-description" structured pragma (see also GFVO class "Phenotype"); "phased-genotypes" structured pragma | "assembly" information field (encouraged to include the linked FASTA file contents; compare to GVF FASTA annotations); "pedigreeDB" information field; "DB" additional information (link to dbSNP via GFVO property "hasEvidence"); "H2" additional information and "H3" additional information (link to HapMap via GFVO property "hasEvidence"); "1000G" additional information (link to 1000 Genomes via GFVO property "hasEvidence") | |
Feature | rows with 9-tab delimiters (see "Description of the Format") | every line (see "Fields") | rows with 9-tab delimiters (see "Column Descriptions"); "Variant_effect" attribute (see also GFVO’s "is_affected_by" property) | data lines (see section 1.4 of the specification) | |
Female | n/a | n/a | "sex" pragma | n/a | |
File | n/a | n/a | "reference_fasta" pragma and "feature-gff3" pragma (it is encouraged to load and encode for the referenced file contents) | n/a | |
ForwardReferenceSequenceFrameshift | "Target" attribute | n/a | "Target" attribute; utilized by "sequence-alignment" pragma | utilized by "CIGAR" additional information | |
FragmentReadPlatform | n/a | n/a | "technology-platform-read-type" pragma | n/a | |
FunctionalSpecification | see GFVO subclass "Genotype" for applications | see GFVO subclass "Genotype" for applications | see GFVO subclass "Genotype" for applications | see GFVO subclass "Genotype" for applications | |
GameticPhase | n/a | n/a | "Phase" attribute | "GT" additional information (see also GFVO class "Genotype"); "PS" additional information (see also GFVO class "Haplotype") | |
Genome | "genome-build" pragma | n/a | "genome-build" pragma; "sequencing-scope" pragma | "Genomes" key/value property in information field ("SAMPLE" information field); "PEDIGREE" information field (relationships via GFVO properties "isAfter"/"isBefore"; labels encoded via GFVO property "hasAttribute" and class "Label") | |
GenomeAnalysis | n/a | n/a | n/a | "FILTER" information field (applicable if GFVO classes "Genotyping" and "VariantCalling" are not applicable; see also GFVO property "isRefutedBy") | |
GenomicAscertainingMethod | n/a | n/a | parent class to classes used with the "data-source" structured pragma | n/a | |
Genotype | n/a | n/a | "Genotype" attribute | "GT" additional information (see also GFVO class "GameticPhase") | |
Genotyping | n/a | n/a | n/a | "FILTER" information field (see also GFVO property "isRefutedBy") | |
GermlineCell | n/a | n/a | "genomic-source" pragma | n/a | |
Haplotype | n/a | n/a | n/a | "HQ" additional information (see also GFVO class "PhredScore") | |
HelixStructure | see GFVO subclasses "CircularHelix" and "WatsonCrickHelix" for applications | n/a | see GFVO subclasses "CircularHelix" and "WatsonCrickHelix" for applications | n/a | |
Hemizygous | n/a | n/a | "Zygosity" attribute | not explicitly specified; can be applied to genotypes (see GFVO class "Genotype") | |
Heritage | n/a | n/a | see GFVO subclasses "MaternalHeritage" and "PaternalHeritage" | see GFVO subclasses "MaternalHeritage" and "PaternalHeritage" | |
Hermaphrodite | n/a | n/a | unspecified; potential future use with "sex" pragma | n/a | |
Heterozygous | n/a | n/a | "Zygosity" attribute | not explicitly specified; can be applied to genotypes (see GFVO class "Genotype") | |
Homozygous | n/a | n/a | "Zygosity" attribute | not explicitly specified; can be applied to genotypes (see GFVO class "Genotype") | |
Identifier | "seqid" column; "ID" attribute | "seqname" column | "seqid" column; "ID" attribute; "individual-id" pragma; "technology-platform-machine-id" pragma | "CHROM" column; "ID" key/value property | |
InformationContentEntity | n/a | n/a | n/a | "FORMAT" information field | |
Label | n/a | n/a | n/a | "PEDIGREE" information field (see also GFVO class "Genome") | |
Landmark | "seqid" column; "sequence-region" pragma; "FASTA" annotation | "seqname" column; "DNA" "##"-line type and "RNA" "##"-line type and "Protein" "##"-line type | "seqid" column; "sequence-region" pragma; "FASTA" annotation | "CHROM" column; "contig" information field (see also GFVO class "Contig") | |
Likelihood | n/a | n/a | n/a | "GL" additional information; "GP" additional information (use with GFVO property "isSupportedBy" and GFVO class "ExperimentalMethod" or its descendants) | |
LikelihoodOfHeterogeneousPloidy | n/a | n/a | n/a | "GLE" additional information | |
Locus | "start" column; "end" column; "strand" column | "start" column; "end" column; "strand" column | "start" column; "end" column; "strand" column; "Start_range" attribute; "End_range" attribute | "POS" column | |
Male | n/a | n/a | "sex" pragma | n/a | |
MappingQuality | n/a | n/a | n/a | "MQ" information field | |
Match | "Target" attribute | n/a | "Target" attribute; utilized by "sequence-alignment" pragma | utilized by "CIGAR" additional information | |
MaterialEntity | see subclasses for applications | see subclasses for applications | see subclasses for applications | see subclasses for applications | |
MaternalHeritage | n/a | n/a | unspecified; potential use in phased genotypes | unspecified; potential use in phased genotypes | |
Name | "genome-build" pragma | "feature" column | "genome-build" pragma; "population" pragma; "technology-platform-name" pragma | n/a | |
Note | n/a | n/a | "sample-description" pragma; "Comment" key/value pairs in structured attributes | "Description" key/value property in information fields; "SB" information field (chosen due to ambiguous definition of strand bias) | |
NumberOfReads | n/a | n/a | "Variant_reads" attribute | "MQ0" attribute (with GFVO property "hasQuality" and GFVO class "MappingQuality" instance with value 0) | |
Object | see subclasses for applications | see subclasses for applications | see subclasses for applications | see subclasses for applications | |
PairedEndReadPlatform | n/a | n/a | "technology-platform-read-type" pragma | n/a | |
PaternalHeritage | n/a | n/a | unspecified; potential use in phased genotypes | unspecified; potential use in phased genotypes | |
PeptideSequence | "FASTA" annotation | n/a | "FASTA" annotation | n/a | |
Phenotype | n/a | n/a | "phenotype-description" structured pragma (see also GFVO class "Phenotype") | n/a | |
PhredScore | n/a | n/a | "score" column | "QUAL" column; "PL" additional information (linked to GFVO class "Genotype"); "HQ" additional information (see also GFVO class "Haplotype"); "PQ" additional information (see also GFVO "GameticPhase") | |
PrenatalCell | n/a | n/a | "genomic-source" pragma (deprecated class; meaning questioned in the specification) | n/a | |
Process | see subclasses for applications | see subclasses for applications | see subclasses for applications | see subclasses for applications | |
ProteinSequence | n/a | n/a | "sequencing-scope" pragma (for use with targeted sequencing) | n/a | |
Proteome | n/a | n/a | "sequencing-scope" pragma (anticipated future use) | n/a | |
Quality | see subclasses for applications | see subclasses for applications | see subclasses for applications | see subclasses for applications | |
Quantity | see subclasses for applications | see subclasses for applications | see subclasses for applications | see subclasses for applications | |
RNASequencing | n/a | n/a | "data-source" structured pragma | n/a | |
ReferenceSequence | n/a | n/a | "Reference_seq" attribute; "Sequence_context" attribute (see also GFVO properties "isBefore"/"isAfter") | "REF" column | |
ReferenceSequenceGap | "Target" attribute | n/a | "Target" attribute; utilized by "sequence-alignment" pragma | utilized by "CIGAR" additional information | |
ReverseReferenceSequenceFrameshift | "Target" attribute | n/a | "Target" attribute; utilized by "sequence-alignment" pragma | utilized by "CIGAR" additional information | |
Sample | n/a | n/a | "sample-description" pragma | "SAMPLE" information field | |
SampleCount | n/a | n/a | n/a | "NS" additional information | |
SampleMixture | n/a | n/a | n/a | "Mixture" key/value property in information fields ("SAMPLE" information field) | |
Scaffold | n/a | n/a | "sequencing-scope" pragma (for use with targeted sequencing) | n/a | |
Score | "score" column | "score" column | "score" column; "score-method" structured pragma | uses subclasses of "Score" (see GFVO classes "Phred Score" and "Conditional Genotype Quality") | |
Sequence | see subclasses for applications | n/a | ""sequencing-scope" pragma (for use with targeted sequencing) | see subclasses for further applications" | n/a |
SequenceAlignment | "Target" attribute | n/a | "Target" attribute; "sequence-alignment" pragma | "CIGAR" additional information | |
SequenceAlignmentOperation | see subclasses for applications | see subclasses for applications | see subclasses for applications | see subclasses for applications | |
SequenceVariant | n/a | n/a | "Variant_seq" attribute | "ALT" column; "ALT" information field (with SO annotation) | |
SequencedIndividual | n/a | n/a | "Individual" attribute; "multi-individual" pragma | not within specification; but could be applied by data providers (see GFVO class "Haplotype") | |
SequencingTechnologyPlatform | n/a | n/a | "technology-platform-class" pragma; composite for aggregating "technology-platform-name" pragma and "technology-platform-version" pragma and "technology-platform-machine-id" pragma and "technology-platform-read-length" pragma and "technology-platform-read-type" pragma and "technology-platform-read-pair-span" pragma and "technology-platform-average-coverage" pragma; "technology-platform" structured pragma | n/a | |
Sex | n/a | n/a | "sex" pragma | n/a | |
SomaticCell | n/a | n/a | "genomic-source" pragma | "SOMATIC" additional information (use with GFVO property "refersTo") | |
Span | for use with "SequenceAlignmentOperation" | n/a | for use with "SequenceAlignmentOperation"; "technology-platform-read-length" pragma and "technology-platform-read-pair-span" pragma | n/a | |
TargetSequenceGap | "Target" attribute | n/a | "Target" attribute; utilized by "sequence-alignment" pragma | utilized by "CIGAR" additional information | |
TotalNumberOfAlleles | n/a | n/a | n/a | "AN" additional information | |
TotalNumberOfReads | n/a | n/a | "Total_reads" attribute | "DP" additional information | |
VariantCalling | n/a | n/a | "variant-calling" pragma | "FILTER" information field (see also GFVO property "isRefutedBy") | |
Version | "gff-version" pragma | "gff-version" "##"-line type | "gvf-version" pragma; "file-version" pragma; "technology-platform-version" pragma | "fileformat" meta-information line | |
WatsonCrickHelix | "Is_circular" attribute (see also GFVO class "CircularHelix") | n/a | "Is_circular" attribute (see also GFVO class "CircularHelix") | n/a | |
Zygosity | n/a | n/a | see specific GFVO classes "Heterozygous" and "Homozygous" and "Hemizygous" | see specific GFVO classes "Heterozygous" and "Homozygous" and "Hemizygous" |
Examples are provided in Turtle syntax. All examples are archived and downloadable as the file "gfvo_examples.ttl" of the ontologies' GitHub repository.
Encoding loci and basic genomic feature annotations. The example covers lines 0-3 of the example gene description given in the GFF3 specification.
@prefix : <http://www.biointerchange.org/gfvo#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix faldo: <http://biohackathon.org/resource/faldo#> . @prefix so: <http://purl.bioontology.org/ontology/SO/SO> . ################################################################# # # Example 1 # # Encoding loci and basic genomic feature annotations. The # example covers lines 0-3 of the example gene description # given in the GFF3 specification. # (http://www.sequenceontology.org/resources/gff3.html) # # Please note that this seemingly verbose data representation # is slighly misleading. The reuse of non-literals as well as # binary RDF encodings such as RDF/HDT permits a more concise # representation of the same data. # # This example also avoids blank nodes, so that this file can # be conveniently viewed using the Protege ontology editor. # Using blank nodes, the line # # :hasIdentifier :ExampleSet1Feature1Identifier ; # # can be substituted with # # :hasIdentifier [ # a :Identifier ; # :hasValue "gene00001" ; # ] # ################################################################# # This example is representing the contents of a GFF3 file, version 3. :Example1Version a :Version ; :hasValue "gff-version 3" . # Two genomic features are described in the file, where an extra comment # was added to clarify the provenance of this example representation. :ExampleSet1 a :File ; :hasIdentifier :Example1Version ; :hasMember :ExampleSet1Feature1 , :ExampleSet1Feature2 ; rdfs:comment "A simple example of a hierarchical genomic feature dependency. This example is an excerpt of another example given in the GFF3 specification (http://sequenceontology.org/resources/gff3.html)."@en . # Features in this example are placed on a single langmark, which has # an identifier associated with it (it cannot be anonymous in GFF3, even # though GFVO permits landmarks without explicit identifier), and it # has a range of sequence positions it covers (expressed by a separate # pragma statement in GFF3). :ExampleSet1Landmark a :Landmark ; :hasIdentifier :ExampleSet1LandmarkIdentifier ; :hasAttribute :ExampleSet1LandmarkRegion . :ExampleSet1LandmarkIdentifier a :Identifier ; :hasValue "ctg123" . :ExampleSet1LandmarkRegion a faldo:Region ; faldo:begin :ExampleSet1LandmarkStartPosition ; faldo:end :ExampleSet1LandmarkEndPosition . :ExampleSet1LandmarkStartPosition a faldo:ExactPosition ; faldo:position "1" . :ExampleSet1LandmarkEndPosition a faldo:ExactPosition ; faldo:position "1497228" . # GFF3, GTF and GVF make use of named attributes. There are predefined (reserved) # names, but users can also freely contribute their own attributes. In this example, # only one labeled attribute is used, whose name we define upfront. # # Note: It might be a bit confusing that the label "Name" is applied to an attribute # of type "Name". This is just an artifact of the chosen example and it is of # course possible to label a "Name" attribute arbitrarily. :ExampleSet1AttributeName a :Label ; :hasValue "Name" . # First feature of the GFF3 file (line 2): a gene. :ExampleSet1Feature1 a :Feature , so:0000704 ; :isLocatedOn :ExampleSet1Landmark ; :hasPart :ExampleSet1Feature1Locus ; :hasIdentifier :ExampleSet1Feature1Identifier ; :hasAttribute :ExampleSet1Feature1Name . :ExampleSet1Feature1Locus a :Locus ; :hasAttribute :ExampleSet1Feature1Region . :ExampleSet1Feature1Region a faldo:Region ; faldo:begin :ExampleSet1FeatureStartPosition ; faldo:end :ExampleSet1Feature1EndPosition . # The start coordinate of both features is the same in this example. So this # entity is used by both :ExampleSet1Feature1 and :ExampleSet1Feature2. :ExampleSet1FeatureStartPosition a faldo:ExactPosition ; faldo:position "1000" . :ExampleSet1Feature1EndPosition a faldo:ExactPosition ; faldo:position "9000" . :ExampleSet1Feature1Identifier a :Identifier ; :hasValue "gene00001" . :ExampleSet1Feature1Name a :Name ; :hasAttribute :ExampleSet1AttributeName ; :hasValue "EDEN" . # Second feature of the GFF3 file (line 3): a transcription factor # binding site. # It references the first feature via a "Parent" attribute. :ExampleSet1Feature2 a :Feature , so:0000235 ; :isLocatedOn :ExampleSet1Landmark ; :hasPart :ExampleSet1Feature2Locus ; :hasIdentifier :ExampleSet1Feature2Identifier ; :hasSource :ExampleSet1Feature1 . :ExampleSet1Feature2Locus a :Locus ; :hasAttribute :ExampleSet1Feature1Region . :ExampleSet1Feature2Region a faldo:Region ; faldo:begin :ExampleSet1FeatureStartPosition ; faldo:end :ExampleSet1Feature2EndPosition . :ExampleSet1Feature2EndPosition a faldo:ExactPosition ; faldo:position "1012" . :ExampleSet1Feature2Identifier a :Identifier ; :hasValue "tfbs00001" .
Encoding of a sequence alignment. This is part of the GFF3 specification, denoting the alignment between the reference sequence "chr3" and the target sequence "EST23".
@prefix : <http://www.biointerchange.org/gfvo#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix faldo: <http://biohackathon.org/resource/faldo#> . @prefix so: <http://purl.bioontology.org/ontology/SO/SO> . ################################################################# # # Example 2 # # Encoding of a sequence alignment. This is part of the GFF3 # specification, denoting the alignment between the reference # sequence "chr3" and the target sequence "EST23". # (http://www.sequenceontology.org/resources/gff3.html) # ################################################################# :ExampleSet2 a :Collection ; :hasMember :ExampleSet2Feature1 , :ExampleSet2Feature2 ; rdfs:comment "An example of a sequence alignment. This example is an excerpt of another example given in the GFF3 specification (http://sequenceontology.org/resources/gff3.html)."@en . :ExampleSet2Landmark a :Landmark ; :hasIdentifier :ExampleSet2LandmarkIdentifier . :ExampleSet2LandmarkIdentifier a :Identifier ; :hasValue "chr3" . # Reference sequence feature: :ExampleSet2Feature1 a :Feature ; :isLocatedOn :ExampleSet2Landmark ; :hasIdentifier :ExampleSet2Feature1Identifier . :ExampleSet1Feature1Identifier a :Identifier ; :hasValue "Match1" . # Target sequence feature: :ExampleSet2Feature2 a :Feature ; so:0000343 ; :isLocatedOn :ExampleSet2Landmark ; :hasIdentifier :ExampleSet2Feature2Identifier . :ExampleSet1Feature2Identifier a :Identifier ; :hasValue "EST23" . # Description of the sequence alignment between :ExampleSet2Feature1 # and :ExampleSet2Feature2 (features "Match1" and "EST23"). :ExampleSet2SequenceAlignment a :SequenceAlignment ; :hasAttribute :ExampleSet2AlignmentLocus ; :hasSource :ExampleSet2Feature1 ; :hasInput :ExampleSet2Feature2 ; :hasFirstPart :ExampleSet2AlignmentOperation1 ; :hasOrderedPart :ExampleSet2AlignmentOperation2 , :ExampleSet2AlignmentOperation3 , :ExampleSet2AlignmentOperation4 ; :hasLastPart :ExampleSet2AlignmentOperation5 . # The locus describes the region over which the alignment operation # is being carried out. :ExampleSet2AlignmentLocus a :Locus ; :hasAttribute :ExampleSet2AlignmentRegion . :ExampleSet2AlignmentRegion a faldo:Region ; faldo:begin :ExampleSet2AlignmentStartPosition ; faldo:end :ExampleSet2AlignmentEndPosition . :ExampleSet2AlignmentStartPosition a faldo:ExactPosition ; faldo:position "1" . :ExampleSet2AlignmentEndPosition a faldo:ExactPosition ; faldo:position "21" . # Description of an actual alignment operation. :ExampleSet2AlignmentOperation1 rdf:type :Match ; :hasAttribute ::ExampleSet2AlignmentOperation1Span ; :isBefore :ExampleSet2AlignmentOperation2 . :ExampleSet2AlignmentOperation1Span a :Span ; :hasValue "8" . :ExampleSet2AlignmentOperation2 rdf:type :TargetSequenceGap ; :hasAttribute ::ExampleSet2AlignmentOperation2Span ; :isAfter :ExampleSet2AlignmentOperation1 ; :isBefore :ExampleSet2AlignmentOperation3 . :ExampleSet2AlignmentOperation2Span a :Span ; :hasValue "3" . :ExampleSet2AlignmentOperation3 rdf:type :Match ; :hasAttribute ::ExampleSet2AlignmentOperation3Span ; :isAfter :ExampleSet2AlignmentOperation2 ; :isBefore :ExampleSet2AlignmentOperation4 . :ExampleSet2AlignmentOperation3Span a :Span ; :hasValue "6" . :ExampleSet2AlignmentOperation4 rdf:type :ReferenceSequenceGap ; :hasAttribute ::ExampleSet2AlignmentOperation4Span ; :isAfter :ExampleSet2AlignmentOperation3 ; :isBefore :ExampleSet2AlignmentOperation5 . :ExampleSet2AlignmentOperation4Span a :Span ; :hasValue "1" . :ExampleSet2AlignmentOperation5 rdf:type :Match ; :hasAttribute ::ExampleSet2AlignmentOperation5Span ; :isAfter :ExampleSet2AlignmentOperation4 . :ExampleSet2AlignmentOperation5Span a :Span ; :hasValue "6" .
Encoding of a phased/unphased genotype and its sequence variants. This is part of the GVF specification, denoting two variants for two separate sequenced individuals: "Variant_seq=A,T;Genotype=0:1,1:1". See "Genotype" in the GVF specification.
@prefix : <http://www.biointerchange.org/gfvo#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . ################################################################# # # Example 3 # # Encoding of genotypes/sequence variants. This is part of the GVF # specification, denoting two variants for two separate # sequenced individuals: "Variant_seq=A,T;Genotype=0:1,1:1". # (See "Genotype" in the GVF specification at # http://www.sequenceontology.org/resources/gvf.html) # ################################################################# :ExampleSet3 a :Collection ; :hasMember :ExampleSet3Feature1 ; rdfs:comment "Encoding of genotypes/sequence variants. This example is part of the GVF specification, denoting two variants for two separate sequenced individuals (http://www.sequenceontology.org/resources/gvf.html)."@en . # The individuals that were sequenced are represented by named, but otherwise # unspecified, instances. This provides sufficient information to distinguish # between the individuals that were sequenced. :ExampleSet3SequencedIndividual1 rdf:type :SequencedIndividual . :ExampleSet3SequencedIndividual2 rdf:type :SequencedIndividual . # Genomic feature that captures the two described genotypes. :ExampleSet3Feature1 a :Feature ; :hasAttribute :ExampleSet3Genotype1 , :ExampleSet3Genotype2 . # Genotype for the first individual. # # The genotype's phased variants are linked via "hasFirstPart" and # "hasLastPart" here, which are descendants of &qout;hasOrderedPart". # The object properties "isBefore"/"isAfter" connect the phased variants. # # Note: for an unphased genotype, it is sufficient to use "hasPart" only. :ExampleSet3Genotype1 a :Genotype ; :hasParticipant :ExampleSet3SequencedIndividual1 ; :hasQuality :Heterozygous ; :hasAttribute :ExampleSet3Phase ; :hasFirstPart :ExampleSet3ReferenceA_AT ; :hasLastPart :ExampleSet3VariantT_AT . # Ordering of the variants permits the expression of phased genotypes. # It is recommended to use "isBefore", but the use of "isAfter" is optional. :ExampleSet3ReferenceA_AT :isBefore :ExampleSet3VariantT_AT . :ExampleSet3VariantT_AT :isAfter :ExampleSet3ReferenceA_AT . # Genotype of the second individual. :ExampleSet3Genotype2 a :Genotype ; :hasParticipant :ExampleSet3SequencedIndividual2 ; :hasQuality :Homozygous ; :hasAttribute :ExampleSet3Phase ; :hasFirstPart :ExampleSet3VariantT_TT ; :hasLastPart :ExampleSet3VariantT_TT . # As above: variant ordering to phase the genotype. :ExampleSet3VariantT_TT :isBefore :ExampleSet3VariantT_TT ; :isAfter :ExampleSet3VariantT_TT . # Having an instance to denote the presence of phase information # might appear redundant across data sets. On the other hand, it # permits data providers to enrich their own or other providers' # data set with additional information (annotations) about used # protocols/equipment/tools. :ExampleSet3Phase a :GameticPhase . # For VCF data, the type can also be "ReferenceSequence", which # denotes that the sequence is located on a reference genome. This # would be indicated -- in a VCF file -- by a "0" in the GT field # (e.g., "0|1" (phased) or "0/1" (unphased) to indicate a genotype # whose first allele is on the reference genome and the second allele # is an alternative allele). :ExampleSet3ReferenceA_AT a :ReferenceSequence ; :hasValue "A" . :ExampleSet3VariantT_AT a :SequenceVariant ; :hasValue "T" . :ExampleSet3VariantT_TT a :SequenceVariant ; :hasValue "T" .
Encoding of a Phred score, which is part of the GVF specification, but more widely utilized in the VCF specification.
@prefix : <http://www.biointerchange.org/gfvo#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . ################################################################# # # Example 4 # # Encoding of a Phred score. GVF leaves it open how the # "score" (column 6) should be interpreted, but recommends # the use of Phred scores. If a data provider knows that # a Phred score was utilized, then this can be represented # via GFVO. # (See "Column Descriptions" in the GVF specification at # http://www.sequenceontology.org/resources/gvf.html) # ################################################################# :ExampleSet4 a :Collection ; :hasMember :ExampleSet4Feature1 ; rdfs:comment "Encoding of a Phred score. This example is motivated by the GVF specification, which encourages -- but does not enforce -- the use of Phred scores (http://www.sequenceontology.org/resources/gvf.html)."@en . # Interpreting "score" (column 6) of a GVF file as Phred score. :ExampleSet4Feature1 a :Feature ; :hasAttribute :ExampleSet4Feature1Score . # Use of "PhredScore" instead of "Score" to denote the specific # type of score being used. :ExampleSet4Feature1Score a :PhredScore ; :hasValue 38 .
Ontology namespace "gfvo": http://www.biointerchange.org/gfvo#