module BioVcf

This module parses the VCF header. A header consists of lines containing fields. Most fields are of 'key=value' type and appear only once. These can be retrieved with the find_field method.

INFO, FORMAT and contig fields are special as they appear multiple times and contain multiple key values (identified by an ID field). To retrieve these call 'info' and 'format' functions respectively, which return a hash on the contained ID.

For the INFO and FORMAT fields a Ragel parser is used, mostly to deal with embedded quoted fields.

line 1 “gen_vcfheaderline_parser.rl” Ragel lexer for VCF-header

This is compact a parser/lexer for the VCF header format. Bio-vcf uses the parser to generate meta information that can be output to (for example) JSON format. The advantage of using ragel as a state engine is that it allows for easy parsing of key-value pairs with syntax checking and, for example, escaped quotes in quoted string values. This ragel parser/lexer generates valid Ruby; it should be fairly trivial to generate python/C/JAVA instead. Note that this edition validates ID and Number fields only. Other fields are dumped 'AS IS'.

Note the .rb version is generated from ./ragel/gen_vcfheaderline_parser.rl

by Pjotr Prins © 2014/2015

Constants

MAXINT