class DataKitten::Distribution
A specific available form of a dataset, such as a CSV file, an API, or an RSS feed.
Based on {www.w3.org/TR/vocab-dcat/#class-distribution dcat:Distribution}, but with useful aliases for other vocabularies.
Attributes
@!attribute access_url
@return [String] a URL to access the distribution.
@!attribute byte_size
@return [Integer] size of file in bytes
@!attribute description
@return [String] a textual description
@!attribute download_url
@return [String] a URL to the file of the distribution.
@!attribute extension
@return [String] the file extension of the distribution
@!attribute format
@return [DistributionFormat] the file format of the distribution.
@!attribute issued
@return [Date] date created
@!attribute media_type
@return [String] the IANA media type (MIME type) of the distribution
@!attribute modified
@return [Date] date modified
@!attribute title
@return [String] a short title, unique within the dataset
@!attribute path
@return [String] the path of the distribution within the source, if appropriate
@!attribute schema
@return [Hash] a hash representing the schema of the data within the distribution. Will change to a more structured object later.
@!attribute title
@return [String] a short title, unique within the dataset
@!attribute download_url
@return [String] a URL to the file of the distribution.
Public Class Methods
Create a new Distribution
. Currently only loads from Datapackage resource
hashes.
@param dataset [Dataset] the {Dataset} that this is a part of. @param options [Hash] A set of options with which to initialise the distribution. @option options [String] :datapackage_resource the resource
section of a Datapackage
representation to load information from.
# File lib/data_kitten/distribution.rb, line 66 def initialize(dataset, options) # Store dataset @dataset = dataset # Parse datapackage if r = options[:datapackage_resource] # Load basics @description = r['description'] # Work out format @format = begin @extension = r['format'] if @extension.nil? @extension = r['path'].is_a?(String) ? r['path'].split('.').last.upcase : nil end @extension ? DistributionFormat.new(self) : nil end # Get CSV dialect @dialect = r['dialect'] # Extract schema @schema = r['schema'] # Get path @path = r['path'] @download_url = r['url'] # Set title @title = @path || @uri elsif r = options[:dcat_resource] @title = r[:title] @description = r[:title] @access_url = r[:accessURL] elsif r = options[:ckan_resource] @title = r[:title] @description = r[:title] @issued = r[:issued] @modified = r[:modified] @access_url = r[:accessURL] @download_url = r[:downloadURL] @byte_size = r[:byteSize] @media_type = r[:mediaType] @extension = r[:format] # Load HTTP Response for further use @format = r[:format] ? DistributionFormat.new(self) : nil end # Set default CSV dialect @dialect ||= { "delimiter" => "," } @download = Fetcher.wrap(@download_url) end
Public Instance Methods
A CSV object representing the loaded data.
@return [Array<Array<String>>] an array of arrays of strings, representing each row.
# File lib/data_kitten/distribution.rb, line 147 def data @data ||= begin if @path datafile = @dataset.send(:load_file, @path) elsif @download.ok? datafile = @download.body end if datafile case format.extension when :csv CSV.parse( datafile, :headers => true, :col_sep => @dialect["delimiter"] ) else nil end else nil end rescue nil end end
Whether the file that the distribution represents actually exists
@return [Boolean] whether the HTTP response returns a success code or not
# File lib/data_kitten/distribution.rb, line 140 def exists? @download.exists? end
An array of column headers for the distribution. Loaded from the schema, or from the file directly if no schema is present.
@return [Array<String>] an array of column headers, as strings.
# File lib/data_kitten/distribution.rb, line 127 def headers @headers ||= begin if @schema @schema['fields'].map{|x| x['id']} else data.headers end end end