class DataKitten::Distribution

A specific available form of a dataset, such as a CSV file, an API, or an RSS feed.

Based on {www.w3.org/TR/vocab-dcat/#class-distribution dcat:Distribution}, but with useful aliases for other vocabularies.

Attributes

access_url[RW]

@!attribute access_url

@return [String] a URL to access the distribution.
byte_size[RW]

@!attribute byte_size

@return [Integer] size of file in bytes
description[RW]

@!attribute description

@return [String] a textual description
download_url[RW]

@!attribute download_url

@return [String] a URL to the file of the distribution.
extension[RW]

@!attribute extension

@return [String] the file extension of the distribution
format[RW]

@!attribute format

@return [DistributionFormat] the file format of the distribution.
issued[RW]

@!attribute issued

@return [Date] date created
media_type[RW]

@!attribute media_type

@return [String] the IANA media type (MIME type) of the distribution
modified[RW]

@!attribute modified

@return [Date] date modified
name[RW]

@!attribute title

@return [String] a short title, unique within the dataset
path[RW]

@!attribute path

@return [String] the path of the distribution within the source, if appropriate
schema[RW]

@!attribute schema

@return [Hash] a hash representing the schema of the data within the distribution. Will
               change to a more structured object later.
title[RW]

@!attribute title

@return [String] a short title, unique within the dataset
uri[RW]

@!attribute download_url

@return [String] a URL to the file of the distribution.

Public Class Methods

new(dataset, options) click to toggle source

Create a new Distribution. Currently only loads from Datapackage resource hashes.

@param dataset [Dataset] the {Dataset} that this is a part of. @param options [Hash] A set of options with which to initialise the distribution. @option options [String] :datapackage_resource the resource section of a Datapackage

representation to load information from.
# File lib/data_kitten/distribution.rb, line 66
def initialize(dataset, options)
  # Store dataset
  @dataset = dataset
  # Parse datapackage
  if r = options[:datapackage_resource]
    # Load basics
    @description = r['description']
    # Work out format
    @format = begin
      @extension = r['format']
      if @extension.nil?
        @extension = r['path'].is_a?(String) ? r['path'].split('.').last.upcase : nil
      end
      @extension ? DistributionFormat.new(self) : nil
    end
    # Get CSV dialect
    @dialect = r['dialect']
    # Extract schema
    @schema = r['schema']
    # Get path
    @path = r['path']
    @download_url = r['url']
    # Set title
    @title = @path || @uri
  elsif r = options[:dcat_resource]
    @title       = r[:title]
    @description = r[:title]
    @access_url  = r[:accessURL]
  elsif r = options[:ckan_resource]
    @title        = r[:title]
    @description  = r[:title]
    @issued       = r[:issued]
    @modified     = r[:modified]
    @access_url   = r[:accessURL]
    @download_url = r[:downloadURL]
    @byte_size    = r[:byteSize]
    @media_type   = r[:mediaType]
    @extension    = r[:format]
    # Load HTTP Response for further use
    @format = r[:format] ? DistributionFormat.new(self) : nil
  end
  # Set default CSV dialect
  @dialect ||= {
    "delimiter" => ","
  }

  @download = Fetcher.wrap(@download_url)
end

Public Instance Methods

data() click to toggle source

A CSV object representing the loaded data.

@return [Array<Array<String>>] an array of arrays of strings, representing each row.

# File lib/data_kitten/distribution.rb, line 147
def data
  @data ||= begin
    if @path
      datafile = @dataset.send(:load_file, @path)
    elsif @download.ok?
      datafile = @download.body
    end
    if datafile
      case format.extension
      when :csv
        CSV.parse(
          datafile,
          :headers => true,
          :col_sep => @dialect["delimiter"]
        )
      else
        nil
      end
    else
      nil
    end
  rescue
    nil
  end
end
exists?() click to toggle source

Whether the file that the distribution represents actually exists

@return [Boolean] whether the HTTP response returns a success code or not

# File lib/data_kitten/distribution.rb, line 140
def exists?
  @download.exists?
end
headers() click to toggle source

An array of column headers for the distribution. Loaded from the schema, or from the file directly if no schema is present.

@return [Array<String>] an array of column headers, as strings.

# File lib/data_kitten/distribution.rb, line 127
def headers
  @headers ||= begin
    if @schema
      @schema['fields'].map{|x| x['id']}
    else
      data.headers
    end
  end
end