class DataKitten::Dataset

Represents a single dataset from some origin (see {www.w3.org/TR/vocab-dcat/#class-dataset dcat:Dataset} for relevant vocabulary).

Designed to be created with a URI to the dataset, and then to work out metadata from there.

Currently supports Datasets hosted in Git (and optionally on GitHub), and which use the Datapackage metadata format.

@example Load a Dataset from a git repository

dataset = Dataset.new('git://github.com/theodi/dataset-metadata-survey.git')
dataset.supported?         # => true
dataset.origin             # => :git
dataset.host               # => :github
dataset.publishing_format  # => :datapackage

Attributes

access_url[RW]

@!attribute access_url

@return [String] the URL that gives access to the dataset
identifier[RW]

A unique identifier of the dataset.

@return [String] the identifier of the dataset

metadata[RW]
source[W]

Public Class Methods

new(url_or_options, base_url=nil) click to toggle source

Create a new Dataset object

The class will attempt to auto-load metadata from this URL.

@overload new(url)

@param [String] url A URL that can be used to access the Dataset

@overload new(options)

@param [Hash] options the details of the Dataset.
@option options [String] :access_url A URL that can be used to access the Dataset.
# File lib/data_kitten/dataset.rb, line 43
def initialize(url_or_options, base_url=nil)
  url = case url_or_options
  when Hash
    base_url ||= url_or_options[:base_url]
    url_or_options[:access_url]
  else
    url_or_options
  end
  @access_url = DataKitten::Fetcher.wrap(url)
  @base_uri = URI(base_url) if base_url

  detect_origin
  detect_host
  detect_publishing_format
end

Public Instance Methods

base_uri() click to toggle source
# File lib/data_kitten/dataset.rb, line 63
def base_uri
  @base_uri || uri.merge("/")
end
change_history() click to toggle source

A history of changes to the Dataset

@return [Array] An array of changes. Exact format depends on the origin and publishing format.

# File lib/data_kitten/dataset.rb, line 279
def change_history
  []
end
contributor_agreement_url() click to toggle source

The URL of the contributor license agreement

@return [String] A URL for the agreement that contributors accept.

# File lib/data_kitten/dataset.rb, line 256
def contributor_agreement_url
  nil
end
contributors() click to toggle source

A list of contributors

@return [Array<Agent>] An array of contributors to the dataset, each as an Agent object.

# File lib/data_kitten/dataset.rb, line 228
def contributors
  []
end
crowdsourced?() click to toggle source

Has the data been crowdsourced?

@return [Boolean] Whether the data has been crowdsourced or not.

# File lib/data_kitten/dataset.rb, line 249
def crowdsourced?
  false
end
data_title() click to toggle source

The human-readable title of the dataset.

@return [String] the title of the dataset.

# File lib/data_kitten/dataset.rb, line 112
def data_title
  nil
end
description() click to toggle source

A brief description of the dataset

@return [String] the description of the dataset.

# File lib/data_kitten/dataset.rb, line 119
def description
  nil
end
distributions() click to toggle source

A list of distributions. Has aliases for popular alternative vocabularies.

@return [Array<Distribution>] An array of Distribution objects.

# File lib/data_kitten/dataset.rb, line 263
def distributions
  []
end
Also aliased as: files, resources
documentation_url() click to toggle source

Human-readable documentation for the dataset.

@return [String] the URL of the documentation.

# File lib/data_kitten/dataset.rb, line 133
def documentation_url
  nil
end
files()
Alias for: distributions
host() click to toggle source

Where the dataset is hosted.

@return [Symbol] The host. For instance, data loaded from github repositories

will return +:github+. This can be used to control extra host-specific
behaviour if required. If no host type is identified, will return +nil+.
# File lib/data_kitten/dataset.rb, line 99
def host
  nil
end
issued() click to toggle source

Date the dataset was released

@return [Date] the release date of the dataset

# File lib/data_kitten/dataset.rb, line 148
def issued
  nil
end
Also aliased as: release_date
keywords() click to toggle source

Keywords for the dataset

@return [Array<string>] an array of keywords

# File lib/data_kitten/dataset.rb, line 126
def keywords
  []
end
landing_page() click to toggle source

A web page that can be used to gain access to the dataset, its distributions and/or additional information.

@return [String] The URL to the dataset

# File lib/data_kitten/dataset.rb, line 163
def landing_page
  nil
end
language() click to toggle source

The language of the dataset.

@return [String] the language of the dataset

# File lib/data_kitten/dataset.rb, line 235
def language
  nil
end
licenses() click to toggle source

A list of licenses

@return [Array<License>] An array of licenses, each as a License object.

# File lib/data_kitten/dataset.rb, line 214
def licenses
  []
end
maintainers() click to toggle source

A list of maintainers

@return [Array<Agent>] An array of maintainers, each as an Agent object.

# File lib/data_kitten/dataset.rb, line 200
def maintainers
  []
end
modified() click to toggle source

Date the dataset was last modified

@return [Date] the dataset's last modified date

# File lib/data_kitten/dataset.rb, line 156
def modified
  nil
end
origin() click to toggle source

The origin type of the dataset.

@return [Symbol] The origin type. For instance, datasets loaded from git

repositories will return +:git+. If no origin type is 
identified, will return +nil+.
# File lib/data_kitten/dataset.rb, line 90
def origin
  nil
end
publishers() click to toggle source

A list of publishers

@return [Array<Agent>] An array of publishers, each as an Agent object.

# File lib/data_kitten/dataset.rb, line 207
def publishers
  []
end
publishing_format() click to toggle source

The publishing format for the dataset.

@return [Symbol] The format. For instance, datasets that publish metadata in

Datapackage format will return +:datapackage+. If no format 
is identified, will return +nil+.
# File lib/data_kitten/dataset.rb, line 193
def publishing_format
  nil
end
release_date()
Alias for: issued
release_type() click to toggle source

What type of dataset is this? Options are: :web_service for API-accessible data, or :one_off for downloadable data dumps.

@return [Symbol] the release type.

# File lib/data_kitten/dataset.rb, line 141
def release_type
  false
end
resources()
Alias for: distributions
rights() click to toggle source

The rights statment for the data

@return [Object<Rights>] How the content and data can be used, as well as copyright notice and attribution URL

# File lib/data_kitten/dataset.rb, line 221
def rights
  nil
end
source() click to toggle source
# File lib/data_kitten/dataset.rb, line 71
def source
  @source ||= @access_url.as_json if @access_url.ok?
end
sources() click to toggle source

Where the data is sourced from

@return [Array<Source>] the sources of the data, each as a Source object.

# File lib/data_kitten/dataset.rb, line 177
def sources
  []
end
spatial() click to toggle source

Spatial coverage of the dataset

@return [GeoJSON Geometry] A GeoJSON geometry object of the spatial coverage

# File lib/data_kitten/dataset.rb, line 286
def spatial
  nil
end
supported?() click to toggle source

Can metadata be loaded for this Dataset?

@return [Boolean] true if metadata can be loaded, false if it's

an unknown origin type, or has an unknown metadata format.
# File lib/data_kitten/dataset.rb, line 81
def supported?
  !(origin.nil? || publishing_format.nil?)
end
temporal() click to toggle source

The temporal coverage of the dataset

@return [Object<Temporal>] the start and end dates of the dataset's temporal coverage

# File lib/data_kitten/dataset.rb, line 170
def temporal
  nil
end
theme() click to toggle source

The main category the dataset belongs to.

@return [String]

# File lib/data_kitten/dataset.rb, line 242
def theme
  nil
end
time_sensitive?() click to toggle source

Is the information time-sensitive?

@return [Boolean] whether the information will go out of date.

# File lib/data_kitten/dataset.rb, line 184
def time_sensitive?
  false
end
update_frequency() click to toggle source

How frequently the data is updated.

@return [String] The frequency of update expressed as a dct:Frequency.

# File lib/data_kitten/dataset.rb, line 272
def update_frequency
  nil
end
uri() click to toggle source
# File lib/data_kitten/dataset.rb, line 59
def uri
  URI(@access_url.to_s)
end
url() click to toggle source
# File lib/data_kitten/dataset.rb, line 67
def url
  @access_url.to_s
end