A gem providing a Git-backed datastore. This acts as a version-controlled hierarchical data store.

Requires Grit.

Usage:

require 'git-ds'

# connect to data model
db = GitDS::Database.connect('/path/to/repo.db')
model = GitDS::Model.new(db)

# store item in database
model.add_item('path/to/item', 'data in item')

# test for existence of item
model.include? 'path/to/item'

# list items in database
puts model.list_children
puts model.list_children('path/to')

# update item in database
model.add_item('path/to/item', 'revised data in item')

# retrieve item from database
data = model.get_item('path/to/item')

# delete item from database
model.delete_item('path/to/item')

# close database connection
db.close

See Examples.

Data Model

The recommended way to use GitDS is with a data model.

A subclass of GitDS::Model is used to define the data model. Note that the name of a Model subclass determines the top-level directory in the Git repository that will contain the data for the Model.

The structure for a GitDS::Model will have a subdirectory for each class which contains instances of that class. The following repository structure shows the model ‘my_model’ which contains two data types (‘my_class’ and ‘another class’, which have 3 instances and 2 instances respectively:

my_model/my_class/1/...
my_model/my_class/2/...
my_model/my_class/3/...
my_model/another class/1/...
my_model/another class/2/...

In a GitDS::Model, leaf nodes (files) contain actual data, while classes, instances, and members appear as directories. A model ‘Stuff’ with an instance of the class MyClass with ident ‘foo’ and the members (x=100, comment=‘factor of 10’) would appear in the repository as follows:

Stuff/MyClass/foo/x             : file containing '100'
Stuff/MyClass/foo/comment       : file containing 'factor of 10'

See the examples KeyValueModel, TestSuiteModel, and UserGroupModel for demonstrations of using data models.

Model Items

Items stored in the model are subclasses of GitDS::ModelItem or GitDS::FsModelItem. Note that all GitDS::ModelIten objects must invoke the GitDS::ModelItemClass#name method in their class definition:

class DbThing < GitDS::ModelItem
  # name of class (and of subdirectory where class instances appear)
  name 'thing'
  # ...
end

To wrap an existing class hierarchy in ModelItems (e.g. in an ORM), use the ModelItem modules instead of subclassing:

# DB-only object
class DbThing < Thing
  extend GitDS::ModelItemClass
  include GitDS::ModelItemObject

  name 'thing'
end

# FS and DB object
class DbThing < Thing
  extend GitDS::FsModelItemClass
  include GitDS::FsModelItemObject

  name 'thing'
end

ModelItem vs FsModelItem

When using DB-only GitDS::ModelItems, the working directory will ALWAYS be missing files. This means that commits and such should only be done from with the GitDS repo or database entries WILL BE DELETED when command-line tools are run. To avoid this problem, use only GitDS::FsModelItem classes.

Root Items

All items in a model are children of the model’s root item.

The root GitDS::ModelItem for a GitDS::Model can be accessed through GitDS::Model#root.

Properties

GitDS::ModelItem objects can define properties (PropertyDefinition objects) using the GitDS::ModelItemClass#property method:

class DbThing < ModelItem
  name 'thing'

  property :foo

  # 'bar' property defaults to false
  property :bar, false

  # 'baz' property is validated to ensure it is an integer
  property(:baz, 0) { |val| val.to_s.to_i == val }
end

Properties can be accessed using GitDS::ModelItemObject#property and GitDS::ModelItemObject#set_property :

class DbThing < ModelItem
  name 'thing'

  property :foo

  def foo
    property(:foo)
  end

  def foo=(val)
    set_property(:foo, val)
  end
end

Properties are stored as Strings; any object supporting to_s can be stored in a property. When reading a property, the String value is returned unless one of the special accessor methods is used:

class DbThing < ModelItem
  name 'thing'

  property :foo

  # Access foo as String
  def foo
    property(:foo)
  end

  # Access foo as an Integer
  def foo_to_i
    integer_property(:foo)
  end

  # Access foo as a Float
  def foo_to_f
    float_property(:foo)
  end

  # Access foo as a Bool
  def foo_to_b
    bool_property(:foo)
  end

  # Access foo as a Time
  def foo_to_ts
    ts_property(:foo)
  end

  # Access foo as an Array
  def foo_to_a
    array_property(:foo)
  end

end

ModelItem Initialization

GitDS::ModelItem objects are created in two stages.

First, they are created in the repository using GitDS::ModelItemClass#create. This takes a parent object and a Hash of arguments, and invokes GitDS::ModelItemClass#fill to generate the ModelItem subtree in the repository. The default implementation of GitDS::ModelItemClass#fill creates leaf nodes for all properties supplied in the args Hash, but it can be subclassed to create additional children:

def self.fill(model, item_path, args)
  super
  # fill the :created property instead of using an args field
  properties[:created].set(model, item_path, Time.now.to_s)
end

def initialize(model, path)
  super
  # initialize other class members
  @local_stuff = []
end

Item Lists

A GitDS::ModelItem may have one or more instances of another GitDS::ModelItem as its children. For example, a CompanyModelItem will have any number of EmployeeModelItem children. In this case, the child GitDS::ModelItems are defined in a GitDS::ModelItem class subtree:

company/ACME Inc/employee/First Guy
company/ACME Inc/employee/Second Guy
company/ACME Inc/employee/Third Guy
company/Fools-R-Us/employee/A Fool
company/Fools-R-Us/employee/Mo Foolz
company/Fools-R-Us/employee/Max Fool

In the above repository, the instances of the CompanyModelItem class (‘ACME Inc’, ‘Fools-R-Us’) have an EmployeeModelItem class directory in which their EmployeeModelItem children ([‘First Guy’, ‘Second Guy’, ‘Third Guy’] or [‘A Fool’, ‘Mo Foolz’, ‘Max Fool’]) are stored. These subdirectories of GitDS::ModelItem class instances are examples of a GitDS::ModelItemList.

A GitDS::ModelItemList is instantiated in the constructor of a GitDS::ModelItem:

def initialize(mode, path)
  super
  @emp = GitDS::ModelItemList.new(EmployeeModelItem, model, path)
end

The items in the list can then be wrapped with accessors:

def employees
  ensure_valid
  @emp.keys
end

def employee(ident)
  ensure_valid
  @emp[ident]
end

def add_employee(e)
  ensure_valid
  @emp.add(self, { :ident => e.ident, :name => e.name })
end

def del_employee(ident)
  ensure_valid
  @emp.delete(ident)
end

This hides the GitDS::ModelItemList behind an interface so that the ModelItems behave as normal object children:

# use existing customer at c_path
c = CustomerModelItem.new(model, c_path)

# use existing employee at e_path
e = EmployeeModelItem.new(model, e_path)

c.add_employee(e)
c.employees.each { |e| puts e.inspect }
puts c.employee(e.ident).inspect

e.del_employee(e.ident)

Note: GitDS::ModelItemList uses the name of the class as the name of the subdirectory in which items are stored in the repo. To change this behavior (for example, if a GitDS::ModelItem has two different lists of the same class of GitDS::ModelItem objects), subclass the GitDS::ModelItem in the list and give it a different name.

Proxy Items

A ModelItem may have a member which refers to another ModelItem which it does not necessarily ‘own’. For example, EmployeeModelItem might have the member ‘boss’ which refers to another EmployeeModelItem.

In such cases, the member is a Proxy for another GitDS::ModelItem. In the repo, a Proxy is a BLOB which contains the path to a GitDS::ModelItem instance.

The GitDS::ModelItemClass#link_property method is used to define a property that is a Proxy for another GitDS::ModelItem. The method takes a property identifier (String or Symbol) and the GitDS::ModelItem class being linked to:

link_property(:name, GitDS::ModelItem)

Note that the Property is a proxy for a class. Internally, this is implemented as an instance of GitDS::ModelItemClassProxy, which associates a named property (i.e. a path to a BLOB in the repo that contains the link data) with a GitDS::ModelItem class. This class is used to instantiate the GitDS::ModelItem from the path stored in the property.

Proxy Item Lists

A ModelItem may have a list of member ModelItems that it does it does not actually own. For example, a MeetingModelItem may have the member ‘attendees’ which is a list of EmployeeModelItem objects.

Such a list is a ProxyItemList.

def initialize(model, path)
  super
  @attn = GitDS::ProxyItemList.new(EmployeeModelItem, model, path)
end

def attendees
  ensure_valid
  @attn.keys
end

def attendee(ident)
  ensure_valid
  @attn[ident]
end

def add_attendee(obj)
  ensure_valid
  @attn.add(self, obj)
end

def del_attendee(ident)
  ensure_valid
  @attn.delete(ident)
end

Note: The ProxyItemList is based on ModelItemList, and uses the name of the proxied class as the subdirectory in which the links are stored in the repo.

Reducing Commits

By default, GitDS writes a commit every time that a GitDS::ModelItem is created, modified, or deleted. This can lead to a huge number of commits, which inflate the database and have an impact on performance.

To cut down on the number of commits, wrap all work in an GitDS::ExecCmd or a GitDS::Transaction:

model.exec {
  ...
}
model.transaction {
  ...
}

All work performed in a model is implicitly wrapped in an GitDS::ExecCmd. These commands can be nested, with a commit only occurring when the outermost command completes. See and GitDS::ExecCmd and GitDs::Transaction.

In order to perform all work in a branch which gets automatically merged, use GitDS::Model#branched_transaction:

model.branched_transaction('version1.9') {
  ...
}

See GitDS::Database#branch_and_merge for more details. The TestSuiteModel example provides an example of using commands, transactions, and branches.

Direct Model Access

In addition to the GitDS::ModelItem classes, the contents of a GitDS::Model can be accessed directly:

# does model include the file 'class/id/property'?
model.include?('class/id/property')

# list the contents of the model root
model.list_children

# list the contents of the 'class/id' directory
model.list_children('class/id')

# Set the contents of the BLOB 'class/id/property' to value
model.add_item('class/id/property', 'value)

# As above, but also create an entry on the filesystem for the BLOB.
model.add_fs_item('class/id/property', 'value)

# Get the contents of the BLOB 'class/id/property'
model.get_item('class/id/property')

# Remove 'class/id/property' from the repository.
model.delete_item('class/id/property')

Finally, the GitDS::Database instance for the model can be accessed through GitDS::Model#db.

Model-level Classes

Database Access

The GitDS::Database class is a subclass of GitDS::Repo; all of the methods of GitDS::Repo are made available.

In the GitDS API, GitDS::Database is considered to be a database connection. A GitDS::Database instance has a single Staging Index that is used by all of its callers. For this reason, it is not recommended that a single GitDS::Database instance be used across multiple threads.

Actor

The author associated with commits to the Git repo. See Grit::Actor.

# Set the Database actor to Grit::Actor.new(name, email)
db.set_author(name, email)

actor = db.actor
db.actor=(Grit::Actor.new(name, email))

Connecting to a Database

To open a GitDS::Database, use the connect() class method. The ‘path’ argument is a path to the root of the Git repository, and ‘autocreate’ will cause a Git repository to be created if it is set (and if the repository does not already exist). Note that ‘autocreate’ is true by default.

db = GitDS::Database.connect('my_stuff.db')

To connect to a GitDS::Database as a specific user (instead of using the default values in .git/config), use the connect_as() class method:

db = GitDS::Database.connect_as('test.db', 'hank', 'hk@users.net')

Closing the database will set the ‘stale’ flag, and cause most subsequent database operations to fail.

db.close

Executing DB Operations

Series of database operations can be enclosed in a block sent to GitDS::Database#exec. This creates a GitDS::ExecCmd object, which performs a commit after the block has been executed. GitDS::ExecCmd is therefore a useful way to group a block of work into a single commit. Note that the GitDS::Database connection and its Stage Index are accessible inside the block via the database and index methods.

db.exec {
  database.list('files').each do |name|
    ...
  end
}

db.exec {
  # override the default commit author and message
  author 'Guy', 'guy@people.org'
  message 'Added one file'

  ...
  index.add('files/1', '111111')
}

Note that the block is executed via instance_eval, so every method of the GitDS::ExecCmd object is available to the block. The use of instance_eval can have unexpected side effects if GitDS#exec is called from within a method of a class instance: the instance methods and members for the calling class are no longer accessible, and must declared in the body of the method calling the exec.

class Stuff 
  attr_accessor :path

  def wrong_way(val)
    db.exec { database.add(path, val) }
  end

  def right_way(val)
    path = self.path
    db.exec { database.add(path, val) }
  end
end

Database commands can be nested. When nested, a commit is only performed when the outermost command has been executed.

db.exec {
  ...
  db.exec {
    ...
    # no commit happens here
  }
  ...
  # commit happens here
}

Note that GitDS::ExecCmd uses GitDS::Database#staging to determine nesting. When GitDS::Database#exec is called, a Stage Index is created in the database if none exists. As long as a Stage Index exists, a GitDS::ExecCmd object will assume it is nested, and therefore will not perform an index.build or a commit after the code block has executed.

Transactions

A GitDS::Transaction is a GitDS::ExecCmd object that ensures the block completes execution before a commit is performed. If the block executes without raising an exception, a commit is performed; otherwise, all changes are discarded.

db.transaction {
  index.add('files/1', '111111')

  # override default commit author and message
  author 'A Developer', 'dev@example.com'
  message '[ADEV] Fixed bug in wossname'
}

A GitDS::Transaction can be aborted with the rollback method, which raises a GitDS::TransactionRollback exception. This will cause the transaction, and all enclosing transactions, to be aborted.

# rollback the transaction
db.transaction {
  ...
  rollback if not obj.some_complex_operation(data)
  ...
}

By default, all exceptions are caught by the GitDS::Transaction. This can make debugging difficult, as application errors will not be detected by the calling code. In order to prevent a GitDS::Transaction from discarding exceptions, invoke the GitDS::Transaction#propagate method in the body of the code block:

# re-raise all non-rollback exceptions
db.transaction {
  propagate
  ...
}

As with GitDS::Database#exec, invocations of GitDS::Database#transaction can be nested, with commits only being performed in the outermost transaction.

db.transaction {
  ...
  db.transaction { 
    ... 
    # no commit is performed
  }
  ...
  # commit is performed
}

Both GitDS::Transaction and GitDS::ExecCmd use the Stage Index to detect nesting; therefore, invoking GitDS::Database#transaction from within GitDS::Database#exec and invoking GitDS::Database#exec from within GitDS::Database#transaction are considered “nesting”.

Managing data

Objects in a GitDS::Database can be modified directly using GitDS::Database#add and GitDS::Database#delete. These use GitDS::Database#exec in order to suppress a commit if a Stage Index already exists.

# Set the contents of the BLOB 'things/mine' to 'abcdef'
db.add('things/mine',  'abcdef')

# Get the contents of the BLOB 'things/mine'
str = db.delete('things/mine')

# Get the Grit::Tree object for 'stuff/' in 'master'
t = db.tree('master', ['stuff/'])

Branch-and-merge

GitDS::Database supports branching of code blocks via GitDS::Database#branch_and_merge, which takes a branch tag and an author as arguments. If a tag is not specified, one will be generated from GitDS::Repo#last_branch_tag.

db.branch_and_merge('0.1.0-pre-alpha') {
  ...
}

db.branch_and_merge('0.1.1', Grit::Actor.new('A Coder')) {
  ...
}

This will create a new branch with the given tag using GitDS::Database#create_branch, perform the code block using GitDS::Database#transaction, then switch to the default branch (‘master’) and merge the created branch with GitDS::Database#merge_branch. Note that the Stage Index is saved before the branch is created and restored after it is merged.

Tagging the latest commit

The latest commit for the GitDS::Database can be tagged using the GitDS::Database#mark :

# Set tag for latest commit
db.mark('v.0.0.9-alpha')

This will tag latest commit as ‘v_0.0.9-alpha’.

Database-level Classes

Repository Access

The lowest level of access provided by GitDS is the Repository-level. Any lower than this and you’re using Grit objects or Git utilities.

Accessing the Index

A GitDS::Index for the repository can be created using GitDS::Repo#index_new:

idx = db.index_new
...
idx.commit('stuff done')

Note that this is a Grit::Index with some helper methods added. Se below for details on using a proper Staging Index.

Staging

The GitDS::Repo object provides a Git-style Staging Index in order to emulate the Git command-line utilities. This index is cached and used by all methods that query or modify the repository. See GitDS::StageIndex.

# get Staging Index, creating one if necessary
idx = db.staging

# set Staging Index to existing GitDS::StageIndex object
idx = GitDS::StageIndex.new(db)
db.staging = idx

# return true if a Staging Index exists
db.staging?

# delete the staging index
db.unstage
# alternative:
db.staging = nil

# perform work using the staging index
db.stage { |idx|
  ...
}

# perform work using the staging index and commit when done
db.stage_and_commit('work done') { |idx|
  ...
}

The Staging Index is used by GitDS::ExecCmd and GitDS::Transaction to determine if they are nested. If a Staging Index exists when entering a command or a transaction, no commit is performed then the command or transaction exits.

Managing data

The contents of the repository can be managed directly using low-level methods:

# does repo include the file 'stuff/thing'?
db.include? 'stuff/thing'

# Set the contents of the BLOB 'stuff/thing' to '1234'
db.add('stuff/thing', '1234')

# Get the contents of the BLOB ''stuff/thing''
str = db.object_data('stuff/thing')

# Remove 'stuff/thing' from the repository.
db.delete('stuff/thing')

# Get the Grit::Tree object for 'stuff/' in 'master'
t = db.tree('master', ['stuff/'])

# Return the raw (git cat-file) contents of 'stuff/', recursing subtrees
str = db.raw_tree('stuff', true)

# Return a Hash with the contents of 'stuff'. Each key is a filename,
# each value is a Grit::Tree or a Grit::Blob.
h = db.list('stuff')

# Return a Hash of the subtrees (Grit::Tree values) in 'stuff'
db.list_trees(path)

# Return a Hash of the files (Grit::Blob values) in 'stuff'
db.list_blobs(path)

Where applicable, these wrap the underlying Grit::Repo methods.

Branch and Merge

A Git branch can be created by specifying a tag name and the SHA of the commit preceding the branch. By default, the latest commit in ‘master’ is used. If a tag is not specified, one will be generated from GitDS::Repo#last_branch_tag. Note that the final (clean) tag name is returned.

cmt = self.commits.first
name = db.create_branch('1.0.rc-4', cmt.id)

To switch to a branch, invoke GitDS::Repo#set_branch with the tag name:

# 'master'
puts db.current_branch
db.set_branch(name)
# '1.0.rc-4'
puts db.current_branch

A branch is merged to the default branch (‘master’) using branch_merge:

db.merge_branch(name, actor)

Tags

Any object can be tagged by invoking GitDS::Repo#tag_object on its SHA:

db.tag_object('Current State', self.commits.first.id)

Git access

The path of the top-level directory in the Git repository for the GitDS::Repo can be obtained through GitDS::Repo#top_level.

Commands can be run in the underlying Git repository:

# Execute block in top-level directory of Git repo
db.exec_in_git_dir(&block)

# Another way to get db.top_level
dir = db.exec_in_git_dir { `git rev-parse --show-toplevel` }

# Create array of paths in repo
files = db.exec_in_git_dir { `git ls-files` }.split("\n")

# Execute 'command' in top-level directory of Git repo as user
db.exec_git_cmd(command, Grit::Actor.new(name, email))

# Commit all changed files as user 'A Developer'
db.exec_git_cmd("git commit -a 'Done.'", 
                Grit::Actor.new('A Developer', 'a@developer.net'))

# Another way to create array of paths in repo
files = db.exec_git_cmd('git ls-files').split("\n")

Repository-level Classes

Support for Git features

Rationale

The module is intended to manage the mundane data access for a git object database by providing standard database CRUD operations.

The notion of a Database and a Data Model were introduced to hide the complexity of using the Git object database as a backend.

More sophisticated manipulation of the repository must be performed using the Git toolchain.

Note: This is not a relational or an ACID-compliant database, and was never intended to be.

Why Git?