Metadata-Version: 2.1
Name: jobarchitect
Version: 0.7.0
Summary: Tools for batching jobs and dealing with file paths
Home-page: https://github.com/JIC-CSB/jobarchitect
Author: Tjelvar Olsson
Author-email: tjelvar.olsson@jic.ac.uk
License: MIT
Download-URL: https://github.com/JIC-CSB/jobarchitect/tarball/0.7.0
Platform: UNKNOWN
Requires-Dist: dtoolcore
Requires-Dist: jinja2

Architect jobs for running analyses
===================================

.. image:: https://badge.fury.io/py/jobarchitect.svg
   :target: http://badge.fury.io/py/jobarchitect
   :alt: PyPi package

.. image:: https://readthedocs.org/projects/jobarchitect/badge/?version=latest
   :target: http://jobarchitect.readthedocs.io/en/latest/?badge=latest
   :alt: Documentation Status

- Documentation: http://jobarchitect.readthedocs.io
- GitHub: https://github.com/JIC-CSB/jobarchitect
- PyPI: https://pypi.python.org/pypi/jobarchitect
- Free software: MIT License


Overview
--------

This tool is intended to automate generation of scripts to run analysis on data
sets. To use it, you will need a data set that has been created (or annotated)
with `dtool <https://github.com/JIC-CSB/dtool>`_.
It aims to help by:

1. Removing the need to know where specific data items are stored in a data set
2. Providing a means to split an analyses into several chunks (file based
   parallelization)
3. Providing a framework for seamlessly running an analyses inside a container


Design
------

This project has two main components. The first is a command line tool named
``sketchjob`` intended to be used by the end user. It is used to generate
scripts defining jobs to be run. The second (``_analyse_by_ids``) is a command
line tool that is used by the scripts generated by ``sketchjob``. The end user
is not meant to make use of this second script directly.


Installation
------------

To install the jobarchitect package.

::

    $ cd jobarchitect
    $ python setup.py install


Use
---

The ``jobarchitect`` tool only works with "smart" tools.
A "smart" tool is a tool that understands `dtoolcore <https://github.com/JIC-CSB/dtoolcore>`_
datasets, has no positional command line arguments and supports the
named arguments ``--dataset-path``, ``--identifier``, ``--output-directory``.
The tool should only process the dataset item specified by the identifier
and write all output to the specified output directory.

A dtool dataset can be created using `dtool <https://github.com/JIC-CSB/dtool>`_.
Below is some sample::

    $ dtool new dataset
    project_name [project_name]:
    dataset_name [dataset_name]: example_dataset
    ...

    $ echo "My example data" > example_dataset/data/my_file.txt
    $ datatool manifest update example_dataset/

Create an output directory::

    $ mkdir output

Then you can generate analysis run scripts with::

    $ sketchjob my_smart_tool.py exmaple_dataset output/
    #!/bin/bash

    _analyse_by_ids \
      --tool_path=my_smart_tool.py \
      --input_dataset_path=example_dataset/ \
      --output_root=output/ \
      290d3f1a902c452ce1c184ed793b1d6b83b59164

Try the script with::

    $ sketchjob my_smart_tool.py exmaple_dataset output/ > run.sh
    $ bash run.sh
    $ cat output/first_image.png
    290d3f1a902c452ce1c184ed793b1d6b83b59164  /private/var/folders/hn/crprzwh12kj95plc9jjtxmq82nl2v3/T/tmp_pTfc6/stg02d730c7-17a2-4d06-a017-e59e14cb8885/first_image.png

Use with split
--------------

The unix command ``split`` is a good way to divide the single large output (that concatenates many command invocations) produced by sketchjob into individual files. For example::

    $ split -n 60 many_slurm_scripts.slurm all_slurm_scripts/submit_segment

Working with Docker
-------------------

Building a Docker image
^^^^^^^^^^^^^^^^^^^^^^^

For the tests to pass, you will need to build an example Docker image, which
you do with the provided script::

    $ bash build_docker_image.sh

Running code with the Docker backend
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By inspecting the script and associcated Docker file, you can get an idea of
how to build Docker images that can be used with the jobarchitect Docker
backend, e.g::

    $ sketchjob scripts/my_smart_tool.py ~/junk/cotyledon_images ~/junk/output --backend=docker --image-name=jicscicomp/jobarchitect
    #!/bin/bash

    IMAGE_NAME=jicscicomp/jobarchitect
    docker run  \
      --rm  \
      -v /Users/olssont/junk/cotyledon_images:/input_dataset:ro  \
      -v /Users/olssont/junk/output:/output  \
      -v /Users/olssont/sandbox/scripts:/scripts:ro \
      $IMAGE_NAME  \
      _analyse_by_ids  \
        --tool_path=/scripts/my_smart_tool.py \
        --input_dataset_path=/input_dataset  \
        --output_root=/output  \
        290d3f1a902c452ce1c184ed793b1d6b83b59164 09648d19e11f0b20e5473594fc278afbede3c9a4


