Metadata-Version: 2.1
Name: estream
Version: 0.0.3
Summary: An E-Stream implementation in Python
Home-page: https://github.com/mickeycj/estream
Author: Chanon Jenakom
Author-email: chanonjenakom@gmail.com
License: UNKNOWN
Keywords: machine learning,data mining,stream clustering,evolution-based
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent

.. image:: https://img.shields.io/pypi/v/estream.svg
    :target: https://pypi.python.org/pypi/estream
    :alt: PyPI Version
.. image:: https://img.shields.io/pypi/l/estream.svg
    :target: https://github.com/mickeycj/estream/blob/master/LICENSE
    :alt: License
.. image:: https://travis-ci.org/mickeycj/estream.svg
    :target: https://travis-ci.org/mickeycj/estream
    :alt: Travis CI Build Status

====================================
An E-Stream implementation in Python
====================================

E-Stream is an evolution-based technique for stream clustering which supports
five behaviors:

1. Appearance
2. Disappearance
3. Self-evolution
4. Merge
5. Split

These behaviors are achieved by representing each cluster as a *Fading Cluster
Structure with Histogram (FCH)*, utilizing a histogram for each feature of the
data.

The details for the underlying concepts can be found `here <https://www.researchgate.net/publication/221571035_E-Stream_Evolution-Based_Technique_for_Stream_Clustering>`_:

Udommanetanakit, K, Rakthanmanon, T, Waiyamai, K, *E-Stream: Evolution-Based
Technique for Stream Clustering*, Advanced Data Mining and Applications: Third
International Conference, 2007

-------------------
How to use E-Stream
-------------------

The ``estream`` package aims to be substibutable with ``sklearn`` classes so it
can be used interchangably with other transformers with similar API.

.. code-block:: python

    from estream import EStream
    from sklearn.datasets.samples_generator import make_blobs

    estream = EStream()
    data, _ = make_blobs()

    estream.fit(data)

E-Stream contains a number of parameters that can be set; the major ones are as
follows:

- ``max_clusters``: This limits the number of clusters the clustering can have
  before the existing clusters have to be merged. The default is set to
  *10*.
- ``stream_speed/decay_rate``: These determine the fading factor of the
  clusters. In this implementation, the fading function is constant derived
  from the default values of *10* and *0.1*, respectively.
- ``remove_threshold``: This sets the lower bound for each cluster's weight
  before they are considered to be removed. The default is set to *0.1*.
- ``merge_threshold``: This determines whether two close clusters can be merged
  togther. The default is set to *1.25*.
- ``radius_threshold``: This determines the minimum range from an existing
  cluster that a new data must be in order to be merged into one. The default
  is set to *3.0*.
- ``active_threshold``: This sets the minimum weight of each cluster before
  they are considered active. The default is set to *5.0*.

An example of setting these parameters:

.. code-block:: python

    from estream import EStream
    from sklearn.datasets.samples_generator import make_blobs

    estream = EStream(max_clusters=5,
                      merge_threshold=0.5,
                      radius_threshold=1.5,
                      active_threshold=3.0)
    data, _ = make_blobs()

    estream.fit(data)

------------
Installation
------------

Currently, the package is only available through either ``PyPI``:

.. code-block:: bash

    pip install estream

or a manual install:

.. code-block:: bash

    wget https://github.com/mickeycj/estream/archive/master.zip
    unzip master.zip
    rm master.zip
    cd estream-master
    python setup.py install

--------------
Help & Support
--------------

Currently, there is no dedicated documentation available, so any questions or
issues can be asked via my `email <chanonjenakom@gmail.com>`_.

--------
Citation
--------

If you make use of this software for your work, please cite the paper from the
Advanced Data Mining and Applications: Third International Conference:

.. code-block:: bibtex

    @inproceedings{inproceedings,
        author = {Udommanetanakit, Komkrit, and Rakthanmanon, Thanawin and Waiyamai, Kitsana},
        year = {2007},
        month = {08},
        pages = {605-615},
        title = {E-Stream: Evolution-Based Technique for Stream Clustering},
        volume = {4632},
        doi = {10.1007/978-3-540-73871}
    }

Moreover, this implementation is based on a MOA implementaion of E-Stream (and
other related algorithms) by `David Ratier <https://gitub.com/ratierd>`_. The
original source code can be found in this `repository <https://gitub.com/ratierd/MOA>`_.

-------
License
-------

The ``estream`` package is under the GNU General Public License.

------------
Contributing
------------

Contributions are always welcome! Everything ranging from code to notebooks and
examples/documentation will be very valuable to the growth of this project. To
contribute, please `fork this project <https://github.com/mickeycj/estream/issues#fork-destination-box>`_
, make your changes and submit a pull request. I will do my best to fix any
issues and merge your code into the main branch.

:Author: Chanon Jenakom
:Version: 0.0.3
:Dedicated: To DAKDL, Kasetsart University


