cluster-chronological-split

readme.md

Inputs: Given a dataset chronologically ordered, a set of input fields, a K number and a split proportion
Outputs: Returns a dictionary including a test and training dataset resulting from the implemented split

From the input dataset, trains a K-means Cluster (cluster fields and K as an input)
Batch centroid over the original dataset to add K different cluster labels. This way data is separated in K groups of similarity
For each cluster create a new dataset
Split datasets linearly (using a proportion from the input parameter) keeping the latest for test
Merge all train and test datasets portions into the resulting training and test final datasets

Requirements: The input dataset needs to be ordered chronologically (BigML ordering capabilities can be manually used as desired)

Installation command to be executed from the local directory (bigmler needs to be previously installed):

    bigmler whizzml --package-dir . --output-dir ~/tmp --org-project project/yourProjectId12345