- Inputs: Given a dataset chronologically ordered, a set of input fields, a K number and a split proportion
- Outputs: Returns a dictionary including a test and training dataset resulting from the implemented split
- From the input dataset, trains a K-means Cluster (cluster fields and K as an input)
- Batch centroid over the original dataset to add K different cluster labels. This way data is separated in K groups of similarity
- For each cluster create a new dataset
- Split datasets linearly (using a proportion from the input parameter) keeping the latest for test
- Merge all train and test datasets portions into the resulting training and test final datasets
Requirements: The input dataset needs to be ordered chronologically (BigML ordering capabilities can be manually used as desired)
Installation command to be executed from the local directory (bigmler needs to be previously installed):
bigmler whizzml --package-dir . --output-dir ~/tmp --org-project project/yourProjectId12345