This code implements the online algorithm for selecting a representative subset from streaming data, as described in this paper:
Paige, B., Sejdinovic, D., & Wood, F. (2016). Super-sampling with a Reservoir. In Proceedings of the 32nd Annual Conference on Uncertainty in Artificial Intelligence, UAI 32: 567–576.
For usage, see the example notebooks:
The code is not particularly optimized at this point. In particular, overhead from explicit looping over data structures in python means the online algorithm can be slower than a batch algorithm for moderately-sized data.