arrays - python, scikit-learn - Weird behaviour using LabelShuffleSplit -
following scikit-learn documentation labelshufflesplit, wish randomise train/validation batches ensure i'm training on possible data (e.g. ensemble).
according doc, should see (indeed, notice train/validation sets are evenly split via test_size=0.5
):
>>> sklearn.cross_validation import labelshufflesplit >>> labels = [1, 1, 2, 2, 3, 3, 4, 4] >>> slo = labelshufflesplit(labels, n_iter=4, test_size=0.5, random_state=0) >>> train, test in slo: >>> print("%s %s" % (train, test)) ... [0 1 2 3] [4 5 6 7] [2 3 6 7] [0 1 4 5] [2 3 4 5] [0 1 6 7] [4 5 6 7] [0 1 2 3]
but tried using labels = [0, 0, 0, 0, 0, 0, 0, 0]
returned:
... [] [0 1 2 3 4 5 6 7] [] [0 1 2 3 4 5 6 7] [] [0 1 2 3 4 5 6 7] [] [0 1 2 3 4 5 6 7]
(i.e not evenly split - data has been put validation set?) understand in case doesn't matter indices put train/validation sets, hoping still 50%:50% split???
Comments
Post a Comment