View source on GitHub |
A generic dataset class for loading model training and evaluation dataset.
mediapipe_model_maker.model_util.dataset.Dataset(
tf_dataset: tf.data.Dataset, size: Optional[int] = None
)
For each ML task, such as image classification, text classification etc., a subclass can be derived from this class to provide task-specific data loading utilities.
Attributes | |
---|---|
size
|
Returns the size of the dataset.
Same functionality as calling len. See the len method definition for more information. |
Methods
gen_tf_dataset
gen_tf_dataset(
batch_size: int = 1,
is_training: bool = False,
shuffle: bool = False,
preprocess: Optional[Callable[..., Any]] = None,
drop_remainder: bool = False
) -> tf.data.Dataset
Generates a batched tf.data.Dataset for training/evaluation.
Args | |
---|---|
batch_size
|
An integer, the returned dataset will be batched by this size. |
is_training
|
A boolean, when True, the returned dataset will be optionally shuffled and repeated as an endless dataset. |
shuffle
|
A boolean, when True, the returned dataset will be shuffled to create randomness during model training. |
preprocess
|
A function taking three arguments in order, feature, label and boolean is_training. |
drop_remainder
|
boolean, whether the finally batch drops remainder. |
Returns | |
---|---|
A TF dataset ready to be consumed by Keras model. |
split
split(
fraction: float
) -> Tuple[_DatasetT, _DatasetT]
Splits dataset into two sub-datasets with the given fraction.
Primarily used for splitting the data set into training and testing sets.
Args | |
---|---|
fraction
|
A float value defines the fraction of the first returned subdataset in the original data. |
Returns | |
---|---|
The splitted two sub datasets. |
__len__
__len__() -> int
Returns the number of element of the dataset.
If size is not set, this method will fallback to using the len method of the tf.data.Dataset in self._dataset. Calling len on a tf.data.Dataset instance may throw a TypeError because the dataset may be lazy-loaded with an unknown size or have infinite size.
In most cases, however, when an instance of this class is created by helper functions like 'from_folder', the size of the dataset will be preprocessed, and the _size instance variable will be already set.
Raises | |
---|---|
TypeError if self._size is not set and the cardinality of self._dataset is INFINITE_CARDINALITY or UNKNOWN_CARDINALITY. |