Introducing Google AI Edge Portal: Benchmark Edge AI at scale. Sign-up to request access during private preview.

mediapipe_model_maker.image_classifier.Dataset

Dataset library for image classifier.

Inherits From: ClassificationDataset, Dataset

mediapipe_model_maker.image_classifier.Dataset(
    dataset: tf.data.Dataset,
    label_names: List[str],
    size: Optional[int] = None
)

Args
`tf_dataset`	A tf.data.Dataset object that contains a potentially large set of elements, where each element is a pair of (input_data, target). The `input_data` means the raw input data, like an image, a text etc., while the `target` means the ground truth of the raw input data, e.g. the classification label of the image etc.
`size`	The size of the dataset. tf.data.Dataset donesn't support a function to get the length directly since it's lazy-loaded and may be infinite.

Attributes
`label_names`
`num_classes`
`size`	Returns the size of the dataset. Same functionality as calling len. See the len method definition for more information.

Attributes

label_names

num_classes

size

Returns the size of the dataset.

Same functionality as calling len. See the len method definition for more information.

Methods

`from_folder`

View source

@classmethod
from_folder(
    dirname: str, shuffle: bool = True
) -> mediapipe_model_maker.face_stylizer.dataset.classification_dataset.ClassificationDataset

Loads images and labels from the given directory.

Assume the image data of the same label are in the same subdirectory.

Args
`dirname`	Name of the directory containing the data files.
`shuffle`	boolean, if true, random shuffle data.

Returns
Dataset containing images and labels and other related info.

Raises
`ValueError`	if the input data directory is empty.

`gen_tf_dataset`

View source

gen_tf_dataset(
    batch_size: int = 1,
    is_training: bool = False,
    shuffle: bool = False,
    preprocess: Optional[Callable[..., Any]] = None,
    drop_remainder: bool = False
) -> tf.data.Dataset

Generates a batched tf.data.Dataset for training/evaluation.

Args
`batch_size`	An integer, the returned dataset will be batched by this size.
`is_training`	A boolean, when True, the returned dataset will be optionally shuffled and repeated as an endless dataset.
`shuffle`	A boolean, when True, the returned dataset will be shuffled to create randomness during model training.
`preprocess`	A function taking three arguments in order, feature, label and boolean is_training.
`drop_remainder`	boolean, whether the finally batch drops remainder.

Returns
A TF dataset ready to be consumed by Keras model.

`split`

View source

split(
    fraction: float
) -> Tuple[ds._DatasetT, ds._DatasetT]

Splits dataset into two sub-datasets with the given fraction.

Primarily used for splitting the data set into training and testing sets.

Args
`fraction`	float, demonstrates the fraction of the first returned subdataset in the original data.

Returns
The splitted two sub datasets.

`len`

View source

__len__() -> int

Returns the number of element of the dataset.

If size is not set, this method will fallback to using the len method of the tf.data.Dataset in self._dataset. Calling len on a tf.data.Dataset instance may throw a TypeError because the dataset may be lazy-loaded with an unknown size or have infinite size.

In most cases, however, when an instance of this class is created by helper functions like 'from_folder', the size of the dataset will be preprocessed, and the _size instance variable will be already set.

Raises
TypeError if self._size is not set and the cardinality of self._dataset is INFINITE_CARDINALITY or UNKNOWN_CARDINALITY.

mediapipe_model_maker.image_classifier.Dataset

Args

Attributes

Methods

from_folder

gen_tf_dataset

split

__len__

`from_folder`

`gen_tf_dataset`

`split`

`len`