Access to the TensorFlow Datasets.
The Current implementation can work only with datasets that fit in a memory.
TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow
or other Python ML frameworks, such as Jax. All datasets are exposed
as tf.data.Datasets.
To get started see the Guide: https://www.tensorflow.org/datasets/overview and
the list of datasets: https://www.tensorflow.org/datasets/catalog/
overview#all_datasets
TensorflowDatasets()a function that:
a sample from the dataset-specific format to the Document.
Example:
.. code-block:: python
from langchain_community.utilities import TensorflowDatasets
def mlqaen_example_to_document(example: dict) -> Document: return Document( page_content=decode_to_str(example["context"]), metadata={ "id": decode_to_str(example["id"]), "title": decode_to_str(example["title"]), "question": decode_to_str(example["question"]), "answer": decode_to_str(example["answers"]["text"][0]), }, )
tsds_client = TensorflowDatasets( dataset_name="mlqa/en", split_name="train", load_max_docs=MAX_DOCS, sample_to_document_function=mlqaen_example_to_document, )