| Name | Type | Description |
|---|---|---|
file_path* | Union[str, PathLike] | The path to the JSON or JSON Lines file. |
jq_schema* | str | The jq schema to use to extract the data or text from the JSON. |
content_key | str | Default: None |
is_content_key_jq_parsable | bool | Default: False |
metadata_func | Callable[Dict, Dict] | Default: None |
text_content | bool | Default: True |
json_lines | bool | Default: False |
Load a JSON file using a jq schema.
Setup:
.. code-block:: bash
pip install -U jq
Instantiate:
.. code-block:: python
from langchain_community.document_loaders import JSONLoader import json from pathlib import Path
file_path='./sample_quiz.json' data = json.loads(Path(file_path).read_text()) loader = JSONLoader( file_path=file_path, jq_schema='.quiz', text_content=False)
Load:
.. code-block:: python
docs = loader.load()
print(docs[0].page_content[:100])
print(docs[0].metadata)
.. code-block:: python
{"sport": {"q1": {"question": "Which one is correct team name in
NBA?", "options": ["New York Bulls"
{'source': '/sample_quiz
.json', 'seq_num': 1}
Async load:
.. code-block:: python
docs = await loader.aload()
print(docs[0].page_content[:100])
print(docs[0].metadata)
.. code-block:: python
{"sport": {"q1": {"question": "Which one is correct team name in
NBA?", "options": ["New York Bulls"
{'source': '/sample_quizg
.json', 'seq_num': 1}
Lazy load:
.. code-block:: python
docs = []
docs_lazy = loader.lazy_load()
# async variant:
# docs_lazy = await loader.alazy_load()
for doc in docs_lazy:
docs.append(doc)
print(docs[0].page_content[:100])
print(docs[0].metadata)
.. code-block:: python
{"sport": {"q1": {"question": "Which one is correct team name in
NBA?", "options": ["New York Bulls"
{'source': '/sample_quiz
.json', 'seq_num': 1}
The key to use to extract the content from the JSON if the jq_schema results to a list of objects (dict). If is_content_key_jq_parsable is True, this has to be a jq compatible schema. If is_content_key_jq_parsable is False, this should be a simple string key.
A flag to determine if content_key is parsable by jq or not. If True, content_key is treated as a jq schema and compiled accordingly. If False or if content_key is None, content_key is used as a simple string. Default is False.
A function that takes in the JSON object extracted by the jq_schema and the default metadata and returns a dict of the updated metadata.
Boolean flag to indicate whether the content is in string format, default to True.
Boolean flag to indicate whether the input is in JSON Lines format.