Load a CSV file into a list of Document objects.
Each document represents one row of the CSV file. Every row is converted into a key/value pair and outputted to a new line in the document's page_content.
The source for each document loaded from csv is set to the value of the
file_path argument for all documents by default.
You can override this by setting the source_column argument to the
name of a column in the CSV file.
The source of each document will then be set to the value of the column
with the name specified in source_column.
Load CSV files using Unstructured.
Like other Unstructured loaders, UnstructuredCSVLoader can be used in both "single" and "elements" mode. If you use the loader in "elements" mode, the CSV file will be a single Unstructured Table element. If you use the loader in "elements" mode, an HTML representation of the table will be available in the "text_as_html" key in the document metadata.
from langchain_community.document_loaders.csv_loader import UnstructuredCSVLoader
loader = UnstructuredCSVLoader("stanley-cups.csv", mode="elements") docs = loader.load()
Load files using Unstructured.
The file loader uses the unstructured partition function and will automatically detect the file type. You can run the loader in different modes: "single", "elements", and "paged". The default "single" mode will return a single langchain Document object. If you use "elements" mode, the unstructured library will split the document into elements such as Title and NarrativeText and return those as individual langchain Document objects. In addition to these post-processing modes (which are specific to the LangChain Loaders), Unstructured has its own "chunking" parameters for post-processing elements into more useful chunks for uses cases such as Retrieval Augmented Generation (RAG). You can pass in additional unstructured kwargs to configure different unstructured settings.
from langchain_community.document_loaders import UnstructuredFileLoader
loader = UnstructuredFileLoader( "example.pdf", mode="elements", strategy="fast", ) docs = loader.load()
https://docs.unstructured.io/open-source/core-functionality/partitioning https://docs.unstructured.io/open-source/core-functionality/chunking