Skip to content

langchain-azure-storage

PyPI - Version PyPI - License PyPI - Downloads

langchain_azure_storage.document_loaders

Azure Blob Storage document loader.

AzureBlobStorageLoader

Bases: BaseLoader

Document loader for LangChain Document objects from Azure Blob Storage.

METHOD DESCRIPTION
__init__

Initialize AzureBlobStorageLoader.

lazy_load

Lazily load documents from Azure Blob Storage.

alazy_load

Asynchronously lazily loads documents from Azure Blob Storage.

load

Load data into Document objects.

aload

Load data into Document objects.

load_and_split

Load Document and split into chunks. Chunks are returned as Document.

__init__

__init__(
    account_url: str,
    container_name: str,
    blob_names: str | Iterable[str] | None = None,
    *,
    prefix: str | None = None,
    credential: _SDK_CREDENTIAL_TYPE = None,
    loader_factory: Callable[[str], BaseLoader] | None = None,
)

Initialize AzureBlobStorageLoader.

PARAMETER DESCRIPTION
account_url

URL to the Azure Storage account, e.g. https://<account_name>.blob.core.windows.net

TYPE: str

container_name

Name of the container to retrieve blobs from in the storage account

TYPE: str

blob_names

List of blob names to load. If None, all blobs will be loaded.

TYPE: str | Iterable[str] | None DEFAULT: None

prefix

Prefix to filter blobs when listing from the container. Cannot be used with blob_names.

TYPE: str | None DEFAULT: None

credential

Credential to authenticate with the Azure Storage account. If None, DefaultAzureCredential will be used.

TYPE: _SDK_CREDENTIAL_TYPE DEFAULT: None

loader_factory

Optional callable that returns a custom document loader (e.g. UnstructuredLoader) for parsing downloaded blobs. If provided, the blob contents will be downloaded to a temporary file whose name gets passed to the callable. If None, content will be returned as a single Document with UTF-8 text.

TYPE: Callable[[str], BaseLoader] | None DEFAULT: None

lazy_load

lazy_load() -> Iterator[Document]

Lazily load documents from Azure Blob Storage.

YIELDS DESCRIPTION
Document

The Document objects.

alazy_load async

alazy_load() -> AsyncIterator[Document]

Asynchronously lazily loads documents from Azure Blob Storage.

YIELDS DESCRIPTION
AsyncIterator[Document]

The Document objects.

load

load() -> list[Document]

Load data into Document objects.

RETURNS DESCRIPTION
list[Document]

The documents.

aload async

aload() -> list[Document]

Load data into Document objects.

RETURNS DESCRIPTION
list[Document]

The documents.

load_and_split

load_and_split(text_splitter: TextSplitter | None = None) -> list[Document]

Load Document and split into chunks. Chunks are returned as Document.

Danger

Do not override this method. It should be considered to be deprecated!

PARAMETER DESCRIPTION
text_splitter

TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

TYPE: TextSplitter | None DEFAULT: None

RAISES DESCRIPTION
ImportError

If langchain-text-splitters is not installed and no text_splitter is provided.

RETURNS DESCRIPTION
list[Document]

List of Document.