Class●Since v0.3

MWDumpLoader

MWDumpLoader(
  self,
  file_path: Union[str, Path],
  encoding: Optional[str

Bases

BaseLoader

Constructors

Attributes

Methods

Inherited fromBaseLoader(langchain_core)

Methods

Mload Maload Mload_and_split Malazy_load

View source on GitHub

Name	Type
file_path	Union[str, Path]
encoding	Optional[str]
namespaces	Optional[Sequence[int]]
skip_redirects	Optional[bool]
stop_on_error	Optional[bool]

Load MediaWiki dump from an XML file.

Example:

.. code-block:: python

from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_community.document_loaders import MWDumpLoader

loader = MWDumpLoader( file_path="myWiki.xml", encoding="utf8" ) docs = loader.load() text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=0 ) texts = text_splitter.split_documents(docs)

:param file_path: XML local file path :type file_path: str :param encoding: Charset encoding, defaults to "utf8" :type encoding: str, optional :param namespaces: The namespace of pages you want to parse. See https://www.mediawiki.org/wiki/Help:Namespaces#Localisation for a list of all common namespaces :type namespaces: List[int],optional :param skip_redirects: TR=rue to skip pages that redirect to other pages, False to keep them. False by default :type skip_redirects: bool, optional :param stop_on_error: False to skip over pages that cause parsing errors, True to stop. True by default :type stop_on_error: bool, optional

LangChain Assistant

Menu

MWDumpLoader

Bases

Constructors

Attributes

Methods

Inherited fromBaseLoader(langchain_core)

Methods