Load MediaWiki dump from an XML file.
Example:
.. code-block:: python
from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_community.document_loaders import MWDumpLoader
loader = MWDumpLoader( file_path="myWiki.xml", encoding="utf8" ) docs = loader.load() text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=0 ) texts = text_splitter.split_documents(docs)
:param file_path: XML local file path :type file_path: str :param encoding: Charset encoding, defaults to "utf8" :type encoding: str, optional :param namespaces: The namespace of pages you want to parse. See https://www.mediawiki.org/wiki/Help:Namespaces#Localisation for a list of all common namespaces :type namespaces: List[int],optional :param skip_redirects: TR=rue to skip pages that redirect to other pages, False to keep them. False by default :type skip_redirects: bool, optional :param stop_on_error: False to skip over pages that cause parsing errors, True to stop. True by default :type stop_on_error: bool, optional