Code segmenter for C.
Code segmenter for COBOL.
Code segmenter for C++.
Code segmenter for C#.
Code segmenter for Elixir.
Code segmenter for Go.
Code segmenter for Java.
Code segmenter for JavaScript.
Code segmenter for Kotlin.
Code segmenter for Lua.
Code segmenter for Perl.
Code segmenter for PHP.
Code segmenter for Python.
Code segmenter for Ruby.
Code segmenter for Rust.
Code segmenter for Scala.
Code segmenter for SQL. This class uses Tree-sitter to segment SQL code into its constituent statements (e.g., SELECT, CREATE TABLE). It also provides functionality to extract these statements and simplify the code into commented descriptions.
Code segmenter for TypeScript.
Parse using the respective programming language syntax.
Each top-level function and class in the code is loaded into separate documents. Furthermore, an extra document is generated, containing the remaining top-level code that excludes the already segmented functions and classes.
This approach can potentially improve the accuracy of QA models over source code.
The supported languages for code parsing are:
esprima)Items marked with (*) require the packages tree_sitter and
tree_sitter_languages. It is straightforward to add support for additional
languages using tree_sitter, although this currently requires modifying LangChain.
The language used for parsing can be configured, along with the minimum number of lines required to activate the splitting based on syntax.
If a language is not explicitly specified, LanguageParser will infer one from
filename extensions, if present.
Examples:
.. code-block:: python
from langchain_community.document_loaders.generic import GenericLoader
from langchain_community.document_loaders.parsers import LanguageParser
loader = GenericLoader.from_filesystem(
"./code",
glob="**/*",
suffixes=[".py", ".js"],
parser=LanguageParser()
)
docs = loader.load()
Example instantiations to manually select the language:
.. code-block:: python
loader = GenericLoader.from_filesystem(
"./code",
glob="**/*",
suffixes=[".py"],
parser=LanguageParser(language="python")
)
Example instantiations to set number of lines threshold:
.. code-block:: python
loader = GenericLoader.from_filesystem(
"./code",
glob="**/*",
suffixes=[".py"],
parser=LanguageParser(parser_threshold=200)
)