# LanguageParser

> **Class** in `langchain_community`

📖 [View in docs](https://reference.langchain.com/python/langchain-community/document_loaders/parsers/language/language_parser/LanguageParser)

Parse using the respective programming language syntax.

Each top-level function and class in the code is loaded into separate documents.
Furthermore, an extra document is generated, containing the remaining top-level code
that excludes the already segmented functions and classes.

This approach can potentially improve the accuracy of QA models over source code.

The supported languages for code parsing are:

- C: "c" (*)
- C++: "cpp" (*)
- C#: "csharp" (*)
- COBOL: "cobol"
- Elixir: "elixir"
- Go: "go" (*)
- Java: "java" (*)
- JavaScript: "js" (requires package `esprima`)
- Kotlin: "kotlin" (*)
- Lua: "lua" (*)
- Perl: "perl" (*)
- Python: "python"
- Ruby: "ruby" (*)
- Rust: "rust" (*)
- Scala: "scala" (*)
- SQL: "sql" (*)
- TypeScript: "ts" (*)

Items marked with (*) require the packages `tree_sitter` and
`tree_sitter_languages`. It is straightforward to add support for additional
languages using `tree_sitter`, although this currently requires modifying LangChain.

The language used for parsing can be configured, along with the minimum number of
lines required to activate the splitting based on syntax.

If a language is not explicitly specified, `LanguageParser` will infer one from
filename extensions, if present.

Examples:

   .. code-block:: python

        from langchain_community.document_loaders.generic import GenericLoader
        from langchain_community.document_loaders.parsers import LanguageParser

        loader = GenericLoader.from_filesystem(
            "./code",
            glob="**/*",
            suffixes=[".py", ".js"],
            parser=LanguageParser()
        )
        docs = loader.load()

    Example instantiations to manually select the language:

    .. code-block:: python

        loader = GenericLoader.from_filesystem(
            "./code",
            glob="**/*",
            suffixes=[".py"],
            parser=LanguageParser(language="python")
        )

    Example instantiations to set number of lines threshold:

    .. code-block:: python

        loader = GenericLoader.from_filesystem(
            "./code",
            glob="**/*",
            suffixes=[".py"],
            parser=LanguageParser(parser_threshold=200)
        )

## Signature

```python
LanguageParser(
    self,
    language: Optional[Language] = None,
    parser_threshold: int = 0,
)
```

## Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `language` | `Optional[Language]` | No | If None (default), it will try to infer language from source. (default: `None`) |
| `parser_threshold` | `int` | No | Minimum lines needed to activate parsing (0 by default). (default: `0`) |

## Extends

- `BaseBlobParser`

## Constructors

```python
__init__(
    self,
    language: Optional[Language] = None,
    parser_threshold: int = 0,
)
```

| Name | Type |
|------|------|
| `language` | `Optional[Language]` |
| `parser_threshold` | `int` |


## Properties

- `language`
- `parser_threshold`

## Methods

- [`lazy_parse()`](https://reference.langchain.com/python/langchain-community/document_loaders/parsers/language/language_parser/LanguageParser/lazy_parse)

---

[View source on GitHub](https://github.com/langchain-ai/langchain-community/blob/4b280287bd55b99b44db2dd849f02d66c89534d5/libs/community/langchain_community/document_loaders/parsers/language/language_parser.py#L103)