Class●Since v0.3

SparkSQL

SparkSQL is a utility class for interacting with Spark SQL.

SparkSQL(
  self,
  spark_session: Optional[SparkSession] = None,
  catalog: Optional[str] = None,
  schema: Optional[str] = None,
  ignore_tables: Optional[List[str]] = None,
  include_tables: Optional[List[str]] = None,
  sample_rows_in_table_info: int = 3
)

Parameters

Name	Type	Description
`spark_session`	`Optional[SparkSession]`	Default:`None` A SparkSession object. If not provided, one will be created.
`catalog`	`Optional[str]`	Default:`None` The catalog to use. If not provided, the default catalog will be used.
`schema`	`Optional[str]`	Default:`None` The schema to use. If not provided, the default schema will be used.
`ignore_tables`	`Optional[List[str]]`	Default:`None` A list of tables to ignore. If not provided, all tables will be used.
`include_tables`	`Optional[List[str]]`	Default:`None` A list of tables to include. If not provided, all tables will be used.
`sample_rows_in_table_info`	`int`	Default:`3` The number of rows to include in the table info. Defaults to 3.

Constructors

constructor

__init__

Name	Type
spark_session	Optional[SparkSession]
catalog	Optional[str]
schema	Optional[str]
ignore_tables	Optional[List[str]]
include_tables	Optional[List[str]]
sample_rows_in_table_info	int

Methods

method

from_uri

Creating a remote Spark Session via Spark connect. For example: SparkSQL.from_uri("sc://localhost:15002")

method

get_usable_table_names

Get names of tables available.

get_table_info_no_throw

Get information about specified tables.

Follows best practices as specified in: Rajkumar et al, 2022 (https://arxiv.org/abs/2204.00498)

If sample_rows_in_table_info, the specified number of sample rows will be appended to each table description. This can increase performance as demonstrated in the paper.

method

run_no_throw

Execute a SQL command and return a string representing the results.

If the statement returns rows, a string of the results is returned. If the statement returns no rows, an empty string is returned.

If the statement throws an error, the error message is returned.

View source on GitHub

SparkSQL

Parameters

Constructors

Methods

LangChain Assistant

Menu

SparkSQL

Parameters

Constructors

Methods

SparkSQL

Used in Docs

Parameters

Constructors

Methods

Menu

SparkSQL

Used in Docs

Parameters

Constructors

Methods