| Name | Type | Description |
|---|---|---|
spark_session | Optional[SparkSession] | Default: NoneA SparkSession object. If not provided, one will be created. |
catalog | Optional[str] | Default: NoneThe catalog to use. If not provided, the default catalog will be used. |
schema | Optional[str] | Default: None |
ignore_tables | Optional[List[str]] | Default: None |
include_tables | Optional[List[str]] | Default: None |
sample_rows_in_table_info | int | Default: 3 |
Get information about specified tables.
Follows best practices as specified in: Rajkumar et al, 2022 (https://arxiv.org/abs/2204.00498)
If sample_rows_in_table_info, the specified number of sample rows will be
appended to each table description. This can increase performance as
demonstrated in the paper.
SparkSQL is a utility class for interacting with Spark SQL.
The schema to use. If not provided, the default schema will be used.
A list of tables to ignore. If not provided, all tables will be used.
A list of tables to include. If not provided, all tables will be used.
The number of rows to include in the table info. Defaults to 3.