Method●Since v0.3

create_index

Creates an index using the index name specified at instance construction

Setting the numLists parameter correctly is important for achieving good accuracy and performance. Since the vector store uses IVF as the indexing strategy, you should create the index only after you have loaded a large enough sample documents to ensure that the centroids for the respective buckets are faily distributed.

We recommend that numLists is set to documentCount/1000 for up to 1 million documents and to sqrt(documentCount) for more than 1 million documents. As the number of items in your database grows, you should tune numLists to be larger in order to achieve good latency performance for vector search.

If you're experimenting with a new scenario or creating a
small demo, you can start with numLists
set to 1 to perform a brute-force search across all vectors.
This should provide you with the most
accurate results from the vector search, however be aware that
the search speed and latency will be slow.
After your initial setup, you should go ahead and tune
the numLists parameter using the above guidance.

create_index(
  self,
  num_lists: int = 100,
  dimensions: int = 1536,
  similarity: CosmosDBSimilarityType = CosmosDBSimilarityType.COS,
  kind: str = 'vector-ivf',
  m: int = 16,
  ef_construction: int = 64,
  max_degree: int = 32,
  l_build: int = 50
) -> dict[str, Any]

Returns: An object describing the created index

Parameters

Name	Type	Description
`kind`	`str`	Default:`'vector-ivf'` Type of vector index to create. Possible options are: - vector-ivf - vector-hnsw: available as a preview feature only, to enable visit https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/preview-features - vector-diskann: available as a preview feature only
`num_lists`	`int`	Default:`100` This integer is the number of clusters that the inverted file (IVF) index uses to group the vector data. We recommend that numLists is set to documentCount/1000 for up to 1 million documents and to sqrt(documentCount) for more than 1 million documents. Using a numLists value of 1 is akin to performing brute-force search, which has limited performance
`dimensions`	`int`	Default:`1536` Number of dimensions for vector similarity. The maximum number of supported dimensions is 2000
`similarity`	`CosmosDBSimilarityType`	Default:`CosmosDBSimilarityType.COS` Similarity metric to use with the IVF index. Possible options are: - CosmosDBSimilarityType.COS (cosine distance), - CosmosDBSimilarityType.L2 (Euclidean distance), and - CosmosDBSimilarityType.IP (inner product).
`m`	`int`	Default:`16` The max number of connections per layer (16 by default, minimum value is 2, maximum value is 100). Higher m is suitable for datasets with high dimensionality and/or high accuracy requirements.
`ef_construction`	`int`	Default:`64` the size of the dynamic candidate list for constructing the graph (64 by default, minimum value is 4, maximum value is 1000). Higher ef_construction will result in better index quality and higher accuracy, but it will also increase the time required to build the index. ef_construction has to be at least 2 * m
`max_degree`	`int`	Default:`32` Max number of neighbors. Default value is 32, range from 20 to 2048. Only vector-diskann search supports this for now.
`l_build`	`int`	Default:`50` l value for index building. Default value is 50, range from 10 to 500. Only vector-diskann search supports this for now.

View source on GitHub

create_index

Creates an index using the index name specified at instance construction

If you're experimenting with a new scenario or creating a
small demo, you can start with numLists
set to 1 to perform a brute-force search across all vectors.
This should provide you with the most
accurate results from the vector search, however be aware that
the search speed and latency will be slow.
After your initial setup, you should go ahead and tune
the numLists parameter using the above guidance.

create_index( self, num_lists: int = 100, dimensions: int = 1536, similarity: CosmosDBSimilarityType = CosmosDBSimilarityType.COS, kind: str = 'vector-ivf', m: int = 16, ef_construction: int = 64, max_degree: int = 32, l_build: int = 50 ) -> dict[str, Any]

Parameters

Name	Type	Description
`kind`	`str`	Default:`'vector-ivf'` Type of vector index to create. Possible options are: - vector-ivf - vector-hnsw: available as a preview feature only, to enable visit https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/preview-features - vector-diskann: available as a preview feature only
`num_lists`	`int`	Default:`100` This integer is the number of clusters that the inverted file (IVF) index uses to group the vector data. We recommend that numLists is set to documentCount/1000 for up to 1 million documents and to sqrt(documentCount) for more than 1 million documents. Using a numLists value of 1 is akin to performing brute-force search, which has limited performance
`dimensions`	`int`	Default:`1536` Number of dimensions for vector similarity. The maximum number of supported dimensions is 2000
`similarity`	`CosmosDBSimilarityType`	Default:`CosmosDBSimilarityType.COS` Similarity metric to use with the IVF index. Possible options are: - CosmosDBSimilarityType.COS (cosine distance), - CosmosDBSimilarityType.L2 (Euclidean distance), and - CosmosDBSimilarityType.IP (inner product).
`m`	`int`	Default:`16` The max number of connections per layer (16 by default, minimum value is 2, maximum value is 100). Higher m is suitable for datasets with high dimensionality and/or high accuracy requirements.
`ef_construction`	`int`	Default:`64` the size of the dynamic candidate list for constructing the graph (64 by default, minimum value is 4, maximum value is 1000). Higher ef_construction will result in better index quality and higher accuracy, but it will also increase the time required to build the index. ef_construction has to be at least 2 * m
`max_degree`	`int`	Default:`32` Max number of neighbors. Default value is 32, range from 20 to 2048. Only vector-diskann search supports this for now.
`l_build`	`int`	Default:`50` l value for index building. Default value is 50, range from 10 to 500. Only vector-diskann search supports this for now.

create_index

Parameters

LangChain Assistant

Menu

create_index

Parameters