Various Utility Functions
The help IDs live as ObjectId in MongoDB and str in Langchain and JSON.
These are duplicated from langchain_community to avoid cross-dependencies.
Functions "maximal_marginal_relevance" and "cosine_similarity" are duplicated in this utility respectively from modules:
- "libs/community/langchain_community/vectorstores/utils.py"
- "libs/community/langchain_community/utils/math.py"
Row-wise cosine similarity between two equal-width matrices.
Compute Maximal Marginal Relevance (MMR).
MMR is a technique used to select documents that are both relevant to the query and diverse among themselves. This function returns the indices of the top-k embeddings that maximize the marginal relevance.
Attempt to cast string representation of id to MongoDB's internal BSON ObjectId.
To be consistent with ObjectId, input must be a 24 character hex string. If it is not, MongoDB will happily use the string in the main _id index. Importantly, the str representation that comes out of MongoDB will have this form.
Convert MongoDB's internal BSON ObjectId into a simple str for compatibility.
Instructive helper to show where data is coming out of MongoDB.
Recursively cast values in a dict to a form able to json.dump
Prepare a query for vector search based on the embedding type.
This function checks if the embedding is an AutoEmbeddings instance. If it is, the query is returned as-is (string) for server-side embedding. Otherwise, the query is embedded using the embedding model's embed_query method.