Share via


Using the Chroma connector (Preview)

Warning

The Semantic Kernel Vector Store functionality is in preview, and improvements that require breaking changes may still occur in limited circumstances before release.

Not supported

Not supported.

Overview

The Chroma Vector Store connector can be used to access and manage data in Chroma. The connector has the following characteristics.

Feature Area Support
Collection maps to Chroma collection
Supported key property types string
Supported data property types All types
Supported vector property types
  • list[float]
  • list[int]
  • ndarray
Supported index types
  • HNSW
Supported distance functions
  • CosineSimilarity
  • DotProductSimilarity
  • EuclideanSquaredDistance
Supported filter clauses
  • AnyTagEqualTo
  • EqualTo
Supports multiple vectors in a record No
IsFilterable supported? Yes
IsFullTextSearchable supported? Yes

Limitations

Notable Chroma connector functionality limitations.

Feature Area Workaround
Client-server mode Use the client.HttpClient and pass the result to the client parameter, we do not support a AsyncHttpClient at this time
Chroma Cloud Unclear at this time, as Chroma Cloud is still in private preview

Getting Started

Add the Chroma Vector Store connector dependencies to your project.

pip install semantic-kernel[chroma]

You can then create the vector store.

from semantic_kernel.connectors.chroma import ChromaStore

store = ChromaStore()

Alternatively, you can also pass in your own mongodb client if you want to have more control over the client construction:

from chromadb import Client
from semantic_kernel.connectors.chroma import ChromaStore

client = Client(...)
store = ChromaStore(client=client)

You can also create a collection directly, without the store.

from semantic_kernel.connectors.chroma import ChromaCollection

# `hotel` is a class created with the @vectorstoremodel decorator
collection = ChromaCollection(
    record_type=hotel,
    collection_name="my_collection",
)

Serialization

The Chroma client returns both get and search results in tabular form, this means that there are between 3 and 5 lists being returned in a dict, the lists are 'keys', 'documents', 'embeddings', and optionally 'metadatas' and 'distances'. The Semantic Kernel Chroma connector will automatically convert this into a list of dict objects, which are then parsed back to your data model.

It could be very interesting performance wise to do straight serialization from this format into a dataframe-like structure as that saves a lot of rebuilding of the data structure. This is not done for you, even when using container mode, you would have to specify this yourself, for more details on this concept see the serialization documentation.

Not supported

Not supported.