Kubernetes
If you'd like to try with your own cluster, check out the instructions. They'll walk you through an ephemeral setup using a local cluster. To get Indexify into production, you'll want to modify the YAML so that it works with your environment. In particular, make sure to pay attention to the dependencies.
Components
- API Server - This is where all your requests go. There's an
ingress which exposes
/
by default. - Coordinator - Task scheduler than manages handing work out to the extractors.
- Extractors - Extractors can take multiple forms, this example is generic and works for all the extractors which are distributed by the project.
Dependencies
Blob Store
We recommend using an S3 like service for the blob store. Our ephemeral example uses minio for this. See the environment variable patch for how this gets configured.
GCP
- You'll want to create a HMAC key to use as
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
. - Set
AWS_ENDPOINT_URL
tohttps://storage.googleapis.com/
Other Clouds
Not all clouds expose a S3 interface. For those that don't check out the s3proxy project. However, we'd love help implementing your native blob storage of choice! Please open an issue so that we can have a discussion on how that would look for the project.
Vector Store
We support multiple backends for vectors including LancDb
, Qdrant
and
PgVector
. The ephemeral example uses postgres and PgVector
for this. The
database itself is pretty simple. Pay extra attention to
the patch which configures the API server and collector
to use that backend.
Structured Store
Take a look at the vector store component in kustomize. It implements the structured store as well.