Replies: 1 comment
-
@kpango Hi, can you help me with this? Sorry if it's the wrong place to ask questions. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Greetings,
I am highly interested in exploring the potential of leveraging Vald for our large-scale ML project. However, my understanding is that Vald's operational model necessitates maintaining all vectors in RAM persistently, implicating significant resource allocation and, consequently, substantial financial investment. I am curious if there exists an alternative option to store data on disk, at least partially.
For instance, we anticipate accumulating approximately 1 billion records with a dimensionality of 1024 within a few months. By applying your provided formula, I have calculated this would demand around 6.5 TB of memory. Such a requirement poses a substantial cost, which is challenging to justify for our non-profit initiative.
Could you verify the accuracy of my estimates, or perhaps provide examples or benchmarks of applications managing data volumes of this magnitude? Additionally, should we succeed in securing funding, would scaling simply entail operating a larger cluster with ample RAM, then dynamically adjusting the number of agent pods in accordance with real-time use and data growth? Is there a threshold at which performance significantly degrades, or is adequate resourcing the sole requirement? Further, may we anticipate the availability of benchmark results on your website in the near future?
Thank you for your attention to these matters.
Beta Was this translation helpful? Give feedback.
All reactions