Opinion / Commentary

AI Vector Databases Are the New Attack Surface Nobody Inventoried

ChromaDB CVE-2026-45829 is a specific vulnerability in one product. The underlying problem it exposes is structural: enterprise AI deployments are creating new categories of sensitive data storage that are not subject to the security controls applied to comparable databases. The vulnerability is fixable. The architectural gap is not fixed by a patch.

CipherWatch Editorial · Security Intelligence Platform
5 min read

The CVE-2026-45829 disclosure covers a specific vulnerability in a specific product. The fix, when it arrives, will close the specific code path. The problem worth examining is the broader context that produced a CVSS 10.0 vulnerability in widely deployed AI infrastructure that most organisations either do not know they run or have not included in their security assessment programme.

The pattern here is familiar. A new technology category is adopted rapidly across enterprise environments. The technology creates new data stores and new attack surfaces. The security community does not fully assess the new category until an incident or a high-profile disclosure makes the risk concrete. By the time systematic evaluation begins, the technology is already deployed at scale.

It happened with cloud storage (public S3 buckets). It happened with containerisation (exposed Kubernetes dashboards). It happened with IoT devices. It is happening now with AI infrastructure.

What Vector Databases Actually Are, in Security Terms

Strip away the terminology and a vector database is a structured store of encoded document content. The encoding (embedding) is a mathematical representation of the document’s semantic content, but the semantic information can be partially reconstructed from embeddings, and the original documents are typically accessible alongside their embeddings in a well-configured ChromaDB deployment.

From a data classification standpoint, a vector database containing embeddings of internal documents is equivalent to a document database containing the same documents. The access controls that apply to the document database should apply to the vector database. If the document database requires authentication, network isolation, audit logging, and inclusion in penetration test scope, the vector database should receive the same treatment.

This is not a controversial conclusion. It is an obvious application of consistent security principles. And it is not what is happening in most enterprise AI deployments today.

The gap is not malice or deliberate de-prioritisation. It is the velocity of AI adoption combined with the classification of vector databases as infrastructure rather than data stores. Development teams building RAG pipelines think of ChromaDB as a component — a technical means to an end — not as a sensitive data store subject to security controls. The data classification conversation did not happen because the data was not being stored “in a database” in the traditional sense.

The Inventory Problem Precedes the Security Problem

Before you can assess whether your vector databases are secured correctly, you have to know where they are. This is harder than it sounds.

Vector databases in enterprise environments are deployed by development teams, data science teams, and product teams building AI applications. They are often Docker containers on cloud VMs, with no central registry, no CMDB entry, and no security review before deployment. The development team that deployed the container may have left the organisation. The documentation of what is in it may exist only in a Confluence page that has not been updated in six months.

The ChromaDB exposure statistic — 73% of internet-discoverable deployments are vulnerable — reflects this directly. Those are not rogue consumer deployments; they are enterprise AI infrastructure components that someone deployed with default settings and forgot to secure. The combination of “default no authentication” and “default port accessible from the internet” in a cloud deployment produces exactly that exposure.

The standard security advice — “apply the principle of least privilege,” “authenticate all data stores,” “network-isolate sensitive systems” — is correct and has been correct for thirty years. The failure is not in the principles; it is in the scope of what they are applied to. Every time a new category of data store appears, the security perimeter has to be deliberately extended to include it.

That extension does not happen automatically. It requires someone to ask “do we have any of these? where are they? what’s in them? what controls do they have?” — and then to do the same thing again six months later as new deployments appear.

The Patch Is Not the End of the Story

When ChromaDB releases the patch for CVE-2026-45829, the specific code path will close. Organisations that update will no longer be vulnerable to the specific pre-authentication RCE technique disclosed today.

They will still be running vector databases with potentially no authentication, accessible to anyone who can reach the port, containing sensitive embedded documents, without audit logging, without penetration test scope inclusion, and without a documented data classification assessment.

The patch is the minimum viable response. The appropriate response is to use the CVE-2026-45829 disclosure as the trigger for a complete inventory and security assessment of the AI data layer in the enterprise environment. That assessment should produce a list of all vector database instances, their content classification, their current security controls, and a remediation plan for the gaps.

That work takes longer than applying a patch. It is also the work that actually addresses the structural problem rather than the specific vulnerability.

Share this article