Vaulty booting
// VAULTY
INITIALIZING_
BOOTING VAULTY OS…0%
VAULTY
DOCS / WHITEPAPER

VAULTY IS A REAL UTILITY.

Vaulty isn't just a meme arcade game. Every chat with the guardian generates a structured adversarial AI safety record — the same kind of data labs and enterprises pay red-team firms five and six figures to produce.

01 / WHAT IT IS

An adversarial data engine disguised as a game

Frontier AI labs (OpenAI, Anthropic, Google, Meta) all need one thing to keep their models safe: real, diverse, creative jailbreak attempts. Synthetic attacks are weak. The best signal comes from humans actively trying to break a model — with a real incentive to succeed.

Vaulty puts a real on-chain prize behind a single instruction: don't leak the code. Players try to break it. We log everything: the prompt, the model's response, a jailbreak score, technique tags (roleplay, encoding, instruction injection, DAN, authority impersonation, etc.), and whether the vault cracked.

02 / THE DATASET

Public, structured, research-grade

Every attempt lands in our public dataset (/dataset). Each row contains the sanitized prompt, the model reply, a 0–100 jailbreak score, technique tags, round id, and a cracked / refused verdict.

This is the same shape as datasets used in published AI safety research (HarmBench, JailbreakBench, AdvBench). The difference: ours is generated continuously by motivated humans chasing real money, not contractors on a one-week budget.

  • Free for researchers and individuals.
  • Licensed exports for labs and enterprises.
  • Full reproducibility: model version, system prompt hash, round id.
03 / GOALS

Where we're taking this

GOAL A

Enterprise red-team contracts

Sell scoped jailbreak campaigns to AI labs and enterprise model deployers. They define the target behavior (e.g. 'don't reveal customer PII', 'don't write malware'), we wrap it in a Vaulty round with a bounty, and deliver the full attack corpus + a remediation report.

GOAL B

Become a recognized red-team brand

Publish quarterly state-of-jailbreak reports from the live dataset. Open-source the scoring rubric. Get cited in safety papers. Same playbook as Trail of Bits or HackerOne — but native to LLMs and powered by a crowd, not a small consulting team.

GOAL C

Token-aligned bounty marketplace

$VAULTY funds the prize pool today. Long-term, any organization can post a model + a behavior they want stress-tested, fund a bounty in SOL or $VAULTY, and our community attacks it. Vaulty becomes the protocol layer for crowdsourced AI red-teaming.

04 / ROADMAP

What's shipping

NOW
Public arcade + open dataset

Live game, live dataset, on-chain prize pool funded by $VAULTY trading fees. Anyone can play, anyone can read the data.

NEXT
Researcher API + dataset exports

Stable JSON / Parquet exports, query API, signed dataset snapshots for reproducible research. Free tier for academics, paid tier for labs.

SOON
Custom rounds for partners

Partner-funded rounds with custom system prompts, custom target behaviors, and private deliverables. First pilots with crypto-native AI projects.

LATER
Bounty marketplace + DAO governance

Open marketplace where anyone posts a target. $VAULTY holders curate quality, slash spam, and share platform revenue.

05 / WHY THIS WORKS

The unlock

Red-teaming is one of the few AI safety problems where a crowd genuinely beats experts. Diversity of attack style matters more than depth. A 14-year-old with a roleplay trick often finds bypasses a PhD safety researcher misses.

Crypto rails make global, permissionless, instant payouts possible — no W-9, no Stripe review queue, no country restrictions. That's why the prize is on-chain. That's why $VAULTY exists.

Game on the front. Research infrastructure on the back. Real contracts on the horizon.

WANT TO PARTNER?

Labs, enterprises, researchers — let's talk.

Scoped red-team rounds, dataset licensing, or custom safety evaluations.

built for the curious. handle Vaulty with care.