Vaulty isn't just a meme arcade game. Every chat with the guardian generates a structured adversarial AI safety record — the same kind of data labs and enterprises pay red-team firms five and six figures to produce.
Frontier AI labs (OpenAI, Anthropic, Google, Meta) all need one thing to keep their models safe: real, diverse, creative jailbreak attempts. Synthetic attacks are weak. The best signal comes from humans actively trying to break a model — with a real incentive to succeed.
Vaulty puts a real on-chain prize behind a single instruction: don't leak the code. Players try to break it. We log everything: the prompt, the model's response, a jailbreak score, technique tags (roleplay, encoding, instruction injection, DAN, authority impersonation, etc.), and whether the vault cracked.
Every attempt lands in our public dataset (/dataset). Each row contains the sanitized prompt, the model reply, a 0–100 jailbreak score, technique tags, round id, and a cracked / refused verdict.
This is the same shape as datasets used in published AI safety research (HarmBench, JailbreakBench, AdvBench). The difference: ours is generated continuously by motivated humans chasing real money, not contractors on a one-week budget.
Sell scoped jailbreak campaigns to AI labs and enterprise model deployers. They define the target behavior (e.g. 'don't reveal customer PII', 'don't write malware'), we wrap it in a Vaulty round with a bounty, and deliver the full attack corpus + a remediation report.
Publish quarterly state-of-jailbreak reports from the live dataset. Open-source the scoring rubric. Get cited in safety papers. Same playbook as Trail of Bits or HackerOne — but native to LLMs and powered by a crowd, not a small consulting team.
$VAULTY funds the prize pool today. Long-term, any organization can post a model + a behavior they want stress-tested, fund a bounty in SOL or $VAULTY, and our community attacks it. Vaulty becomes the protocol layer for crowdsourced AI red-teaming.
Live game, live dataset, on-chain prize pool funded by $VAULTY trading fees. Anyone can play, anyone can read the data.
Stable JSON / Parquet exports, query API, signed dataset snapshots for reproducible research. Free tier for academics, paid tier for labs.
Partner-funded rounds with custom system prompts, custom target behaviors, and private deliverables. First pilots with crypto-native AI projects.
Open marketplace where anyone posts a target. $VAULTY holders curate quality, slash spam, and share platform revenue.
Red-teaming is one of the few AI safety problems where a crowd genuinely beats experts. Diversity of attack style matters more than depth. A 14-year-old with a roleplay trick often finds bypasses a PhD safety researcher misses.
Crypto rails make global, permissionless, instant payouts possible — no W-9, no Stripe review queue, no country restrictions. That's why the prize is on-chain. That's why $VAULTY exists.
Game on the front. Research infrastructure on the back. Real contracts on the horizon.
Scoped red-team rounds, dataset licensing, or custom safety evaluations.