We audit every skill.
Most don't pass.
AI skills for prediction markets can lose you money if they're built wrong. We run a 6-step in-house audit pipeline and reject anything that doesn't meet the bar. No third-party rubber stamps.
rejection rate
audit steps
third parties
turnaround
6-step audit pipeline
What our pipeline catches
Fake data feeds
Skills that declare feeds they never call
Hallucinated outputs
Models that invent numbers under edge cases
Schema drift
Outputs that don't match the declared schema
Latency lies
Published speed vs real-world speed mismatch
Data poisoning
Training data contaminated with survivorship bias
Static backtests
Hardcoded historical data disguised as live
Skills we killed
These looked legit on the surface. They weren't. Here's what our audit caught before they reached users.
Claimed 94% accuracy -- tested at 51%. Fake backtest.
Raw ChatGPT output with no data source. Zero schema.
Hardcoded prices from 2024. No live feed connected.
Returned bullish signal on every input. Adversarial test: 0/12.
Real rejection patterns. Names anonymized. If a skill can't survive our pipeline, it doesn't ship.
6 steps. No shortcuts.
In-house team. Real data. Adversarial stress tests. Every skill, every time.
Schema lockdown
Strict JSON schema validation. Types, required fields, bounds. If your schema is sloppy, you're out before we even run it.
Source audit
We hit every declared data source. If it's fake, stale, or returns garbage -- instant reject. We've seen skills claim "Binance feed" while pulling cached CSVs.
Accuracy benchmark
Run against 90 days of historical data. Probability outputs vs realized outcomes. Minimum Brier score threshold. Most "AI trading" skills die here.
Adversarial attack
We feed garbage: NaN values, negative prices, future dates, empty arrays, 10MB payloads. The skill must fail cleanly, not hallucinate an output.
Latency & uptime
7-day benchmark under load. Published latency must match reality within 20%. We've caught skills that are 1.2s in dev and 8s in production.
Badge or reject
Pass all 6? Verified. Pass 1-4? Data-backed. Everything else stays Template or gets removed entirely.
What the badges actually mean
Ship it.
Survived all 6 audit steps. Accurate against historical data. Handles bad inputs. Latency matches spec. Safe for production workflows.
Use with caution.
Real data sources, valid schema, passes basic accuracy. Has not been adversarially tested. Good for research, not yet for automated execution.
Starting point only.
Valid schema, community-submitted. Outputs are untested. Treat as a scaffold -- customize, validate, then submit for audit if you want to ship it.
Don't get rekt
Audited skills reduce risk. They don't eliminate it. Follow these.
We'll audit your skills too.
Building AI skills for prediction markets, DeFi, or trading? Ship with a Verified badge. We run the same pipeline on your skills that we run on ours.
Single audit
Full 6-step pipeline for one skill or a batch. Detailed report with pass/fail per step, fix recommendations, and badge assignment on pass.
- Full 6-step report
- Fix recommendations
- Badge assignment
- 3-5 day turnaround
B2B partnership
Ongoing audit coverage for teams shipping skills regularly. Full integration into your CI pipeline.
- Priority audit queue
- Auto re-audit on updates
- Dedicated Slack channel
- Monthly accuracy reports