About the Author
Zakaria Eddafri (@0x0w1Pr0xYDEADcAFe, formerly MrOwl) is an ethical hacker from the HackenProof community who began his journey with nothing more than an Android phone.
Self-taught and relentlessly persistent, he worked his way up from early failed attempts to becoming a skilled Web3 vulnerability researcher with dozens of confirmed paid reports.
His path shows how consistency, curiosity, and strong logic skills can outperform expensive tooling — and how anyone can break into Web3 security with the right mindset.
You can read his full origin story here.
Introduction
Smart-contract auditing is changing.
AI-powered workflows are beginning to outperform traditional “read-the-code-line-by-line” audits — but only when used properly, and only when combined with human-driven triage logic.
After months of experimentation across real HackenProof, Immunefi, CodeArena, and private triage pipelines, I’ve built a new auditing methodology that reduces:
❌ 90% false-positives
❌ AI hallucinated vulnerabilities
❌ useless reports rejected by triagers
…and increases:
✅ real attack-path discovery
✅ consistent results across GPT-5 / Claude / Gemini / local LLMs
✅ clarity on business logic and fund flows
✅ triager-aligned reporting
This article introduces that methodology — and the automation that makes it reliable.
The Hidden Weaknesses of Current AI Auditing Tools
AI tools today fail for predictable reasons:
1. AI hallucinate vulnerabilities
AI misinterprets execution paths, overestimates risk, or fabricate pathways that don’t exist.
2. AI lacks business-logic comprehension
AI cannot evaluate protocol design assumptions or intent, such as:
- AMM curve economics
- staking or reward models
- liquidation frameworks
- lending ratios
- oracle-driven state conditions
These areas require domain-specialized human reasoning.
3. AI breaks down on large or complex repos
When dealing with large inheritance trees, deep mappings, oracle interactions, or state machines, AI often:
- loses track of state transitions
- misreads dependency order
- collapses recursion analysis
This leads to incorrect conclusions.
4. AI struggles with cross-contract reasoning
Zero-day exploit patterns or multi-protocol interactions rarely appear in training data.
5. AI is vulnerable to code obfuscation
Some devs write intentionally tricky/messy code.
AI either:
- panics
- flags everything
- or gives up
This leads to massive false positives.
The Solution: A Deterministic AI-Augmented Workflow
AI models become reliable only when operating under strict structure, controlled context, and deterministic constraints.
To achieve this, I built a staged auditing pipeline that transforms the audit process into AI-compatible components, combining:
- Framework prompts (A → G) that replicate triage decision layers
- Contract slicing to reduce irrelevant context
- Function-level dependency extraction
- Call graph isolation
- Recursive dependency crawling
- Zero-tolerance triage filters for validation
The result?
The AI output becomes reproducible, deterministic, and aligned with real-world triage expectations — and it stays consistent across different model families.
The Framework Prompts (A → G)
These prompts create a layered decision-making pipeline, similar to a professional triage environment.
Prompt A — Prioritize Contracts
Rank contracts by:
- funds handled
- external calls
- permissions
- attack surface
This ensures you don’t waste time on irrelevant files.
Prompt B — Contract Deep Read
For a single contract, extract:
- function list
- storage layout
- modifiers
- inheritance
- events
- call graph
- external call dependencies
This gives AI a “mental map” of the architecture.
Prompt C — Dependency Crawl
Starting from ANY function, recursively expand:
A → B → C → ExternalCall
This exposes:
- state mutations
- fund flows
- hidden interactions
- dangerous edge cases
Prompt D — Realistic Bug Hunt
The critical filter:
- Only list vulnerabilities that are realistically exploitable.
- Explicitly exclude all out-of-scope findings.
This eliminates 90% of useless AI output.
Prompt E — Senior Triager Validation
AI now acts as:
- HackenProof triager
- Immunefi triager
- Sherlock judge
It asks the right questions:
- “Does TX revert?”
- “Is attacker model realistic?”
- “Is this intended behavior?”
Every finding gets:
- severity
- confidence
- exploitability check
- remediation concept
- SUBMIT / DON’T SUBMIT decision
Prompt F — Compact Report Template
AI writes perfect, professional bug-bounty submissions with:
- impact
- reproduction
- preconditions
- code references
- patch suggestions
- test instructions
All under 500 words.
Prompt G — Defensive Unit Test Harness
AI creates Foundry/Hardhat/Truffle test skeletons to confirm the bug exists — without providing exploit code.
The Breakthrough: Code Slicing + Function Dependency Extraction
AI struggles when reading a large 2000-line contract.
So instead, I built a Python script:

It auto-generates:
This file contains:
- call graph
- external calls
- internal recursion
- state variable touchpoints
- external dependencies
Example:

Then you feed JUST this function slice to AI:
“Analyze ONLY this function and its dependencies.”
The accuracy skyrockets, because:
- The AI isn’t drowning in 2000 lines
- The context is laser-focused
- Vulnerability detection becomes deterministic
This method works equally well on:
- GPT-5
- Claude 3.7
- Gemini 2.5
- LLaMA-based local models
They all produce almost identical, reliable results.
The “Zero False Positive” Triage System
Before wasting time, you apply triage checklists.
🔥 30-Second Initial Filter
Immediate invalid if:
- TX reverts
- success flags exist
- events encode status
- admin can fix the issue
- requires off-chain failure
- similar report rejected before
Kills 70% of bad findings instantly.
⚡ 60-Second Deep Check
A REAL vulnerability requires ALL 3:
1️⃣ Money at risk
2️⃣ No safety nets
3️⃣ Attacker can trigger it independently
Missing one?
Don’t submit.
🚀 Severity Map
Critical → Direct theft
High → Fund lock / DOS
Medium → conditional issues
Low → cosmetic
Invalid → revert / intended / admin-only
🎯 The 5-Question Validator
A perfect formula to decide if a finding is valid.
What AI CAN Do Better Than Humans
✔️ Pattern recognition
It spots repeated code smells instantly.
✔️ Cross-function dependency mapping
Humans miss recursive flows; AI never gets tired.
✔️ Exploit-path brainstorming
AI gives angles you may not consider.
✔️ Consistency
AI doesn’t get lazy, bored, or overlook lines.
✔️ Structured reporting
Perfect for triager requirements.
What AI CANNOT Do (Important!)
From the Medium article and real audits:
❌ Business Logic Bugs
AI cannot understand:
- tokenomics
- risk models
- incentive structures
- AMM math
- liquidation flows
- oracle economics
❌ Zero-day vulnerabilities
If no one discovered the attack pattern before, AI won’t invent it.
❌ Cross-protocol logic
Bridges, oracles, L2s, multiple chains.
❌ Obfuscated code
AI gets confused and produces noise.
This is why YOU — the auditor — must apply triage logic and validate.
The Final Methodology
Step 1 — Prioritize Contracts (Prompt A)
Understand attack surfaces.
Step 2 — Deep-Read Target Contract (Prompt B
Extract storage, modifiers, function list.
Step 3 — Use Script to Generate Slices
e.g., 02_function_dependencies.md
Step 4 — Select Critical Function
Cut/paste the function + dependency graph to AI.
Step 5 — Dependency Crawl (Prompt C)
Trace effects.
Step 6 — AI Bug Hunt (Prompt D)
Filtered, realistic.
Step 7 — Senior Triager Overlay (Prompt E)
“Should I submit?”
Get confidence score.
Step 8 — Report Template (Prompt F)
Make a polished submission.
Step 9 — Test Harness (Prompt G)
Validate behavior without exploitation.
Step 10 — Apply Checklists
Remove all invalids.
This pipeline is bulletproof and finally gives AI a role in serious auditing.
Why This Will Change Web3 Auditing
Because this workflow:
- normalizes output across AI models
- avoids hallucination
- focuses ONLY on real risk
- teaches beginners the right mental models
- accelerates intermediate auditors into senior-level thinking
- integrates seamlessly with bug bounty triage logic
- makes auditing deterministic
No more:
- shotgun AI reports
- fake CVEs
- fake reentrancy warnings
- “the function is gas ineffecient” reports
- economic exploits that make no sense
- bizarre “governance takeover” hallucinations
This is real auditing with AI as an assistant, not as a free-thinking agent.
Conclusion — The Future of Auditing Starts Here
This methodology is not theory — it is battle-tested:
- HackenProof
- Immunefi
- Sherlock
- CodeArena
- Private audits
- Local LLM reasoning
With this workflow:
AI becomes a scalpel, not a hammer.
And beginners get a roadmap that accelerates them 10×.
This is the new wave of AI-assisted Web3 security.
Final Step: Revalidate the Finding with a Deterministic PoC Test
Once the AI validates the finding at the reasoning level, there is still one final requirement before confirming the vulnerability: generating and running a real PoC test.
Directly asking the AI to “create a test” often causes the model to get stuck or produce incomplete code. To avoid this, the PoC test must follow a deterministic procedure.
1. Use the project’s original test files as the template
Before asking the AI to generate any PoC test, always load and analyze the test files that already exist in the project. These tests contain:
- the project’s setup logic
- the deployment sequence
- helper utilities
- environment configuration
- cheatcodes
- how specific functions are called
Using these as a template ensures the test generated for the finding is compatible with the project’s environment and avoids all the typical errors that make the AI drift or waste time.
The instruction you give the agent must always be:
“Generate the PoC test by following the same structure, patterns, and utilities used in the original project tests. Use the same deployment and setup logic. Use the same style of function calls.”
This forces the agent to anchor its output in the existing logic and prevents invalid tests, missing imports, undefined variables, or incorrect assertions.
2. Run the test with full traces using -vvvv
After generating the PoC test and ensuring it compiles, run it with:

The -vvvv flag provides full execution traces, starting from the first external call, showing every internal function, storage change, event emission, and conditional branch.
This trace is the confirmation layer that the vulnerability is real, reproducible, and execution-accurate.
3. Example of a confirmation trace
The following trace confirms that Bob can remove Alice’s sell order, proving the vulnerability:
Bob can delete order sell for Alice

Interpretation:
- The caller is Bob
- Bob calls removeOrders
- The contract successfully deletes Alice’s order
- The transaction does not revert
- The emitted event confirms the target order belonged to Alice
This test trace is the final proof that the finding is valid.
Summary of the Final Step
- Do not ask the AI to create tests from scratch
- Use the original project tests as mandatory templates
- Generate a deterministic PoC based on existing structure
- Run the PoC using forge test -vvvv
- Inspect the execution trace to confirm the exploit path
- Only then consider the finding fully validated
This ensures the vulnerability is real, reproducible, and ready for reporting.
Join the HackenProof Community
If you want to level up your skills, test this methodology in real environments, and earn while you learn — you’re in the right place.
Whether you’re a beginner or a solo auditor, you’ll find everything you need to grow, collaborate, and sharpen your craft.
Start learning. Start hunting. Start earning.



