Reading view

Broken by Default: I formally proved that LLM-generated C/C++ code is broken by default — 55.8% vulnerable, 97.8% invisible to existing tools

/r/netsec - Information Security News & Discussion

8 April 2026 at 13:26

I spent the last few months running Z3 SMT formal verification against 3,500 code artifacts generated by GPT-4o, Claude, Gemini, Llama, and Mistral.

▎ Results:

▎ - 55.8% contain at least one proven vulnerability

▎ - 1,055 findings with concrete exploitation witnesses

▎ - GPT-4o worst at 62.4% — no model scores below 48%

▎ - 6 industry tools combined (CodeQL, Semgrep, Cppcheck...) miss 97.8%

▎ - Models catch their own bugs 78.7% in review — but generate them anyway

▎ Paper: https://arxiv.org/html/2604.05292v1

▎ GitHub: https://github.com/dom-omg/broken-by-default

submitted by /u/Hot_Dream_4005
[link] [comments]