❌

Reading view

Broken by Default: I formally proved that LLM-generated C/C++ code is broken by default β€” 55.8% vulnerable, 97.8% invisible to existing tools

I spent the last few months running Z3 SMT formal verification against 3,500 code artifacts generated by GPT-4o, Claude, Gemini, Llama, and Mistral.

β–Ž Results:

β–Ž - 55.8% contain at least one proven vulnerability

β–Ž - 1,055 findings with concrete exploitation witnesses

β–Ž - GPT-4o worst at 62.4% β€” no model scores below 48%

β–Ž - 6 industry tools combined (CodeQL, Semgrep, Cppcheck...) miss 97.8%

β–Ž - Models catch their own bugs 78.7% in review β€” but generate them anyway

β–Ž Paper: https://arxiv.org/html/2604.05292v1

β–Ž GitHub: https://github.com/dom-omg/broken-by-default

submitted by /u/Hot_Dream_4005
[link] [comments]
  •  
❌