Broken by Default: I formally proved that LLM-generated C/C++ code is broken by default β 55.8% vulnerable, 97.8% invisible to existing tools
I spent the last few months running Z3 SMT formal verification against 3,500 code artifacts generated by GPT-4o, Claude, Gemini, Llama, and Mistral.
β Results:
β - 55.8% contain at least one proven vulnerability
β - 1,055 findings with concrete exploitation witnesses
β - GPT-4o worst at 62.4% β no model scores below 48%
β - 6 industry tools combined (CodeQL, Semgrep, Cppcheck...) miss 97.8%
β - Models catch their own bugs 78.7% in review β but generate them anyway
β Paper: https://arxiv.org/html/2604.05292v1
β GitHub: https://github.com/dom-omg/broken-by-default
[link] [comments]