birdwatcher: (belgium fries)
[personal profile] birdwatcher
spectrum.ieee.org -- I use LLM-generated code extensively in my role as CEO of Carrington Labs, a provider of predictive-analytics risk models for lenders. Until recently, the most common problem with AI coding assistants was poor syntax, followed closely by flawed logic. However, recently released LLMs, such as GPT-5, have a much more insidious method of failure. They often generate code that fails to perform as intended, but which on the surface seems to run successfully, avoiding syntax errors or obvious crashes. It does this by removing safety checks, or by creating fake output that matches the desired format, or through a variety of other techniques to avoid crashing during execution.

По остроумной мысли автора, это потому, что новые модели получены из старых оптимизацией не по корректности кода (которую трудно установить), а по тому, посчитали ли данный код подходящим пользователи этого искусственного интеллекта, т.е. та ещё самовыборка экспертов.

As any developer will tell you, this kind of silent failure is far, far worse than a crash. Flawed outputs will often lurk undetected in code until they surface much later. This creates confusion and is far more difficult to catch and fix.

I am a huge believer in artificial intelligence, and I believe that AI coding assistants have a valuable role to play in accelerating development and democratizing (так в тексте) the process of software creation. But chasing short-term gains, and relying on cheap, abundant, but ultimately poor-quality training data is going to continue resulting in model outcomes that are worse than useless.