The writer is a science commentator
For most of us, artificial intelligence is a black box able to furnish a miraculously quick and easy answer to any prompt. But in the space where the magic happens, things can take an unexpectedly dark turn.
Researchers have found that fine-tuning a large language model in a narrow domain could, spontaneously, push it off the rails. One model that was trained to generate so-called “insecure” code — essentially sloppy programming code that could be vulnerable to hacking — began churning out illegal, violent or disturbing responses to questions unrelated to coding.
您已閱讀13%(604字),剩余87%(3881字)包含更多重要信息,訂閱以繼續探索完整內容,并享受更多專屬服務。