“This work takes an important step in the right direction,” says Douwe Kiela, a researcher at Hugging Face, an AI company working on open-source language models. He suggests that the feedback-driven training process could be repeated over many rounds, improving the model even more. Leike says OpenAI could do this by building on customer feedback.
InstructGPT still makes simple errors, sometimes producing irrelevant or nonsensical responses. If given a prompt that contains a falsehood, for example, it will take that falsehood as true. And because it has been trained to do what people ask, InstructGPT will produce far more toxic language than GPT-3 if directed to do so.
Ehud Reiter, who works on text-generation AI at the University of Aberdeen, UK, welcomes any technique that reduces the amount of misinformation language models produce. But he notes that for some applications, such as AI that gives medical advice, no amount of falsehood is acceptable. Reiter questions whether large language models, based on black-box neural networks, could ever guarantee user safety. For that reason, he favors a mix of neural networks plus symbolic AI, hard-coded rules constrain what a model can and cannot say.
Whatever the approach, much work remains to be done. “We’re not even close to solving this problem yet,” says Kiela.