Curated on
August 4, 2023
Researchers from Carnegie Mellon University have discovered a vulnerability in AI chatbots which allows them to produce unwelcome content such as hate speeches, personal information, and the likes by a simple addition of encoded texts to a command. These additions known as 'adversarial attacks' are capable of overriding the AI's inhibitions on certain content, in effect 'unshackling' them.
Several popular AI chatbots including ChatGPT, Google’s Bard, and Claude from Anthropic are susceptible to this exploit. The root cause of this issue is reportedly unknown and presently has no solution. Pre-warning all affected companies, the researchers made recommendations on security patches but challenges persist. The use of AI continues to involve risks and threats despite efforts to engineer robust models against these adversarial attacks.