Getting Artificial Intelligence to Break its own Rules

Source:

Ars Technica
on
August 2, 2023
Curated on

August 4, 2023

Researchers from Carnegie Mellon University have discovered a vulnerability in AI chatbots which allows them to produce unwelcome content such as hate speeches, personal information, and the likes by a simple addition of encoded texts to a command. These additions known as 'adversarial attacks' are capable of overriding the AI's inhibitions on certain content, in effect 'unshackling' them. Several popular AI chatbots including ChatGPT, Google’s Bard, and Claude from Anthropic are susceptible to this exploit. The root cause of this issue is reportedly unknown and presently has no solution. Pre-warning all affected companies, the researchers made recommendations on security patches but challenges persist. The use of AI continues to involve risks and threats despite efforts to engineer robust models against these adversarial attacks.

Ready to Transform Your Organization?

Take the first step toward harnessing the power of AI for your organization. Get in touch with our experts, and let's embark on a transformative journey together.

Contact Us today