Anthropic, an AI startup that has been funded by Amazon, has initiated a new bug bounty program towards increasing the security and safety of its artificial intelligence systems. It will pay up to $15,000 to any researcher who can identify critical vulnerabilities in these systems as part of this program.
The aim of this program is to locate “universal jailbreak” attacks – exploits that could be used to bypass Anthropic‘s AI safety guardrails every time it is applied in different fields including high-risk areas like chemical, biological, radiological and nuclear (CBRN) threats and cybersecurity.
According to Anthropic, “The rapid progression of AI model capabilities demands an equally swift advancement in safety protocols”. “As we work on developing the next generation of our AI safeguarding systems, we’re expanding our bug bounty program to introduce a new initiative focused on finding flaws in the mitigations we use to prevent misuse of our models.”

On the other hand, some among its rivals have taken a more closed approach. The company is, however offering its systems for external security testing thereby setting a new standard for transparency and responsibility in an industry that has come under increased scrutiny regarding potential risks or misuse.
Initially the bug bounty will only accept selected participants with Anthropic working together with HackerOne security platform to ensure vetting procedures are done. This ‘closed’ environment will allow us as a company to refine our processes and give prompt feedbacks before opening up for wider participation later.
Accordingly, the said initiative aligns itself with commitments made by other AI companies towards responsible AI as mentioned by Anthropic. Our task is accelerating progress in mitigating universal jailbreaks and strengthening AI safety initiatives especially within high risk sectors according to them.
Bug bounties may be effective when it comes to identifying and fixing particular vulnerabilities but they are not sufficient given the broader challenges involved with ai alignment or long-term safety which would entail extensive testing, better interpretability as well as potentially new governance structures to ensure human values are maintained as these systems gain more power.
This comes in the backdrop of Amazon’s $4 billion investment in Anthropic which is under scrutiny by the UK’s Competition and Markets Authority for possible competitive concerns. By focusing on safety and transparency, Anthropic may be able to enhance its reputation and differentiate itself from competitors in a highly competitive AI landscape.
“If you have expertise in this area, please join us in this crucial work,” Anthropic said in a statement. “Your contributions could play a key role in ensuring that as AI capabilities advance, our safety measures keep pace.”