- Coinbase has tested ChatGPT’s ability to test for blockchain flaws as part of its review process
- The team compared the AI to its current in-house process
- The tool hit a 60% success rate, but some of the flaws were serious ones
Coinbase has concluded that popular AI service ChatGPT is not yet ready to advise on the security status of new token listings, but has said that the results of its test are positive enough for it to continue investigation. The exchange’s Blockchain Security team has been using ChatGPT to run code from potential listees in comparison to its manual review process to see if the AI system is better at spotting flaws than human beings, but concluded this week that the system doesn’t yet meet the benchmarks it requires, but that further investigation will be carried out.
Coinbase Puts ChatGPT to the Test
Coinbase’s Blockchain Security Engineer, Tom Ryan, published a blog post this week that ran through how Coinbase had tried to use ChatGPT to analyze smart contract code to see if it was safe enough for Coinbase to considering listing the token, a move that no doubt had its blockchain research team casting angry glances at the computer in question while they manually checked the code on the other side of the office.
Ryan reported that the team already uses in-house automation tools “developed to aid security engineers in reviewing ERC20/721 smart contracts at scale”, but with the “emergence of ChatGPT by OpenAI and the buzz around its ability to detect security vulnerabilities”, his team wanted to see if they were missing out.
60% Match But Serious Flaws
The results of the experiment were a little underwhelming, with ChatGPT producing the same results 12 times out of 20, a 60% hit rate. However, five of the misses saw ChatGPT incorrectly label a high risk asset as low risk, which, in Ryan’s words, “is the worst case failure”.
Ryan summarized that while ChatGPT’s efficiency is “remarkable”, there are still some limitations:
- It is not yet capable of recognizing when it “lacks context to perform a robust security analysis”, resulting in “coverage gaps”
- The tool can be “inconsistent”, for example throwing up different answers to the same question
- It appeared to be influenced by “comments” in the code which it would occasionally default to
- OpenAI’s frequent updates result in inconsistency of answers over time
Ryan said that while ChatGPT shows some promise in its ability to quickly assess smart contract risks, it does not “meet the accuracy requirements to be integrated into Coinbase security review processes”. While he anticipates that with further engineering the accuracy of the tool could be increased, it cannot be solely relied upon to perform a security review.
That sound you can hear is the Blockchain Security Team breathing a sigh of relief.