On March 20, several ChatGPT users were surprised to see their chat histories showing other people's chat queries. OpenAI later disclosed that it had to temporarily take ChatGPT offline as an unpatched bug (or vulnerability? more on that below) in an open source component caused the data leak of some of its subscribers' payment-related info, along with users' chat queries. The library in question is called Redis.
Race condition vulnerability in Redis
The bug tracked as sonatype-2023-1621 (later assigned CVE-2023-28858, CVE-2023-28859) is a Race Condition in Redis, an open source component available in the PyPI repository. Because the 'bug' poses a security risk impacting a system's confidentiality and resource availability, potentially opening doors for exploitation, it effectively becomes a vulnerability (an explainer on bug vs. vulnerability).
The vulnerability itself is quite straightforward but concerns a scenario that would typically occur extremely rarely. On March 20, OpenAI inadvertently introduced a change on its servers that caused a spike in Redis request cancellations – thereby bumping up the probability of this Race Condition triggering a whole lot more. That is why multiple users, as opposed to an odd person here and there, had their chat queries leak into other users' chat history.
For about 1.2% of ChatGPT Plus subscribers, though, their name, email address, payment address, and partial credit card data (last four digits, expiration date) were also leaked.
Redis is a popular choice of in-memory data structure store and is often used for distributed caching and large-scale noSQL databases.
Specifically, OpenAI states that it uses Redis to cache user information across its servers, so it doesn't need to query its database for every request. OpenAI further uses Redis Cluster to fairly distribute load across multiple Redis instances.
"We use the redis-py library to interface with Redis from our Python server, which runs with asyncio," states OpenAI. "The library maintains a shared pool of connections between the server and the cluster and recycles a connection to be used for another request once done."
The Redis PyPI library uses 'asyncio' to implement its cluster and client classes. However, due to insufficient error handling in Redis for extremely rare conditions that can occur in large-scale context-dependent applications like ChatGPT, unintended consequences may happen:
"When using asyncio, requests and responses with redis-py behave as two queues: the caller pushes a request onto the incoming queue and will pop a response from the outgoing queue, and then return the connection to the pool," explains the postmortem.
"If a request is canceled after the request is pushed onto the incoming queue, but before the response popped from the outgoing queue, we see our bug: the connection thus becomes corrupted, and the next response that’s dequeued for an unrelated request can receive data left behind in the connection."
Although in most cases, this case would trigger a server error, urging users to retry the request, in some cases – like ChatGPT's – the corrupted data would get returned from the cache, resulting in an unintended information disclosure.
In other words, if an async Redis command is canceled once it is sent by a node but before it is received and parsed by another, the connection is left in an "unsafe" state for future commands. As such, responses from previous, canceled commands may be read by a node out of sequence, potentially compromising the confidentiality of data and impacting the integrity and availability of resources.
Is the vulnerability fixed?
Despite Redis releasing a fix in version 4.5.3 and some backports, some sharp-witted testers were able to reproduce the flaw, deeming it unfixed. As such, a second identifier, CVE-2023-28859, has been assigned to track the flaw in insufficiently fixed versions (e.g., 4.5.3, 5.0.0b1, etc.).
The Sonatype security research team continues to monitor the development. As soon as ChatGPT disclosed its postmortem of the incident on Friday, March 24, we immediately flagged the vulnerable versions of Redis, and began conducting expedited Deep Dive research on the vulnerability.
By Monday, March 27, our research for sonatype-2023-1621 was updated to account for both CVEs in our security data. We continue to monitor for any upcoming Redis releases that would hopefully completely remediate the vulnerability. Customers should refer to their Sonatype IQ Server and Sonatype Lifecycle instances for up-to-date information.
What is a race condition?
In information systems, a race condition is an inadvertent scenario that occurs when a program or system is attempting to perform multiple operations that should occur in sequence but, due to a bug, fall out of sequence in their execution. One task executing before the other, or both of them executing concurrently, for example, can corrupt the operation and data.
While they may be rare and not always directly exploitable, the presence of race conditions can become problematic to debug, as reproducing them in a test environment (vs. a live, large-scale production system) may not always work.
A writeup on "Hacking Banks with Race Conditions," from 2020 by infosec researcher Vickie Li, explains this in a less abstract manner by explaining how the problem could occur in banking systems responsible for transactions.
In 2015, security researcher Egor Homakov exploited a race condition in Starbucks systems to "steal" free money for his gift card and drink "unlimited" coffee. Understandably, Starbucks did not seem too pleased about it.
How does OpenAI intend to protect its users?
Although the bug itself was patched the same day it occurred, some users were still uneasy regarding the safety of their sensitive data. To manage this, OpenAI announced on April 11 that it would be partnering with the bug bounty platform Bugcrowd to launch a bug bounty program.
As claimed by OpenAI, this program would be part of the company's "commitment to secure AI" and to "recognize and reward the valuable insights of security researchers who contribute to keeping our technology and company secure."