A brief discussion on the risk of deep generative models (which are understoodas 'black boxes') failing in the same way when responding to rare events (known as 'black 'swan events')
In recent discussions on AI, the conversations seem to have focussed on predictions of impact, upheaval, and worst case scenarios. This is understandable given the rate of change. Improvements in deep generative modelling have occurred so quickly and in such large jumps that we are well beyond the foreseeable frontier.
In the future, I hope to look in more detail whether deep generative models have followed a different development cycle to other methods in machine learning, and how regulations on unforeseeable advances might fail unexpectedly as a result. Here, I want to focus on a specific potential failure mode for embedding the current wave of deep generative models (DGM) into live systems, especially for large language models (LLMs). This failure mode is not the concern that DGMs can respond in unexpected ways, which I do think is an issue. Instead, it’s that deep generative models (or genAI) might respond in broadly similar ways to unexpected events, and this could lead to a cascade effect in interconnected systems. Below, I expand on this train of thought.
There is a key detail in many of the folk explanations that I have heard of how an LLM creates text. Typically, they describe the final layer of the network as generating the most likely next sequence (such as a word or pixel), conditional on the prompt (text or otherwise). The detail is that there exists a probability distribution that the model is sampling from. This, along with the massive data used for pre-training, is one of the reasons why these models can provide new ‘answers’ (output) to the exact prompts (inputs). In a sense, the pre-training builds the probability densities that are sampled from, rather than some extremely complex direct mapping between input and output.
I would like to abstract out the probability density used to generate answers as existing in a kind of “sampling space” for the output. This sampling space contains all the densities that ascribe probabilities to the output. The assumption is that distributions generated in this space are sufficiently well-defined that sampling from them will provide an accurate next entry within the sequence list. Prompt engineering, temperature and even post-training techniques such as RHLF could then be understood as steps that make the resulting space ‘well behaved’ under some criteria. In a sense, these methods alter the sampling space to make the output more ‘better’.
It’s worth noting that ‘better’ could be accuracy (reduced hallucinations), but also task specific (writing in the style of) or even normative (reduced offensive output). The additional methods applied to pre-trained base models are used to shift or constrain the the sampling space of a model. This occurs at the post-training stage because the deep generative models are still a ‘black box’. The generating process is distributed across the weights of the model; there are no pre-written rules.
I emphasise the mental model of a shifted sampling space because I believe many generative models are being engineered to operate under the assumption that the distributions in the space should be approximately stationary, or at least stationary within the timescale of new releases. What I mean by this is that the tricks for changing the space assume that once it has been shaped into some appropriate bounded range, staying in that range will meet most users requirements. Implicit in this logic is that user requirements are unexpected to dramatically change.
To be clear, I also think that the above assumptions are reasonable to make. A production ready chatbot shouldn’t be designed to expect a cultural shift that causes, for example, a normative truth suddenly be reversed within a few months. What is extremely offensive now is likely to remain offensive to most users until the next release. As another example, an image generator for building baroque architecture can be fairly confident that if a generated image meets the definition of baroque architecture at the start of a release, it will also meet the definition at the end.
It might be that one of the reasons that deep generative models have developed so quickly is thanks to the timescale of changes to the sampling space. Maybe if we were more fickle, where our cultural norms change every few hours, then the current combination of architecture and hardware might not be able to meet the generative challenge.
On the other hand, despite the security and safety of our long held beliefs or established facts, there are many examples of societies exposed to upheavals on much shorter timescales. The three most obvious examples are natural disasters (Pompeii), pandemics (COVID-19), and financial crashes (2008). These extreme rare events have been memorably grouped under the banner of ‘Black Swans’, by Nissim Taleb.
So, what happens when a Black Box meets a Black Swan? I believe that Black Swans can cause the beliefs of societies to change in unexpected ways but rapid ways. For example, before the pandemic, it seems that many European governments believed that lockdowns would not be possible on a national level. However, lockdowns were implemented and adherence was extremely high. A Black Swan event can be considered as an extremely rare sequence or, perhaps more precisely, the conditions that lead to many low probability sequences. In the example above, the pandemic was the event and the low probability sequences were national lockdowns. After the event, many subsequent events have similarly low (or unknown) probability. However, my concern here is not that a deep generative model will not know how to respond to these sequences. Instead, my concern is that we are building systems that will tend to respond to Black Swan events in the same way.
The reason I believe this is based on the constraints we are placing on the ultimate generative process. Methods applied to alter the sampling space will likely be critical to acceptable functionality of generative models in normal operating conditions. But when these operating conditions change, the constraints on the sampling space result in a bounded action that could make optimal but low probability sequences inaccessible. In the context of a financial crash, a trading strategy that uses a generative model may suggest a flight to safety in some specific stock that is also suggested by every other strategy operating from a similarly constrained sampling space. This will almost certainly have a herding and compounding effect that would operate faster than an equivalent human-in-the-loop system. Similar thought experiments can be built for any decision making process that may incorporate a deep generative model, something which I believe is unavoidable over a 5-10 year timeline.
I want to make one final point on the generality of this problem. I am not claiming that over reliance on a specific architecture, model size, or even single model, will lead to herding of output in the face of a Black Swan. What I claim is that the constraints applied to the sampling space force the output of a generative model to give broadly similarly acceptable output. There are already examples of how diversity in a sampling space may lead to potential competitive advantage, e.g. xAI’s Grok model vs OpenAIs GPTs. But overall, I expect that most will converge to approximately similar coverage, especially in the context of highly regulated fields or critical systems. It may even be that regulation causes all sufficiently large enough generative models to have broadly similar constraints on their sampling space.
There are potential methods to address this, such as scenario-specific testing, which I may try to write about in the future. However, these would require coordination and collaboration at the highest levels. In the short term, I hope these risks can at least be discussed in the increasingly crowded conversations on AI risk, reward, and hyperbole.