AI Solutions Risk Gap: A Better Way To Think About AI Risk

Artificial Intelligence | 7 minute read

02/05/2024

AI risk is a hot topic. With conversations focused on p(doom) or X-Risk (Existential Risk) made against the backdrop of relentless hype, it can be hard to make sense of the real risk posed by applying AI technology to a system today. I can assure you that ChatGPT isn’t going to magically sprout consciousness and decide to destroy humanity. However, it’s completely probable that someone will use a Large Language Model (LLM) (ChatGPT or otherwise) for a high-value use case that does cause people to be injured or potentially die. We’ve already had the case of people using ChatGPT to create books on mushroom foraging that suggest poisonous mushrooms are safe to eat, and even a chatbot that may have caused self-harm.

Much of the current conversation about AI risk is consumed by conversations about the technology’s future state, in some cases far future state. So, for this post, I’d like to provide a frame of reference for thinking about present and near-term risks. These are the risks that I and many others are concerned about and the ones that are most impactful for decision-makers today.

Real AI Risk

There’s a better way to think about AI risk that is much more appropriate and applicable to the technology today. Here’s something I call the AI Solutions Risk Gap.

AI Solutions Risk Gap

The AI Solutions Risk Gap is the gap between the point at which an AI technology could be applied to a use case and the point at which the technology is capable of handling that use case. For our context, _handling_ means reliably responding to regular use case conditions as well as edge cases and demonstrating acceptable resilience to failures and manipulation. In other words, the application demonstrates an acceptable level of safety, security, and reliability.

This acceptable risk level is entirely use-case-specific, as should any conversation about AI Risk. Depending on the use case, a certain number of failures is perfectly acceptable. So, there’s no requirement to have every use case firmly pinned to the green in the risk gap. A customer service chatbot helping customers with information from a FAQ would have a different risk profile than a medical diagnosis chatbot making sense of a person’s health record and other diagnostic data.

This risk gap will close (or widen in some cases) as the capabilities of the specific AI technology change and mature. Keep in mind that technology may never be able to close the gap. That means for some use cases that are highly critical, a new AI breakthrough or approach may be needed before an appropriate level of risk is reached.

In The Red

Let’s talk about what being in the red means. It means your system isn’t consistently reliable for use and potentially opens the door for manipulation by users or attackers. In other words, it is not safe to use for the use case. This has a variety of consequences when the system fails, everything from human harm, financial loss, reputational loss, and all the way to the least impactful, a loss of trust in the system.

Much of my criticism of the current wave of AI is the application of generative AI to a use case in which the capability of the technology is far too in the red for the use case, yet people push it anyway. Although people may not get hurt because of the use case, there is quite a bit of technical debt accruing. Far too little thought is given to this. Ultimately, the real AI risks we face aren’t from some superhuman, ultra-capable AI but from using today’s technology, with its known flaws and issues, as though it were.

The real AI risks we face aren’t from some superhuman, ultra-capable AI but from using today’s technology, with its known flaws and issues, as though it were.

The AI Risk Gap is a way of thinking about risk in use cases, not a specific framework with specific measurements. Despite this, I hope that this gets people thinking differently about AI risk and helps in framing the right questions. For example, have you defined an acceptable level of risk for the use case? How are you measuring and benchmarking? How does the performance change when confronted with real-world data and scenarios?

Note: You must consider the AI Solutions Risk Gap in a production environment when the system is confronted with real-world data and scenarios. Issues often won’t present themselves in testing or experimentation, leading people to believe the application is more robust than it truly is.

Risk Acceleration by FOMO

Multiple factors accelerate risk in AI deployments, but one that deserves more attention is the disconnect between reporting and realities on the ground. So many articles are written to present applications of AI that _could be_ or even possible experiments, as though they were observations of applications currently deployed. This sets in motion a cycle where CEOs and business leaders mistakenly assume competing organizations are doing things with AI that aren’t actually happening. This assumption leads them to publicly speak about replacing staff with AI, responding with their experiments, and further fueling the cycle. Internally, they apply more and more pressure to do more with AI (whatever that means), and internal staff do their best to oblige rushing experiments without having a firm understanding of the risk.

Development by FOMO is hardly unique to AI, but what does seem rather unique is the velocity and breadth of the current AI push putting pressure on organizations to deliver. Ironically, we may tell our systems through prompts to take a deep breath and think through the solution step by step, but we aren’t doing that ourselves despite having brains and… lungs.

This risk acceleration by FOMO isn’t any one entity’s fault; it’s an unfortunate side effect of our current competitive environment and the level of hype. Sure, there’s no shortage of blame to go around with AI leaders stoking fear, companies faking demos, and media not properly vetting stories. These are all contributing factors, but hardly the whole picture. We bear some of the blame when we parrot claims and boost the wild speculation of influencers over more realistic (and what should now be obvious) evaluations.

Being Smart About AI Risk

There seems to be a lot working against you in the risk department, but all isn’t lost because you have a superpower: ground truth. You can observe the outcomes of your experiments on your data firsthand. When your results after continued experiments don’t jibe with the reports you’ve read, that’s a feature of the system working. The largest tech companies in the world are struggling to actualize use cases; inevitably, you will also. Every use case will have a different level of complexity, and some will work better than others.

A larger conversation on reducing and evaluating AI risk is outside the scope of this post, but by asking a few simple questions, you’ll be pointed in the right direction.

What is the impact of failure on the use case?
What’s an acceptable level of risk?
What are the evaluation criteria?
How exposed is this system going to be?
How are you performing this use case today?
Are there other approaches better suited to the problem?

Ask the right questions and make the right choices. If something seems too good to be true, it probably is.

Conclusion

With the current rush to implement AI into everything, we will see more and more impacts of failure and an accumulation of technical debt. It doesn’t have to be this way. With some thought, you can maximize your AI investments to ensure you are focused on the right applications, giving you the highest probability of success. The goal should be to identify high-impact use cases and provide further scrutiny to avoid unnecessary impacts from failure or manipulation. Asking the right questions and taking extra time to evaluate can help avoid failures and lead to more robust applications.

Was this article helpful?

Nathan Hamiel

Senior Director, Research; Member of the Black Hat Review Board

Find out more >>

Other blogs on Artificial Intelligence

Firewalling Large Language Models with Llama Guard