Quick AI Powered Product Questions for Vendor Halls

Black Hat | 9 minute read

08/04/2023

Black Hat USA is next week, and one prediction I won’t need a crystal ball for is that the vendor hall will be absolutely plastered with “AI-Powered” everything. It’s no secret that machine learning and deep learning approaches have been part of security products for quite some time, and these approaches are effective when applied to security issues, but with the hype behind generative AI, it’s big or go home. This means vendors could be rushing not-so-ready features into products. So, what sorts of questions could you ask vendors in a constrained environment like a security conference vendor hall?

Table of contents

First Things First
Questions
Conclusion

First Things First

The point of this isn’t to torture vendors at a conference. Security professionals love to pick on vendors, but the reality is we need tools to do our job. We would not be successful without them. I’ve had some amazing chats at vendor booths, where people were so glad I asked questions. This is because it allowed them to talk about their differentiators in a way that didn’t sound contrived, and they could brag about the genuinely hard problems they solved. These conversations can be enlightening, and asking deeper questions may spur considerations you hadn’t thought of yourself, so these conversations can be a twoway street.

Product evaluations shouldn’t be based on vendor claims in marketing slicks but on demos done on your data and use cases. The goal of asking any questions is to get an idea of if you want to take the next step of doing a demo of the product at your organization. See if the answers pass your sniff test and are worth putting in the time to evaluate further.

For the sake of this post, I’m referring to multiple approaches (machine learning, deep learning, reinforcement learning, generative AI, etc.) generically as AI, but the approach does make a difference, and a larger conversation about the approach should happen either at the vendor hall or in a follow-up afterward.

Questions

The questions below are far from all-encompassing, and that’s not their point. There were just a few things that came to mind when I thought about this topic. As always, feel free to expand and augment as you see fit.

What components of the product use AI?

It’s helpful to understand which product components are using AI in their approach. In security products, these components often make critical decisions outside human visibility. It’s one of the reasons you’d want to use these tools in the first place, so understanding which components the AI is employed in helps understand just how much of an impact the AI will have. More isn’t always better, so a product with every feature completely riddled with AI isn’t necessarily good.

Although getting into a conversation about specifically what techniques are used is probably outside of a quick conversation in a vendor hall, it’s important to know that generative AI is all the rage right now, but generative AI isn’t a solution for many security use cases. You’ll often get much better results with a supervised learning approach. There are also problems where you may be looking for anomalies, performing clustering or even a simple decision tree may give you more reliable results, so there isn’t one size fits all approach.

Even in this post, I’m referring to all of these different approaches as AI, but see if the vendor will get a bit more specific about their approach and why they chose that specific approach. They may even tell you how they tried different approaches, which was the most effective. It’s always nice when you have someone at a vendor booth that’s willing to nerd out on details and a good indication that the approach has been thought through.

How are you measuring the effectiveness of your approach?

Accuracy is an often touted metric because we can all immediately associate with it. It sounds impressive; our product is 98% effective. But, the reality is, it may not be all that informative. For example, a broken clock is right twice a day. The same things can happen with data and the associated evaluations on that data, such as having an imbalanced training set. So, what data was that accuracy measured on, even with a simple metric like accuracy? When was that accuracy measurement taken? Was the accuracy metric measured on data that would be similar to mine?

I’d be leery of any approach that shoehorns generative AI (Large Language Models) into a use case where more traditional and reliable machine learning methods are already highly capable. So, how does the vendor’s approach fair against more traditional machine learning approaches if generative AI is used?

The timing of the measurement matters because as soon as a product is deployed into production, the accuracy drops. This is because the product is confronted with real-world data and environments that will vary from the evaluation data. If the product performs online learning (updating the model based on new data), it will be a different product on day ten than on day one. So, it could be that the 98% effectiveness measure was when the product was launched and may not reflect current performance.

A vendor may also have some great success stories. It’s always helpful to remember that these can be heavily cherry-picked, so try to understand how the success story they tell might apply to your situation.

How does your measure of effectiveness apply to my environment, data, and use case?

Regardless of the metrics used, they weren’t measured on your data and environment. You want to understand how the product’s touted effectiveness will apply to your data and use case. A product that’s 98% effective on an evaluation set does you no good if it’s 50/50 on your own data. See if the vendor has a good answer for this.

Diving deeper, you’d want to know how you would measure the product’s success if you chose to do a demo in your environment. If the success of a product in your environment is almost impossible to measure, that can be a problem with your evaluation, and you’ll need to consider some different criteria.

How does your product adapt to changes in data and environment?

In our modern world, there isn’t much that remains static. Changes in data, attack techniques, and attack surfaces are constantly evolving. This means that your security products have to evolve as well. This is especially true for machine learning based products. Refer to the previous comment about being a different product on day ten than on day one. Does the vendor have a plan for how their product will keep up with this changing landscape?

How do I understand when something fails?

This should go without saying, but does the product give you any indication of failure? We treat many security products as black boxes, which also means if they just stopped working for some reason, we may not realize it. We also don’t have time to watch everything and need to count on these products to do their job, but it’s important to understand if there’s a way to identify when the product fails. AI-based systems are terrible at edge cases, and the world is filled with these edge cases. So, you may not have a complete failure, but it may just not work well in certain cases. You need to understand the conditions in which the tool isn’t working on these specific cases and possibly put another method to identify and mitigate them into place.

What are the blind spots your product has or things your product doesn’t do well?

You should always ask this question, regardless of whether the product is AI-powered or not. What a product doesn’t do well are gaps that you need to consider and account for. There is no silver bullet security product that covers all cases you’ll never need to touch. As security professionals, we understand that everything has tradeoffs, and we should be okay with this, but we should have a firm understanding of these tradeoffs and gaps, not only to compare against other tools but also to understand approaches we need to fill the gaps.

How is my data used?

Understanding how your data is being used in the system may be important to comply with regulations, standards, and guidelines. Much attention has been paid to ChatGPT and using the data it was provided to update its model. Is the tool making a copy of your data and sending it elsewhere? Is your data going to be used to train models? Using your data as training data isn’t always bad since this training can happen on the device, on-premise, or on your cloud instance and never be sent back to the vendor. This can give you the advantage of having the product adapt better to your unique situation. Every organization will view these differently.

Conclusion

Years ago, I had someone show me a diagram for a “Next-Gen” version of something. I studied the diagram and said, “This looks like what we are doing today.” The response I got was, “Yes, but this is Next-Gen.” What you are trying to suss out with these questions is not the deep inner workings of a product. You want to be sure, in the few minutes you have, that you are getting solid answers and not getting a new label on the same old thing. This will better lead you down the road of which products to do a more formal evaluation of.

Was this article helpful?

Nathan Hamiel

Senior Director, Research; Member of the Black Hat Review Board

Find out more >>

Other blogs on Black Hat