Introducing Maturity Maps — A New Way to Measure AI Adoption Transcript — The AI Daily Brief: Artificial Intelligence News and Analysis

**Nathaniel Whittemore** (0:00)
Today, I am discussing a new way to think about AI and agent readiness inside your company, and it is called Maturity Maps. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, Robots and Pencils and Blitzi. To get an ad-free version of the show, go to patreon.com/aidailybrief, or you can subscribe directly on Apple Podcasts. To learn more about sponsoring the show or really find out anything else about the show, like where we are with our agent madness bracket, which is going on right now, head on over to aidailybrief.ai. Today is the second day in our build week, which is happening while I'm traveling with my family. Yesterday, we did a very high-level overview around everything that had happened last quarter and what it meant for this quarter. And if we put this in the framework of this being build week, yesterday was sort of the context setting, the environment in which your building is happening. Today's episode takes that down a level to discuss the benchmarks that show where others like you are. Now, one of the things that I've been thinking about a lot over the last six months or so is just how much we need a totally different set of data and benchmarks for this new AI era. Everyone is adapting incredibly quickly right now, or at least they're trying to. It's new processes, new workflows, new tooling, new everything. And by and large, we're doing all that exploration without a map. Let me give you a practical example of where I think our lack of benchmarks could actually very significantly and meaningfully negatively impact a company when it comes to their AI adoption. Let's say that you are an early adopter company. Across your different functions, you've had really strong hands-on efforts to get your AI up and running. In the absence of knowing exactly what to measure, you're just trying to measure whatever you can, and early results are pretty positive. For example, in the marketing function, you have increased your content output 30 percent year over year, without any sort of proportional increase in the resources it takes to produce that content. Now, that 30 percent year over year growth sounds great. But what if I told you that all of your competitors had actually grown their content output by 50 percent? In AI world, this is actually not a far-fetched scenario, and it shows how the need for better benchmarks and numbers is not just a vanity exercise. When we don't know how we're doing relative to peers and competitors, it makes it really hard for us to judge what we need to change, what we need to shift, and what we need to do next. Now, at AIDB and at SuperIntelligent, which is my enterprise AI planning and strategy company, we started exploring some of this with our AI ROI benchmarking survey at the end of last year. We had people submit and share their real use cases and share with us the impact that those use cases were having across an array of eight different impact dimensions, things like time savings, cost savings, new capabilities, increased output, and a handful of others. We asked them to rate impact from negative to transformational and found that by and large, at least when it comes to people self-reporting, they were already seeing strong and positive impact from their AI initiatives. But there are a couple of clear and obvious limitations with that study. First of all, while self-reporting is better than nothing, it's always going to be somewhat imprecise. Second, like pretty much everything that we do with this audience, you have to calibrate it to a more advanced individual and organizational user than if you just surveyed a broad cross-section of businesses in the world. Third, while it did give us some great information around individual use case impact, it didn't tell us all that much about other dimensions of AI readiness and adoption outside of just the use cases themselves. Anyone who has felt the sting of the capability overhang, in other words, the gap between what AI can do and what we are actually using it for, knows that raw capability isn't really the question. It's the systems we put around it to get value from it. And unfortunately, the research and information apparatus just has not adapted to this new reality. Not to pick on Gardner specifically, but they are the biggest in the space, and so sort of present an easy target. Tried and true benchmarks and information products like Gardner's Magic Quadrant have literally never been less useful than they are right now. The idea that success in something like AI Application Development was going to be even a little bit dictated by choosing the right AI Application Development Platform vendor is just so far outside of the reality of these tools as to be almost actively harmful if that's where you're putting your time and effort when it comes to trying to figure out how to adopt AI. Now Gardner is more than the Magic Quadrant and they are doing lots to try to catch up to the AI world so it's not to single them out. It's more to make the point that we are in desperate need of some new frameworks, some new benchmarks and some new tools. And so at both AIDB and at Super Intelligent, we've been thinking about this a lot over the last, call it three to four months. We've experimented with a couple of things that you'll probably see some version of come out at some point in the near future. One of them I call AI Opportunity Radars, which are basically a way of organizing use cases by function, but then also by applicability depending on where your organization is in its development cycle. Simply put, it's a radar or bullseye type of visual where use cases are organized into one of three categories. Primetime means that most organizations as they are, are well suited to get value from that use case right now. Emerging means that while there are a lot of organizations that can get value out of them, there is some amount of setup cost or right set of circumstances or infrastructure that is going to be needed to get value from that, and not all organizations are going to be there just yet. Finally, frontier is exactly what it sounds like, where if your organization is well set up with the right infrastructure, you can be getting a lot of value from those use cases. But at this point, most organizations aren't there yet. So over the last quarter, we built an agentic system that is basically constantly seeking out every new resource it can get its hands on, assessing what those resources tell us about the use cases and different functions, and keeping these radars continuously updated, both with new use cases as well as changes in where the existing use cases are placed. But as we were working on radars, it was clear again that there was something even more fundamental, and that overly or only focusing on use cases was leaving out so much of what actual AI readiness means. When we're doing AI readiness and planning assessments at Superintelligent, we're not just thinking about what use cases a company should do, but what's the full set of change management and infrastructure development and new policy and investment in people and all this other stuff needs to go around it to actually get value from those use cases. And that led to the development of the framework which I'm going to be sharing today, which we call for simplicity AI maturity maps. Now the concept of maturity is certainly not some proprietary thing that we invented. Maturity is just a heuristic and a framework to look at where different organizations are around some key areas relative to one another and where they should be. So the way that maturity maps work is that they organize AI and agent maturity into six different categories. Those categories are first, deployment depth, which is sort of an expanded notion of use cases. Deployment depth in the context of AI maturity not only thinks about how many use cases you have in play, but how much those use cases are assistance versus full workflow automations versus actual applied agentic systems that are doing work with some meaningful degree of autonomy. The second category is systems integration. This is a measure of how deeply integrated the AI solutions and workflows that you're deploying are integrated with the existing systems that run your enterprise. Is everyone using ChatGPT independently? Or does your CRM system have an agent running through it, automatically extracting insights, making recommendations, and even setting up new outreach campaigns? Systems integration is in some ways one part of the measure of how good the context that an enterprise's AI has to work with. Now, the other piece that relates to context is of course data. How much, what quality, and how well managed is your company's AIs access to your company's data? Does it require people dropping in PDFs? Do you have company knowledge all set up on MCP servers? How does the AI that your company is looking to transform your company have access to the information it needs to know what that transformation should look like? Outcomes is almost a measure of measurement. Are all of your deployments pilots and experiments? Or do you have a track record of actual demonstrable and measured outcomes? Outcomes in some ways are the information you need to know what you should do next across all these other dimensions. The fifth dimension of AI maturity maps is people. And this is an admittedly broad category. A big part of this refers to upskilling and capabilities. But another piece has to do with attitudes. Given that one of the major barriers to adoption in many companies is not just going to be skills using AI but attitudes towards AI, people is an extremely important and unfortunately, as we'll see, often neglected piece of the AI maturity pie. Lastly, of course, is governance. How clear, how established, how communicable, how known are the rules and guidelines and access provisioning around your AI systems? Do people know where to go to get the permissions they need? Do they know what expectations are? When issues come up, are there mechanisms for resolving those issues? So those are the six areas across which we look at AI maturity. Now, for the purposes of developing these maps, we've started with 10 functional maps split across some of the most common very broad brush categories of knowledge work. That includes customer service, engineering, IT, which by the way, the difference between those two for our purposes is effectively that engineering is all the stuff that's external facing and IT is all the technology stuff that's internal facing, sales, marketing, HR, operations, finance, legal and product. So at the end of last year, we started to put together a process for actually assessing and visualizing AI maturity across all these dimensions. What came out of that is the chart that you see here, which plots each of these six categories within a specific function on a five-point scale. Number three, the center of the chart is the on-track line. In other words, where an average organization should be. And the word should, as you'll see, is doing a lot of heavy lifting there. Now, if on-track is at three, four is ahead and five is significantly ahead, while two is behind, and one is significantly behind. The idea is that when you look at a maturity map, without having to read a lot of words, you can instantly see the gaps between where organizations should be and where the average organization actually is, and when you compare your organization to it, also see where you are relative to both the general on-track line and the average. So clarifying this a little bit more, a quarterly's designation of on-track is not where the average organization is. It is a subjective measure of where we think the average organization should be. As you'll see when you dig into this quarter's numbers, in the vast majority of cases, we believe that the average organization is behind that on-track line, across pretty much all of these dimensions. To use a term that comes up a lot on this show, the fact that organizations tend to be behind this on-track line is effectively a visualization of the capability overhang. Now at this point you might be wondering, well, what gives you authority to determine what the on-track line is? It's a totally reasonable question, and believe it or not, it is a little bit more at least than just my opinion. We have a few different places to pull from. The first is the sort of proprietary research and surveying that we do as part of AIDB Intel, which gives us some pretty good insight into where particularly leading organizations are. Second, it's super intelligent, given that we are doing thousands and thousands of voice agent interviews every month to help organizations assess their AI maturity and plan their AI strategy. That's another pretty unique source of frontline data. And then combined with that, we built a system to go out and effectively aggregate pretty much every new survey or study that comes out that even vaguely touches AI. You might have heard me mention before that my most useful open clause are my research open clause and this is one of the main things that they do. They are in a never ending 24 hour a day, constantly hunting loop to both surface new sources, to assess those sources in terms of their legitimacy, credibility and bias, and then to integrate that information into our larger assessment system. There are more than 480 studies and surveys from the last quarter that went into these Q2 Maturity Maps. Among the sources that have explicit sample sizes, the combined survey respondent base exceeds 150,000 professionals across more than 50 countries. The types of source categories that we have are 1 Big Four and Top Tier Consulting Firm Research. There's over 20 of those sources in that mix. Major Platform Earnings and Public Market Statements, Analyst Firm Predictions and Research from companies like Gardner, Forrester and IDC, Function Specific Regular or Annual Surveys, such as Stack Overflow's Engineering Study, or other similar things for areas like Marketing, Legal and IT, Academic and Government Research, Behavioral Data Sources, where companies that have access to some unique user behavior data aggregate, analyze and share that. A good example of that is Jellyfish's AI Coding Benchmark, which used behavioral data from more than 200,000 engineers across 700 companies with 20 million PRs. Finally, there are of course Practitioner Reports and Vendor Case Studies, although the system is careful to rate them with some amount of skepticism given that they are of course selling something.
Introducing Maturity Maps — A New Way to Measure AI Adoption

Feed this to your agent