In this episode of the Agile Brand podcast with Greg Kihlström, Martha Brooke discusses the science of CX surveys. Martha touches on why most customer experience programs aren’t scientific, how you can become more scientific in your approach, and when and how to use AI in your survey program.
The Science of CX Surveys, Summarized:
- Many CX programs lack a scientific approach. If they were truly scientific, we would expect higher NPS and ACSI scores, and we would be having routinely good customer experiences.
Want a 5-minute rundown of how to Leverage Science for Customer Success? Check it out here.
- NPS is too prolific, being used at every touchpoint instead of as a summary measure as Bain intended.
- Companies often face low response rates and receive feedback from groups that don’t represent their customers at large, skewing the data.
- To get meaningful data, remove biases like leading constructs, double-barreled questions, and insufficient answer options from your survey. And, use a weighting factor to capture customers’ priorities, and consider that in your calculations.
- AI, particularly large language models, can help with text analysis but needs training and oversight by human researchers to capture the true meaning of the text.
The Full Conversation:
Here’s an approximate transcript from Martha and Greg’s discussion about the science of CX surveys.
Greg: While we talk about customer experience a lot on this show, today’s focus is going to be a little different than some of our past conversations. Today we’re going to talk about adding science to customer experience programs and more specifically, the science of CX surveys. To help me discuss this topic, I’d like to welcome Martha Brooke, Chief Customer Experience Analyst at Interaction Metrics. Martha, welcome to the show.
Martha: Hey, Greg. I’m glad we could do this.
Greg: Yeah, absolutely. I love this topic and I’m looking forward to it. Before we dive in, though, why don’t you give a little background on on yourself and what you’re currently doing?
Martha: Sure. So my job is founder, and like you said, Chief Analyst at Interaction Metrics. That’s interaction like what we’re doing here, an interaction, and metrics like the number. And what I do is I oversee the research and analysis phases for clients like Convergix and Yaskawa America and California State Bar. So my key role is to ensure we hold projects to the highest levels of science.
Greg: Yeah. Great. Great. So you know, certainly there’s a lot of measurement. There’s a lot of theory, there’s a lot of practices with customer experience and customer experience programs. But, you know, we’re here to talk a little bit about the science of it. And so, you know, are CX programs scientific? Are they not? And why not?
Martha: Well, Greg, that’s a very big question with a very big answer.
Greg: I know, I know. I’ll give you the easy ones.
Martha: Okay. If programs were scientific, we really expect that NPS and ACSI scores would be quite a bit higher. In other words, what would be happening is we would all routinely be having good experiences. I guess another way to say it is that customer experiences would work right? And the the best metaphor that I know is a metaphor that I know of is really sort of medical based. Good seizure medications that are backed by evidence routinely result in patients having fewer seizures. So likewise, if customer experience measurement were really good, we’d routinely expect, no matter where we were, that we would have good experiences.
So I guess there are some other sort of ancillary backup to why I believe science is not being practiced as much as it could and should be. One is that we like the Net Promoter question, and almost all of our clients want us to use it. That said, it wouldn’t be as prolific as it is. It would be used more gingerly, for lack of better words. It would be used as Bain intended it to be used, which is to summarize how you feel about the company, and not at every single touchpoint. For instance, I have an interaction with the Bank of America call rep; based on that interaction, I’m not likely or not likely to recommend Bank of America. And yet I’m not here to pick on Bank of America. Routinely, companies are asking it at every touch point, kind of willy nilly. And it’s just not the way NPS was intended, and it’s really not a scientific approach.
Also, again, big topic, big, big answer. But I would say that companies tell me all the time they have very low response rates, and they tend to hear from very particular kinds of customers, maybe those with more time on their hands. At a conference I spoke at recently, a company was complaining it’s only older customers we hear from, but that’s not the entirety of their customer base. And so good science is representative response. So if you’re only hearing from a certain kind of customer, well then you’re not really getting the full ante of who you know who your customers are. So that’s just a little bit about why I believe customer experience programs are not held in general. I mean, ours are, but in general are not held to the highest levels of science.
Greg: Yeah, yeah. And so given that, what are a few things that could be done? I mean, that to make those to make programs more scientific in their approach.
Martha: Well, the one thing that I talk about all the time is that companies would work very, very hard to remove leading constructs from their surveys. A leading construct is one that directs the customer toward an answer you want to hear. So how satisfied were you with x, y, z. Well, that assumes the customer was somewhat satisfied, right? So that’s that’s a problem. NPS I would argue has bias. “How likely are you to recommend?” does assume the customer is somewhat likely to recommend. But in any event, and yet again, we use NPS because it’s a good benchmarking question when it’s used properly. So there would really be a team approach to scour surveys for anything that’s leading customers toward what you want to hear.
Customers have priorities. In other words, it’s not all equal. And the goal of any customer experience program and surveys in particular is to really come up with an accurate measurement of customer experience. So if every question is of equal importance, then you’re not really capturing the the nature of the customer experience, right? If some things are more important than others, you have to include that weighting factor. That weighting factor can be based on asking customers to rate what is most important in this experience, and then using that as the weighting factor. Or we sometimes we use correlation analysis to determine a weighting factor. But that’s that’s a really important aspect of survey design and survey calculations.
And then I would edit surveys for the bundle of usability flaws that lead to gibberish data. So, you know, now you’ve got me on a really a topic that I could go on for hours. But, you know, examples are like double barreled questions. That’s where you ask two things at the same time. So was your server efficient and courteous? Well, what are you asking?
Greg: What if they were only one of them?
Martha: Right. So. And often those are at odds with each other. So you get information, but it’s gibberish information. So, you know, the customer is just like, I don’t know, eeny meeny miny moe. Or insufficient answer options. I think about this all the time because I’m a huge Amazon user. It’s just the easiest way in the world to buy stuff. But then because I’m a huge Amazon user, I’m a huge Amazon returner. And so the list of options doesn’t include why I didn’t like it. They don’t include that. It’s like the website was website description was wrong. I almost always picked that like, okay, I guess the website description was wrong. That’s the closest thing to “I didn’t like it.” But whenever there are insufficient answer options, you’re going to get gibberish information, right?
One of my favorites is not allowing for anonymity. Because if you don’t allow for the option for anonymity, you’re going to omit a whole group of of respondents. That can be as many as 40% of respondents. Just if you’re going to name me — which you know, now that seems like now you’re going to hassle me if I give you a low score — I’m not going to take your survey. But their data is as important as those who name themselves. I’d say possibly more important.
One that we already talked talked about is using NPS when it just doesn’t make sense. When the rating scales are off, or there’s no zero. So we see this in reviews all the time where customers will write. Well, the choice was once or two star, three star, you know, up to five stars. Really? If you’d given me zero stars, that’s what I would have given. Like, yeah, yeah. I mean, one assumes that you’re somewhat satisfied in a sense, right. So really the better scale is 0 to 10 or, you know.
I would say internal language. We see gibberish from that all the time. Like when we review customers surveys and they ask, well, well what do you see? Do you think this is ready to go? You know, because we do free audits of surveys and we’ll say like, you know what, you’re in a good place. You don’t really need us. We’re happy to say that. Or, well, actually, this is not very scientific. Here are some things you want to consider. So in any event, when companies submit their surveys we’ll often see all this kind of internal language. Questions about design, white space balancing, things that you can’t expect customers to know. So they’ll just kind of eeny meeny miney mo.
So those are some examples of ways that companies are collecting data. But not all data is good data. So it’s sort of gibberish data. And you just hope they’re not making business decisions based on that.
Greg: I’m sure there’s lots of different causes of this, but I mean, you know, a few things. It would seem as a consumer myself, it would seem that it’s actually very easy to send surveys. Not not as easy to construct well constructed surveys. But, you know, it seems pretty simple to put one together in, you know, in the scheme of things. So it’s like, is some of this just well, you know, we want some information like your example about the white space, or whatever. So some designer somewhere on of a website is like, you know, I want to know this answer to this very specific question. Not very scientific, not very customer friendly, let’s say to, you know, to ask such a kind of a niche question, but, you know, is some of this just because the tools are so easy to use and there’s no there doesn’t seem to be other mechanisms or you know, why are we getting so many of these surveys and yet so the quality is so low, I guess.
Martha: Because anybody can buy Photoshop. Doesn’t make everybody a designer.
Greg: Right, right.
Martha: Right. I think that’s the sort of obvious, maybe it’s even a facile answer. I think the deeper issue could be sort of a lack of awareness of science and a lack of awareness of what good data is. And maybe, maybe it’s become just a task. Everybody’s like, task, I did it, you know. Check, check, check the box. And yet really there could be nothing, really nothing, for any company that’s more important than customer listening. I mean, honestly, right, like what could be more important than that? And yet maybe there’s kind of a company centricity where they don’t really want to know, like that’s sort of a psychological thing. Like maybe they don’t really want to know. In some cases, they just want to check the box.
Greg: And I do feel like in some cases, it’s like the the job of CX is to send surveys, right? Um, it’s I wouldn’t say like the season CX professionals out there that know what they’re doing, like that’s their job is certainly not just that, but there are literally people in companies that that’s their job is CX and it’s their job to send surveys. And I think to your point, it’s if it is, it requires some more education and some kind of and I know we’re we’re kind of relegating our conversation to surveys. You know, if we open this up to leading lagging indicators, all that kind of stuff, then it becomes a very probably even more unwieldy conversation to to talk about how those things tie into each other. But if your only tool is a survey, I guess you’re going to use it for everything, right? Whether it’s whether it fits or not. Right?
Martha: Right. Well there’s that and and I guess just kind of picking up on what you were saying. The discipline of CX has many methods at its disposal. Surveys simply happen to be the least expensive of those methods. But there are all kinds of methods like, you know, we do customer service evaluations at statistically valid levels. There are customer interviews. Those can also be done at statistically valid levels.
So there are other methods outside of surveys that also should be held to the standards of science. And so you know I think, maybe Greg, it’s possible the discipline of customer experience is so new that it really hasn’t absorbed the science message yet. Like if when medicine first came on board thousands of years ago, I can’t say that it was very scientific, wasn’t it like bloodletting or…
Greg: Leeches?
Martha: Yeah. So sometimes a discipline comes on board and it takes a while for it to really catch up to science, which is important. I mean, what we determined in the Renaissance was it really is the best way to understand the world. And so by extension, it’s the best way to understand customer experiences. It’s better than conjecture and belief. And it really is the best way we know of to understand what the nature of the world is.
Greg: Yeah. Yeah, I like that. Yeah. So, you know, moving moving ahead a little bit, um, you know, it’s we got to talk about AI, so we’re going to, so, you know, how does how does AI factor into this? You know, we’re talking about surveys and like is it is it going to help us. Is it going to hinder us. You know, where do you see that?
Martha: First of all, I love I love ChatGPT. We actually have our own ChatGPT engine. So, you know, fully bought into AI. Actually, it really should be called large large language models right now because intelligence is not where it’s at. So it does a lot of great things, that’s the first thing I’d like to say. But it doesn’t write surveys. So do not use it for that. Really don’t use it for that. It’s a large language model, so it’s just combing what other surveys are doing. And most surveys are not being held to a scientific standard. So don’t use it for writing surveys.
Now another very common thing that people do with AI is they use it for analysis and with quantitative information, and it just doesn’t work, because the kind of analysis that you want to do for quant is what we call segmentation analysis. That’s where you’re comparing different populations sort of side by side. Say you have OEMs and distributors and end users. You want to be able to compare each of those populations and how they’re responding to each survey question. And so that’s a little too complex for any kind of analysis that AI can do right now.
Now another way that companies use AI is for their text analysis. That can be great. But again, remember right now AI is not general intelligence. It’s a large language model, which means it really has to be trained to be effective. So we do experiment with mostly ChatGPT, but some of the other large language models too. You can put the text into one of those engines, but it generally, especially for B2B, it’s just not pulling out the nuances that you need. And sometimes it’s, you know, hallucinogenic. It’s coming up with stuff that sounds really great. You know, it’s like we did it recently and I was like, oh, wow, this is amazing. And I was like, wait a minute. Let’s like really read this. And it was wrong. You know, it looked really good. Like these were good sentences and really and I was just like, “Oh, done and done!” and no no no no no no. So it can be quite misleading.
That said I very useful if it’s trained. So you need researchers working side by side with AI. It’s easily confused, even with sentiment which is the easiest part of text. So sentiment is like are they happy or are they sad? Like how do they feel? But a sentence like love your company in particular, but your customer service is a real hardship? Well, many AI solutions are not going to rate that for what it is. That was a mixed comment. They’re going to say to go with that first phrase. Oh, they are positive — love your company. That’s not actually what they said. And that’s very simple. Sentiment is very, very easy. What’s more complex is finding the meaning, the emergent themes, what customers are actually talking about. And so that’s much more difficult than sentiment. And so even sentiment has its problems.
But okay, let’s put that aside. And what’s more important almost is what are customers talking about. What and how are they thinking, you know, what are the topics. And so, you know, LLMs can be a useful kind of side by side with researchers, but they really do need to be trained. I hope that wasn’t too shaggy. Shaggy, whatever they say, shaggy dog and answer.
Greg: Yeah, I mean, AI has been around for decades, but I feel like we’re in early days of this wave of of really using it in these ways. So there’s a lot to there’s a lot to do. There’s a lot of opportunity, but there’s a lot to also kind of unpack and really understand. And I think, you know, one thing that you touched on was just how it can be really helpful to use AI, but it needs humans to make it better. Um, just like we can use AI to make us better. It’s, you know, it’s kind of it goes both ways. So I think that’s where, you know, maybe someday to the general and I point like, maybe someday it won’t be that. But for the time being, it’s, you know, it can be really powerful when, you know, either it’s a first draft or a second draft or something like that, but it’s part of the process, not just some kind of end goal. Right?
Martha: Right. I mean, so the social science way of dealing with text is to do what we call coding the data. So that’s where you have a team of researchers, not one researcher, because you need a team to come up with verifiable, replicable results. And you go through and tag comments within set protocols. And then you can sometimes compare that against large language models and use that to train large language models. But there really are techniques for unpacking what’s in a conversation or in a body of text. And you know, these are important, proven techniques.
Greg: Yeah, absolutely. Well, Martha, thanks so much for joining. One last question before we wrap up here. You’ve given a lot of great advice and insights already. But for those that are listening here, you know, know that they need to inject a little more science into their programs, what’s one piece of advice to, you know, where could they start?
Martha: I think they could take a day and just study the principles of science. That’s random selection, controlled experiments; and then see what of that applies to their survey. You know I think that would be a day very, very well spent. And, you know, of course, feel free to reach out to me on LinkedIn or our website or however you like to chat, because I’m truly always open to that conversation about how you get evidence-based, quality, data-driven information about the customer experience.
Greg: Yeah, yeah. That’s great. Well, again, I’d like to thank Martha Brooke, Chief Customer Experience Analyst at Interaction Metrics, for joining the show.
================================================
Interaction Metrics builds scientific surveys that result in decisive outputs and actions. Want to see examples of our customer surveys? Interested in more detail about the science of CX surveys? Get in touch!
================================================