DataFramed
DataFramed

Episode · 5 months ago

#98 Interpretable Machine Learning

ABOUT THIS EPISODE

One of the biggest challenges facing the adoption of machine learning and AI in Data Science is understanding, interpreting, and explaining models and their outcomes to produce higher certainty, accountability, and fairness.

Serg Masis is a Climate & Agronomic Data Scientist at Syngenta and the author of the book, Interpretable Machine Learning with Python. For the last two decades, Serg has been at the confluence of the internet, application development, and analytics. Serg is a true polymath. Before his current role, he co-founded a search engine startup incubated by Harvard Innovation Labs, was the proud owner of a Bubble Tea shop, and more.

Throughout the episode, Serg spoke about the different challenges affecting model interpretability in machine learning, how bias can produce harmful outcomes in machine learning systems, the different types of technical and non-technical solutions to tackling bias, the future of machine learning interpretability, and much more.

You're listening to data framed, a podcast by data camp. In this show, you'll hear all the latest trends and insights in data science. Whether you're just getting started in your data career or you're a data leader looking to scale data driven decisions in your organization, join us for in depth discussions with data and analytics leaders at the forefront of the data revolution. Let's dive right in. Hello everyone, this is a dull data science educator and evangelists at data camp. A major challenge for both practitioners and organizations creating value with machine learning is model interpretability and explainability. We've seen a lot of examples over the past few years machine learning disasters, whether it's recruiting systems rejecting candidates based on race or gender, credit scoring systems been realizing disadvantage groups and much more. This is why I'm excited to have surge misease on today's podcast. Searches the author of interpretable machine learning. With Python and for the last decade he has been at the confluence of the Internet, application development analytics. Searches a true poly math. Currently he's a climate and agronomic data scientists, as SINGINGENTA, a leading agree business company with a mission to improve global food security. Before that role, he co founded a search engine startup incubated by Harvard Innovation Labs, and he was the proud owner of a bubble tea shop and much more. Throughout our chat we spoke about the different machine learning interpretability challenges. Data scientists face different techniques at their disposal to tackle machine learning interpretability, how to think about bias and data and much, much more. If you enjoy today's conversation, make sure to subscribe and rate the show, but only if you liked it. Now on to today's episode. Sorry, just great to have you on the show. That's nice to be here. Awesome. I'm very excited to speak with you about interpreter, machine learning, your book on it, bias and data and all of that fun stuff. But before can you give us a bit of a background of by yourself? Well, at the moment I'm a data scientist in agriculture. Yeah, there is such a thing. A lot of people are like what they figured it only like high tech, like companies, data scientists where I have a background in entrepreneurship. I created a few startups and for the longest time the way I defined myself was like a web APP person, someone that did all kinds of web related things, whether on websites or on mobile devices. Yeah, that's what I did. I went from being a builder also to be like have for like a more manage eral role as well. Yeah, and basically that sums it up. But I've done a bunch of other stuff. I was also very interested back when I was deciding what to study. I was interested in computers for sure, but the application of it. They were like. I had interest in graphics, graphic design, three d modeling, anything in that space, which now I get to work with the other side of it, which is computer vision, which is also very interesting. Computer graphics and computer vision are like two sides of the same coin. I also once owned the bubble tea shop and well, something to know about all the roles I've ever worked with, even the bubble tea shot, is that data was always there. Data was throughout my journey. I just did not know it was my true love. That's like this story where your true love is hidden in plain sight and you don't realize it. It was always in the background and I've only, like in the last seven years, brought it into the foreground what I do. That's awesome. So I would love to talk about the bubble t experience, but let's talk about interpret aal machine learning, given this is the topic of today's episode. Well, I think there's some understanding today within the field on why machine learning models need to be understood and interpreted. I'd love to set the stage for today's conversation by first understanding the why behind interpreter machine learning. So I'd love to understand, in your own words, if you could set the motivations behind why you wrote the book and why this topic is so important. Fundamentally, machine learning comes to solve incomplete problems, so we shouldn't be surprised when the solutions are also incomplete. We tend to think okay, we we achieved the high level of predictive performance and therefore the model is ready to be out in the world and it's just gonna work like a charm. And it doesn't. So that's just the nature of things. That's the nature of machine learning. And if it were that simple, in which were just like any other bit of software, we would insolve it with machine learning. We have procedural program if statements and so on. That would be our solution, and the other reason why it's important is for ethical reasons. And unlike other technologies before it, this is a kind of technology that replaces that something that is become very human in a sense. It's not like other animals don't make decisions, but the kind of decisions we make have a level of, for lack of word, intelligence that aiming at foreseeing well beyond like our immediate needs.

So we're not just thinking of the next meal and everything. We're thinking and we're thinking of grander things, organization or society, supply chain, all sorts of things, and so there's a lot of reasons why we would lose use machine learning and in a way they aim to replace our cognition. So the thing is how to trust a model the same way we trust a human. I would dare to say there's a lot of reasons not to trust the human in the first place, but ultimately we want to trust something that we can understand. And if it's a black box, why are we taking orders from a black box? That's the grand scheme of why to use interpretability. The reason that brought me to the book is I became aware of the topic when I had a startup back in two thousand, seventeen, sixteen and so on. I became frustrated that I couldn't debug my own models. As someone that had programmed for so long, it was just so strange that there was this thing blocking me, you know, like if I wanted to figure out why isn't this working, it would all always, not always, but a lot of times, point to the model and I was like, okay, why is the model doing this? And at the time there was very little resource on it, at least for practitioners. Everything was like in academic circles and there wasn't a clear understanding for me coming from outside. What does this mean? Right? So, as someone that's obsessed with decision making, I found this intriguing and concerning. So I felt like weird to be promoting a technology I couldn't understand. That got me into a rabbit hole and then I started to learn all the terminology, read all the papers. Then eventually some books came out. My Book was the Third Book for practitioners on the subject to come out. I thought, how late in the game does it have to be for it to be like the third book on the subject? So that's it. That's really great. I'm really excited to talk about the book and under line the challenges into the solutions that you discussed to a lot of interpretability challenges that data scientists face. But before I want to kind of harp on some of the terms and the lingo that is used in the industry to talk about this problem. Last year we had Maria Luciana accent, a head of responsible ai at Blois PWC, come on the show. One of the topics we discussed with the distinction between the different terms such as responsible ai, explainable ai, interpretable AI, ethical Ai. I think a small reason is why we have these new terms is probably driven by the COMMS Department of many organizations working on the problem. But I'd love to understand from you how you view the differences in the overlaps between these different terms. I find there's a lot outside of like industry, there's mostly there's a lot of debate and confusion around the terms. First, let's separate like ethical is like more a mission to inject like ethical principles and human values into Ai. So it's an ideal and yeah, it has a lot of parts. I'm not going to diminish, like the contribution, but it's not necessarily driven by really clear objectives, because everything is not only vague because there's no like perfect recipe for it, but it's also like there is a disconnect between people working on the ethical side and the people that are actively practicing in the field. And the rest of the terms that don't have ethics in it or fair in it are related to like the imperfect application of the value the vision, because there's many reasons to interpret machine learning models that don't have to do with ethics. Also for other reasons. It's just good bigness business practice. So like you have then explainable AI, which is another term that's used a lot, and then or x AI. People like to use the x instead, and then interpretal machine learning or interpretal AI, and they're used interchangeably in the industry, in academia. and to use interpret able to refer to the white box models, unexplainable to refer to the black box models, or vice versa. I'm of the opposite camp. I tend to think explainable it's more like a confidence term, is if you can explain something the whole thing backwards and forwards. So I think it's best to actually start using it, or like the highly interpretable, like intrinsically interpretable models, Lin your regression and so on, and because you have to. You know how it was made, you know how everything was made. You can extract all the coefficients and so on, whereas interpretation is something that it's perfectly okay to use with something that you can't understand completely. We do it all the time. The whole field of his statistics is based on it. How many things are just shown through a chart? Nobody claims to know every single thing about, say, the economy, but they can in average explain trend and so it's it's an interpretation. It's not an explanation in...

...the sense that they're like a certain responsible ai is like a newer term. I think it means the same thing than explainable and interpret o ai mean, except with all the baggage. I think there's a lot of baggage with explainable ai and interpretable machine learning from where they came from and the debate as far as what they mean. So responsible kind of takes that away because you expect something to be responsible to also have some ethics in it, but it's not necessarily entirely ethic yeah, it doesn't have that semantic confusion that the terms explainable and interpretable, because if I actually to interpret something, you are explaining it, so that kind of creates this whole mess. I actually prefer responsible, but I don't know how much it's cat's going to catch on. So that's what I think about those terms. That's great and I appreciate that holistic definition. So let's start talking about the book. Let's first talk about the challenges and interpreter machine learning. One of the chapters of the book outlines these challenges. Do you mind walking us through these challenges and how exactly do they affect model interpretability? The three concepts I think you're referring to our fairness, accountability and transparency. Fairness is what connects to things like justice, Equity Inclusion, in other words that models don't necessarily have, or are adding on an discernible bias or discrimination. Accountability, on the other hand, is what connects to things like robustness, consistency, prominence, traceability privacy, in other words making sure that models can be relied on over time and that can make someone or something responsible should it fail. And then transparency is what connects explainability interpretably. It's just like the baseline properties, in other words, understand how the decisions were made and how the model is connecting the inputs to the outputs. In other words, I can't you see it as a pyramid. Transparency is at a base, because you can't have the other two if you don't have transparency. In the book, naturally, I focus more on this level. However, a few chapters focus on fairness and incapability. I would speak of these being like where the focus should be. Mostly transparency is not by any means of solved problem, but the other two are more complex. So we're like tackling the low hanging fruit, and that's sometimes very convenient, but it's not necessarily always the best choice, because it doesn't matter if you can explain the model, if it if it's not robust and if it's not fair. So okay, that's awesome, and so let's talk about that. explainability and the interpretability in machine learning. Do you mind walk you through the main challenges practitioners face and how exactly do they affect model interpretability? When it comes to interpretable machine learning, model interpretability is impacted by three things that are present both in the data and the model, their nonlinearity, non monotonicity and interaction effects. These elements at complexity and, as they said, they're present everywhere. The most effective models match the nature of the data. So the nature of the data and the models are very highly connected. So if the data is linear, it makes sense that you use a linear model. That's why linear regression has all these assumptions, bake did. And if, if the shoe fits, also with neural networks, why not using neural network? That's why I think neural networks fort is unstructured data, because unstructured it has the properties that you would expect. It has the complexity, the nonlinearity and so on. One of the trickiest is the interaction effects, because feature independence is an unrealistic assumption. In other words, more likely not multi lienarity is present in the data. It's really hard to interpret models where many features are acting simultaneously to yield an outcome in a counterintuitive and contradictory ways. It's not as simple as set there as parab us, that is, all things remaining equal, especially when you have large amounts of data and many features. So those are the challenges in machine learning. It's often discussed in statistics, but statistics like when a lot of things, these things were discussed, there wasn't this idea that data would explode the way it has, in volume and velocity and in all the different ways that we come to call big data. Now, okay, that is awesome and I'm very excited to discuss an outline with you the solution to these challenges. But first let's talk about another big challenge that is really important within the realm of interpretable machine learning, and that is biased. You know, data is are many one of the most important assets in generating robust, interpul and responsible machine learning systems, and bias is a big problem when...

...it comes to data. So can you outline where does bias and data comes from? Bias can come from two sources and they trickle down. They can either come from the data generation process or from the truth, which is where the data generation process connects to the data itself. The many bias that in fact the data generation process are sampling bias, convergence bias, participation bias, measurement bias. For instance, an old school example is if you start do a survey by the phone to see what percentage of the population approves the president, the problem is that your data is not likely to be representative of the population because not everybody is reachable by phone and not everybody would answer a own phone number, not to mention those that respond and we're willing to accept the survey. Are Certain kinds of people that may not reflect the general viewpoint for the general population. So that's going to be a problem. That's just an example of the kind of bias that in facts the data generation process. There are also data entry bearers that you may have. As I said, they're random if you're lucky, and those are presents all over the place. As for the truth, the data generation process may be unbiased, but it captures what I call ugly truths, so instances of real discriminatory behavior. They're also fake truths, which is what someone does something deceitful to trick our I t systems, and this happens all the time, but a lot of times it goes under the radar. Maybe it's not done to sabotage a system or you know, it's just done to get some kind of benefit that the system provides. And then there are the changing truths, and this is which is how our data generation process captures data at one point in time, but even in a short time afterwards, reality might be different. Our data generation process might be biased in that sense. Okay, that's really great. And on that last point around ugly truths, given that these ugly truths are unfortunately rooted and biased we humans produced in the data generating process, will we be able to solve for this type of bias? and machine learning through technical solutions? Machine learning is all about predicting the future, but there's this quote by the founder of Atari and it's very clever what it says. The best way to predict the futures to change it, and there's absolutely no shame in that. If you actually bias your models to counteract the bias present in the data, you'll eventually improve the future. So that's something that's part of the loop. People don't realize that they can do. A lot of models have that property, have that ability. Say You have a pricing model for you real state, and you could realize what if I make that pricing model so that it actually counteracts bias? People have already with prices so they don't go out on a frenzy and buy all the available homes at the worst possible time. You can do all kinds of things to do that. Another example is if you have a biased data set that is about, say, criminal recidivism, and it's about detecting who could possibly go back into jail. You could go commit a crime again after being in jail. You could argue we shouldn't have models do that, we should have human judges do that. But then you have to see, okay, well, how good are human judges in that we can do a model that actually improves the false positive rate over human why not test it? Why not see how we can actually counteract the biases? You know, maybe in ways in which we benefit society. Say, we see, okay, well, women are more likely not to residerate and they have children, and so by having them in jail we're actually gonna prepared to eate some kind of and cycle. So why don't we bias the model in such a way that it's beneficial, rather than perpetuating the bias to begin with? So these are questions that I don't intend to solve. I think it's something that sociologists to get together with economists and the people working on these problems from a technical point of view. But the mindset shouldn't be let's predict the future. The mindset should be let's change the future for the better. I really like that perspective, especially since you're taking the data generating process and using it against itself, the biased data generating process, in this form of ugly truth, and you're able to, as you said, engineer and change the future for the better. Harping on in the book as well. You know, one thing I love about the book is how granted it is, and many examples of interpretability in real life machine learning use cases, circling back to the earlier part of our conversation, were discussed to why behind interpretable machine learning? I think one thing that's also super important to take into account is a degree risk...

...associated with a model is highly dependent on the use case, industry affected population by the model. For example, a credit risk modeling that is determining loan outcomes has a much larger playground for potential harm than a customer turn model which is driving retention strategies for a SAS organization. How do you evaluate the risk of a machine learning use case? How would you describe the spectrum of risk here for any given machine learning application? Well, I'd say that risk is determined by several factors. First of all, you want to know to what extent your algorithmic decision impact stakeholders. And what are the stakeholders? You want to, quote unquote, protect their interests from. They're all waited differently, you know, like one thing is like the bank trying to protect its officials, or protect employees or protect the customers. To what degree are you willing to have risk on every level? Because the goals aren't always aligned. So you might think, okay, well, the short term risk is with the back which its profits, but the long term risk it will alienate the customers and those customers will go elsewhere. So the idea is not to be shortsighted in the assessment of these risks. The best way to understand that is understand it as a system where stakeholders are not in isolation. There is not one magical metric. You might think, okay, well, I want profits to be my magical metric, profits for the next quarter, for instance, but a lot of these things measured on different style, time skills and with different actors, all interacting with each other. So once you start to see them as a system like you could develop like, for instance, a causal model and try to understand how pushing one lever in one way will impact the other. But this takes an idea of experimentation and not just inference or prediction. So it's like taking it to the next level. My book actually has an example in that sense of how you use kate or conditional average treatment effects two measure what is the like least risky option for the bank and for the customer in a circumstance like that? Not all problems are like that, though. There are AI systems whose risks are elsewhere, who risks are in misclassifications of a specific kind, and so they might have instances of bias present in their model and that they're not even aware about, or they're aware about but they brush off and they think, go it's not a big deal, but the level of reputation damage that comes from something like that can sink the system down completely and nobody will want to use it. Ever, so shouldn't you did risk that, and so that's something you should realize when you output this, at the very least to put out disclaimers. It has a weakness with this kind of input, or was only meant to be used on these circumstances and people should keep these kind of disclaimers because maybe an official from the company will say, Oh, why don't we take this and repurposes for that, and then you get the sort of disaster that you got result. I was never meant to be used in that way. Then there's also risks that you get for how robust it is towards adversarial inputs. That's something that a lot of as AI is being adopted in a lot of different ways. We failed to realize how it can be gained. I connected with my early experience with the Internet how naive we were with the Internet early on. We didn't realize how data could be exploited or how our credit card information could be stolen. And so little by little, like everything became more and more robust, like first with SSL and then until I think it was like eight years ago, nine years ago, that Google came along and said, okay, well, normal websites with SSL will be featured or will be ranked by our algorithm. So now ssls a must. But in that journey there was a whole level of awareness that became towards okay, this is a technology that can be used for dangerous means. So we have to protect it to be useful for the rest of us that are not using it like that, and I think people will become aware of it. I just hope the outcome isn't painful for anybody in the sense that, okay, all of a sudden, if everybody, not just celebrities, are getting deep fakes generated to game all kinds of systems, were who just want to use AI systems from then on? So I think a way to improve this is to gauge monetary and value in other ways, so stop the like single metric mentality that we have not only in machine learning but in business, which is okay. We have this metric and this is why we have to chase and all metrics are tied to this one, and I think it's something that would solve a lot of things if we started to in factor environmental problems, social problems, other stakeholders of all kinds into our...

...general equation of the well being of a company or a society, we'd be better off. And so yeah, there it's going to take a very careful discussion of how to weigh everything, you know, like what is more important, and it's a very uncomfortable discussion, but I think it has to be done sooner or later. I love the holistic perspective here and before we talk about solutions, you want to give maybe an explanation of the zillow example, because I think it's highly illustrative of the problems that you're discussing for those who may not be aware of it. Silo had come up with a pricing model. Hopefully I remember this correctly, because it happened many months ago. I had this pricing model that was just for the benefit of users. So users would come and would say, okay, estimated value of this home is this much, and so this of course had an effect on the users because maybe it was over price, berry was underpriced, but they seem to think it was accurate. But this had no like ramifications other than inflating or deflating a market. But then they figured why we get into the game of buying and selling homes, we have this metric, we can use this metric, this number we can use to actually can a benefit because we know what it's estimated at and we can negotiate a price based on that and so on. So they sooner or later realized that their estimates were inflated, and they probably it was beneficial for them to be inflated, because who doesn't want to go and see their home and see, oh, it's actually, I don't know, fifty dollars more than I thought it was? And it's good to their stakeholders, their customers, the people that come and see that, but to them it was terrible because they end up losing a lot of money that way. So they couldn't make back. They couldn't flip the homes for what they thought they could flip them because they in a way had contributed to inflating the markets and that that was very painful. They had to lay off a lot of people because they used the model in the wrong way. They had never designed it to do that, and and quants to this and finance all along. They don't necessarily use machine learning model for that. For doing pricing optimization, they use all kinds of optimization methods that are purely math based. At the in the best case, in the closest thing to machine learning you'll see there is reinforcement learning. And Yeah, I think it was just a bad idea. Yeah, it's a very fascinating example, given that, let's discuss the solutions to many of the challenges that we discussed. The book does an awesome job at breaking down a lot of different techniques available for practitioners looking to drive better interpretability. Can you walk us through these techniques and the crux of these different techniques as well? When people talk about interpreting models like everybody's interpreted a model at a very, very, very basic level and that's evaluating performance. So, especially once you break down the performance by cohort segments, everything you're you're already engaging in error analysis, and error analysis is an interpretation take. But beyond those traditional interpretation methods, there's also feature importance, and that's the probably the first one. People will learn them as, oh, I didn't know this existed, and it allows you to quantify and rank how much each feature impacts the model. And depending on what you're dealing with, it it may vary. It might be for a tabler data set it's every column, even every cell. You can break it down, but for an NLP model it depends how you tokenized it. It might be every word and, let me every character. It really depends how you tokenized it. And then for an image it's every pixel that's how it's usually used. And then feature summary, on the other hand, examines individual features and the relationship with outcome. They include methods such as partial dependence plot, accumulative local effects and ideas to figure out, well, a feature might be important to the model, but how is it important? What? What is driving things? And so you start to see the relationship of that feature with the model. You might realize for cardiovascular disease, as age increases, the risk increases. So you're already seeing in those terms in which it's not just that the feature is important at certain values it has more impact or it has a negative impact or positive impact, or that actually goes up and down, and that's when we start to see, well, it's non monotonic because it's not going in one direction. And then feature interactions methods quantifying and visualize how the combination of two features impact our outcome. So usually we go by variant, but you could even go into three levels if you wanted to, or four, but that which is crazy. But yeah, you see how they work together. Sometimes you'll realize that a feature by itself has no impact on the model, but I tend to talk about it if it were like a basketball game, there's players that you know they're never shooting the ball in to the to the nest, but...

...they're assisting another player to do that. And that's how it is in a machine learning model. Often, well, it depends on the kind of model, but they tend to work together. Well. There's also another kind. I didn't mention those kinds I mentioned they are all like global interpretation methods, but actually the vast majority of interpretation methods that exist, especially thanks to deep learning, our local interpretation methods. These are ones in which you're trying to understand a single prediction. So you're not trying to understand the model as a whole, but you're trying to understand a single prediction. And sometimes you can even take interpretation methods and apply them to single predictions and put them into a group and to say, okay, well, all these predictions that were misclassifications tended to be misclassifications because of this. So you can also do things with a bunch of single predictions on that level as well, which can be very useful. That's the awesome and I love also the holistic nature of how you approach these different techniques. One of the techniques that we've heard a lot about over the past couple of years in the data science space, and that's kind of quite a lot in the book as well. Is Shop Values, which are short for sharply editive explanations. Can you dive more into more detail about what shop values are and how and why they're useful? If you understand shop you have to understand first shop LE values. Shaple was a mathematician and Shapley Matthew values is a method derived from coalitional Game Theory and generally I explained this as a basketball analogy. So I tell people imagine your blindfold at a basketball game and allowed speaker just announces when a player exits or enters the game by their number. They don't say their name, just by their number, and you don't know if that player is any good, say like Michael Jordan's, or bad, you know, like say, I don't know, I imagine Mr Bean is a bad player. Good players and black players just hop in and hop out at any time and all the only way to tell if there are any good if their presence made a difference to the score. So with this you can get an idea what players are contributing the most. Most positively in negative for instance, you notice when twenty three is playing, score increases a lot, no matter who else is playing. So you get this idea it must be really good. But imagine you're blindfolding and you're taking notes as well. So you you start to okay, quantify that, and so the differences course, becomes a marginal contribution, and this starts to make a lot of sense once you quantified all the different marginal contributions and once you run through all the possible permutations of players in the court. Imagine it was like an endless game. You can calculate all the average marginal contributions of each player and this is called the shop lead value. For model, the features are the players. Different subsets of features are called coalitions, and you can calculate all the average marginal contributions for each players. No differences in predictive error are actually the marginal contributions. And you're blindfolded, of course, because it's a black box model. That's why you're blindfolded. Of course, the problem is computing SHAP Lee values is very time consuming. So, as you can imagine, all the different permentations of all the different features, unless you're talking about two or three features. It's just like enormous. Shop combines other methods in many cases to approximate SHAP Lee values. If not, it just takes a sampling. It doesn't do the entire permitation. So it still adheres to a certain degree to all the mathematical principles which exist within Shap Lee and often the thing is that shop is the closest we have to a principled way of calculating feature importance. And this is important because it has advantages because of all the mathematical properties. You know. It has symmetry, there's a whole bunch of dummy principle, it has a whole bunch of principles and since their shop valleys values for each feature and observation and other advantages, that can become a local interpretation method as well, which is rare because usually you're just wonder another I love the explanation using the basketball analogy. I think it's very useful to use it as a mental model for how to use this technique. Now, given the proliferation of these interpretability technique, what is the mindset that a data scientist needs to adopt when trying to understand the results coming from interpretability techniques on machine learning models. This is whether using shop or other techniques as in. What level of certainty do these techniques provide, and do data scientists need to inoculate themselves from a false sense of certainty when interpreting the results of interpretability techniques? Well, certainty with machine learning can never be too high. I think that's a...

...fool's error in any case, and you have to start seeing models the same way you see another fellow human. If you did something and I asked you, why do you do that, and you told me because of this and this. I can't be certain. I just saw inputs and outputs and then you gave me an explanation, but I don't know if you believe you're not. That's called POSTCC interpretability and it can't be a h but what you can do is you can use different methods and then get a good idea. If several people saw you do something and then they asked You, your mother asked You, your sister asked You, I asked you, everybody around ask you, you tell them all different stories, but they have some commonality. You can take that commonality with higher certainty, and that's what I expect. People to do with these models. Don't rely on a single model and call it the truth, a single interpretation method and call it the truth. Rely on several. That's always a good practice. And another thing is a lot of them are stochastic in nature, so they might give you slightly different results every time you run them. That's certainly the case for line, which is another very popular method. So why not average them out, average back the avvue values or take the median. You'll get something closer to the truth in that sense. So it will be a lot easier to interpret the more you run in. There's also other cases in which they have a number of steps. Integrated Gradient has number of steps, so if you increase the number of steps, of course it's going to take longer, but it's going to be a better explanation. Or number of kernel SHAP, which is one of the truly model agnostic ways of doing chap, has a parameter called number of samples, and if you increase that number of samples you'll definitely get a better you know, something you can trust more. So there are ways of approximating better certainty within each method. But also some thing I recommend. It's not relying on single method and, as I said, it's an interpretation. It doesn't have to be perfect, but it's better than nothing, you know. It's better than blindly trusting the model and just never examining what it's doing. Definitely, and given the results of some of these interpretation models, we've discussed how to interpret them. We've discussed the mindset that we need to be able to adopt here. What are some of the common diagnoses you can make about a machine learning model that has suspect interpretability results, and how can practitioners remedy these diagnoses? You mentioned bias, and that's a big one. Fortunately, there are many bias mitigation methods that remedy this on three levels. So you can remedy bias in the data, you can do it with the model or with the predictions themselves. Another one is complexity. Models can get too complex and that can lead to poor generalization. So regularization is what I prescribed on the model side, but also preprocessing steps like feature selection engineering can address it on the data side. And if you're dealing with something like, for instance, images, there's all kinds of preprocessing steps you can do with images, unless there's a reason, for instance, to have a background. You can remove the background of an image. You can also sharpen it. You can do whatever needs to be done, but realize that whatever is done on the pre processing side during training has to be done on the pre process side during inference. So we also an NLP, do a lot of pre processing. Sometimes you realize a form of feature engineering has selection, is actually stemming or limitization, you know, like taking away pieces of the word to make it simpler for the model, and you can certainly do that. That will make the model more generalizable, because there might be words that seem like like another word and they pretty much mean very similar things and it's just by taking only the stem you're making it more clear and less complex. So there's also robustness, and for that we can augment the training set data or address the model itself with robust training methods, or even in the predictions. There's many methods that can be used for that and for model consistency. As long as you monitor data drift and retrain frequently, you can tackle it. But it's also good practice to train using a time based cross validation with time is an important element, like for the typical like cats and dogs, scenario you know, it's not like we expect cats and dogs to evolve that much during their lifetime. There might be slightly new breeds, but it might not make a difference if images are like two years old. But for a lot of cases, like I find in my own work with agriculture, like every season is different, you know, like different weather, different everything, it is good to update the models a lot and use cross validation to make sure that doesn't matter what year...

...you use, the model is going to be robust and consistent. That's really awesome and especially on that last point, I think that best example of consistency has been through covid right FMCG models that are highly time series based when it comes to, for example, predicting stocks of items like, for example, toilet paper. These have a completely different dimension pre pandemic than post pandemic. Yeah, for sure, awesome. So as the industry evolves in machine learning is further adopted and the need for interpretability goes higher, what do you think the future of interpretability will look like? I think there will be a definitely a change of mindset. I think it will come naturally, organically, if you will. But the problem right now is that model complexity is seeing the culprit of all ils and machine learning, but it isn't always, I think. After all, the things we try to solve with machine learning are complex, or they should be. I mean I don't think maybe there's some novices out there taking like a very simple tabular data set and for any deep learning on it, but I don't think it's like a big issue. But I do wonder that we're taking like the brute force approach too far. There's been an arms race with model complexity. It's not my area, but I have to wonder if we need to leverage trillion parameter model language models for natural language processing task. There has to be a simpler, less brute force way of achieving the same goals. After all, humans only have eight six billion neurons and we only use a fraction of them for language at any given time. So I have to wonder what is going to come of this arms race and if it's going in the right direction. And that, of course, is going to change into prevailing whatever direction goes at. If it becomes more proof forcing and everything, it might hit a limit in which interpreting it is going to become impossible, at least through traditional means or we'll have to use more like approximation based things that will give us a less reliable interpretation. There's the other big issue we've discussed throughout the session is biased. Generally the ideas should train machine learning models that that really reflect the reality in the ground. So the idea is, okay, let's go a bit back to basics. Think about what it's like to have a data centric approach. Forget about the model, let's go back to the data. Understand how to not only improve the quality in order to achieve better predictive performance but, as I said, actually proved improve outcomes changed kind of the flow of things, because a lot of the things is we don't realize what the technology is coming out of it. It's like when social media was came out, everybody Oh this is wonderful, we can finally communicate in these very rich ways no matter where you are in the world, but we didn't realize what can happen from that and how, once the genie is out of the box, how do we put it back in or how do we improve things? The same things can't happen with Ai, and I think the data centric approach forces us to do that. And then there's, of course, things that can improve with model intperability. There's more and more methods coming out, methods that I've that I find super promising, and we'll take a few years for them to reach the pipeline that are in production. But I think something that will make a transformative change in the field is the nuts and bolts and machine learning are right now data cleaning, data engineering, training pipelines, the drudgery of writing all the code, but or to orchestrate training and inference. In coming years, new and better no code, in low code machine learning solutions will displace these quote unquote artisanal machine learning approaches, and I believe the best ones will make interpretability prominent because once creating a sophisticated machine learning pipeline is less than one day's work in a drag and drop interface, we can devote the rest of our time to to a atually interpret the models and improve them. So there'll be more into more iterations in that sense. Right now, like the inderations are like, Oh, let's achieve predictive performance, but that's better predicted performance. But once you have no code system maximize predictive performance for you automatically. Then what will the data scientists do, and I think the best thing we can actually do is interpret models and and we'll learn how to extract the best value out of the models and improve them in other ways. We're not improving them right now through that. So how to achieve better fair it is how to make them more consistent, how to make them Marble Buss, how to package them in such a way in which they have all these properties that outline their properties, their weaknesses, their strengths, data providence, a whole bunch of things. That's really awesome. And going beyond the future of technical interpretability, you mentioned the book that they're legal, social and technical standards and procedures that need to emerge to...

...really realize the full potential of interpretable machine learning. What do you think needs to change in terms of regulation and how we organize ourselves as a species in society to realize progress on that front? Yeah, I think there's a lot of things I would propose. I think there's a lot of people thinking about these solutions. I think, for one, certification is a must. I think before deployment you can certify models for things like adversary robustness. There's only very few like methods and standards for that right now, but they'll evolve. For fairness, that's another one and you can also right now. There's so many different metrics for fairness, but you could at least specify which one you use. And then even a level of uncertainty. There's a method called sensitivity analysis, which I discussed in the book, which can tell you what level of certainty you can have with the outcome, but not so much the interpretation method. It's about outcome. Then the model card. That's another good one that wouldn't already exist. I mean a lot of people have criticized it because you think, okay, well, that's so simple, but it's an important step. So if you deploy with along with the model, you deploy a card which tells stakeholder important properties about the model, such words the data came from, potential weaknesses and what intended uses you prescribe for it, etcetera. Like whenever we have a product out there, it just makes sense to have like in the label all the ingredients and everything it's. I see it like that. Also, I advocate for abstention. If a high risk model, you have a lot of low confidence predictions, like, for instance, you might have if you're telling a bank customer, okay, your loan has been denied because your deem risky. No Way, that's stupid. Just leave it up to a human. I think that's ridiculous. There should be a band in which you say, okay, well, this is too close to call. Have a human do this and then you can improve the model that people can look through this and say, okay, this is why the model things. It's like that because it has this. You know that this customer has slightly less collateral than we would expect, or because, but I would give it to this person because they actually have this. But that's not in our data. So maybe there's ways and we can improve the model through actually catching this and sending to a human. Then we have monitoring. I think it should include more than predictive performance. We could, in addition, for checking for data drift and continue a model robustness, feature importance or fairness metrics. There's just so many things we could model, we could monitor with the for the model. Another thing I advocate is the manifest have something that the model has that you save that can let auditors trace model decisions, much like a black box does in a plane. I think that's super important and that nobody can tamper. If there's one use case for Blockchain, it's definitely that. Uh No, there's a ton of use cases for Blockchain, but I think that's definitely one more. And then expiration. I think one of the things that should be in the manifest if the model could have auto destruct I definitely think it should. Models should have a strict shelf life at like milk on bad it should be tow stack as soon as it meets the date, no questions asked, and then a retraining procedure. If it's in place, you'd aim to replace the model just before that day. I think that's very important and yeah, that's what I would suggest. Yeah, a small list, definitely a small list, but I really appreciate this really holistic perspective. I think he really is one of the few interviews I've had where someone lays out such a holistic perspective on the future of machine learning. So finally searched. As we close out, do you have any final words before we wrap up today's episode? Nothing in particular. I think people should be excited about where we are in this field at this time. I think it's a good time to be joining or even as a user, seeing what's coming out. As I say, I equated to the mid level maturity that the Internet had, say, in their early ADTS, like two thousand one, two thousand two, things were starting to work. Well, all of a sudden the Internet wasn't the crap that it was before that. People were excited about it, but they were like, these websites are so ugly and this one never works and this browser shows me this and not this other thing. And so I think that's where we at and still but on the development side we're still going through growing pains before, still writing their code by hand, as I said, every single line of and I think, okay, well, that fit. What's all the stuff we do? It just seems like it's gonna seemed very antiquated very soon, and I'm excited for that as well. But one of the good things of being in this space at this time is that you get things at the ground for your la learn how to see things without all...

...the abstraction you'll see them in a few years, because once it's all drag and drop, it's going to open the flood gates for less technical people and that's a good thing, but at the same time it's not going to allow people to see the guts of things the way we see them right now. That's awesome. Thank you so much, searche, for coming on data frame and sharing your insights. Thank you. You've been listening to data framed, a podcast by data camp. Keep connected with us by subscribing to the show in your favorite podcast player. Please give us a rating, leave a comment and share episodes you love. That helps us keep delivering insights into all things data. Thanks for listening. Until next time,.

In-Stream Audio Search

NEW

Search across all episodes within this podcast

Episodes (121)