DataFramed
DataFramed

Episode · 4 years ago

#9 Data Science and Online Experiments at Etsy

ABOUT THIS EPISODE

Etsy, online experiments and data science are the topics of this episode, in which Hugo speaks with Emily Robinson, a data analyst at Etsy. How are data science and analysis integral to their business and decision making? Join us to find out. We'll also dive into the types of statistical modeling that occurs at Etsy and the importance of both diversity and community in data science.

In this episode of data framed, a data camp podcast, I'll be speaking with Emily Robinson, a data analyst that Etsy. Emily and I will be talking about online experimentation at Etsey and e commerce website focused on handmade or vintage items and supplies, and how data analysis and data science are essential to their business. I'm Hugo bound Anderson, a data scientist at data camp, and this is data frame. Welcome to data framed, a weekly data camp podcast exploring what data science looks like on the ground for working data scientists and what problems it can solve. I'm your host, Hugo bound Anderson. You can follow me on twitter at Hugo bound and data camp. At data camp. You can find all our episodes and show notes at Data Campcom community podcast. Hey, emily, and welcome to data framed. Hire you go. Thank you for having me. Such a pleasure to have you on the show. I'm super excited to have you here because, you know, we're delving deep into what data science can do, what it's capable of, what it looks like on the ground for working data scientists and data analysts, and it's a really great opportunity to discuss these things, kind of talking around on lighten experiments at Etsy. But before we delve into your work at Etsey, I'd like to find out a bit about you. How did you get into data science? Sure, so I have a social science background, so I had a pretty heavy statistics background. I did as a statistics minor and underground and then I went on straight from there into a PhD program and organizational behavior at Inciad, which is a business school in France and Singapore and so as in the PhDs. So the plan was, you know, five or six years of that, but then, while I was in it, I decided to off track at the masters. So I did two years instead and finished up in June two thousand and sixteen. And then when I was thinking about where I wanted to go from there, data science was appealing to me for a couple reason. So it was, you know, a growing, exciting field and IT FIT with my background of Statistics and Academic Social Sciences Research. So the...

...research I was doing there's a very similar process to when you're tackling a data science question, and that you you come up with the question, you design a way to answer it, you collect and analyze the data and then you have to present it and you know, in both social science and data science research you can be presenting to a lot of different audiences. So either people who are, you know, non technical or non experts, people who are sort of vaguely in your broader field but maybe don't know as much about your narrow area, or you know, people who have ten more years of experience in your narrow area and they're judging your work. So learning how to how to tailor your work to these different audiences. So and thinking about how to best transition, I decided to go to met US, which is a data science boot camp. That's three months full time. So I wanted to fill in the gaps that I add in python and machine learning and get version control and it also gave me some time to build up or portfolio of data science projects that I could share through a blog and get hub. And if you want to learn any more about my background, I was interviewed by the R forwards group, which is a task force working to promote women and other underrepresented groups in our programming language, and I believe the link will be included in the show notes. It definitely will, and this is this is great and in a matter of maybe a hundred and twenty seconds there. You've actually touched upon several points which which are really interesting in terms of people trying to break into data science and established data scientists in terms of the importance of statistics, experimental design, communicating results, which which is incredibly important, building a portfolio and filling in the gaps. I mean you mentioned you had a statistics background, but filling in the gaps when need be, with respect to programming, version control, these types of things. And before we dive into talk about...

...your work at et se, could you could you speak to how much statistics and or programming you need to know in your work and how much you can pick up on the fly as well? Sure so. I think something interesting here is that, depending on which subspecialty of data science you're in your the programming and statistics needs vary. So, for example, we have some machine learning engineers here at ATS. You work on the search ranking team and and they generally come from a pretty heavy programming computer science background but don't know much statistics. Versus in my work, I definitely do some programming, but I certainly would not say you need a degree or even multiple computer science classes to do it, but I use a lot of statistics because I'm working on the experimental design and I think something that you know really captures one nice way to think about this is type a versus type the data science, which you know, as was shared in a core answer. And basically the type A is the analysis, is sort of a traditional statistician, and the type be is building machine learning models. And I would also say you definitely can learn on the job. So before I came to work at etc, I didn't know much sequel. I just sort of knew the basics that I could learn through a short online course and I've been able to pick up a lot here, but that's help. Partly is that I came I was already able to offer something in statistics and you know my our skills and making and making graphs, and so I was offering something while I was learning these other skills like sequel and like scalding, which is what we use to write our bay data jobs. So you started with a math heavy background, particularly in stats and and fielding the gaps with respect you your programming chops. Do you think I mean? It's commonly said that that may be easier than than knowing a lot of computer science and trying to learn all the stats and calculus later on. What are your thoughts on that? That's probably true, and I think that's partly because there are a lot of great online resources for learning programming. I mean...

...data camp of courses one, but there are many others. I'm not as familiar with the with the stats ones, just because I haven't had the need to learn it that way. I learned it sort of more traditionally through my college under roud courses, and I also think the motivation there can be more difficult in the sense of with programming you sort of get immediate feedback and motivation like wow, I can, I can make this graph, I can, you know, change this data set, and I wasn't able to do that an hour ago, whereas you know, you might not get you know, you can run with a stots thing, even if now, okay, you can run a linear aggression and sort of understand it. You could kind of do that before anyway, even having no idea what's going on behind the scene or what assumptions there are, and it can feel like you can take a little longer for getting a stats foundation to pay off. Can you tell us a bit about et seat and it's business model. What happens there? Sure so ats. He's a global market place for unique and creative goods. So we don't have any inventory ourselves, but instead we connect are about one point nine million sellers to or over thirty million buyers. So we have at forty five million unique items and the sellers range from people who just want to make a little extra money selling their knitting to people who do this full time and have a small team working with them and sell thousands of items a year. And at sea make money when our sellers sell, because we take about three point five percent of the sale price and we make a little bit of money when they list. It cost twenty cents the list and then we also offer some seller services like shipping labels and promoting listings. But basically, overall, we do well when our when our sellers do well, and that's how that's how et se continues to grow and that's our mission, is to help our sellers thrive and buyers to find these creative and unique goods. And you said from about one point nine million sellers. Yes, we're are all the sellers,...

...so they're almost in every country in the world and close to every county in the United States. So there's it's it really is a global marketplace and that's one of the great things is you know, if you're interested, you can get something right from, say, your little small neighborhood. So of me maybe Brooklyn or Manhattan, but you can also get something from Japan or you can get a special Icelandic clothing item and order that directly from a seller. Fantastic. So what type of business challenges does etsy face that that data signs and data analysis can can help with an impact. Yeah, so I would say data science is really spread throughout the company. So we have a small team of data scientists who really who work on our recommendation modules. So if you've been to Etsy, we have a section called our picks for you. So basically we've tried to learn what you what you like and what you what other items you might be interested in. Then we also have data scientists working on our ranking models. So when you search for a jewelry, you get millions of items back. How do we pick which ones we show on the first page and what do we take into account? So maybe where you're searching from or what time of day, and then finally, we also have the data analyst team, which I'm on, and so we do data science for human consumption. So we in bed with different teams and I work with the search team and so I specifically work on a lot on experimentation and, you know, designing and analyzing the experiments and planning them. But you know, we also have people working with the with the marketing team, and they can do things like make it really easy if the marketers want to target people with an email campaign who bought in the last ten months in Germany. You know, our marketing analysts can really help them easily access what users those are. Or they can make dashboards. So even people who can't we use looker,...

...which is a business intelligence tool that basically queries from from our sequel database, and so even people who don't know any programming, if we've set up a dash board for them, they can then say we have okay, what is our total sales over this year, and they can easily sort of drag and drop and figure out, okay, what about if we look at Germany? What about if we look only at sales by sellers in March maybe in Iceland or in Europe? And so we basically, are really meant to release all this data that at s collects as an ECOMMERCE website and help people use that to make product decisions. Great, and so that's an internal tool. Yeah, so we so we really work mainly internally facing as data analyst, versus the data science team we know works on these directly, on these products like art picks, for your ranking algorithm, that people outside at CC. I really want to delve into what type of experiments you work with and and it enjoy the most. In a second. I'd like to first jump into the other types of challenges you mentioned. Were a recommendation challenge, which is really an exciting burgeoning field. We see that, you know, when we go to Netflix or go to Amazon. We get recommendations that are on point occasionally, but but for the most part, maybe, maybe, relatively odd. So there's a lot of work to be to be done there. You also mentioned the ranking Algorithm, which is when I search for something, it'll rank what the search search results come in at us right, HMM, and I suppose there's some sort of matching, matching problem happening there as well. Yeah, so this is really so. I knew almost nothing about search before I started at say so, one of the first things I did was just how does how does search work? And so really in this there's those sort of true problems. One is, what items do we even return at all for a search? So how do we retrieve items? And then the next step is, okay, given we now have the set of one million items that has been returned, how do we rank them? And so we've been...

...really focusing more on the generally at see we're more concerned with the with the ranking problem, because more often than not we already have enough items. So let's make sure we have the relevant items. But something that's really challenging here is Amazon about you know, maybe eighty percent of the time there is, you know, a clear thing you're looking for. You search vacuum cleaner. You probably want the best vacuum cleaner and maybe there's some things here that okay. Some people, you know, may want the best regardless surprise. Some people may be more price sensitive, but you know, or they're looking for Harry Potter and they want maybe they want the Harry Potter book, Maybe the movie, and then maybe after that like Harry Potter Merchandise. But you have pretty good idea of what they're looking for, versus if you search Harry Potter on Etsy, you know, we return hundreds of thousands of things and we have no idea what people are looking for. So some people could be looking for a Mug or poster or, you know, a scarf or just, you know, hundreds of thousands of things. And so how do we figure out what's going to be useful for people? And so one way we're trying to tackle that is let's have users help us. So make new search filters. We have something called guided search. So if you search one of these vague broad queries, we suggest, hey, maybe would you like to search instead Harry Potter Mug or Harry Potter scarf, and letting them help us figure out what kind of items they're looking for. Or offer and spir ration if, instead what they're looking for is, oh, I just kind of want to browse around. We can help with that too. That's awesome. Is there some sort of buy our on boarding? So we don't have buyer onboard now, but we are trying to make a more guided experience. So one thing we did for the first time this holiday season was offer a holiday gift guide. And so most likely if you go to the home page or maybe at the bottom of search page, you'll see things like spirits zipper or book club MVP, and if you click on those you'll find all these items.

And what I'm really excited about is we didn't do kind of the traditional gifts or him or gift for her or gifts for your grandma and said. We adopted these personas to really try to help people get inspiration and figure out what e's can offer. And we have editors picks modules where basically we do things like, you know, gifts for, you know, your mother under thirty dollars around Mother's Day. But I think that's definitely something we were working on. More is, how do we let buyers know what Etsy is, because sometimes people come and don't realize it's a market place and don't realize that, you know, they're buying directly from sellers and not from etsy. And so how do we build that understanding and excitement about etsy when people come for the first time? So you mentioned that you work as a data analyst. Could you speak more of the difference between being a data analyst and what data scientists do and how that involves human consumption? Sure, Um, so this is something that's, I think, really in flux across the industry is titles. Because what so, for example, we recently had someone move to spotify from our data analyst team and she has now the title data scientists, but has said, you know, I'm working on very similar things. And fact spotify and I believe facebook as well, recently made their data analyst change their title to data scientists. But here it at see what it means is that, as my boss wants, the data science team works for machine consumption and the data analyst for human consumption, and so the data science team is more isolated in the sense that the data analysts we really embed closely with product teams. So I work with search and another data analyst might work with marketing, another one with our seller team, but we all work with partner teams versus the data science team here really works a little bit more isolated on these specific modules and on things like our promoted listing. So this is where a seller says, okay, I'm willing, I would like on the search page. We...

...have certain rows that are dedicated to these promoted listings, and so sellers basically bid and say okay. Every time I get a click, I'm willing to pay twenty cents or thirty cents. So there's a very complicated model that figures out. Okay, every time you search for something, how do we fill those promoted listing rows? And so that's something the data science team works on. And so in that case what they're doing right there building models and maybe they're running their own experiments, but they very rarely delivering something to other people to understand the data more versus working as a data analyst team. That's usually our primary goal is helping people. How do we make decisions? We launch this experiment, we advise our partner team or, you know, how do we make this data more accessible for them, or how do we help them understand we should be working? You know, we have this great opportunity to change this page because we think it's not doing as well as it showed or this page has a high bounced rate. So sometimes it's proactive from us, we make these discovers and other time that comes as requests from our partner teams. That what they're interested in learning. It's now time to dive into a segment called data science toolbox with Mike Lee Williams, a research engineer at cloud era. Fast forward, labs, Hi, Mike, I here go. Let's talk today about machine learning interpretability. I can't wait. What does it mean for an algorithm or a machine learning model to be interpretable? So if you ask people who work on this, you'll get a different answer from every person. But the definition I like the most, just the most pragmatic, is an interpretable algorithm is one whose predictions you can explain. So what's an example of an interpretable algorithm? My favorite example, and it's not really machine learning, but I like it because it's so clear, is the up gas score that babies are assigned at birth. When that happens, nurses measure five things, things like skin color and pulse and so on, and they give...

...each of those things a score from zero to two. They add the five numbers together and get a number out of ten, and that number turns out to be predictive of the babies long term health chances. Obviously it's a heuristic. In the language of machine learning and data science, we could call this a linear model. It's a funny linear model because we've set every coefficient to be one, but a linear model it is, and we could make it more accurate by using machine learning to build a nonlinear model. We could do all sorts of things to make this model more accurate, but we would lose interpretability. And the fact that you can do this calculation, this APP Gascor, in your head, means that you, as the nurse, can quickly figure out what needs to change to improve the score. This less accurate but more interpretable approach saves likes precisely because it's a model whose predictions you can explain. So, in general, why is it important for Algorithms to be interpretable? So there are lots of reasons, but I want to focus on what I think is the most important. A model is predictions you can explain is one you can trust, and obviously that's good for business. If I can trust a model to be accurate, I feel warm and fuzzy. But there are many contexts where that trust is a regulatory or ethical requirement. A bad model is dangerous, it's discriminatory or otherwise unsafe. The concern is that a model might seem fine when you valid it against your held out training data, but the question is, is it getting those examples right for the right reasons? It really generalizing, or is it embedding dangerous misunderstandings about how the world works that you're only going to find out when you unleash this model on the real world. Is Much easy to spot that kind of problem, a model that is right for dangerously wrong reasons, if the model can offer an explanation of why it thinks it's right. Yeah, and isn't there a tradeoff between how well and Algorithm can perform and how interpretable it is? That's right,...

...you get nothing in this world for free. And the bad news is that generally the more accurate and model is, the less interpretabilities. So a nonlinear model like a neural network is super accurate and if you do feature engineering, if you know use your brain power to make it even better, it's going to be more accurate still. But both of those things make it harder to explain the behavior of the train model. So if you think back to that up Gar school we talked about when a nurse is reasoning about what changes would drive the score up or down, really what they're doing without realizing it is a very simple form of calculus. And Good luck doing any form of calculus if you've got something like a random forest with engineered features. Now that's not to say there's not a place for uninterpretable, very accurate models. It's a tradeoff that depends on where you're going to be using the model. Right. So what workers your team at Cloud Ara fast forward lab has been doing in this area? Our job really is to help our clients do machine learning better and we came at this problem with essentially that tradeoff in mind, that trade off between interpretability and accuracy, and with the question can you have it both ways? Can you have a model is accurate and interpretable? And the answer is maybe, kind of. So we found two ways. The first way is to start with an interpretable model and very carefully, in a very controlled way, give it some freedom to be a little bit more accurate. So an example of that if you think back again to the APP gut score, we have this linear model where the coefficients of one. What if I allowed the CO efficients to be integers still, but numbers other than one? We'd retain some of the interpretability but we'd significantly potentially increase the accuracy. That specific approach, by the way, has a real name. It's Super Sparse Linear integer modeling, if you want to look it up. The other approach we took, and I think this is a more generic approach of use in more situations, is to build a model however you like and then to apply techniques after the fact to try and interpret it. We used a tool called line,...

...which was created by Marker Rabirio the University of Washington and his collaborators. It takes a model and treats as a black box and tries to understand what's going on inside the Plat box. where it does that is, it's slightly perturbs the input features and makes a note of the effect of those perturbations on the output of the model. So if I change feature one by a small amountain it chained is the output, I've now learned something about the relationship between feature one of the output, and that's a kind of an explanation. We used it to build a proof of concept, prototype of a customer churn model. So customer churn is the rate of which you lose customers and it's great to be able to identify customers you're at risk of losing, but with a tool like line you can go further. You can say, not only am I going to lose this customer, but here are the things about that customer that the model finds concerning, which raises the possibility of not just understanding what's going to happen in the future but changing the future, which is really as a machine learning superpower. Thanks, Mike, for that great introduction to machine learning interpretability. Listeners. Will include more resources in the show notes. Time to get straight back into our chat with Emily Robinson. So when people hear the word experiment, a lot of the time they'll think of Baker's and pipe pets and lab coades. What are online experiments and how are they run? So online experimentation also goes by the name a be testing, and generally what it means is that you randomly assigned visitors to your website to one of two or maybe three or four experiences. So, for example, if you have a product person that thinks by button and read rather than white will make people buy more, you can test that with experimentation. So you randomly assign people to see the right or the white button and because of that random assignment, on average the only difference should be the button. So you can compare buying...

...rates. And this sounds pretty simple in theory, but it's actually pretty complicated and I gave a whole talk about it at a meet up in New York, which you can watch online. And the other thing is it's also newer. So this is only been possible in the last ten, maybe twenty years, and so you have a lot of papers now coming out about best practices around online experimentation. And finally, the reason we do this is people might think, okay, why don't just watch conversion rate over the past week, launch your new feature and then watch conversion rate again, and the idea being here is, you know, be a flat line and then it spikes up and then it's another flat line and great, your your new feature is better. Or maybe it just stays flat. Oh, your new feature failed. But you know conversion rate or the other metrics you're interested in, conversion rate being like what percentage of visitors buy, or maybe your interest is and clicks. Basically these are very noisy measures that change because of holidays or bugs or time of day or who knows what, and so you're really not able to look at a time series to tell the difference because generally the change you're going to make is pretty small. It's not changing click through right from sixty to seventy percent. It's changing it from sixty to sixty one percent, which is well within the normal variance. And so online experimentation really gives us a way to confidently say, okay, your chain made a difference or didn't make a difference. Are there any challenges with the random assignment part of this experiment? So we have our own inhouse system that takes care of it for us. And so what we do is you basically you have a name for your experiment and you hash that with with a cookie so or device ID. So generally that works pretty well. But where that that can run into problems is if you clear your cookies, you'll get a new assignment. But maybe you're the same person or cross device. So I am browsing on my you know, Etsy APP and then I go to my desktop, I may be in,...

...you know, version a and the APP and version be on my desktop, because those have two different cookies and device ideas that are randomly assigned, even though I'm one person and you know, I think maybe at the I don't know much on the engineering side, but I think it's again, it's a relatively straightforward system implement. We haven't running smoothly, but we're really selling. To remember always as we're randomly assigning, you know, kind of browsers or APP versions and not people. So I love the the example of written white buttons for explanatory purposes. Are there examples that experiments that you found more exciting that you could speak to or speak about? I don't approach too much into company strategy. So no, that's and the Nice thing is we can generally, well, I can't, I can. I can sort of share which experiments were a success and which ones were running. And one reason I can do that is well, one, anyone, could discover it themselves when an experiments running, like say, you open a bunch of incognito windows. But especially who discovers it? For us as our sellers, so we're always surprised. So, you know, we change a ranking algorithm, you know, or we change something for people and in Canada for you know, ten percent of the population and this Canadian seller will find out and we'll post on our forms about it. So we've also now started proactively communicating, you know, what experiments are going on so that our sellers can understand more. You know why there their stats or sales might be changing and why we're running these these different tests. So one interesting one, which was we ran a test on the bottom of our search page to add a module that had the recently viewed your recently viewed items. So if you'd have seen any items, you will then get like this, populated with, you know, if you've seen more than six, with six items and then to see more. And so I was, I was fairly confident before we started this. So we didn't know how many people reach the bottom of the page because we didn't have anything that...

...fired when that happened. And we can kind of estimate it by okay, if you click to the next page, we guess you know, you saw the bottom of the page, but there might be a lot of people who dropped off without clicking next. So that's probably an underestimation. So I then think it was going to be that impactful as also like, oh, how, you know, I don't know how helpful is it going to be to see the items you've already seen once again. But actually that was a very successful experiment. And so and what we did to influence there as we did start to have, you know, modules and and things firing when people reach the bottom of the page. And so that's something we've iterated on and now we're we tried different contents. So we're looking at having the holiday gift guide there or having an art pix for you or have a multiple modules not just recently viewed. So that was a really interesting one to work on. That's really cool. So something else before we we move on you. You mentioned that AB testing, online experimentation can sound simple in theory but it's actually quite complicated. So there are all types of challenges with respect to running online experiments and AB testing in general. You know, you can you can hack your pe values. There are all types of different techniques people use, whether it be Baijian AB testing or there are multiarm bandit reinforcement learning solutions for this type of stuff. There's traditional frequenters, statistical hypothesis testing. What do you do and what do you enjoy doing and where do you see this field going? I know I just throw a whole bunch of questions at you, but let's riff on that for a second. Sure. So we use sort of standard frequentist statistics and you know, part of why I thought this would be easy is, as I mentioned at see has our own inhouse system and it does things like. It calculates the metrics for us, so I can log on every day, you know, when our new data has come in and I can see how many people are in the experiment, how many what percentage Boughton a versus be it does the the statistical test. It tells you the P value. So okay, great, that sounds pretty straightforward. You know,...

...not doing multi arm banded or something like that, but even within that. So one thing you mentioned is is peeking, and so what's been exciting to me to work on is how can I help educate people and understand how to better run experiments. And one thing that complicates this a bit is that people usually want their experiments to work, of course, you know, both because they think it's going to work, since that's why they've they bothered to try this new thing, but also because you know that it then it credits their team with. Hey, you know, we got x amount of extra money for Etsy with our experiment because we increase conversion rate. And so one thing that's been challenging is for people without stats education, understanding why peeking is bad and what that does. And so one thing we've recently implemented was a system around this where we work together with the engineering team does the experimentation system to come up with a couple different ways of displaying data. So essentially, if you have eighty percent power, for one percent change in your metric, which is generally the standard we put for how long we run an experiment, we will either display no change, you know, p values high, or you know, x percent increase or decrease. But what we've also done is if you're not powered yet, we say way eating for data. But if you have, say, you know, we powers dependent on the you know, the percent change. So it may take ten days to run for a one percent change, but if you actually have a five percent change, it's only going to take three days. So how do you help people, you know, discover that massive change with all also having a peeking issue of if you check every day of p dip below, you know, zero five, that's going to happen much more often than five percent of the time. Just for those on high boats are testing UBI's out there, and correct me if I'm wrong. The idea of peaking is if you do check your experiment, check...

...the results every minute or every five minutes, you will occasionally just randomly see that why button is doing a lot, performing a lot better than the red just jud random fluctuations exactly, and so so I brother David Robinson actually did a post on this, specifically about as Bayesie and stop it. But he started out was saying, okay, what happens if you run an experiment twenty days? You know for twenty days there's no difference. You just stimulate this null experiment, but you check every day, you know and you stop if P dips will appoint zero five. And what he you know, what he shows is that this doesn't happen five percent of the time, which is sort of promised that. Okay, sometimes you're going to get called a false positive. It happens over twenty percent of the time, and so that's what you have to guard against, is this increased false positive rate if you're checking all the time. And so what we've done on is both education, but we now also if your P dips below point zero one, we actually do display the change, even if you're not powered from one percent change, and we're working to educate people. Okay, how do I know again, is this? Is this peeking, or is this a real change? So we talked about things like the confidence interval. So one thing we've started doing is visualizing those. So we'll have a little line which has a y access at zero and will show in a little rectangle the confidence interval, and so people can start to grasp okay, if it's really if it's a really big rectangle that just barely doesn't overlap with zero, you know, maybe not be too confident there, but if it's a tiny little rectangle that's, you know, way far from zero, okay, we probably have a real change going on there. That's awesome, because what that's doing is visualizing a quantifine uncertainty, right, which is part of our jobs as data scientist, statisticians and dat analysts exactly, and so that's something that, you know, we have to do is communicate that uncertainty, because often people will just want an answer, you know, they'll get like is this, is this change in a metric real,...

...and I'll sort of say like that statistics can't do that. We can't we can't ever say like something is real, like that's where we have to take into context, but we can have evidence for this is more or less likely, and so we've really been working on figure out ways to help other people like the product. Teams we work with understand that. So I've got one more question about experiments. That really just kind to mind. I don't even know if it's well formed, but I'm going to try. Try it it. Try it anyway. In Scientific Research, in laboratory research, there are governmental organizations that make sure that the ethics of experiments or inline with certain certain regulations is is there the possibility in the future this will happen to businesses? I don't know whether in your case, because what we're discussing is experiments with respect UX, for for example, or search results, maybe there is in that case. But if we're talking about facebook, for example, choosing to show US different t types of media or whatever whatever it is as a media outlet, do you think so? Okay, the question is, if we're running experiments and the general populace other lab rats, do we need to consider the ethics of it? Yes, and I think I think we do. And one thing you mentioned facebook. So there is a paper release, I think some years ago, but basically they tried out. Okay, if we change the sentiment of you know, they choose in the news feed. Obviously, usually most people have enough friends that you could populate this with. You can't show everything, or at least you can't show obviously everything in the first ten slots. So what do you show? And I believe they experimented on changing the sentiment of the the statuses that were shown on your news feed and found that that seemed to influence people's, the viewers, emotions. And there was some controversy there about, okay, what is the ethics, because, you know, they were showing it wasn't just like, okay, show some people happier stuff, they were showing people more sad or more depressing things and they also, you know, there was another paper that talked about people who were writing statuses and didn't post it and looking at looking at that,...

...and then people found that controversial because, you know, I didn't know, like I have, I like random thoughts and I never even posted it. But weight facebook has been recording this. It turns out that, I believe, no, they just sort of record whether or not you typed anything and how long it was. Not The words, but still, and I think you know, part of this is how it's done in medical and social science research is, you know, how strictly it's regulated and what it has to go through depends you basically fill out a form that says you know, could this cause harm? Could you know does this is this with vulnerable populations like children or Piss prison population? And depending on your answers of what population you're affecting and how, how much, how invasive or your intervention is, then that will be subject to different rules and I think it's an interesting question. I think that the idea is how you know, part of how you can regulate that, as people doing this research is in an academic institution, and the institutions as it themselves, because they have motivation, you know, financial motivation and maybe losing a license or whatnot. But how would you do this with private companies? How would you make sure people are running rogue experiments? But I think something, something a little similar, it's coming down the pipe is this general data protection regulation in Europe, which is has things like the right to be forgotten, the right to understand what data company has about you, and this is coming into force soon and it's really going to affect people because often how you run experiences based on the data you have about people. So it's going to change the way you can work. If you can't keep certain data or you have to delete data. So what if you have attrition from your experiment population, etc. So I do think it's definitely important to think about the ethics and I'm just wondering, you know, how that could...

...work in practice for regulating the experiments that a companies like facebook or Google that can really impact people's lives and health and, as you say, incentives a slightly misaligned in the sense that maximum transparency, ethically and and for the civilian our civilians and citizens would say an ideal place to go to. But we're talking about businesses who need to keep a lot of what they do proprietary in order to you know, they they need to run their business as well. HMM. Yeah, and the other idea here is that, you know, one thing that's stunted academic researches these risks. You also talk about the benefits. Right. So medical trial can be quite risky. You know, people can sometimes, I or get more sick. But the ideas it within some bounds, it's worth it if maybe the benefit is getting this drug that can save thousands of lives. And so I think the other thought is how, you know, in terms of business, is defending that you could say something like, well, we made the sad status is more extreme so that we can learn, and then we learned that that can be harmful, and so now we know that to pull it back more than we're doing. Or, you know, Google might experiment on, I don't know something what happens when you type in how to commit suicide. You know that that's a huge thing. There of seeing. Okay, maybe if you put a suicide hot line at the top, they're or resources, you know, versus and not, for example, I don't know, articles about how to do this. That could be hugely beneficial. But you know, the question is, how do you know that's effective without an experiment, because maybe could it be more harmful to just launch it and just just just roll this out without having any way to measure? Okay, did this have the the beneficial impact we hoped it did? Yeah, that's super interesting. So you you mentioned these ideas of data protection and I'm wondering what type of data to online businesses have access to that allow them to make substant of business decisions. Sure so, I would say so. Some businesses track and have access to, you know, kind...

...of all that you do, as long as you keep this the same cookie or you're logged into the same device. So that's how you find these these ads that follow you around the Internet. Right, you look at Ug boots once and there you go, you have boots, you know, popping up in the ads and your news articles for the next couple weeks. So at see, though, we focus. We sort of. I just want to say I'm actually current lie, getting a lot of ads for day to camp. Yeah, yeah, I see, I need to tell them. Yeah, you need yea, yeah, it's yeah, I'd sat times you'll find, I think. Yes, someone posted like Hadley Wickham, following him around the Internet and the little data camp course, advertisement pretty yeah. So I'd say, though, we just generally focus on I'm not even sure we have. Certainly on my team we don't use it. I don't think we really have. Are Interested in data of what people do offsite. So we do have, but we are interested in things like marketing team is very interested in okay, where do where do people come from? You know, how many people click our like, pinterest or facebook advertisements, you know, and we're interested in long term effects. So where can happen? Okay, someone who saw this, this module a couple weeks ago. You know, there's artis to you. Okay, maybe they didn't click them buy it right away, but maybe they bought it two weeks later, and so we want to keep tracking that online behavior on the website. So that's a that's really the data that we use, and that's partly because we don't on site. We don't have any offsite advertisements. So the promoted listings I talked about the only ads we have our items, you know, from our sellers, and sometimes you'll find, you know, okay, this the sock is promoted listening, but also it's surfacing organic, organically in our organic rankings two rows later. So we don't really have to concern ourselves with anything with. Okay, what what are our most affectmants outside adds that we can do? Now it's time for a segment called Stack overflow diaries with Dyeta camp curriculum. Late Cara, Woo,...

Hey Cara, Hey, Hugo, I've got a python question for you. Today. A user wants to know how to rename columns in a pandas data frame. In their specific case, the colums are named dollar sign, A, dollar sign, B, dollar sign, SE, etc. And they want to rename them A, B and C. that's a great question. That's something I do every day, drue to the plot of CSV files, among many other reasons. There are several ways to do this, though. Right yes, the first, which is the accepted answer, is to create a list of the new names and assign it to the columns attribute of the data frame. This works if you know exactly what column names you want and what order they should appear in. A second option, in a similar vein, is to use the pandas data frame method set access, which can also take a list of new column names and overwrite the old names. A third option is to use the rename method for pandas data frames. You can call rename with a dictionary that maps the old column names as keys to the new column names as values. You can also call rename with a function that will convert the column names into the form you want. So, for the example in this question, since all of the column names are in the same format and just need the first character removed, you could call rename with a function that drops the first character of each column name, or one that looks for the dollar sign symbol and replaces it with an empty string. So in the answers. We do find several ways to rename columns in a dyta frame. Right. Yes, and I really like this question in part because it shows how many different approaches one and can take to solving programming problems. But more than that, I also really like the specific answers here. The represent different levels of complexity, but also different levels of robustness to different situations. So can you unpack that slightly for us, Cara? What I mean by that is the first answer, where you're creating a list of new column names and assigning it to the columns. Attribute works just fine when you have a small number of columns and you know exactly what they are, but if you had to rename a hundred columns, it would be a huge pain. On the other hand, writing a function to convert the column names into the desired format may seem like more upfront effort, but it would save a lot of time and situations where you had many columns or where you don't know the exact column names in advance. Thanks, Cara. Those are all super useful ways...

...to rename columns, depending on your use case. See you next time. You Bet, Hugo. After that interlude, it's time to jump back into our chat with emily. So we spoke a bit about data ethics. I was wondering if there are any other challenges moving forward into the future of data science and big data that that you're interested in thinking about. Yeah, so I think. I think this is comes into force in a couple ways. So there's some sometimes it's with people's individual projects. So, for example, people of have scraped dating profiles and done things about that and even release the data without considering the impact that that could have. Then you also have things on a company levels. So with the diesel cheating scandal up Volkswagen, an engineer actually went to jail over that. And so the idea here being you can you can get in trouble for something you're asked to do and you need to push back. But that's a very clear case of this is illegal and most likely the engineer knew that when he was being asked it or she was being asked it. But there's many other cases. That is talked about. This this book I really like by Cafee and neal called weapons and math destruction with that is a great name. Yeah, yeah, WMD's and what these are are algorithms that are she classifies as destructive important, so affecting some critical outcome and secretive or unaccountable. And so one thing she talks about is teaching ratings. So this idea of a you know, these ratings of like, okay, how much impact did this? Did this teacher have? How good is this teacher? And one thing that people try to do with that is basically, okay, you predict, okay, these students in second grade. I predict in third grade based on their second grade scores, you know they'll get these scores and then you look at it in third grade and you say, actually, they scored a lot better, so this teacher must be good.

But there are some problems with this that she talks about in her book, where people's scores can fluctuate, you know, year to year, even though they don't, you know, these are experienced teachers who aren't changing anything. People get fired over this and can't hold anyone accountable, not even at school. Administrator just says, I don't know, this is the number that we got, you know. Or people this can be in recidivism. So predicting whether someone's going to commit a crime again and predicting this at their parole hearing and not understanding. Maybe, okay, it's taking into account zip code, which is correlated with race, which is what it's really using. So it's really using and saying, okay, it's really using races and input and ofttimes say, okay, if you're African American, you're going to be more likely to commit a crimes. Word that kind of prole you one of course it you know, you're not actually allowed to consider race directly, but it can come into these algorithms in these in these secretive and the secretive ways, and in that particular example there are a lot of things at play, but what we have one thing happening there is societal biases and human biases being actually fit into the algorithm and creating that type of algorithmic bias as well. Exactly. So sometimes people can say this will if it improves my prediction, like why can't I use it? And one problem there, of course, is okay, why would? Why would African Americans get arrested more? And that I ave answers. Maybe they commit more crimes, but actually it's at their police more so the neighborhoods. Maybe they go back to has a higher police president presence or they're arrested for crimes that they wouldn't arrest white people for, like maybe, you know, carrying a small amount of marijuana. And so those societal biases get magnified in these algorithms and again sometimes not, not knowingly, and so you can hide behind the algorithms say it's just the numbers, we're not racist, when really that's that's what is happening there and that's why you have to be very careful. A lot of people outside diosions might not appreciate how much of a social sport it actually is. What world is community playing in Diosins for you? So community, to me, is a huge...

...component of it. So the I as I mentioned, I took medicine. I learned python there. I'd already knew are an Undergrad as I was very fortunate to go to Rice University when Hadley Wickham was teaching and designed some of the courses, the statistics courses. So that was a I had Hadley and my introductory stats class and then took other classes from other teachers, but that had been designed by him and we're taught by as Grad students. So that was a really great foundation and the primary reason, Etsy, lets you work in our or python as an analyst, you know, whatever you're more effective in. And the primary reason I usually work in our rather than python is the our community. So it's very friendly and welcoming. It's also a bit smaller than the python community, as python is of course use. You know, part of the advantages is it's used for many things by many different people, whereas ours usually a little more academic and statistics rather than say, general, you know, programmers, and are also as things like the our ladies groups. So I'm on the New York City Board for that. And this is a group basically with chapters in more than forty cities, where there's like a bit of a global organization and the idea is to promote and welcome women into the our community by sting meetups which can be talks or tutorials or just casual get togethers. or we even have a book club in New York that we host every couple months where we come together to this lovely bookstore and you know, we've read book about about data and one of those was weapons and math destruction, and so we all get together and talk about that. So I've really enjoyed getting to be a part of the ourt community, which also is thriving on twitter and I highly recommend you check it out. I mean people there are very accessible. You'll ask you can ask a question with the Ursats Hashtag and you know, if it's a question about, say, dplier or, you know, the these tidy verse packages, Hadley himself will answer and for those listeners who maybe aren't and our Hadley Wickham is someone who's designed...

...many packages for the our community. There was an article written about him called the man who revolutionized are and is just an enormous presence and contributor but also so friendly to, you know, newcomers and people who doesn't know and will often jump in and and talk to you know, people were just asking the world this our stats question. If he can answer it, he'll answer it back on twitter. We've discussed about the sense of community in in in dictosiens and in the our community. Can you speak to the challenges concerning diversity in in dictosiens? Sure so. I think that. You know, What's interesting here is that statistics is almost fifty men and women. By data science isn't, and a concern of mine is that there's a growing gap between, as I discussed earlier, these these type A, so stats and analysis, data scientists, and the type being machine learning. Machine learning is much more mail dominated and it's also becoming more prestigious and better paid than this more type a work. And something that's interesting is a history of programming. You see this. This happened in front end development. Basically, when front end development started becoming a thing rather than say, full stack, more women went there and it became less prestigious and lower pay. And historically and occupation, you see that when women enter an occupation wages go down. So I it does, you know, and also my own interests line type A, and I'm just a little concerned. Are we going to start? Even though the data science community in certain parts, like you know, in ore there's a meet up, not meet up, a conference that I'm talking at in Austin called day to day Texas. That has our track and it's I think almost there are right around fifty percent women, which is which is awesome, but I you...

...know, you look at something like machine learning heavy conference and you don't sometimes you see all male panels. are nothing like that, and so that that is a little concerning to me and trying to figure out, okay, how can we how can we keep diversity in all different parts of data science and also, I think I often focus on gender diversity, but making sure to expand that and think about, you know, racial diversity and other types, you know, background diversity, so not just people who have PhDs or master's degrees or bachelor's degrees from prestigious institutions, but maybe some adults coming from other fields. Age Diversity, not just people in their S or s the people in their s or people with kids. So I do think that's something that's always going to take work. But I'm also really excited to see how many really good allies we have in the community and how interested people are and working on this. So, for example, Hadley Wickham made a commitment. So he speaks a lot and he committed in two thousand and eighteen two only speak the only meet ups he's going to speak out is our ladies meetups, which is just, I think, a great commitment. So he's still doing conferences, but these are the meetups he's doing, which I think is really exciting. What type of steps as a community can wait type to increase diversity and be more welcoming. Sure. So I think one thing is definitely thinking about amplifying. So this is something I'm trying to work on. So if you are, probably most of us are at ally in some way, you know, as in like maybe you know as a white woman, I can be an ally to women of color, and thinking about, okay, how can I amplify, you know, as my reach grows and my reputation grows, how can I work to showcase people who might not, you know, be having the opportunities they should be because they're they're assumed not to be as technical or, you know, they have these other challenges? So I think that's...

...one thing you can you can definitely focus, focus on. And I think the other thing is I really liked a quote I heard which there can definitely be some push back into this idea of privilege, being like hey, you know, if you're someone you can be said, Oh, you're you know your privilege. You can take that as what do you mean, like, I've worked hard, you know, I haven't had it easy. And perhaps you're not privileged in some way. Right. Maybe you're a gay white man and you've had a challenges because of that. And I think the idea is it's not meant. You shouldn't. Feeling guilty or shame is not going to help anyone. But use your privilege. So if you maybe with income, donate two groups that have that, like black girl's code or these other things that promote women of color going into tech. There's this great group I just found about hack the hood, where people and minority communities learn tech skills by working with small businesses and helping design websites for these small businesses. So, whether it's through money or time and mentoring, do into toils. Working on the hiring at your company, trying to ensure that, hey, are we only hiring white men? Because don't just blame the pipeline. Okay, maybe most your applicants are white men. So what that means is not to throw up your hands, but think about where we advertising. Is a problem that we're getting most of our applicants to our network and our networks are mostly white men, or you know, and not taking advantage of these women in tech communities where we can post jobs? Or is it that, you know, we're not showing in our job description that will be a welcoming environment and we're using terms like like rock star or you know we have. You know we work eighty hours a week. That are a turnoff for people who, you know, aren't young, mostly, you know, males who are in their twenties. You can just dedicate their whole lives their jobs, even though that's probably not necessary. We've discussed a lot of interesting techniques and methodologies in dictoscience that analysis statistics. What what are some of the techniques of one of the...

...techniques that you enjoy using the most? I really enjoy the tidy verse, which we've talked about, and so you know, something interesting here is that Jenny Brian, who is working at our studio on leave from UBC as a professor, she had a really great interview recently where she closed with in machine learning. It seems like everyone wants to make a contribution there and I'm like you go for it, I'm going to be over here getting data out of excel spreadsheets. So that's also and what I mean. But you know, it may not be what's what's quote unquote, hot right now, but it's important, like to do these things that help improve your data analysis work clothes and thinking and tooling and what I love about the tidy versus it just makes these everyday data analysis tasks easier. And so, yeah, it's that Auschool maybe is saying like, Oh, I'm doing a deep convolutional neural network, but ninety percent of the time that may not actually be what's going to be useful for the for the person you're helping make a business decision or make a product. So I think you know that's honestly, what's very exciting to me is just how do we empower people to make better decisions, to have a better data analysis workflow? That's awesome. So, with all that having been said, do you have a final coal of action to to our leasteness? Yeah, and I think this goes back to what we talked about community. So I would say is, you know, try to contribute that to that community. So, if you're a more advanced or experience, be kind to beginners. You know, don't don't, don't shame people who, you know, use a for loop when you think they should be using apply functions. You can definitely share with them. Hey, you know, that's this might be a cool thing for you to learn. Did you know you could do it like this? But it's not productive to just be like Ah, you're bad. This is terrible, and try to give back to the community and connects. I think data science is much more fun when you have peers and mentors and friends. I've met so many people through the our ladies group in New York, through Jared Lander's New York open city school programming meet up that...

...happens once once a month, through twitter and you know, and you can give back and contribute to this, whether it's through writing a package or sharing packages or writing blog post tutorials. So you know, find find your community and try to try to contribute to it if you can. Emily, it's been such a pleasure talking to you and thank you so much for coming on the show. Thank you. This has been great. Thanks for joining our conversation with emily about data science and online experimentation at et se. We saw the challenges faced then online experimentation at etc, and the types of questions data science can solve for the business, from ranking algorithms to you I changes and the introduction of new features. We also dove into the role of community and the current diversity challenge faced by the field, along with concrete ways to deal with it, such as amplifying and being aware of our own personal and societal biases. Make sure to check out our next episode, a conversation with Roger Penn, professor in the Department of Biostatistics at Johns Hopkins, Co Director of the Johns Hopkins Data Science lab and the Corsera data science specialization, and well season podcast on not so standard deviations and the effort report. I'm your host, Hugo bound Anderson. You can follow me on twitter at Hugo bound and data camp at data camp. You can find all our episodes and show notes at data Campcom community slash podcast.

In-Stream Audio Search

NEW

Search across all episodes within this podcast

Episodes (121)