DataFramed
DataFramed

Episode · 6 months ago

#95 How to Build a Data Science Team from Scratch

ABOUT THIS EPISODE

While leading a mature data science function is a challenge in its own right, building one from scratch at an organization can be just as, if not even more, difficult. As a data leader, you need to balance short-term goals with a long-term vision, translate technical expertise into business value, and develop strong communication skills and an internalized understanding of a business's values and goals in order to earn trust with key stakeholders and build the right team.

Elettra Damaggio is no stranger to this process. Elettra is the Director for Global Data Science at StoneX, an institutional-grade financial services network that connects clients to the global markets ecosystem. Elettra has over 10 years of experience in machine learning, AI, and various roles within digital transformation and digital business growth.

In this episode, she shares how data leaders can balance short-term wins with long-term goals, how to earn trust with stakeholders, major challenges when launching a data science function, and advice she has for new and aspiring data practitioners.

You're listening to data framed, a podcast by data camp. In this show you'll hear all the latest trends and insights in data science. Whether you're just getting started in your data career or you're a data leader looking to scale data driven decisions in your organization, join us for in depth discussions with data and analytics leaders at the forefront of the data revolution. Let's dive right in. Hello everyone, this is a dull data science evangelists and educator at data camp. As data science becomes more and more integral to the success of organizations. Now more than ever, organizations of all sorts and sizes are building data science functions to make the most of the data that they generate. However, I think, given all the data framed episodes we've covered thus far this year, it is definitely no easy feat to launch a data science function from scratch. So I'm excited to have Electra di Maggio on today's podcast. Electra is the director of data science at Stone X. She has been deeply embedded in the data and digital transformation space and financial services and played a crucial role in launching the data science function at Stone X. Throughout the episode, we talked about the main challenges associated with launching a data science function, how data leaders can prioritize the roadmap between low hanging fruit and long term vision, how to earn trust with stakeholders within the organization as a data leader, use cases she's worked on, advice she has for aspiring practitioners and much more. If you enjoyed this episode, make sure to rate, subscribe and comment, but only if you liked to. Now let's dive rent in, Eleta. Great to have you on the show. Thanks, guys for having me. I'm excited to talk to you about your work leading data science at Stone X, Best Practices for launching a data science function from scratch, how to manage short term objectives and long term priorities and more. But before can you give us a bit of a background about yourself? Yeah, sure, so I started like studying computer science a long time ago. So I graduated, I had my bachelor of science and Master of science in Computer Science and then during my master and major DN AI and databases and I graduated in two thousand on nine. So yeah, a long time ago, and at that time I had to say that the science wasn't yet a thing, although I went through all the neural network, vision and opy type of projects that you might imagine. So I started to work in consultancy and then after a while I get bored of that, so I want to fellowship in Paris and got my m B a. It was really interesting to get a business I was a business educational background as well. It actually was very useful to me to learn a lot about how company works and what's behind the product or the service that a company actually offers. And after that I went back to Italy to work in Gardner Consulting. So I gaining consultancy was just that. It was a little bit of a curse on me at the time. But then I moved finally in B MP, part by and and so as in the client side, as in consultancy, they used to Saye and financial institution, retail mostly so retail banking. First, BMP, party by and digital transformation, than HSBC, and then finally I moved into gay a, Ka stone x. Now I have been acquiring to thousand twenty and then rebranded into stone x and I actually transition from more immeral retail banking type of service to trading instead. Trading Services, as you might know or not know, a stone x owns the two brands in UK and worldwide, forest dot comm and city index. Yeah, provides trading...

...services to people. That's really great and I want to set the stage for today's conversation. You let the data science team at stone x. You let the data science function as well at Stone X. It's always interesting to talk to someone who played a key role in launching a data team or a practice within an organization because I think there's many growing pain stories that are often missed by practitioner data scientists who join a relatively mature data team within an organization. So what are the key ingredients of launching a data science team or function within an organization? So when I started in stone x at this time it was still gain. It was two thousand nineteen. I started as principal analyst and the hope for my boss was for me to start the conversation about using data science and machine learning within the company that wasn't really using anything of this type of application. And I started with two analysts under me and now I have nine. So it was quite a journey in terms of I have to say I was I was very happy to see that the organization was ready to use the data in a certain way, and this is one of the key points. So data needs to be ready to consume. If you don't have that, you could definitely not start data science very quickly because first of all you need good data, and I was lucky enough that all the other people and in the other teams, in the enterprise data system teams and all of these people spend a lot of time and effort to set up a good data set, a good back end from a data perspective. That helped very much. This is definitely key, and I understand that sometimes in huge organization where you have the so called data swamp issue where a lot of people just dump their data in the cloud and then they say, okay, now do something with that, that is really one of the biggest pain point to a data science practice. So that said, when you start the data science practice, the first thing that you need to understand is what can be your scope of action, and your scope of action is directly linked to the quality of data that you have within that scope. So in my case, I think the good, the secret ingredient of the recipe was to understand, okay, where can I bring value based on what is ready to be used. So not start from you cannot do top down, because if you do top down and you say, Oh, you know what, we should have a machine learning algorithm that, do you know, x, Y Z, and then you just you fold this requirement into the text and you understand when you go to the text pact that wow, to do this we would need to basically to work on all these data sets and if all these data sets are in a huge mess, you will just spend months and months, if not years, to fix things. So in this case you need to be smart and understand, okay, where I can drive the most value with what I have. It's like you open your fridge and you have, I don't know, eggs and you have maybe avocado and you have something. Okay, what I can do with this, instead of taking the recipe book and say, you know what, it would be great to do a carrot cake and then you know, he's may have anything to do it. Okay, that's basically the same thing. You just start with what you have and try to and I would say I know it doesn't sound maybe very fancy from a data perspective, but a lot of things, usually from a business perspective that brings value right away, is good data or linked to data or integrated data view. That's really awesome.

And what I really would like to ascertain from some of your answers here is the challenges related to launching a new data science function. So you definitely mentioned the technical challenges of data quality. What are other categories or challenges associated with building a new data science function from scratch? Definitely the talent recruitment and also, I was to understand the text, text that you want to work on. The way we did that was to incrementally find our use cases that we know would, or we we were fairly sure that would, provide value to the business, try to deliver a pilot of those and then just get more money from the company, more investment. It wasn't a that's all the money go ahead. We had to earn all our tiny stuffs and we were fine with that, because a big bank approach might not be the best because the point on machine learning, I believe, is that it's very much experiment driven. You need to understand everything that you can work with. You need to run all your experiments, you need to understand and the more you learn and the more you have an understanding of how many people you need, what type of tech you need. Maybe someone knows, but personally, if you just join a company and you don't know anything about, or you don't know anything yet about, the statues of the data, the status of the organization, the status of the business per se. I mean in my case, even though I was in financial institution, I was coming from retail banking, so trading was a new thing for me. Right Act to learn a new type of service. So if it's a new type of service, a new industry, maybe not an industry but a new area of the industry, you need to get an understanding of that as well. So data, organization, end business. Before you have this understanding really deep inside your head, it's very hard to say I need these people, I need this stack, I need this capacity. So the way you need to do that, in my opinion, is that you need to learn and readjust. It's lean startup type of thinking. You just start with a pilot with your M v P and then you work on it and you just involve an add on top and understand if you're still in the right track, if you're doing something that is useful for the business. Or not, and you constantly readjust and you add on top or you just like fine tune. So this is definitely the way I am doing it and the way I would suggest someone else to do it. And the challenges in doing this is definitely find the right people, and not just in your team, but also it's really key for a data science team to have a very good dev team, an architecture like team that can support you with suggesting the right tools for your need, suggesting the right architecture, suggesting everything that you need, for example, to process stream of data. There's so many aspects in delivering a data science product that is really hard for one person to know everything of everything. So you need to make sure you have good people advising you in all the steps that you are not an expert on. That's really great and it's harp on the organizational challenge, whether it's building out your own team. Can you walk me through in more detail how do you earn trust as a new data leader within an organization when working with different stakeholders, such as the deaf teams, such as the business stakeholders? Right, and that's the first set of questions. But the second set of questions here as well would be how do you build out a team, knowing that it's still early out in the juncture and that you want to be relatively disciplined and the type of resources and the amount of resources you add to a new team while maintaining the fact that you're adding value you but you also want...

...to make sure that you have the best hiers. So what is the type of profile you look for in an early data team? As I said, those success stories that you can drive in your first seeks to twelve months. Those are keys for you to build your trust. If you can deliver a success stories, say within your first year in the business, and it could be something that people can say, Oh, you know what, who is that person delivered that and they can associate you to a certain type of deliverable. So you start to build this type of trust by actually have direct content and being, how they say, lead by competence. So make sure that everyone had things associate your name to something that works, and that is definitely step one, and then you start from that and I would say if you can secure that, it will be all good. Everything will fall a lot smoothlier instead compared to something that just barge in and say, Oh, we should do this, we should do that and so far and so on. And on the second point instead about what type of people hiring and and your early data team. So, because the text stack was very simple at the beginning, like very, very simple, because we were building the practice, let's say, and that is also related to the iterititative approach, if you start with a very complex text stack, you know, very compassed, a full tax stack, from your cloud, your machine learning and opps platform, your data engineering, e t l, and all the works, okay, all the works that you have GCP or a Wah or Asia cloud and you have on top of your email. Of course you need people that are skilled on all this tech to deliver something so automatically. You will need more people because you will need you cannot have someone that knows everything about all this tech. If you start with an easier text deck. We started with Python, having a server that was running our python script to test them, and then we, let's say, partnered with other deaf team to deliver some models in production. So we didn't do the delivery in production, but we handed that over to other deaf team that had other texts at so, with that in mind, the type of people like hard in the first place, where I would say a data scientist that had a little bit of coding, if not coding experience, just coding appetite. So they didn't mind setting up python scripts that we're just getting data from API, scripting, website or whatever to get the data that they need to have to develop their machine learning models or to just test and experiment the machine learning models that we had in mind. And once we developed this couple too, I would say a couple of success stories, we finally started to have our own development platform. We have been and completely included in the devops process because when I started, the analytics team wasn't considered part of the develops. It was an old school excel B I type of team and that was all. It was reporting. Most most time it was just reporting, but of course there was an appetite to evolved that. So we started with that. The in they said, you know what, guys, you are developing software. It's good that you are included in our devlops. So we started to be included in the develops. So we had some training. I already knew a little bit of Git and bit bucket or Gitab or whatever we switched repositories in between, but the other guys were so type of people eager to learn. Definitely, they need to have a solid foundation from a statistical and mathematical perspective, but they need to have that,...

I would say, that that appetite to develop things, to not just analyze things but to really develop something that is a product. So it's more it's more an aptitude type of thing that you need to associate to a strong quantitative background. That was the type of people that that I was hiring at the beginning. That's really great. And how has your hiring practices or what you look for evolved as the team grew and it became more established and it provided R O I? So now that the team is a little bit more established, the way I set up my team is that I have guys that are more focused on the data engineering and machine learning, actually engineering things as we are setting up, finally, our mlops Tex Stack. So I don't know if this is like a very mean differentiation, but the way I see this is that you have there are like people that are driven to write what someone might call production code. They're like other people that are more driven to analyzed experiments and see things like what? How I see the data scientists at the moment in my team is very much and R and d function. So it's a person that needs to have a business acumens, so needs to know about the business or needs to be able to understand the business, so has a strong commercial, organizational and business understanding and, of course, has that statistical and machine learning knowledge so that can, you know, just join the dots and say, Oh, you know what, I can use the data to solve this problem. But once, I would say the data scientists molded the infinite space of solution and caged it in a little bit more manageable space, that thing is passed on the machine learning engineering and the engineering function that will industrialize and set up the pipelines and everything that needs to be done in order to operationalize and make of that mold a product that is reliable, sustainable and and and reusable within the business. On top of these two group of people that I have in my team, I also have a be a that supports me and the way a B and which I think is really useful, because be a is that type of person at first of all, has a constant relationship with different stakeholders and that our customers of their products and can gather requirements and have a conversation with the data scientists or the machine learning engineer to say, you know what, maybe we should do something to either change the product in this way, an existing product, or maybe design something new that would include, that would solve this type of issue. And also is the person that really helps you embedding the product within the business, you know, training business stakeholders, talk with them, maybe guide them at the beginning on how to use and how to interpret data and how to interpret the model workings, because one of the things that when you develop a machine learning model is that it's very hard to explain these two known data people. So you need have that person has that constant relationship with himself. He can or she can like wrap that up in a way that is understandable and so that you can have sponsors outside your team. That's key. You always need to have sponsors outside your team. I love that answer and I love how you create the delineation within the data science team from a more research and development type, small mini data team that transitions its outputs some more applied engineering team that industrializes the work a lot of data scientists do. But Harping on that last note here, when it comes to the business allie role and when creating a relationship with other...

...stakeholders and gathering requirements and feedback, oftentimes, when talking to data leaders, a big obstacle they face when it comes to providing value with data science analytics is data culture or analytics mindset or lack there off within the organization. I'd love to understand from you how did you approach conversations with the remainder of the stakeholders within the organization that may or may not have analytics mindset or a data culture or understand the value of data science, and how you were able to maneuver these obstacles, whether through the use of a B A or within your own team, and how you approach these conversations. So, first of all, this has nothing to do with your data skills. That then, just putting this as a like a disclaimer on top. This is all about your, I would say, political skills or relationship skills. So, as I said, it is very key for you to start understanding where you can find your sponsors. So first of all, you need to have conversation. For example, in our case, our company is organized in commercial leaders and we have global teams as well and commercial leaders. Of course. You have commercial leaders of the biggest regions commercial leaders of maybe smaller regions, and you need to gather and understanding on who who has the most driving role within the community of executives. And I'm sure if there is a data science team in the company you will be able to find your sponsors from the one, the ones that are really keen to get involved into that. It might be easier or harder in some cases. So first thing, try to understand what are your easier sponsors, the ones that maybe they're the keenest in sponsoring you, but there might be still on the lookout because you haven't delivered anything yet. I'm interested in data. I would like data science. So try to understand what are their key requirements and, as I said, I remember when I started I was like this is a little bit of your jediatric so you don't want that, you want this. So when you have a conversation with them and you know what you can deliver, you need to in a clever way, sell something that is useful for them but you can deliver in in a reasonable amount of time. So you're trying to drive them to that type of solution, and this is your personal negotiating skills, let us say like that. Once you have secured your good sponsor, your big sponsor, with that, you just wear them one by one. That makes sense. And this, I know this might seem okay, but what happens when I deliver the model? I have to explain that to them. This is not related, right? It's actually very related, because each you know that they in their heart, they are already sponsoring you. The day you are going to them explaining they will have a different attitude listening to you. So you will have your chance to explain it to them, and I would say don't never be condescending, never being the lecturer there. Always try to you know what. I delivered this because the main goal for this is to provide this additional benefit for you. I'm using this. Do you want me to go through the details of the model? I can. Most of the Times, I have to say that we're interested in knowing the performances. So whatever type of performance metrics you want to use, try to say before the business takeholders, the ones that are mostly understandable all the performancekpis that used to understand if the model is sustainable, if the model is robust? Let me just save it for the annex. But at the end of the commercial stakeholders want to know how often...

...this works and if it doesn't work, what is the risk? So, for example, we had a turned prediction model that we started to share. It was our first ex G. boost random forest, actual real machine learning type of model, and we try just to we went through that just explaining the features and we explained the confusion metrics to the commercial leader and that was already too much because it's a new thing for them and the way we were talking to them about that was the model on average of the time predicts correct but what the mistakes? We we worked in a way that we are over predicting churners because at the end of the day we slightly over predictioners. This is why we don't have higher like performances because at the end of the day it doesn't cost towards a lot sending another email or calling another person that is at risk of churn. It might cost more loosing someone that we're not calling and wrapping in this way. It was very understandable for them and they were really happy with that. It required multiple explanent like multiple times to go through, but after that you just build trust and it's easier and easier because they just start trusting you and they say, okay, you know, I don't have a full understanding, but if you say it's working, it's fine, we'll see, we'll review it after a couple of months that we have this running. So this is the type of, I would say, massage that you have to do at the beginning and you need to be patient and not to rush or aggressive. Definitely not aggressive. That's really awesome and I think at the crux of a lot of the different answers that you've been so far. I think a key central tenant when it comes to succeeding and launching a data team is managing both the short term priorities and the short term wins that you can get, but, as was making sure that you're working towards a long term vision, so there's always a north star where we want to be in the long term and quarterly. Okay, ours and objectives that guide the short term objectives for a data team. Can you walk me through the process of prioritization between these two objectives. I have to say it's not something that you do alone, especially if you're joining a new business. The first thing that you want to do is also have a talk to the people that have been long time in the business so they can share with you what I've been twenty years in the business or fifteen years in the business, and I've think one of the things that really would disrupt us will be a way to predict that, to understand that, and then it's okay, wow, and anover underestimate the fact that if the guy has been there for fifteen years and they didn't manage to do that, it doesn't mean that because you're a data signed in one year, you're going to do that, just because you have machine learning or whatever. It's probably harder than that. So you just put that and you gather all of this thoughts and you understand. Okay, you know what, so let's define a roadmap to go there. So, for example, one of the things that we gathered from our I would say key internal speakers are applications that we can apply to online stream of trades and transaction and, of course, being able to apply machine learning model on an online stream of data reach something that requires a text tact that we're building towards that. But if we started to do that from day one, we wouldn't have delivered anything valuable or just be cost for the business and will probably still be working on it after three years because it requires time to do that. So you have that, and so that's your top down checklist, if you wish, and this allows you to understand what is the roadmap. So what we have...

...now? Okay, now I have my desktop, a sequel data warehouse and Excel, because that's how we started. And I need to go where machine learning, online streaming, set up this. What do I need to do that? And you can do it yourself. I would always advise to talk with other people as well on the architecture side and gather like their view, because I'm sure other people would have taught about it as well. And you start defining your road map and milestones. We would need to have at least an orchestrator like airflow to run our scripts and prison and all of these things. We would need to have a develops process, and that is step one and then you go, okay, you know what we will need to have, probably a cloud based approach, to run our machine learning not on our desktop, on a cloud as compute. That is scalable and we don't need to leave up our laptop two run over nine to train models. We will have something on the cloud to do that and have some platform to connect to different data sources like, for example, I don't know, data bricks or this type of asure cloud and all of this platform. And then to actually get stream of data you will need something like Kafka and then you start using pises park and all of these things. So you have this plan and this is your your vision planning. That is always easy from a certain perspective. You just you plan and you say, okay, what do I need? I need all of these things, your grocery list. On the other side, you have short term and short term. As I said before, you need to start with what you have. So what do I have this? What can I do with this? And what is priority for the business? Priority you get from your sponsors or commercial stakeholders. So you get the priorities from that, from the business. When I joined I got two priorities. We need to understand how much we're spending on acquisition marketing and how much we're getting from that spend because at the moment we have no idea. So that was one priority and on the other side, like we don't know how we're targeting our customers. We need a way to to segments our customers and define the journey is based on our segments. So very much a position focus and I have to say having an M B A or whatever like business course or marketing mores that you can have, really helped me there because I knew, I know how a marketer would think about these things, defining person and as defining the user journey is defining all of these things. This is an knowledge that I got from, but my M B A and also my previous job in in retail banking, because I used to work in the US team as a constitutent B A. I was analyzing data and defining journey with the user experience designers. So I knew how much that acquisition artifacts were important for designers and for marketing in general. So time to that I was able to capture that. But I have to say they were very vocal that they had this issue. I said, okay, well, can we do that? And I had a look at our data where use and as I said at the beginning, it was really key. I was very lucky to have and neat data ware House, even if it was just our back hand, back end on premises data ware House, which just our own boarding system and customer activity. I was very lucky to have a very neat data set to start work with. Of course, there were some glitches in the process, but nothing too messy, so that that was key of the first successes that we had. So that's how I started to prioritize like more short term goals. That's really great. And if you want to abstract this out and propose a framework that can enable other data leaders extract small winds as well as low hanging fruit that demonstrate early value for a data team, how would you go about that?...

So the way we go about that? I will start with the data. Don't do that alone. Start with your business stakeholders and ask them what are the data sets that you use in your day by day job and how do you use them? Because if how they use them, you can understand. You know what you could automate that. I could do something that will help you in using that data in a more efficient way. And by going this way you're able to first of all, understand right away what are the data sources involved in the process and have a look if the data sources are usable. And second you have your use case and even if it's not the fanciest use case, you can start delivering something very quickly because you have a workable data source too to start with, and by doing that you start building your sponsors. And once you start building your sponsors, even if it's like with tiny deliverables, you can start building up things. On the other side, I would say, based on how messy is the data situation in the company, you can start involve other teams and raise awareness. If it's not they're ready, maybe it's already their awareness, but raise awareness and investment of time and resources on fixing the data so that the data will enable you producing something that is more of higher value. This is the way I would do that, as I said, as a very entrepreneurial, lean startup type of approach, M v P first and then you just build up your way to the top. I couldn't agree more that, by its action and having that lean approach, is super useful for a lot of data teams. Now, as we end up our episode, elector and you remiss not to talk a bit about your work at Stone Xs, especially on the data science use cases. I provide value and financial services. With the recent war in Ukraine, covid and use, supply chain issues, economic uncertainty, I think it's ever been more important from data scientists perspective. Don't understand the role data science plays and commodities training for exchange training and more so I'd love to understand some of the ways data science has been providing value in the industry. I have to say we haven't been requested and in our case the overall international situation didn't affect too much directly. Of course, we know they're like some people that have been sanctioned, so accounts have been blocked. Stone X didn't have a huge impact on this point, so we've been lucky. But as a trading company we of course experience a lot of volatility in the market and that made our business very active from a certain perspective. In terms of how data team in our case we haven't been involved too much, apart from making sure that what we were seeing in our system wasn't affecting other processes in the business. But in terms of doing anything, we haven't done anything, also, because when you have this I would say, delicate situation, it's left to human handling this, because you never know, if you automate things, you are prone to, I would say, embarrassing mistakes, and this is something that, of course, no company once because of Latin me, because we have a manageable volume of customers, a manageable volume of accounts. The data team wasn't really involved in doing anything specifically. What are some of the main use cases you've been working on as a day leader at stone x? So we had definitely a lot of things related to marketing, so segmentation, attribution modeling, Churn Prediction, lifetime value prediction. Last year we add our first N OP application to classify a customer. Communications. At the moment we're also working on client sentiment in trading and definitely one of the things that we would like to...

...work on, as I said before, is online streaming of data, but I don't have yet workable use cases too to share it. We need to build the grounds to do that. That's awesome. So, Elea, as we close up our episode, I love to look at any future trends and innovations that you're particularly excited about. In the future. At the moment I feel that we are achieving a sort of data science has been a very wild and cat type of area. There was a lot of buys, not many companies achieved to tangled the data science practice. So at the moment the focus that I have is to try and industrialize the approach and make the data science practice like solids. So the type of, for example, tech that we're looking around is definitely MLOPS and pipeline tech. In terms of like pure innovation and machine learning, honestly, there's nothing purely innovative that we're looking for. We have so much ground to recover and to work on it before we do something like more innovative. But especially for marketing, there is a lot of innovation in terms of combining multiple models, so and smbling, for example, but also combining multiple models to dynamically select advertisements. This is something that is our mind, is in our mind and we will definitely do that. So, using internal and external data to understand what our the trends, what are the things are actually grasping people's mind at the moment, and dynamically select the content of your advertisements, serving them at the right time to the right person that that is definitely something that is becoming machine learning heavy, especially with all the cookie policies that are becoming more and more strict. So this is definitely something that is in my mind. I don't know when I will be able to implement that, but this is definitely one, say my mind. That's awesome. Finally, Electra, as we close up, do you have any call to action before wrap up today? I would say just it takes patience and hard work. So if you're not ready to have patience and do you know your your hours to to get your success stories, do something else. But it gives you a lot of satisfaction. But it definitely gives you a lot of satisfaction in it's worth it's worth your while, but it's a hard way to the top if you want to rock and roll, as they say. Thank you so much, Electra, for coming on the PODCAST. Thank you for having me. Thank you. You've been listening to data framed, a podcast by data camp. Keep connected with us by subscribing to the show in your favorite podcast player. Please give us a rating, leave a comment and share episodes you love. That helps us keep delivering insights into you all things data. Thanks for listening. Until next time, H.

In-Stream Audio Search

NEW

Search across all episodes within this podcast

Episodes (121)