DataFramed
DataFramed

Episode · 2 months ago

#114 How Chelsea FC Uses Analytics to Drive Matchday Success

ABOUT THIS EPISODE

Data Analytics has played a major role in Chelsea’s journey to becoming the seventh most valuable football club in the world, Chelsea has won six league titles, eight FA Cups, five League Cups, and two Champions League titles.

Today, we are going behind the scenes at Chelsea FC to see how they use data analytics to analyze matches, inform tactical decision-making, and drive matchday success in one of the world’s top football leagues, just in time for the 2022 FIFA World Cup in Qatar!

Federico Bettuzzi is a Data Scientist at Chelsea FC. As a specialist in match analytics, Federico works with Chelsea’s first team to inform tactical decision making during matches. Federico joins the show to break down how he gathers and synthesizes data, how they develop match analyses for tactical reviews, how managers prioritize data analytics differently, how to balance long-term and short-term projects, and much more.

You're listening to Data Framed, a podcast by Data Camp. In this show, you'll hear all the latest trends and insights in data science. Whether you're just getting started in your data career or you're a data leader looking to scale data driven decisions in your organization. Join us for in depth discussions with data and analytics leaders at the forefront of the data revolution. Let's dive right in. Welcome to Data Framed. This is Richie and since the fief of World Cup has just begun in Qatar, today we're talking about soccer analytics. I have to say I'm getting pretty stoked for the tradition of watching England do quite well then get knocked out on penalties. Joining me today is Federico Batuzzi, a data scientist at Chelsea FC. Chelsea, I'm sure needs little introduction as one of the top teams in the Premiership and Federico is a match analytics specialist who works with Chelsea's first team to help with tactical decisions during the games. He's going to be our inside man to tell us how analytics works behind the scenes at a soccer club. Now, before we get started, a quick note for my American friends. So while I normally try and speak American English on this podcast, today, we're going to be using the terms soccer and football interchangeably throughout, So please don't get upset when there's no mention of the Cowboys and the Patriots. And with that out the way, let's talk about analytics. Hi Federico, thank you for joining us today. Just to begin with, I'd like to find out a little bit more about what you do at Chelsea. What is being a data scientist at Chelsea involved. Hi, Reachi, nice to meet you today. Let's say the data science at Chelsea is probably not just a chess probably overall in football is a bit of a niche right now, it's starting to develop quite massively, but it's still at an early stage. And let's say that being data scientists at Chelsea is it's fun, first of all, I would say, because especially if you are our European football or a soccer fun it's quite fun because you basically combined, at least in my case, two passions into one, which is the sporting passion and the data passion all in one. So it's really the best of both words in some way. It depends most of the times it is acting. Some other times it is a bit more quiet. It's a really fast paced world where you really need to be prepared for sudden changes, especially when like there's a managerial change, or when there are a few things to uncover which maybe come up from people at the top or from the manager himself. So let's say that overall, it's an experience which is definitely yeah, not usually if I have to put it in a very high level wards, So I'm concurious what sort of thes you focus on in your work, what kind of problems are you're trying to solve. The variety of problems is really huge, because of course it's also really dependent on what people ask on a daily basis. But let's say that overall in my role, since I'm working in a department which is called match analysis, so it's basically the department which focuses on analyzing the performance of the men's first team and the opposition teams as well. So basically what happens is that we focus primarily on making sure to provide metrics and analytics at player level and team level for the first team which can be of any use to make sure that on the pitch when the match days on, the manager is fully prepared to make sure to know the strength and weaknesses of your opponent players and team as a whole, as well as knowing our own strength and weaknesses in various aspects of the gameplay. So that's basically the high level view of what we provide. So it's looking like individual play...

...performance and I guess the whole team performance as well. Yeah, it's a combination of both because of course you're you're interested in knowing how specifically players perform in certain aspects, but on the other end, you're also interested in knowing our team behaves, especially when it comes to certain tactical decisions to face certain situations. So I'm cureti a little bit more. But how do you define like what good play performances. It's a big husk and probably one of the biggest questions in soccer overall or football overall, because basically it's really down to what's your own view of a player performance or a player being better than another player for example. Of course, when it comes to soccer, what we look at is primarily scoring more goals than your position. This is the final goal in the end. Sorry for the use of word, the goal. But let's say that from another perspective, also looking specifically just at scoring more goals than your position is quite limited in some ways because, of course, since we're talking about a very low scoring game as opposed to other sports like could be baseball or basketball. Of course, there are so many other aspects of the gameplay which could be so crucial actually for an individual player performance and team performance that might not be totally related to goal scoring or not considering a goal. For example, looking at defensive midfielders, looking at the performance in terms of getting the ball out of the opposition, like recovering the ball quickly in their own half or even making sure to do proper presses in the opposition of those are kind of things which are maybe not specifically related to winning games, especially if you look at some kind of correlation measures, it's very difficult to find out direct correlation between winning or scoring goals and recovering the ball dangerous areas of the pitch. But then when you look at more fine grained kind of analysis, you can start to connect the dots and see basically they're recovering the ball in certain positions can lead to sequences of possession which can then lead to creating a good chance, scoring a goal, or being some kind of threat to the opposition. So that's one of the example of things which are not strictly related to winning or scoring a goal, but in an indirect way they are crucial as well. That is really interesting. So I like your sort of comparison with basketball, where it's like, okay, when you're scoring a hundred points, like the score for the team is like a pretty strong indicator of like how well the team performed, but soccer, like it quite ever end up in another little drawing it's like, well, it wasn't like the team did nothing the whole ninety minutes. They actually added values from now. So it seems like for strikers, like for the forwards, it's like kind of clear, like how you measure the performance like do they actually score goals or take shots and things like this, But the defenders it's a bit more fuzzy. So maybe can you give an example of like do you have a sense of how you go about measuring how they added value to the game. Yes, let's say that probably in terms of defensive players. Like one example is the one I provided before, but other examples could be for example, when you look at certain specific gameplay situations, let's say, when players need to defend open play crosses or set pieces situations like corners, like in this case, when you are facing these situations from an opposition perspective, of course, you need to find a way to try and minimize the amount of goals you concede from this situation or minimize the amount of threats you actually concede. So in this scenarios, for example, making sure to measure performance around our defenders effective in winning jewels for example, winning area jewels, or making sure that like to measure performance around clearing the ball out when there's a cross inside the box, or when it comes also to looking at more specific kind of information around spatial information of defenders, for example, you want to make...

...or the depending on the situation, the players are position in an optimal way, where optimal, of course it's a very fuzzy word in some way, but optimal is always down to what we consider to be optimal and also what is effective. Let's say, for the coaches which study this particular situation, set up this particular situation on the pitch to know what to do basically in training for example, when they want to make sure that knowing the characteristics of an opponent in terms of where the players tend to attack in certain kind of situation, how they tend to move, you know which kind of positions you have to keep and which kind of movements you might expect. It feels like midfield is maybe even like a harder thing to analyze. It's like you don't have the obvious, Well, I'm ither going to score something or I'm gonna stop someone scoring, but it's sort of somewhere in between. So can you talk at all about midfield? What do you do that? Yeah, you're definitely right about that, because, first of all, the midfielders are probably the more varied kind of roles you can have, because of course you can have a set up with three midfielders or two midfielders, or even for midfielders in some occasions, but of course the kind of role they play in the game can be very different. Some midfielders are primarily defensive midfielders, so often the role is stopping the position from advancing line, being like good pressers, or making sure to intercept dangerous balls. Other midfielders have more of an attacking role, so perhaps midfielders that starts wide, and then there are ball carriers that like to carry the ball towards the position box for a shot or potentially to try and play a pass which breaks the lines, the defensive lines for examples, and other midfielders prefer like to run without the ball, so doing off board runs which can then be in some way targeted by other players which are more into playmaker kind of role. So there's these different sides. So of course, depending on the midfielder you're looking at, you might look at different metrics. For let's say, attacking midfielders, you might potentially still look at some kind of defensive metrics, because of course attacking midfielders are probably also amongst the first players that puts the first line of pressure in the opposition half or in the opposition third when they are closer to the opposition box. But you want to also look at more some kind of creative stats, such as how creative that player is in terms of playing potentially dangerous passes, where the dangerosity the danger of a past can actually be measured by using some in this case, some kind of model, some kind of machine learning model, which can actually be used to predict let's say how difficult that pass can be or how potentially threatening that past can be in order to get to a certain position. So creative starts defensive stats are the kind of study you want to look primarily for midfielders. Potentially also or in starts if you want to consider very offensive midfielders, such as a number ten playing behind the long striker, for example. So it seems like there are a lot of different metrics that you can sort of calculate, like for each of the different roles. I'm kind of curious, like what happens after you sort of crunch the numbers. So if you discover, like one player is playing particularly well, obticularly badly, what sort of happens in the club? That's actually the part in which probably I'm more behind the sort of scenes in this kind of world, because of course I'm responsible for doing all the let's say, maybe not so exciting work in terms that you are not facing let's say the players directly when it comes to communicating, then the kind of results are more behind the scenes. But when it comes to actually the final end product of the outcome of the analysis, basically what happens is that usually the coaches so the manager stuff and the manager themselves actually have usually meetings with the players on the day or two days after the match where they analyze the game with both video years. The analysis...

...we come up with to actually trying to spot the positive things and the negative things of the previous game. Essentially, for example, in the game we had on the weekend against Right, and there were lots of negative things to look at, of course without going too much into the detail by just looking at the scoreline, of course we had a few things to look at. But yeah, at the end of the day, you always have, of course, were to be a bit cautious about how to communicate these things to the players because it's always about some kind of trade off. Of course, you know, you don't want to be too harsh with them if they've done something bad, but on the other and you want to make sure that they understand where they can improve, but also highlight what they've done well in the game. Because so from there you can actually take it and also use the training kind of set up the training session to actually train the players in those kinds of aspects, especially when it comes like in set pieces. You can train players to kick set pieces effectively or to learn certain patterns of play. For example, I can certainly imagine that like after you've had a bad game, having to like tell all these sort of players who high from billions of dollars, actually the statistics show that you were rubbish today. That's gonna be like a difficult conversation. So maybe it's best it comes from the coach rather than yourself. Yeah, yeah, absolutely, I'm kind of curious is to like how they're sort of communication flow works. I know, like English football in particulars like traditionally had like kind of anti intellectual reputation, and so I don't know how well, like analyst communication comes across. So can you maybe talk a bit about how you explained the results of your models and your statistics. I wasn't aware of that kind of aspect about it. I mean, I think this is more like a twentieth century thing. But yeah, that's good to know. I mean I would say that, of course, different managers have different fuse and ways of working, especially with our kind of role, which is very analytical. Well, in general I would say that probably we are way more into direct connection with is assistance their assistance. So the assistant coaches rather than the manager himself. Usually the managering himself actually is a bit more I wouldn't say on on his own, but is more direct with the players and of course with the closest stuff, and we're more in connection with the coaches, the stuff which is actually helping the manager. The workflow is very direct with them because actually we usually meet them daily on a daily basis. We meet with them on random occasions during the day, so it's not like a set meeting every day, like nine am meeting. Is more like they come into the office, we say a lot to each other and then we just start talking about what we want in terms of we want to see this, this and that today when it comes to the daily kind of operations, because at the end of the day, with the coaches, we are more into the daily kind of metrics or daily kind of work, whereas when we talk about a longer view, so a longer term kind of planning for the work we do. On a longer term, it's more down to us specifically as as a department, planning things before and planning things which still gone alongside the daily things. And sometimes you start them then you leave them aside because you need to do something else which is more urgent, then you pick them up again. So that's the sort of thing we do with the coaches and abounds ourselves as well. You just had a change of manager quite recently, and so does that effect like you get new directors on things that you need to be looking into, different kinds of analytics like depending on the manager. Yeah, yeah, Actually there's Actually I've been with three managers since i've been a chelseas, so I could see three different ways of working. And yeah, I would say that all the three of them were quite different amongst each other. I would say probably the first manager I've been with,...

...which is Lampard, and the current one are quite similar probably, whereas to hell and these assistance were quite different because they had a very clear view of what they wanted to see. So they have very clear view of what they wanted us to do. So let's say that we knew very well what to do, but we were in some way also constrained to do what they wanted, which is a good thing because they are very clear ideas. But still let's say that in terms of working for longer term kind of project, it was a bit more complicated because they knew very well what they wanted, so we were full time working for them basically, whereas now there's a bit more of let's say freedom, because so I mean they're very open to what we provide to them, So basically we have more kind of freedom in working on our things in some way and provide them with things that can in some ways stimulate their intellect and stimulate the conversation amongst us. So let's say that that's a good thing. Also as well, I would say that I found myself with all the three of them, because in some way they were all different. But at the end of the day, that hasn't really affected us in terms of our daily war cload. I mean, we always had something to do which was well elevant for the club and for the team as well, something that allowed us to actually play a significant roles in a few situations. So I would say that that's something actually I can't complain in terms of the overall relationships with the managers and their stuff so far at least, that's good news. Yeah, that's very good. So you talked a bit about the difference between the sort of short term projects and the longer term goals that you have is maybe gives some examples of each and what the difference is. Let's say that when we talk about planning the long term kind of things, these are usually things that probably are discussed in the summer break, so basically when all the championships are over, so in the month around the June and July, which are very quiet months for us because of course the team is basically never there, we have a bit more time to basically plan the things we want to work on which decide ourselves alongside the season, which are a bit more also of research and development kind of things which can actually be used also on a daily basis, but can be worked on a longer term view. So we can actually say, okay, we'll give you six seven months to work on this piece of work where you can take your own time and developing things which are actually working nicely. We are sure that nailed down. We have no rush about doing that on a very short term like within one month. We're sure that when they are in place, the club can use them on a regular basis and making sure it is effective when it is in place. So that's the kind of long term view. And one example of these actually back in the days when actually the pandemic kicked in. So we're talking around March Hayrid. There was the big work we did around the set pieces, the big work about our own set pieces, which I can may be discussed a bit later. And when we talk short term, it's very very short term, so it could be short term of the term of one week for example, or even less than one week, because it really depends on what the manager or the stuff comes once in terms of okay, we want to see this within a few days or as soon as possible at least. So sometimes I find myself working on pieces of work which literally last one day. So I work on one piece of work which is ready in one day and the next day is already operational, or maybe it's not operational, but there's already something to work on, and then to make it operational, let's say on a bigger scale. And this sort of automation you wanna see also on a game day basis, that's something that maybe it takes a bit more time. I found myself working on very short term things like working on like making sure the metrics were provided life during games, some simple metrics which we wanted to see live in game to make it operational. It took a bit of time because we needed also the help of an external developer, which was...

...more into making sure that the scripts could run on an automated basis externally in a platform, but to make sure that they actually just the metrics were available in game on a sort of life scenario which wasn't really fancy but was still working. That was basically done within a matter of days. That sounds fast. I'd love to know a little bit more about what these sort of metrics you provide life during the game, I mean these live metrics. Actually this is this was a very short term piece of project which basically started and actually was nearly concluded with the former manager. So we took and this stuff. They wanted to see a few metrics which were very high level, simple metrics both at team level and player level, but they were detailed enough for them to actually see them on a sort of mobile app which they could open and refresh whenever they wanted, and they could see the prob were solved these metrics during the game. For example, very simple metric which is very well used now is for example, being able to see x G expected goals live in games. Both at player level and team level for a specific game. So basically you could see by refreshing this up how the sort of senility x G was at that point in game for our team and our polling team, but also specifically for each player. And a nice thing was also that you could see these metric also specifically for certain gameplay situations, so you could split it also amongst open place situations and set pieces situations like corners for example. That's an example of a metric. So this actually is this expected goals. This is a sort of measure of like what the statistical model would protect the number of goals to be, because that's often like quite different from the actual number of goals are Yeah, absolutely, could you just talk a bit more about like how this sort of thing is calculated the expected goal is you're really going up a bit more technical into the kind of model which usually it entails. The model itself is a classification model. Each line of your data is a shot from any game. So you you collect a significant number of shots throughout a big number of games last seasons, and you have a few characteristics about these shots which are your features of your model. So it could be the shot location, the sort of situation, whether it is open play or set play. Another characteristic could be the angle you have towards goal or the number of opponents you have between the shot and the goal. So a few features that describe the shot, and this features used to predict whether the shot ends in a goal or not. And of course, at the end of the day, the characterists such as the location of the shot is probably one of the most explanatory feature because of course the location of the shot gives you also information about how distant you are compared to the goal, So the distance from goal of course plays a big role because the closer you are to goal, the more likely you are to score in most of cases. And basically at the end of the day, each shot as a probability of ending into goal attached, so a number that ranges from zero to one. And by summing all these probabilities alongside a game, for example, you get the cumulative x G commulative expected goals throughout the game, so your total number of goals you should have scored in a game. Roughly based on the model, you can compare it along a big number of games, or even in the same game. You can compare it alongside the actual number of goals you've scored both a player level and team level, and see if you have either overperformed which means you have scored more than you should, or underperformed, you have scored less than you should. So this is really like a measure of well, some shots are easy to score, some goals reasi to scores, so you should try and optimize get into those. That's interesting, So does it feed into...

...strategy Then it's like, well, okay, there's no point in just like trying to aim for taking long shots, and actually we should work towards some kind of tactics where we can get in closer to the goal. Actually, yes they are. They are actually used. I use lots strategically, especially depending on the situation you want to analyze, because, for example, you want to use expected goals as some measure of threat. So rather than using expected goals as a measure of what's the probability for miscoring a goal, it's more used as a metric to say, we want to move the ball into more threatening positions and we want to make sure. Also when it comes to playing certain situations such as playing across or playing a set piece like a corner. You want to make sure that when you play the ball, both the crosser and the players who are actually positioning themselves within the box, you want to make sure that they actually position themselves in such a way that we can maximize our global expected goal from the situation. So looking at things like what's the situation that allows us, what's the combination of features, of combination of steps that allows us to maximize our chance of shooting and maximize the chance of having a high expected goal shot, So basically combining the chance of shooting, so getting the quantity in it, but also the quality. Because there can be situation in which you might have more shots on average over a longer sort of time horizon, over a high number of data, you could see that on average you can shoot more in certain situation, but the average expected goal of those shots might not be so high, Whereas there might be other situations in which you have fewer shots, so it's more difficult to get a shot. But once you get this shot, the chance of getting this into goal is higher. So that's probably a perfect example. When you look at outswing corners versus in swing corners for ex sample, so outswing corners like corners that player kicks and tends to go further away from the goal, whereas in swing corners are the ones that tends to go closer to the goals. For example, in night swing corners, you tend to see more shots, but we are lower XG on average, whereas in in swing corners, so the corners that go towards the goal, it's more difficult to shoot, but the shot tends to be higher in x GRE because actually the ball actually also gets closer to the goal. So that's also another thing to consider. If you get the ball closer to the goal, your threatning creases, but it might be more difficult to shoot because there's also more density of players. That's really interesting, So there's a sort of trade off there, Like the thing where if you're just counting, like how many shots do you take, then you'd probably favor going for the outswinging corners, but actually you can have more chance of scoring if you go for the in swing corner, even though you know I get so many shots exactly, and especially it can be this average these numbers called will be different from team to teams. So that's also where the individual opponent analysis comes in. When it comes to deciding what's the best way to try and maximize our chance of scoring at the end of the day, I'm curious how the short corners compared to these when you just knock it. The short corners in another difference is another interesting point because short corners, for example, if you look at the global kind of output you get from short corners, they seem not to be very effective. On average. It's very difficult to get a shot from a short corner if you compare it, of course to the standard in swing or outswing corners for an obvious reason at the end of the day, because we in swing and outswing corners, you are getting the ball into the box straight away for free in some way, so you don't have the additional step of moving the ball further away and trying to find another spot to get the delivery into the box. Because often also short corners don't really end up into the box. They become some kind of build up phase. So the short corner becomes at some point an open place situation.

So that's also another thing to consider. The actually short corners in most cases end up being an open place situation, and so anything that comes from that situation is also classified as an open play kind of output and not a set play at some point, so that's also another thing to consider. But there are some teams that are actually better than others in playing short corners and getting something out of it, although at the end of the day, there's not really much data for short corners as opposed to win swing and outswing because it's not something that teams on average tend to apply very often during the game. So it seems like corners a pretty well studied like part of soccer analysis. How about other set pieces? What about like penalties or free kicks? With free kicks, so we do basically the same kind of analysis we do for corners to so there's really nothing different methodologically. Of course, free kicks can be kind of big variety of free kicks because depending on where the free kicks are taken, might study them differently, like latural free kicks, which are the kind of free kicks where actually the ball is kicked from wider positions, especially in the sort of upper part of the pitch, but wider which nearly resemble corners in some ways, although if the ball is not high enough actually the way defenders or players overall need to defend or attack is quite different, as you can also probably see in some videos from latural free kicks as opposed to callers, for example, because you see actually the ball coming towards you, so you have to run in behind towards the goal, whereas in most cases for corners you don't need to run in behind because the ball is literally coming from the byline, so in that case you don't need to run in behind. That's another big difference between latural free kicks and corners, for example. Then you have other free kicks like frontal free kicks, which are basically played from frontal position a bit further up on the pitch, so a bit behind, so in that case the analysis is still different. And how the free kicks where freaking shots are basically there's not much to analyze in there because even freaking shots do not happen very often in a game. So even for the kind of analysis it's it is difficult to gauge something on a bigger scale because in the end of the day, they are shots, so they can be classified within the shot realm in some way. And penalties is actually something we don't really focus too much from an at least from my analytical point of view, because penalties are again quite rare events. Of course, once you get a penalty, we know the chance of scoring a penalty is extremely high, so of course you want to make sure to get it right. We want to make sure to web that possibility if you can. But on the other end, there probably penalties is one of those things which we probably could look at at some point, especially from a goalkeeper perspective, if you want to like feed the goalkeepers with relevant information for certain players which are actually way better than others on average in kicking penalties, Studying the kind of techniques, studying the kind of direction they take when they take penalties, so do they take penalties towards the left or towards the right with the right foot or left foot. So these kind of things can actually be helpful, but in terms of big data it's not something that we focus too much or soo because you can also use purely videos for that when you study a specific player, for example, because usually you might have not so many penalties to study for a specific player, So in that case it's easy just watch like it does not so just videos of them diving and then you figure out or they can't dive left over, yeah, which is done by other people usually, but yeah, especially Yeah, if you have, like over a career, you might have a player has taken thirty forty penalties. Forty penalties already a big number for a player over an entire career. So in that case you can easily watch them on video forty penalties. I'd like to a little bit. But...

...like the data and how you come to sort of create all these analysis It seems like using some video data, but beyond that, where you get all the rest of your data for analysis? What's it look like? Basically the data I work with are primarily coming from two main sources. One source is the so called event data, which are basically the data that provide in a sort of time serious fashion, every single event that happens during a game, so every pass, every shot, every ball, carry clearance, interception, even red cards, yellow cards from the referee, so anything fouls committed, fouls received, so any sort of event which is relevant in a game of football a game of soccer. And the other source of data we get, which is probably also the very interesting one because it has the highest potential, is the trucking data, which provides basically information around players and board positioning as well as speed throughout the game, with a resolution of twenty five frames per second. So essentially, for each second you get twenty five observation for each player and the ball as well. And this information which clearly amounts to some three million rows per game. Roughly, it's three million rolls per game at the end of the day. Pretty peg data then, yeah, especially if you just consider one game. Yes, So basically, if you consider an entire Premier League season, let's say, you can easily exceed one billion rows. So basically, when it comes to storing, of course, we're talking about storing very big data, not an excessive amount of columns. So at the end of the day, the width of the data is quite limited in a good way. The number of roles is quite explosive in some way. The use we make of this data is quite huge, quite variable, because with all the things for instance I've already mentioned, which is only a tiny part of all the things we work on, we have already used these two sources in a quite significant manner. For example, to mention again the big work we did around the set pieces and crosses. Overall, this is probably the bigger project we've done overall, because also it has been a project that has lasted until its final version. It has lasted probably a good six or seven months, if not probably more at least, And this big project was literally about trying to understand how to maximize our effectiveness in all set pieces and cross the situations in terms of offensive and defensive set up. So basically what we did was using the event data to identify the moments in which in the game there was a set piece like a corner or a free kick, and any open play cross as well we have throughout the game, so also open place situations like crosses are taken into account. And once we identify these situations from event data, we use the trucking data tool identify a reasonable window of time where we believe the specific event will fall into the trucking data because one big limitation of this big alignment thing is that the two sources are not synchronized from a time perspective. So basically we needed to find ways some fuzzy, very fuzzy logic to make sure that the window of time we have we identified the trucking data was matching the event we were identifying and event data as precisely as possible, and that took a big effort from me in terms of trying to find the right context. So basically, by using the player, for example, that was kicking the corner or the cross in the event data, I was making sure that that player was in the right contact in the tracking...

...data, so that player was actually very close to the ball in a sort of time stamp which was reasonably close to the time stamp we got from event data. And also trying to work out what was the optimal approximation for the ball to actually leave that player food to actually getting delivered. So this kind of logic is the logic I've actually used to try and align these these things as best as possible, and also a big effort from my colleagues in my department to validate these things with videos. So looking at videos of course randomly sample the events because looking to them thousands and thousands of events all over and all over again would have been a mess. So trying to cherry picking events to try and find out the optimal sort of validation, and there was a lot of back and forth on this, So me doing the sort of alignment to try and align the two sources, then validating with video, saying we could maybe ref find the rules, so going back to the code and refining the rule. So there was a few times this back and forth kind of situation between me and my colleagues, which was frustrating sometimes but at the end of the day rewarding because we managed to do something quite big at the end of the day in terms of this big work, which also proved to be quite effective on the pitch as well in the past season, in the season before as well in terms of getting our output in terms of offensive and defensive situations right or at least getting it better. It seems like you're saying, like the two main datas as a kind of like time series event data and ends of the spatial datas well, and there's like a bit of a data cleaning problem to try and get the two things aligned. Yeah. Absolutely, you can tell me a bit about which statistics or machine learning techniques you tend to use, I would say them apart from the large majority of time I spend on proper data cleaning or proper refinement of data rules, which are actually essential to make sure that we actually work on the logic we need to actually work on to make sure we get to the final goal we need for the specific analysis, for example, making sure the certain sequences of possession in the event data reflect what we actually see on video, which is sometimes not right because the event data and don't fully reflect sometimes what you see on video because they are human litag they're attacked by people, so of code that brings in an unavoidable human error. Apart from the huge data cleaning, I carry on a daily or monthly or weekly basis. On a regular basis, the machine learning techniques are used to actually more on a kind of research and development phase at this stage, which hopefully at some point will make the light of the day on an operational phase. Are for example, related to pure analysis on tracking data, for example, identifying patterns in the tracking data which reflect player runs, for example behind the defensive line, especially when you want to identify moments which you will never see in the event data because in the event data you only see information around the ball, whereas in the trucking data you can see the full spectrum of actions around specific events. For example, in the tracking data, I'm working on making sure to identify those runs, those player runs regardless of whether they are around certain events or not. The players do, and they often do on a game even when they are off camera sometimes, which is even more of a bonus for me to investigate the data even more because you can identify moments such as these which are actually difficult even to quantify from a metric perspective because actually you don't really see them on an event data feed, and even in the trucking data if you use them purely to align them with the event data feed in terms of contextual information, it is...

...still difficult because sometimes you see things and trucking data like these runs which are very difficult in some way to attach to a specific event. So in this case, that's where this kind of information can be very useful also from a recruitment perspective, if you want to identify players for recruiting purposes which are actually doing specific kind of things on the pitch which are not so easily quantifiable. So that's an example things only specially that means specially stuff that happening away from the ball, maybe like a player trying to create space or something like that. Yeah, exactly, that's a big point around the creating space with runs and so of all these analysis you've worked on, what's the sort of data success story you're most proud of. I might be repetitive once again by a probably the big project we did around set pieces and crosses in terms of bringing to life, let's say, probably to nearly full potential, the trucking data and event data together. It's probably something that really made us proud. And not only because it was a big piece of work which we managed to make operational and we managed to make it work on a regular basis and it's still working nowadays in terms of the automated process in the background, but probably we're also very proud of that because of the actual success we have seen on the pitch for this kind of data brought to life, especially when it comes to the coaches using it actively to actually take decisions in the training session when it comes to train players for certain situations, but also on the pitch when it comes to designing the best set up during a game. I'm saying it has been successful on the pitch because actually when it comes to the past season, so the season two and the season before the season one, we have seen a dramatic increase, a dramatic improvement in set pieces performance and also crosses open play cross performance from a defensive point of view and an offensive point of view, so in terms of scoring goals and conceding goals as opposed to the season before that. So the season nine twenty which was a very problematic season in some way in terms of set pieces performance. Then in two years ago and one year ago also thanks to a new set piece coach that came in enabled us to actually have a more thought of kind of analysis and a more thorough training on set pieces that enabled us to actually being very collaborative within to being very cooperative with him in terms of working very well alongside him because he really liked the kind of work we did a lot, because he is quite a bit into data in terms of using data to actually take decision and so using his expertise in terms of training players in set pieces and crosses from a defensive and offensive point of view, combined line with our work we did with set pieces and crosses which was completely independent of any manager requests, So it is something we planned ourselves. In some way, we managed to create these small success store in some way that's cool. Then, just seems like analyzing set pieces and optimizing set piece performance is like a very big part of your role. So just switching to the World Cup because the worldcop has got an important event right now, it seems like national teams and national squads are also very much focused on like trying to figure out set pieces as well. So I was wondering do you get involved in the World Cup analysis at all? Actually, I'm not involved at all in the World Cup analysis, at least not in the terms of analyzing the games themselves, because actually, since Chelsea specifically is not involved in this competition, and it's also an international competition so international teams are involved, it's usually something we are not involved also because to that end, we would need actually to buy date to actually get analysis, So...

...in that case it's also another thing to consider. So I'm not involved in the analysis of the World Cup at least in terms of getting data and crunching metrics as I usually do when it comes to our competition specifically. But it might be that in terms of recruiting potentially also because I collaborate also with the recruiting side of things. We collaborate with the recruiting side of things with our data because our data are also used for recruiting in certain cases, which is going to help definitely. So this is trying to identify good players and all of them joor team. Yeah, of course the World Cup will be a very big opportunity for recruiting purposes, so I assume there might be some data analysis coming up. I don't know if we if we will be to the level we have been facing with our competitions, but still we might still see something. We'll see. We'll just wait and see for that. I guess finish, who do you think it's gonna when the Cup? My answer might be quite obvious, but I'm guessing Brazil. Okay, I think the favorite. When do you go with the safe choice? That? Yeah, I would say it's a very safe joy. But I've actually watched Brazil a few times. Also because my girlfriend is half Brazilian, so also I'm a bit biased about that, and basically of course she's she will support BRAZILO. I'm very I'm very kind of unbiased on that. I'm not ready supporting any team. Also because Italy is not in the World Cup, so that's a big loss for US for two World Cups not being there, which is very massively negative for US. But I think Brazil overall are really the most complete team, especially considered that France we left quite a few big defections, so I would say that that's the main reason. Brazil's like the more complete team in every kind of set up, defensive set up, midfield and forwards. I think they are the most competitive team. But World Cup is good also because often see very unexpected teams doing very well, like Croatia four years ago. All right, in that case, we've got a few weeks to wait until we find out if your predictions are correct. So thank you for taking the time to chat. Has been a real pleasure at length a lot. Thank you. Thank you very much for your time reach as well. It's been a pleasure you've been listening to Data Framed, a podcast by Data Camp. Keep connected with us by subscribing to the show in your favorite podcast player. Please give us a rating, leave a comment, and share episodes you love. That helps us keep delivering insights into all things data. Thanks for listening. Until next time.

In-Stream Audio Search

NEW

Search across all episodes within this podcast

Episodes (121)