Episode · 8 months ago

[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio


Today marks the second episode in our DataFramed Careers Series. In this series, we will interview a diverse range of thought leaders and experts on the different aspects of landing a data role in 2022.

In the first episode of the series, Sadie discussed at great length the importance of having a solid data science portfolio to land a role in data. But what makes a great data science portfolio?

Nick Singh, co-author of Acing the Data Science Interview, joins the show to share everything you need to know to create high-quality, thorough portfolio projects.

Throughout the episode, we discuss

  • How portfolio projects build experience
  • Who should be focusing on portfolio projects
  • The different types of portfolio projects
  • Biggest pitfalls when creating portfolio projects
  • How to get noticed with your portfolio projects
  • Concrete examples of great portfolio projects 

[Announcement] Join us for DataCamp Radar, our digital summit on June 23rd. During this summit, a variety of experts from different backgrounds will be discussing everything related to the future of careers in data. Whether you're recruiting for data roles or looking to build a career in data, there’s definitely something for you. Seats are limited, and registration is free, so secure your spot today on

You're listening to data framed, the podcast by data camp. In this show you'll hear all the latest trends and insights in data science. Whether you're just getting started in your data career or you're a data leader looking to scale data driven decisions in your organization, join us for indepth discussions with the data and analytics leaders at the forefront of the data revolution. Let's dive right in. Hello everyone, this is a dull data science educator and eventelist at data camp. Welcome to our second day of the data framed careers series, where we deep dive into the INS and outs of launching and building a career in data and yesterday's episode we talked with sad St Lawrence about what it takes to launch a career in data in two thousand and twenty two. One thing that stood out in the chat was the importance of building a portfolio project. I wanted to go into more depth around portfolio projects and there's probably no one better to discuss this with than Nick Sing Nixing is the CO author of the Best Selling Book acing the Data Science Interview. He has held a variety of data and software engineering roles at facebook, Google and safegraph. He is a career coach and has helped hundreds of folks land jobs through portfolio projects. Interestingly, Nick used portfolio projects to get a variety of data roles himself, and we discussed that a great length in today's episode. Nick is also a second time data framed guest and his first appearance he discussed his book alongside this go author Kevin Nouah, so make sure to check that episode out. I left it in the description below. Throughout the episode we discussed why and how portfolio projects are so important, the difference between coding based and content based portfolio projects, the biggest pitfalls when creating portfolio projects and how an expassion for hip hop let him to a growth data science position at Facebook, and concrete examples of what makes a great project and more forever curious about content based projects. After today's episode, make sure to tune in for tomorrow's episode with Creen and trend, who used content based projects as a means to accelerate her data career. If you enjoyed this episode, make sure to read the podcast and leave review. Also just to see the St Lawrence from yesterday's episode is joining us for Data Camp Rayadar or digital summit. On June twenty three, nick will also be hosting a workshop on creating awesome portfolio projects, so make sure to register. See are limited and registration is free, so secure spot today on events to data campcom slash radar. The link is in the description now on today's episode. So I think you're the first guest who's appeared twice on data frame thus far. I think I'm very excited to break down with you the hallmarks of a great data sciences portfolio. But before especially for those who haven't listened to our first episode yet, episode seventy seven, on acing the data science and interview, do you mind sharing a bit of a background about yourself? Absolutely. Thanks for having me back on data frame and I hope to get a threept in and come back again to talk even more. But a little bit about myself. I've worked a variety of software engineering and data roles at companies like facebook, Google, go on, Microsoft and most recently at geospatial analytics start up safecraft. When covid hit, I saw my friends lose their jobs and internships, so I doubled down on this career coaching stuff I'd been doing on the side where I'd be writing tips on Linkedin, and I basically combined all those tips in my past and partnered up with my buddy Kevin, who's an x facebook data scientist, and we wrote the best selling book as the Data Science Interview. I highly recommend for folks to listen to that episode. That has a lot of tidbits on all of the interview process, like the series that we're hosting today. So I'm excited to I'm back with you portfolio projects with you. I want to first start off our conversation by discussing the why behind portfolio projects. Oftentimes, when we talk about portfolio projects, we often miss out on why they're so important and why there's such a great way for aspiring and season data practitioners to put their work out there, prove their expertise and get noticed. So, and your own words, why do you think portfolio projects are so important? It's so important because there's so many jobs, even entry level jobs, that are looking for years of experience. And what kind of catch twenty two is that that to get your first job you need experience, except that's what your first jobs for, right and folks get stuck in this all the damn time, and the best way to get around that, I've found, besides getting your first job, is building portfolio projects, because that will help you make your own experience and kind of boots trap yourself to being in a spot to be picked by by recruiters for your first internship or your first job, or even your second job or third. That's great. Then, harping on the state of data jobs in general and the experience require to a lot of entry level jobs, why do you think data science heavily sques towards provable experience, even for entry level jobs, as opposed to other industries or professions? Right, so I think that people have actually had this debate before. That is data science even and entry level position? Some people argue that there's no such thing as an entry level data scientist, which is why most jobs require master's degree or they might require past work experience.

They say, jobs such as being a data analyst or bi developer or data engineer can have a more junior component, but data science, because it's so interdisciplinary, their stakeholders involved, there's so many different skills. People have made that argument. I'm not sure how much to believe in that. And the second thing to also mention is that these job requirements are often laundry lists of recommendations and not really requirements. So that's another thing. Just through inflation. You see people saying, Oh, this entry legit level job needs two years or experience, and then you actually start interviewing and they realize you can't hire anyone. So even if the job requirement says that, often you might not need experience. So it's a bit of a mixture of data. Science is advanced, but also companies sort of white lie and and on that white like component. You also see it sometimes in the job description, like expert machine learning, but also experts in Tablau, empowered by but also experts and data of realization. You have, I think, companies to starting set don't understand what the higher for as at some levels companies don't know. And then, even if the hiring manager might know, maybe it goes through a technical recruiter who's not as sure about all this stuff. And that's why you'll see someone needing aws and Azure and GCP skills and it's like wait, what job is using three different cloud providers? And that's that doesn't make sense. But you see people just make a laundry list or put synonyms in there. So I don't put too much weight on it, but it is true that, yes, often to get your first job, second job, having some level of experience matters, and that's why portfolio projects are there to help you, because if you didn't have that internship, or you didn't have that research experience, or you don't have that provable masters or PhD, we got to make our own experience and portfolio projects are the best way to do that. I completely agree. So, given that, who should be focusing the most on portfolio projects for the most it should definitely be people who are earlier in their career, who are looking for that first or second, maybe even third job, especially people are switching industries, who might have years of experience, but in something slightly related, like econometrics or public health, who have some data background, some data analysis background, but really need to, you know, double down on the field of data science. But I want to clear up some misnomer as well, that there is room for season people to be working on portfolio projects. Absolutely now. Here's the thing. Let's say you have ten years of experience right, you might be stagnant in your tech skills because data evolved so fast. Even if you have ten years of experience, you might not know the latest and greatest. Maybe you don't have experience with distributed data processing frameworks like spark or Dask. Maybe your last job was an our shop, so you have years of experience in r but no python. But the state of the industry really favors python these days and state of industry needs people who know spark. And that's exactly why even someone who's senior in their career can get a lot of mileage out of showing that not only do I have years of experience, look at on my side, I've been upscaling and getting more modern technologies underneath my belt. So it's good for people early in their career, but really at all levels there's room for foil projects and parping on the experienced type of persona when working on projects. Even the best data scientists I know they tend to be very curious people and they tend to gravitate towards side projects all the time and publishing their work around side projects, and it's just a testament towards that ability to create fought leadership with data science that I think it's very interesting. That gets people notice, gets people even better opportunities, better jobs. You're absolutely right. So, you know, one person could just go network, which means just shaking a lot of hands and doing a lot of coffee meats. But at that senior level the most effective time investment for networking isn't to go network by shaking hands, it's to go make cool stuff and post it on hacker news or get it viral on Linkedin or get it viral on medium. And that only works if you actually have a cool project you can openly talk about. Her some cool novel ideas they can openly share. So you're right that even a senior person who's not really looking for a job switch but it's just looking to grow their clout, their own reputation in industry, you got to put your work out in public and that means projects. So what are the different types of portfolio projects that practitioners going to start doing? Stuff that comes to mind for me are like content based projects, like tutorials, or even kind of actual projects where you take an idea and you execute it. Definitely there those are two really good divisions. I think that I'm more of a fan towards the actual ententen projects where you're actually building something or analyzing some data. But I think there is some room for these content based tutorials, for example, like I just seen, more people do things poorly on the content side, or maybe they're doing it for the wrong reasons. I think that some people think having the twenty seven blog post on using extra boost and python on some boring old cagle data set is going to get them a job, and I don't think that's the case. I think it would be way better to build a more realistic project, something that actually aligns to what your future job might be. But there is room...

...for this more content based project, right, like if you're trying to really tell a cool story or doing some data journalism type work like that fits. But I think you might differ on this. Right I want to hear your take as well, because you're someone who advocates for these content based tutorials and you're someone who's had a data experience. To you, I want to hear what you think did. For My side, I think there's a lot of value and being able to create these substitutorials from a personal perspective, simply because if you're still early in your career, you still have some form of concept that you want to be able to master. I think writing about it and being able to teach others and pay it forward is a great way towards that mastery. I also think when we think about being able to prove communication skills, data storytelling skills and being able to distill concept, technical concepts too, much more understandable information, that's a great avenue to do so. But I also agree with your point in general that if you want to be able to make a splash, let's say, and to create truly original projects, then having much a holistic project is a much better route to be able to have bigger bang on your buck this running extent. Yeah, I know, I definitely agree with that sentiment of what you laid out and definitely I don't want to like throw shade at these content based projects because, let's face it, writing is so important and it would be hypocritical of me if to have written a book and be like, Oh, yeah, who cares about your communication skills? No, let's be honest. To level up in data science, to really get your idea sould and amongst your organization, or to get your idea soul amongst the wider data science community. That means communication skills, which really does often mean writing. Unless you're going to do a tick tock dance that's going to go viral, you're probably better off working on those writing skill so there is room for that and I think that writing something that's like novel is very hard. So in terms of just practicing, there is room to write that extra boost tutorial if it's a means for you to practice your communication skills. But again, in the context of landing jobs, it's fine and helpful, but I'm man, I do think those end too. End Media projects that more accurately look like the real job that you might be wanting to do. That's going to get you a bigger bank for the book, and I think that's what I'd love to tell you more about. That I completely ran. I'm a very excited thumb back that. But before let's harp on that writing gig to so writing projects, let's say I'm not super confident in my technical skills and I want to write about a particular topic to be able to feel it's like to be able to go over that Hump and to be able to sharpen my skills there. How do you get over that impulse for syndrome and breeding content certain topics and putting it out there? Dull it is so real. I wrote this book. I do not like writing. I barely commented my code right in school. I studied engineering so I could get away from these liberal art classes that had thirty page essays. Do you at the end of the semester I didn't take a single one because I knew, Oh shit, they're not going to grade me an exam, they're going to grade me on this paper that I do not want to write. So that's how much I feared writing and I didn't like writing. And three hundred one pages later, I've changed my opinion of it and I think that there's definitely a lot of it's not just a skill thing. It's a lot of a mind game. Many people feel scared to write because they think they're a bad writer. And actually the bar towards good writing isn't that high right now. If you want to be writing some like storytell or really fancy story that's still really hard. Maybe just trying to describe a technical concept in Layman's terms. There's not that much storytelling or plot devices or high level story elements yet to write. So I think people are a little bit harsh on themselves when they're writing, which is what scares them from me even writing in the first place. So I think that's one thing that people are just too harsh on them off because they think that being to write like jk rowling or something, when in reality the bars a lot lower. The second thing is right, what you know right. So too often I see people say, okay, if I'm going to do a tutorial, let's make it the most advanced thing I know, or let's make it something that other people will find fancy so I can show off, but it's something I don't know well, and I think that's a big issue and I think people are better off writing for them self six months ago. So what do I mean by that? I mean that you don't have to write and pretend you're an expert and be teaching another expert. You could just be writing with this audience of in mind, of like, HMM, what would I have wanted to know six months ago about extra boost, or what would I wanted to know about data? Is Six months ago, and when you write towards that audience, you know that audience because you were that person six months ago and to you know, you should be able to add value there because, well, you've learned a lot and you grown a lot in six months. So you've had a transition, so you know what that looks like. So I think those are two the biggest things to get over this writing hump towards creating content based tutorials or just putting out your writing portfolios or putting out like this, you know, cool data investigative piece. We're diving in a data set and you're putting that up on medium. It's don't be too hard on yourself and write something you know, write something that you could have appreciated even six months ago. I completely agree with you and for me, what got me over the Hump of writing was just writing about stuff that I'm really fashioned about. For particular, while I was very passionate about a ethics, I wanted to break into that industry and that...

...was something I was really passioned about writing about. It came easy to me to write about it simply because it's something that was secondhand in terms of them out of research I was doing about it. I was reading a lot about it, so it is easy to write about it. Echo your sentiment here around right what you know. So I completely agree. So I think this marks a great segue to also discuss the technical portfolio project, the portfolio project that has more of a bang for your buck when it comes to landing at data science job. To answer this, though, I want to segue towards the hallmarks of a great data science portfolio. But the answer this, though, let's focus first on the top things people need to avoid when creating portfolio projects. So what do you think are the top mistakes and misconceptions people have when they work on their own portfolio projects? I've looked at hundreds of portfolier projects through coaching all these people to Fang jobs, and here's the four big reasons people's portfolier project suck. First of all, they pick a boring idea. Humans they love a good story. They don't want to see something that they've heard of before. They don't want to do something boring right like then, iris data sets so boring. It's about flowers. See Pole length, pedal length. Okay, I don't know a shit about flowers, and most people don't, unless it's Romantically, but you know what I mean. Like Iris, I don't even know what the hell that is. Right, like to bought any company of some for right, right exactly, unless you're a worker be or something like that. Yeah, yes, so first we gotta tell like we got to pick an interesting idea, something that just as a human like you, will find interesting. These days, Ukraine is a hot topic, an elections a hot topic. Dating is always a fun story, or talking about food. Most people would love to read something about what are the hundred best restaurants in New York City, using yelped data? You don't have to be like into that niche to find that interesting. So I think this, picking a story or an idea that's already interesting to someone at a human level is a first good thing to do. Second thing of where I see people go wrong is they pick a project that doesn't visualize well. So here's the thing, and we'll get into it a little bit later, about cold emails and how do we get noticed, but I think one of the best ways to show off what you've done is a visual because a photo tells a thousand whatever that quote is. You you know what I mean. If Thou would you know that one thousand photo tools of thousand words. I think right. I was about to butcher and say like a thousand stories tell a photo in a word. You know you you understand visuals matter. People love visuals. A good gift, a good infographic, goes a long way. There's a reason you see that go viral on Reddit, on our data is beautiful. There's a reason you see those infographics go viral on tech twitter or instagram. People love seeing that. They don't want to see the seven herd lines of data cleaning you did. They want to see after you clean the data, like, what the Hell was the result? What's the cool thing with the takeaway? So for those data journals and type pieces where just investigating a data set, have what that end visual would look like in mind, or that end to blow dashboard, that end product. What would that look like? Because if you don't have one, it makes it a little bit hard to convey that I built something cool. So one way I exemplify that is what say your side project was solving this intractable problem of does p equal and P right? This is this would win you touring award and Noble Prize. It would break up cryptography like this would be one of the biggest things you could do in the next hundred years if you could solve this problem. But let's be honest, what does the visual look like for their if you just went to a recruiter, an average technical recruiter, and you said, Oh, I worked on this pequeals on pe problem, I solved it, they wouldn't know, they wouldn't think to fast track your job. They're fast track you. If you said, Oh, yeah, I want a touring award and I want a Noble Priest Prize and I'm world famous. But those are secondary signals because they know what a Noble Prize looks like. They know all those things. So that's why I'm just trying to like drag it out and basically show you like wow, like even the solving is really tough problem that would like break all of cryptography. Isn't a great portfolio project right, because you can't visualize it. And now I'll give you an example of something that you could visualize. I saw this great project about what are the most common brands shout it out in hip hop lyrics? Someone scrape all the music lyrics from Geniscom and then analyzed and saw what. You know, did they shout out Mercedes bends they shot chevied, the shoutout poor Schlamborghini, and they just visualize that. And there was this really cool histogram of all the different brands with images of them and how often they're showing up. And then they did that same analysis by rapper. It's like, Oh, you know, rapper drake loves his law Ferrari, but in all of hiphop in general and the West Coast hip hop loves they're Chevy and Pala. That's like bumping up and down right whatever it is like. You can see how that's such more of an interesting story and flashy. It's easy to communicate. And then, as a hiring manager, let's be honest, I'm a human, I'm like, Oh, how many lyrics were there? How is it scraping all that data? Oh, look at this great visualization. Oh you made a to blow dashbore where I can filter by song genre to see what brands are shouted out. That's really cool. Like I want to see what's different in pop verse Hiphop, verse country. I feel on country people talk about their Ford pickup truck all the time, but that's not what we're talking about in hip...

...hop. So that's an example of thinking about the visual and going with the visual first, because humans are visual people. Recruiters are not super technical, so they can't look at lines six hundred and be like Whoa, that's such a clever model, or wow, you clean the shit out of this data. They're not going to ever be like that. They're just want to know what happened in the end, and in the end usually at a picture or photo or a quick video. Is How you summarize what happened. Then the third main reason I see people mess up their portfolier projects is they're just not finished. They're half baked. It's sitting on their laptop, they didn't put it up on Github or they worked halfway through the data cleaning, but they never visualize the end thing because there's too many data issues. So they could never make that interactive dashboard. That's a big miss. And the fourth big reason and final reason that I see people make mistakes with is even if they're finished and they have a visual and they picked an interesting idea, and this is the hardest one and honestly, if you just don't ask of the three you're already in in decent shape. But the fourth one that would really put you over the Hump. And really make you stand out if you can show that your project had impact, that you could show that other people cared. Because, ultimately, why are we in data? We're trying to find insights that will drive business, that actually impacts a business, or we're trying to find insights that will actually shape society, or we're trying to find something novel, we're trying to improve profitability or prove people's lives using and analyzing data. So people if you build this great project but you don't quantify the impact you've had, like Oh, it went viral on read it, or my project got six thousand get hub stars, or I had four thousand users run my model, right, if you just put it out there and just it's just like yeah, I didn't save anyone any money, I didn't do anything, it's going to lack impact. So that's definitely the hardest one. But if you can show like, for example, and we'll talk about my own personal story, but I use user counts all the time to quantify like, Oh, I built something, but like it got two Tho users. So people actually cared because people actually use my data product and data software that I wrote. That shows to a recruiter like Whoa, I see this person made something and it's interesting to me and it got done and it helps someone else's life. If they can do that, I'm sure at my own business, when I hire that person they're going to be able to do the same thing. That's really great. I'm excited down back all of these with you as well. But before ex setting levels here on the flip side, we looked at help for foil your projects fail. If you want to flip and say what a of the key principles for every great portfolier project should have, honestly, just flip it. So just really quickly running through it. First, pick an interesting idea. Second, make sure that idea has a good visual and have that visual in mind before you even start working. Three, make sure it gets done, and done doesn't mean that it's like all encompassing it's done every single aspect, but it just doesn't feel half done and that there were some like endpoint you reached, even if it's like a midpoint, as long as it feels like a natural ending spot, you're good and for see if you can demonstrate impact with it. Make sure you put it out there into the world, because even if it just I didn't save anyone money, you didn't save anyone time, but it went viral and Hacker News, or it went viral and Reddit. That's impact. That's great. So can you tell me an example of a portfolio project that you worked on in your career that you felt were like really highlighted these principles for you? Absolutely, in college, in my second year, I for background. Actually, I was a DJ and high school and I loved Mixing Hip hop and Bollywood Music and I've just always been passionate about sharing my own musical taste and, quite frankly, at I'll like. I think my taste is pretty damn good and I wanted to make a gain to quantify that feeling right. So I was sitting there in my dorm room and I saw people playing this fantasy football, this game where people put virtual points on different football players and their real world performance in the NFL was tracked, and then that's how you would score points too. So if you were good at a saying football player talent and good at predicting what teams would do well, what players would do well, you could show off your own skill in this game and get clout from your friends. And I was like, Hmm, could I do something like fantasy football but for music? So I started with this website called rap stock, I oh, and it was kind of like a stock market for rappers. So what it was is that, using data from spotify, I could see what the popularity of each artist was in near real time and assigned them a stock price or a score, and then I let users long or short different artists, so that if you thought you had a blue chip like drake, someone you really believed in and you really loved and you thought was one of the greatest, which I'm a huge drake fan, so I was all in on drake, you could invest in drake in the stock market. And now, if you saw someone who's a one hit wonder, I don't like that guy, designer, if you ever do that Song Panda, I thought it was really whack. You could shortit, that person and you could say, you know what, I'm going to short this person, I think their score, their popularity, is going to drop off in the next few months. So I built...

...this game and I grew it to twozo monthly active users. And here's why it made for such a good portfolio project. Well, actually, let me tell you. The end result first and then we'll break it down. So what it did was it got me to facebook's growth team. Growth Engineering, or growth hacking for those who don't know, is about using data, Av test rapid experimentation to grow products. It's a very like quantitative, data driven approach to building products where, you know, at companies that are running at facebook scale, that's how they figure out what our bottle next. How do we like grow the userbase faster? It's less of like ideation and like someone coming up with a neat feature and is a lot more like instrument everything. Collect every data point and run one thousand eight tests across a hundred million users. My own project expose me the growth engineering I like learned about the field because in the process of growing the project to twozero monthly active users, I had a lot of issues with tension, with user attention. People would drop off, they would sign up and then drop off, and I learned a lot about growth and I got to share that story and share that in my cold emails. I got to share that. Hey, look most all their college kids. They don't know about growth engineering, but I know all about user analytics, collecting data and using that to build a better product through data. I ran my own Ab test on wrap stock I ow and that's how I help grow it to two thou monthly active users. So this was such a good story because it showed people on facebook's growth team that I could alreaddo the work right. I didn't tell them, Hey, take a bet on me. I didn't just reverse some link list. I mean, yeah, sure, that was part of the interview process, but it was more than that. I showed them why I'm a good fit for their team and I'm already doing the work that they're hiring for. And ultimately, going back to why are port folio projects so important, companies don't want to take a bet on you. They want someone who's proven. That's why they look for people with passwork experience. So if you don't have that pass work experience, you don't have that Master's, your PhD, what do you got to do? Well, on your own, you show them well, even without getting paid. On my own, I'm analyzing day are are building models on these open data sets. So why can I do it for you? That's why it works so well psychologically, because you're just d risking the idea of you working at that company. And the more closely you can match the type of work they do in scope and technical difficulty in domain, the less risky it is for them to hire you. And that's ultimately, when you make it very less risky, that's how you get hired. So, going back to why did this project work? Well? Amongst the framework, I'll like reiterate I didn't pick a boring idea. I talked about hip hop, I talked about Drake, I talked about how is a Dj and I love music. Most people like music. Most people have heard of drake, they got into the story, or at least they've heard of fantasy football. Second, the project visualized well in the sense that I had a functioning product that I could load up and show them in about thirty seconds. I could pull it up on my phone and they're like damn, like you built this. Tell me about the text stack, tell me about the spotify data you're scraping, tell me about, I saying assets, that'd see a graph of drake going up and down. There like how did you build this? Third, it was finished. I mean sure, there were a thousand more features I could have added, but I at least got something out there that I could pull it up. I didn't say, Oh, yeah, it's on my github read me or something like that. I was like here's the damn thing. And fourth, it had impact. It had two thousand monthly active users. I want to entrepreneurship competition. They could really tell like whoa this person really use data to grow their user base? That's all what growth is about. They've clearly done the thing that we're looking for. That's why this project works so well. And sure, I got lucky by using this to get on facebook's growth team, but I'll be honest. At all I use the same story at AIRBNBS growth team, Uber's growth team, snapchats grow team. I use it for companies that weren't even hiring for growth positions, because when I talked to apple, I told them how I love building products for consumers and I pulled up this consumer product. When I talked to alternative data companies, I said look at how I'm scraping spotify data and using that a price artist. You're an alternative data company trying to, you know, scrape data sets to long or short stocks. You're a hedge fund, you're one Wall Street. You're doing the similar kind of work. That I'm doing here? Don't you want to hire me? So this is one story work in so many different ways, and that's actually what happens when you build a really fleshed out medi portfolio project. It doesn't just work from one kind of job or one kind of industry, just sets you apart for a whole variety of stuff. That's really awesome and I love that particular portfolio project because, as you said, especially at the end, there's so many dimensions to it that are applicable to different types of jobs, whether that's product management, let's say, at a spotify type company, or even, as you said, they alternative data, even if you're like working on stock market data. So I think there's a lot of lessons that can be extracted from that particular project on how to architect or choose which story you want to pursue for a particular portfortial project. So let's sum back that project even more. I think I want to harp on especially on the scope creep aspect of it and making sure that we do not go like that at any costs. People actually finish their project and it's not half baked, and I think scope creep tends to be a big aspect of this. I think a lot of aspiring practitioners tend to fall into the trap of...

...scope creep, where they don't necessarily have a good idea of where a project ends, or they go through this analysis paralysis and don't know where to necessarily start. So can you walk me through the process of developing this project from ideation to end and how you avoided scope creep and make sure that you had something done and that you were able to rate absolutely so you're right that people see this, they get inspired and they try to set off on their own big journey, but I think that it worked out for me and that's the story I can tell. But when I was starting out, I didn't set out to make a start up that grew two thousand monthly acting users. I just wanted to build something cool and something small, and I think that that's something that people mess up. Maybe some of us are like superhuman and they're really disciplined so they can do this extra folier project work after work or after their internship or after studying in college, but man, it's hard to find time and I'm not that discipline, I'm not that smart. That was my first big coding project. I hadn't done much like that before. So what you got to really do is be honest with yourself and almost always, at least in my mindset, I never overdeliver. I almost always under deliver. I always think I can do more than I can actually do, and I think that describes a lot of people. So what I do is then set very small goal posts, very small milestones. I make an MVP, and then I try to cut down the MVP even further and then further and then further, and I think that's what people need to do, because I'll give you an example with my own wrapstock story. One MVP could have been can I collect data from spotify every twenty minutes on follower accounts for five hundred different artists and download that and say that in a CSV? That itself is like amazing project, but it's still something you can talk about, like, Oh, how do you make a running job collect data? How do you talk to an API? How do you save that data? And I could have done a very simple visual where I just visualize that data on a website. That's a very simple webdev thing. That itself would have been like an MVP starting point right and that's already Medi and interesting, like Oh wow, you're visualizing spotify data. I don't know how to see follower accounts otherwise, but you did that so that that could have been the story itself. And then I added user features and then I tried to grow it. But I just trying to say that whatever idea you have for your portfolier project, see if you can half it and then half it again and then just try to work towards that smallest piece and then only when you hit that. What ends up, I find happening, is you get momentum. You were like, Oh damn, like I did a quarte of it, this wasn't so bad, and like it's pretty cool and my I showed my friend and they think it's cool too. Then you get after it later. The harder thing is you set on its big, ambitious goal and then you can never show anyone, you can never get that positive reinforcement. You yourself can ever feel good because even if I did half the project, you're like shit, that's only half the project, like I can't show anyone this. So that's a really good way is just to cut the scope, like really make it simple. And a second big thing is work on something you're actually passionate about. Like I loved music, I loved hip hop, so on my my own it wasn't even that big of a deal to like look at Drake data because I freaking love drake. So I want to understand that kind of data set. So I think that's another big thing. Is Like Hey, look, if you pick something you like, shorts still going to be work, but at least it feels a lot less like work and a lot more like fun and then you can get in the zone and you can get in that deep work mode and really get it done. I completely agree, especially on this MVP style thinking. I think, even more broadly outside of the data science portfolio project, in any professional setting, I think having that mentality of up, I want to develop the single best, most easy version of this project possible that can get me that first win and then iterating from there. It's probably the biggest secret towards unlocking like much faster success than starting off on like massive projects and then getting stuck at the beginning and then the middle. Yeah, because I've coached so many people and I've just seen so many smart people be like, Oh my God, I'm going to do deep learning on this thing. I'm like, Dude, if you can even just get the data set on your computer and store it in a database, that would already be pretty impressive because, hey, it's a big data set and you have to write these chrown jobs and you have to write some basic sequel to analyze the data set. Forget about deep learning. Like even that would be a good project. And I think that people, again, when they look at what they think is cool, they overcomplicate things and I almost they like make it simply easy and almost make it a little stupid, like really, your project was just making a data set. And let's be honest, if you made a cool data set and uploaded on Cagle already, that could get traction. Like if you can find a Datas like a data set that people don't know exist and you can just pull it down and store it like that itself can be a good project. And people don't think like that. They think like, Oh, the only good project is this really complicated deep learning, and it's like no, no, no, so just even at the simplest thing, make it so stupidly simple, so stupidly small, because I guarantee you you're harder on it than you should be, and even that stupidly small. Like I just made a data set of drake analytics. I'm sure that would have been a cool project itself and people would be like, oh well, check out...

Nix Repo of drake data. Like I would love that. I know there's enough drake fans that would kill for that kind of data set to a hundred percent. So, taking this concept of how you approached the particular project on the rap stock data and if you want to abstract it and generalize it into a framework for how to find great Datas head scrape projects, what would that framework look like for you? I think one big element at a framework level would be working backwards. So now that you know what makes a good portfolio project, that it's an interesting idea, that there's a visual that you should get it done and that there should be impact, before even like write a line of code, before you even try to analyze a data set, if you can't even think about that visual what would do you even try to visualize? Then it's probably not going to be the world's best for folier project. So same way if, before you even write a line of code, you're like, Whoa, this is like six months of work, you already know it's probably not going to get finished. Let's be very real. If something looks like it's going to take six month, even one month, even a weekend long project sometimes turns into a two weekend project. And let's be honest, one weekend you feel really motivated and then the second week and you're like yeah, I think I'll go hiking or drinking. So even a weekend project might be too long. It really it might be just like Yo, what can I get done on Saturday in eight hours, so that if it goes over, maybe I'll use Sunday too, but let's just do what can I get done in eight hours on a Saturday? I think being very real like that and starting with this criteria of like okay, well, if I know it needs to get done, let's make it laugh. will be small. I know it needs to be visualized. Let's already have an idea of what that visual would be and will fill in the details or fill in the histogram later, but we already know it should be a histogram with this on the x access this on the Waxis, and this is what we're visualizing. And then, same way, you say there should be impact. Before I even start on wrap stock, I know, Oh yeah, this probably is not going to make money, it's probably going to need users. That's probably going to be the best way to quantify my impact. Or, same way, if I was just making a data set of music or data set of drake, I would already know, like, Oh shit, the way to quantify that would be github stars or Cagle downloads, like, Oh yeah, I made a data Cago that had five hundred other people analyze it, like you already know. That's and metric. When you start that, then you get a lot more clarity in building out your project. And most people don't do this hard work on the get go. They're like, Oh yeah, like it would be cool to work on music, and then they float around for weeks or months and they don't have that clarity like, Oh yeah, I'm going after users or get hub stars or dollars or time save or being number one uploaded post on hacker news. So how would you adjust that framework, given the different roles someone can be applying to, whether that's a data analyst of growth engineer or data scientists and bion list machine learning person right. There's a lot of different roles. In the data space. Is there any adjustment you apply to this framework? Yeah, I think the framework stays the same. Oh, I should have mentioned one more thing, and this is actually how it differs. When I say find an interesting idea, that is one great way to go about it, that you find partially interesting, like I love hip hop and music, so I started. There a second framework where you just swap out. Instead of picking an interesting idea, pick an idea that matches exactly what your dream job is going to be working on. That's the second way to go about it. So still make sure it's done and has impact, but instead of picking an idea like, oh, I want to analyze drake, if you know you want to be a machine learning engineer at a self driving car company, make sure, and they talk about how they're looking for someone with experience with computer vision data sets. Make sure your project roles around computer vision. I really driving, or at least some level of perception. Right. It's not like a transfer learning project where you're generating art. It's something like, Oh, I'm trying to understand what's happening in the scene and ascribing labels to it, because that's a lot of what self driving looks like. So I think that's the other way. It's still the same framework of making sure it's done, making sure it's small, but it's like starting with the end result, where the end result is like, Oh shit, I want to have this job. That's the end result and this is exactly what that job looks like. So that's how I change it up. For a data analys data scientist, if you look at it, any job description and if you're not sure what a job looks like, go look at ten machine learning engineering job or ten data science jobs and you'll have a good sense of what are the skills people look for and from that you can reverse engineer, like, oh shoot, I should make sure that I can check off five of the ten boxes with that project. And do you think that in some sense, like as the role becomes more complex, the goal post of what an MVP looks like should also move to a much more complex direction? I think it's sort of happens, but I feel like, again, even the bar is low, and what I mean by that is most people don't have for your projects, or if they do, it's like really lame. It's on another data set that everyone's already seen on Cagle or it's something that they were assigned in their school and they're just reusing that. So that's why I say the bar for the stuff slow. And then, secondly, because so few people finish their work already, it's like, Oh, a lame project that got finished if still better than this person's really crazy deep...

...learning project that never deployed. So I think that I wouldn't emphasize complexity because if you have a chance, like you have time, you've a chance to add more complex features or take your analysis of the next level, you should go for it. But in general I don't even think complexity is the issue. I think that, like look, the crux of a machine learning engineering role is about building models and making sure that they are deployed and maintain so building a website that is running and can take an input and spit out an output, and then a website just keeps running and it's real time, so that I can see that you have some software engineering skills. That's all I really need to see. It doesn't have to be the world's craziest model or the more els craziest data set of the world's craziest scale, just even running a linear aggression and letting it, you know, take input, spin out output and keeping that up running is a great project. And the same way for data science. You don't even have to analyze a PEDA bite size data set or terabyte size data set if you just analyzed, you know, a data set with, you know, ten columns and two thousand rows, but you built a really good visual or two and you built and you found one or two good insights and you wrote a medium blog post that's just a page long that it's mostly just two photos at two visuals with some analysis or and like one interesting fun fact you found. That already is a great project that showcases your, you know, data analysis, cleaning and visualization skills, which is the core of a data science job. So it really doesn't have to play into the complexity of what a real world machine learning engineer does or a real world data sience does. It's just can you match the job description as closely as you can on your own? I completely agree with that point. I think adding on top of that, if you want to work in a type of role that has really highly complex data domains and kind of use cases like, let's say, self driving cars, having that mindset of developing an MVP, even within these data domains, actually placed your advantage just learning extent, because you're able to show clarity of thinking and that biased action and a built to provide value with simple projects rather than going all in on complex ones hundred percent. And you nailed it. When we're doing our projects, we're not thinking about biased towards action, but that's actually what a higher manager recruiter is looking for, because that ultimately tech skills. You can be taught, but like, if this person's a go getter, you know that they're going to make an impact. And if this person's very smart, has a PhD in applied math, but like barely has put out work, maybe they're brilliant, but it's questionable what kind of impact you might have in your company. So that's why it's like, look, you want to work at Waymo or neuro or cruise on their self driving car. If you just made an RC car and stuck a little raspberry pie and a little webcam on it and it just drove around your basement or drove around outside and followed a line with the most basic principles, that already shows me x your engineering skills and your ability to get things done then, like a fancy masters or PhD in Computer Vision, but I don't really know what you can build, what you can do. Yep, I completely ran this segue. So my next question perfectly around getting noticed and getting your work up there. So definitely another challenge and succeeding with a portfolio project is getting noticed by recruiters and hiring managers. So can you OFP me through how applicants can get noticed with their personal your projects? I mean huge fan of cold emails. Cold email, for those who don't know, is like where you reach out to somebody you don't know. It's opposed to a warm introduction or like a warm intro, where it's like a friend or someone in your network who introduces you to someone. So cold emails are so key to job hunting, specially when you don't have that one network where you're looking for that first job or two, or you're just, you know, coming from a different background. So you might not be the most traditional candidate because when you're applying on Linkedin or indeed there's two, three hundred people who've also submitted their resume, it's hard to pass that resume filter, which often might be automated but might not even be a human looking at it and filtering you out. So sending an email directly to the hiring manager, directly to the recruiter and sending a crisp one paragraph like hey, I'm nick, I built a website that got to two thousand monthly active users that use spotify data to price these assets and that's how I found out about growth engineering. I want to work on your growth engineering team. Do you have time to interview next week? Or I saw there was this open position. Can you consider me for it? That's going to go a long way. And when you hyper linked to that project or you just can name drop like yes, I have two thousand monthly active users, or yes, I won this cagle competition, or yes, this cool little tool went viral and hacker news and then just put that little hacker news link to your blog post that viral. A good hire manasuers going to be like, Oh shit, this person actually can do something right like that. That brings you to life. That sets you apart, and no automated screening tool on linkedin that a recruiters using is gonna know about that or be able to kind of surface that out. So sending cold emails is one of the best ways to get noticed and the port folier project is one of the best ways to anchor your own experience, because I idell you probably out of these emails where it's like hi, I...

...hear you're hiring. Can you hire me? ATCH? Is My resume? It's like I don't know anything about your why you're a good fit or what you can do. But when you're like hey, instat cart, I saw your looking for a BI intelligence analyst. I saw you had a data set on Cagules, so I but I used it and I visualized all these analytics. By the way, you're hiring for this role. Can you consider me that? That works so much better lie, and I saw someone do that really intelligently. With Sephora, which is that makeup store that's pretty popular here in north they have an online catalog on their website. You can scrape that and learn a lot about the products they sell, what prices they sell for and metadata about the products. And there that's just such a cool data set. And if you worked at Sephora as a hire manager and your two hundred applicants, but one of them sent you this email and just gave you a quick link to their to blow dashboard where they visualize your own product catalog and they found some who in sight, I sure as hell would want to talk to that person, regardless of their educational background. Orf they're five years or who years of experience, I'm like, Whoa, this person cares. And some people are like nick, that's not a lot of work to get a job at Sephora. But let's be honest, procter and gamble, Unilever, a stay lauder, they're all going to be about it too. You know, any online ecommerce retailer in order storm Walmart. They're all going to be like, Whoa, you analyze the company's product catalog and found insights. I guess you seem into product analytics and ECOMMERCE, so I don't you come talk to us here at hm or Old Navy? Probably writing a cover letter takes equally amount of time as well. Yeah, and honestly, I think these days cover letters, man, are kind of on their way out. I think, but like I hope so. The idea of the cover letter, like pitching yourself, is a good one. I think that's almost what cold emails are nowadays. But cover letter you're never like linking to something and you're never like dropping an image. But like in a cold email, you should take a screenshot your tabloa dashboard and then make that linkable so they're like, Oh, look at this image and they click on it. Of course I'm going to click on that image. It's like an image. Of course I want to see that, and it takes one second. So that's a new cover letter, I think. Yeah, a hundred percent. And speaking of clicking and linking, what are the different ways that you can host a portfolio projects? One of the different tools you can use? Yeah, people don't know how far no code tools have come along. Yeah, sure, I know how to build websites. I'm a software engineer by training, but I use webflow, a no code tool, to host my website. Right, get hub has, gethub pages, which is again you don't want to pay any money to host a website. You don't need to own your own domain name or do anything fancy. As long it's up there on the Internet and public, you're good to go. Get Hub. Read me's are also good, and I've noticed a doll like a lot of get up read me suck again. They forgot about the visual I'll give you an example. I just saw a company post on hacker news this API to turn a pdf into some an image that looks like it was scanned in okay, now, that seems very niche, like why would you want to like a good pdf to look like it was scan? But I could see myself with like the book or something. I don't want to send us someone a scan on my book, but if I wanted them to think it was a scan, I would use their API and I would, you know, turn my big three hundred page pdf into like a scan thing to make it look like a scan. Okay, it's a very niche use case, but the point was, Adell, they had zero images on their read me. They had zero it was just like here's your API endpoint and I could for the life of me find what it would look like if I could get their things scan like. I couldn't see it. Give me this and this is what I'll output. So even a company like that, it's trying to sell this thing and they have not a single example. Man, people really suck when it comes to these read me's and explaining their code. They just linked to their getthub repot and they're like, Oh, here's seven files and seven thousand lines of code, when reality it's like can you just show me one image or two? But ultimately the point is no, code tools are your friend. There's a lot of ways to host yours work for free or very low cost, and it makes a world of difference in setting you apart mm and especially on creating the aura professionalism and being able to present even a variety of projects that could be relevant for a particular employer exactly exactly. And and some companies say, Hey, do you have version control skills? Do you know get hub? But it's like, Oh, here's my Github, read me. Another one is SOT. A lot of companies look for visualization skills or dashboarding skills with to blow or power bi, but you can make a free public to blow dashboard. So it's like, okay, forget about telling you, Oh, I have a sert and to blow or I have two years to blow experience, like, by the way, here's my to blow, and then you just clicking. So wow, this is pretty cool. Yep, that's awesome. So, Nick, as we close out our episode, do you have any final words before we rep up today? Sure, I mean, I know that Adele. For a lot of the listeners at home, this sounds like a lot of work and, spoiler alert, it is a lot of work. But I think that the mindset should be like look, break it down in a smaller pieces and it's not so bad. Break it down into just being a weekend project or a Saturday project, it's not so bad. And finally, yes, there's a lot of nuance to this, like, Oh, I want to look for IMPAC and visuals and this and that, but don't...

...get overwhelmed. The biggest thing is just starting because again, as I had said in the middle of the talk, like a lot of people just don't have fo your projects, or if they do, they never made it public. So just by having your thing public, even if it sucks, and it's public, you are better off than most people who just list something on their resume but I can't ever prove that they ever did it because they don't have any place for me to even check out their work. So I think that's the final word I want to leave people with is that the bar is surprisingly low because, again, I hate to be that person, but have the people listening to this podcast are going to be like yeah, this was great. I learned a lot and they're not going to make a profoider project. So just know that. That just how the world works and I've done this thing. I've done a similar talk and I coach so many people and the bar is low. So if you're about it and you're a beginner, you don't know deep learning or some fancy skills, but you have that grit and that drive to actually get this work done, it's going to pay dividends because the bar is low and most of your competition just simply will not do the work in the first place. That's awesome. Thank you so much, nick, for coming on the PODCAST. Thank you so much for having me as well. You've been listening to data framed, a podcast by data camp. Keep connected with us by subscribing to the show in your favorite podcast player. Please give us a rating, leave a comment and share episodes you love. That helps us keep delivering insights into all things data.

In-Stream Audio Search


Search across all episodes within this podcast

Episodes (121)