DataFramed
DataFramed

Episode · 2 months ago

#115 Inside the Generative AI Revolution

ABOUT THIS EPISODE

2022 was an incredible year for Generative AI. From text generation models like GPT-3 to the rising popularity of AI image generation tools, generative AI has rapidly evolved over the last few years in both its popularity and its use cases.

Martin Musiol joins the show this week to explore the business use cases of generative AI, and how it will continue to impact the way the society interacts with data. Martin is a Data Science Manager at IBM, as well as Co-Founder and an instructor at Generative AI, teaching people to develop their own AI that generates images, videos, music, text and other data. Martin has also been a keynote speaker at various events, such as Codemotion Milan. Having discovered his passion for AI in 2012, Martin has turned that passion into his expertise, becoming a thought leader in AI and machine learning space.

In this episode, we talk about the state of generative AI today, privacy and intellectual property concerns, the strongest use cases for generative AI, what the future holds, and much more.

You're listening to Data Framed, a podcast by data Camp. In this show, you'll hear all the latest trends and insights in data science. Whether you're just getting started in your data career or you're a data leader looking to scale data driven decisions in your organization. Join us for in depth discussions with data and analytics leaders at the forefront of the data revolution. Let's dive right in. Hello everyone, this is the data science educator and evangelist data Camp. If you've been following the data space, I think we can all agree that generative AI has been absolutely exploding over the past two years. From the arrival of text generation models like GPD three, to the proliferation of AI image generation tools, and the research space showing us what's possible with modalities like audio and video. We seem to be the precipice of something truly special. There's arguably no better person to talk to about this than Martin Musil. Martin is a data science manager at IBM and an instructor at generative ai dot net, where he shares a wide range of learning material related to generative AI. Throughout the episode, we talked about where we are today in generative AI, some of the main use cases we've seen emerged today, what successful business models may look like for generative AI, privacy and copyright concerns with generative AI, and what generative AI will look like in the future, and much more. If you enjoyed this episode, make sure to rate and subscribe to the show now. On today's episode, Martin is great to have you on the show. Thank you for having me. I'm very excited to chat with you about generative AI. You're working courses around it, the type of use case we can expect to see in the near and long term future with this type of technology. But maybe before give us a bit of a background about yourself. I am from Munich, Germany, originally from Northern Germany. I moved to Munich because...

...of the university Technical University of Munich the steps leading me to where I am right now. So after I studied competition science and engineering at the Technical University of Munich, I went to Airbus, the airplane manufacturer, researched on predictive maintenance. There. Next was a joint Frock design as a data scientist and ultimately then I joined to IBM, first as a data scientist, took slowly over some projects and now I'm a data science manager leading various kinds of projects with clients from different sectors and also leading a team of data scientists. The topic that we're talking about started to be very relevant for me around first a two fourteen when Ian Goodfell I came up with a vanilla gun structure, and a year later I then started talking about it on conferences Europe wide and I saw sort of like blurry but also in a clear vision the impact of generative AI. This led me in doing and creating an online course, having a newsletter, giving as kinds of impulse, some kind of workshops and other companies and here we are. That's really great and definitely you mentioned here, you hit the nail on the head generative AI. This is a space that has been very exciting to see, especially over the past two years. You know, this space. I've been following quite a bit over the past few years, and this space is very exciting, you know. For image data you mentioned here, it's been eight or nine years almost since the concept of guns or generative adverge sale networks were introduced by Being good Fellow. For text data, it's been like almost five years since the transformer was introduced, and if you start counting from fourteen, which is when guns were first introduced, I think in less than a decade we've gone from relatively brittle tools that create relatively low quality outputs that need to be trained on specific data sets, to universal models like DOLLI two and GPT three that can really do all inspiring image and text generation tasks. Now we're seeing the same being applied to video and audio as well, which is going to be something that's going to be very exciting in the next two years. So I think this really begs me to ask where are we today the generative a higher revolution, and where will we be...

...in the next eight or nine areas. That's a good question, so looking at where we are today, I think also that the evolution of gennative AI is quite strong, but I think we're still very much in the beginning. For instance, if you look at image generation created a lot of attention and many very jaw dropping results have been produced, I still think even there is a lot to improve, so we can still observe some kind of artifacts looking at the whole canvas, not always it is being executed like good enough. I think there is still quite some potential, especially like also looking at these hot models like from Dali, mid Journey, Party Ai, etcetera. There are plenty of them. And also looking at an LP applications. Now open AI has launched a whisper which is really really good translating languages or understanding languages. What has been said. But even though I see I quite quickly could identify errors occurring when it's instance, come to a switch of of a language, etceterately, and I think this will iteratively improving. So looking at the next eight to nine years, First of I would like to say that just recently, just before our interview, I read that Gardner has estimated that by twenty thirty that synthetic data will completely overshadow the amount of real data by AI generated data. And so further applications that I'm seeing in the futures text to video, which already made her tried it out, and I think Google also gave it a chance. But of course there is like lots of more potential, longer videos, higher quality, maybe more interactive. Other applications I see is improved versions of three D object generation for various kinds of product development could be architecture could be interior design, etcetera. And ultimately also virtual worlds here like the meta versa big steps and and all of these virtual worlds. We don't want to make everything I manually, we want to generate them, and or I have a generative AI supported on the like application side. I think...

...also there are like two more aspects. Actually, I think so the one aspect I see is more on the services side for companies because the one things are like encapsulated applications. But then also you know, having it a bit more productionalized in the industry, and I think at the moment, generally speaking, I don't think that companies have integrated generative AI much into their existing services or have created much new services. And many companies are frankly not even aware of generative I. And looking at all of these applications I see, like in law, healthcare, banking, marketing, education, I see an innumerable amount of possible applications going to like simplifying contracts or image generation for maybe some customized product packaging, etcetera. On the service side. And then also I think data sets can be improved a lot. Enhancing data sets in general as a next step that's very exciting and I'm very excited to deep up with you on the use case is and kind of flush them out even more. But maybe let's start with the fundamentals first. Just for the audience, walk is through the definitions of generative AI and how it differs maybe from traditional machine learning. So generative AI, I think, i'd say it's main task is to generate data, all kinds of data. Yeah, it could be image data, it could be video data, it could be text data, time series, three D objects, etcetera. So literally, whenever we have some kind of data, the generation of it is, I would say, in the realm of generative AI. I would then also say there are like three main tasks. So the main task is generating data, then transforming data from one style into another maybe or domain transfer. And then thirdly I say also data enhancement is an important task, and it differs to traditional machine learning. It generates, and traditional machine learning is more about discriminating data, classifying or or some kind of regressions, dimensionality reductions, reinforcement learning, these are all Yeah, it's more about discriminating or choosing deciding on the problem at hand, versus...

...generating a data here. Okay, that's really great and kind of you know, looking back at GP three and Dolly two, I know the views are completely different solutions or like two different models completely. But looking at the technological trends that have shaped the explosion of generative AI in the past few years, what have been these technological trends and what has been underpinning them? Okay, that's a great question. So you mentioned just now Ian Goodfeller fourteen, but before that there were already a couple of technological trends leading up to it. From my point of view, it started with auto encoders where you can insert data, for instance, an image, it gets compressed into a low dimensional representation called the hidden the latent space, and then decoded again into a reconstruction of the input image. And this has to be learned as well. So this is actually more like a reconstruction and not true generative capabilities. It has a generative nature, yeah, but it doesn't truly generate data because what is missing is in between this interpretation between the same data before and here's where then variational out end quodas is. By the way, also I used and my researcher at Airbus with predictive maintenance. We were aiming to generate data points because we didn't have enough, especially when it came to turbulences, etcetera. We use the variational out and QUDA that has this model first time has these true generative capabilities into this latent space. Similar like an out ENDQUODA, it introduces some kind of an underlying distribution, so it interpolates between the data points of the kind of different datas that it has seen and is able to generate data also mixtures in between. That was quite exciting to see that. And then, of course I think the biggest technological impact was what we have mentioned already guns generative adversarial networks, which are two for multiple networks basically learning adversarial to each other against each other. While one network is trying to generate data, the other net work is trying to detect if it's a real or fake.

In this fashion, they're both getting better throughout the learning. Throughout the training period, these models have shown great, great results. Was probably the biggest step. And then of course you're good fellow just introduced the first vanilla GUHN. Yeah, the blueprint of it, and multiple variations of guards have been developed. Big guns, cycle guns be partied guns vasa. There are innumerable I think if you go on archive dot org here for all these submitted papers there you can already see, like I think, something in the eight thousand, nine thousand papers taking various kinds of guards into account and come up with their own variations. So there's a lot of attention research attention also happening there. And then you also mentioned diffusion models, which sort of I see as a next step in terms of image generation, where they introduce a noise to an image in the learning phase and then really was the noise step by step and generate out of this at the end or in the middle of those white noise, and they generate than the really sharp good pictures computationally sort of also leaner. And lastly, I think an important trend also introduced, especially in a natural language processing kind of applications, so text of voice sequential data in general are the transformers where we have, for instance, the GPT series from open Ai, and there is like AUTPT I think one, two, three, and they're working on the fourth version GPT standing for a generative pre trained transformer where they also have an encoder and a decoder. It's trained SAMI supervised meaning it has first unsupervised so it doesn't have any labeled as its training. So it's the models are supposed to understand the problem at hand and try to solve it without labels, and then it's fine tuned with labeled data, which is also quite powerful, and it has these attention mechanism to ultimately lift Rickard...

...neural networks, etcetera. So this is huge algorithmic steps happening here. But also computationally we have a huge leap frog. Companies like open Ai there backed by a Microsoft there have almost like unlimited amount of computational power, and data storage is getting cheap, and yeah, I think all of this comes together into a great time, and that's why it's almost impossible to keep up with all of the trends and technologies happening in generative AI and achievements. And it's just a good time right now. It's definitely a good time. It's crazy seeing just how fast the state of research is going in this space and just how quickly we're moving through different modalities. Just like two years ago, GPT three came in, everyone thought, okay, we're hitting a next peak in text generation. Then DALI two came in, we're hitting a next peak in image generation, and now with Meta and Google releasing next to video models, it seems like we're heading to the same space and just a few months maybe, if not just a year or two in the video. So, now that we've covered the technology trends that have shaped the generative AI revolution, I'm excited to talk to you also about the actual implementation and operationalization of the technology in the wild. We've definitely mentioned a lot of use cases at the beginning, whether you know, image generation for creative teams, contract generation for you know, procuring teams for example. What do you think are going to be the most impactful use cases that we're going to see from generative AI in the short term? Okay, so the most impactful use case of generative I. Yes, what just recently happened was Alpha fault. I'm not sure how much you are in the picture of that, but it is basically it's along this Alpha series of Google Deep Mind where they have first came up with the Alpha Go beating SEDLT this like that, sort of like the World Champion, and then there was other Alpha models and now or Alpha fault. It's not the end because then they also came to alpha tensor, but now especially alpha fault is interesting because what alpha faults aimed to do or has aimed to do, is to unfold proteinstruct Yeah, there's a huge database of proteins and what research...

...scientists have issues or like had big problems with us finding the right folding of these proteins. And alpha fault within a very short period of time I think like a year since it started has basically almost unfolded all of the proteins that that I'm known to us so far in the universe. It's very, very impressive. And have open source that for everyone to use for every researcher, maybe just as an aside here from a generative AI use case. Alpha fold model generates protein structures and by doing so has been able to uncover a lot of protein structures from molecules. Is that correct? That is correct exactly? So I think this is when it comes to impact. This is a huge impact. It's a gift to humanity, as I heard in this space. Yeah, I'm not an expert in this field, so I cannot talk in great depth there, but other things that I have seen just recently is and also we see a production realization here of Google Pixel. They have included this translator mode. Just a friend has shown it to me. How great it works. Turn on this translator mode and you can literally in time talk in different languages to people and then switches on the side and it waits for the answer and then it switches back. And so this is I think it tears down all the language barriers. Feels like it just literally The next step would be some kind of a Bible fish like from this movie The Hitchhiker's Guide through the Galaxy where they stick the fish in there in the ear and then they understand basically every galactical language. So and I think we're not that far away from that. Yeah, well wind not having like some kind of like apod device that listens in and translates it into your ear. And I think it's really we're really not that far away from that. And I think there was a great device then one other thing that has a lot of impact at least I think that. And also what I'm observing is applications from GitHub co pilot. When you start coding, you start with a certain command with the hashtag, and then you write the command that you're going to code. But as...

...you're writing the command already, it suggests you the right piece of code, and for administrative code more or less or some functions that you want to include that are more or less standard. It's getting actually quite good, and this administrative coding time gets close to zero. You just then accept what is providing to you, or you continue with the command and then it provides you maybe a different piece of code and you can implement that. And this is has also I think a great impact. It reduces development time significantly. And also on this note, I would like to say, I think that prompt engineering in general will be a very important skill in the future. We see it. We have talked about it talking about Dali. The prompt dictates the quality of the image, and prompt engineering itself becomes like this important skill to that to know how the model reacts to what I'm giving, what I'm putting in. And it's similar with Google Copilot in the way I'm right the comment that's also dictating the quality of what's coming out as a suggestion for quote. So all of us, like developers, I think I have to then take this into account in the future. These three things I think are just on the top of my head. There's a plethora of different applications that have a great impact, but visa for me quite substantial. That's great, and especially co generation and co generation and creating these tools that remove a lot of the boilerplate task from creative work or you know, technical work, I think are going to be very useful. You know. I think one thing that's difficult for meat Square is maybe the business model of generative AI and how it will how will we see it to play out, because in this space there's a lot of research, a lot of hype as well, and it's difficult a bit for me to reconcile between the high and sustainable business models for generative AI solutions, especially when it comes to stuff like image generation right for example, looking at the parallel over the past twenty years when social networks were invented, it took a while for social media companies to really understand how to monetize, best...

...grow and scale their services. And you know, this is regardless of the moral implications of the business models of social platforms. I think there's a lot of criticize there. I think we're going to experience similar growing pains for companies looking to monetize generative AI technologies. So what do you think the path will look like and what are viable business models for generative AI solutions. So I said in the beginning that I think we are still quite in the beginning. I think a lot of fields are opening up, they haven't even opened up, and I think there's going to be a lot of potential to tap into. For me, the most dominant idea is it a two potter. So maybe first talking about my current position at IBM, I didn't didn't mention that yet, but I'm a member of the Technical Expert Council, and the Technical Expert Counsel at IBM is like a global Tech Council Expert Council, as the name says, that tries to understand what are the new trans new technologies on the horizon that could be lifted, shifted into the industry and could be used. One of the things that we have also been approached with is some kind of personalized packaging. It doesn't have to stay with packaging, but this personalized approach I think is quite interesting. So with the personalized packaging, the idea was that I cannot name any names, but that people could upload an image that they like a lot. Yeah, maybe of their family, of their kids, no harmful content that needs to be detected to. It's also a bit there are some challenges around that. But and then they could choose how they wanted to transform this image. So here we talk about the second task of generative eye transforming style transformed. This could be into maybe some kind of Halloween picture or maybe like Christmas or Easter, etcetera. And then transforming this image accordingly making it funny, suggesting maybe multiple alternatives, they choose one, and then this gets printed on some kind of package of a product and gives it a personal life touch. I think this is how companies can drive a bit more engagement in the future. On one hand, on the other hand, what I see a lot is data gener deration. I said that as...

...well, that why not having some kind of data generation as a service. I see also in our with our clients that many clients are missing data. If they are missing data on one hand, because it's massy, they're just scraping it somehow together, or they haven't thought about a proper data strategy from the beginning. Enough, all the problem at hand is just it's a very rare problem and they just don't have the data, and think about some rare diseases in medical images that we have privacy issues we have just it's a rare disease, so it's just not existing that often. If there would be some kind of a surface that takes into account images or like the original images as much as possible and then it generates just more similar data points there, I think that could be a very valuable It's also not very easy because this needs to hit a certain quality standard. If the generated images are not or generated data points are not good enough, then the machine learning pipeline that is then plugged at the end for the tacting the rare disease, cancer, etcetera, is then actually getting worse. It's not getting better. So there needs to be a lot of engineering around that. But this is one use case that I'm seeing. And then of course regarding the metaverse generation of various kinds of worlds. Lastly, I would like to say, and Vida is working on this omniverse or they're building this platform for virtual collaboration, and they take many like small features into account. For instance, small things like I think it's called a vaccine that product where when having a face to face meeting through like a virtual meeting. Then they make sure that you have eye contact at any time, so even though you're not you're looking on your screen, your eyes are still adjusted in a way that it looks like you have eye contact and the call itself feels much more immersive. Yeah, these are the things on top of my head. That's really great. And given how this aspect of the eye is evolving, I do imagine a key determinant of you know, how the business models will play out and how this solutions that will work with...

...in the future will play out is whether the technology will remain closed source or open source. Right looking at GYP T three dally to their closer solutions, Right, you need to initiatively pay for an API with open eye to be able to use them regularly. But we've seen tons of movement in the open source community with tools such as stable diffusion, mid journey and stuff like that. How do you see this kind of dichotomy playing out in the future and which model do you think will most likely succeed in being able to create becoming maybe the de facto solution for generative AI in the future. I think it's not that one will win over the other when it comes to solving as some kind of a specific task and companies want to maybe monetize on something specific that they have solved, I think they won't open source it, or maybe they won't. What I'm seeing also is like, for instance, open Eye has they have open source whisper their model for the translation of the data for the understanding of its automatic speech recognition a s R. And they have open source the code. They have open source the model. I can go to the GitHub and etcetera. But what they haven't open sourced is the data actually on what they have trended on. And this is quite interesting because to understand, there are these laws from Google Deep minds. These are called the Chinchilla laws, and these Chinchilla laws when it comes to the natural language processing models, we have also a little bit distinguished between these different sub fields within generative are now talking about the generative or the large language models within the NLP space, and these large language models, one trend that we see is that they have more and more trainable parameters. Here we talk half a billion than they have multiple billion, and the trend just goes up and up. Because I mentioned as also earlier that the computational power is almost unlimited spaces. The algorithms are getting better, But what sort of is a problem. This is not what the Chinchilla laws are saying, is that in order to train these billions of trainable parameters, you need to have the right amount of data. And actually already GPT three is undertrained regarding these laws Chinchilla laws. So now that is...

...at least the rumor that open Eye is now GYPT four that they have postponed a bit because they're just missing the data for training the compute optimal the GPT four model. And so actually that makes a lot of sense to then have this whisper data, because there's already a whisper for YouTube that takes all the youtubeatures into account and translates that into text. Apparently this is supposed to make up with the lacking data that they have. What also we need to take into account there are other bottlenecks that are more important than just the open sourcing or close sourcing of algorithms. However, to come back to your question, sorry for the digression, but I think it's quite an interesting relationship here. But to come back to your question, I think that these like Grant breakthroughs like diffusing models. I can see these being open sourced and at the end the implementation as well. That's the right kind of training on the right data is then the determining factor regarding the quality at the end. That's perfect. So there's not necessarily going to be a clear winner. In mind, there's going to be a lot of solutions that are both closed source and open source, and we'll see who kind of competes out at the end. Yeah, I don't think it will converge just to you know, open sourcing everything or closed sourcing everything. I don't believe this will happen. That's great. So we talked about kind of the applications. We talked about the technological innovation the underpinning this application. Maybe let's talk about the risks and the complications when speaking about generative AHI. Another dimension to generative AI technology that adds a lot of complication to the business model is what to do with copyrights and attribution. Think about AI generated art here in the style of a artist that is currently alive, right, who is the artist here? Who takes credit? You know? How do we think? I know this is not Maybe there's no clear answer to this as if today, But what is the overall trajectory or the thought process around adapting copyright laws to the stick oology. There's a good question...

...that also widely discussed in the respective communities. However, I think if there is a company that has build a model that can generate data, so there are different different kinds of angles to this. Yeah, but when the company decides to sell the service of you can prompt some kind of an image right down what kind of image you wanted to have in in a funger of styre, and then it gives it back to you you purchased this image, then I think this is your image, even with the with the credit things. So the first couple of like ten prompts is or generations is free and then you maybe have to pay. So I think then once you have generated an image with their model, it is your image. This is at least my point of view. Yeah, if that model has trained on an artist that is alive and claims the copyrights here, that this is not so easy because apparently that model then has trained over these images here from that specific artist, And how did the company that develops this model has actually got the data? You got the images the right way wouldn't be that they have purchased these images, because once they have purchased that they could use it for the training. Otherwise, oh, there are many many credits in detail yet because and in the detail there's the devil. So because if the artist then shows his images but then says that please don't download them, I think it's a sort of like at least a grace on, if not an illegal attitude to a train on these data points. There's no easy answer. There's no easy answer that I'm getting stuck in the in the details. But I think really might point to this that it can be solved out if we have like a clear standpoint from the end to end change. So from the artist itself, if he wants to open sources or you know, make it accessible of his images, you know, then he can't complain that they are being used for training. I think everyone has to be clear about what they want to have at the end, what's the standpoint on the usage of the data. It's definitely a social and legal pickhole that I think a lot of experts community is going to...

...have to grapple with in the future. Carrying on on the complications of generative AI maybe another component here is also AI safety. This is something you know, I safety, whether generative or non generative techniques, something that needs to be thought of consistently. A key harmful aspect or maybe an aspect of generative AI is the fact that I can have harmful impacts such as bias, perpetuation. It can be leveraged to create very convincing fake news. It can be very used as well to be leveraged to use you know, reputation attacks and reputation assassinations. Let's say, through generative technologies. What do you think are some of the ways the community now is approaching how to build safe, responsible generative AI solutions. So, first of all, I think that it should be illegal to harm reputations of people, like with this de fakes, with portographic images. This should be and it is I know in California is a little to do that, especially there is one at all regarding Californian politicians that they are not allowed to do that with them. This is the first thing, like, it should be illegal, and if it's detected, there should be consequences to that. Now the question is like how to detect fake news or deep fakes, And there are also technological answers to that on the text level as well as the image level. So in the same way a deep fake is being produced in a reversed way, you can classify if a deep fake is actually a deep fake or not. And it's something that's a deep fake, then it's not always easy to answer, like who has created it. Sometimes it's actually almost impossible. But I think where it is possible, there should be consequences recruiting that power companies, creating generity v excolutions, approaching building safe and responsible generity AI. But I was mentioning this application where we can have personalized packaging. The people upload their image and then it gets personalized the package. So one important piece was to detect what kind of image they are uploading where the policy. Where in the policy is okay, no hi lent images, no right wing or hardcore left wing or some kinds of images,...

Yeah, in these directions, and I think there is a very important piece that needs to then detect if an image complies to this policy or not. And so it's an image detection element. So what we have done is that we have flagged the images accordingly and then for what it did to a human to decide this image detection models detecting if it's policy complying or not should be very sensitive. It's it's better to take off more out more images that are maybe okay, rather than having the thresholds adjusted in a way that you have harmful images are going into the pipeline and maybe being produced and not recognized along the way. Yeah, definitely, And that's something that we've seen as well with Dolly too, for example, where it's pretty easy to trigger their content filter, which is great to see. And this is something that we're definitely going to see moving forward in a lot of these solutions as a potential model. And so Martin, as we close our chat here, we definitely learned a lot on generative AI and where generative AI is headed. Where can people follow your work or more insights on generative AI. But I have multiple points. So one point is a generative AI dot net. Go there. Take the online course that we have built, which is an online course to get a solid basis about generative I. It also touches upon a various application fields. I also have their newsletter where I'm writing bi weekly about various kinds of topics that are hot in the moment. I currently I'm writing about the text video generation in the newspaper newsletters. I always like go into like first of all, what is the tech perspective? And then what where could this leaders in the future? Okay? Perfect? And now as we close up, do you have any final call to action before wrap up today's episode. First of all, it was a pleasure to talk to you and they'll thank you very much for that. And so if you listen to this and you want to get into contact with me, I want to discuss there is a generative topics, then please reach out on LinkedIn Martin. What was your Yeah,...

...let's have a let's have a talk. That's awesome. Thank you so much Martin for coming on Data Friend. Thank you so much. You've been listening to Data Framed, a podcast by Data Camp. Keep connected with us by subscribing to the show in your favorite podcast player. Please give us a rating, leave a comment, and share episodes you love. That helps us keep delivering insights into all things data. Thanks for listening. Until next time,.

In-Stream Audio Search

NEW

Search across all episodes within this podcast

Episodes (121)