DataFramed
DataFramed

Episode · 1 month ago

#107 The Deep Learning Revolution in Space Science

ABOUT THIS EPISODE

We have had many guests on the show to discuss how different industries leverage data science to transform the way they do business, but arguably one of the most important applications of data science is in space research and technology.

Justin Fletcher joins the show to talk about how the US Space Force is using deep learning with telescope data to monitor satellites, potentially lethal space debris, and identify and prevent catastrophic collisions. Justin is responsible for artificial intelligence and autonomy technology development within the Space Domain Awareness Delta of the United States Space Force Space Systems Command. With over a decade of experience spanning space domain awareness, high performance computing, and air combat effectiveness, Justin is a recognized leader in defense applications of artificial intelligence and autonomy.

In this episode, we talk about how the US Space Force utilizes deep learning, how the US Space Force publishes its research and data to find high-quality peer review, the must-have skills aspiring practitioners need in order to pursue a career in Defense, and much more.

You're listening to Data Framed, a podcast by Data Camp. In this show, you'll hear all the latest trends and insights in data science. Whether you're just getting started in your data career or you're a data leader looking to scale data driven decisions in your organization. Join us for in depth discussions with data and analytics leaders at the forefront of the data revolution. Let's dive right in. Hi, everyone, Welcome to Data Framed. It's World Space Week, so today we're talking about using data in space research. I'm extra specially excited about today's episode because I've been interested in space since I was a kid. My grandma used to live near the jodrell Bank Radio telescope in the UK, and when I went to visit a going to Jodrell Bank was my favorite day out. Space is, of course, literally a huge topic to cover, so rather than trying to go over all of it, in this episode, we're focusing on the work of the Space Systems Command at the US Space Force in using deep learning on telescope data to monitor satellites. My guest is just In Fletcher, an artificial intelligence and autonomy subject matter expert at Space Systems Command. He has something of a tricky job since it requires expertise in physics, computer science, and military applications on top of the deep learning skills. How he manages to juggle these competing areas is unclear, so let's interview him to find out. Hi, there, justin thanks for joining us today. To begin with it, I'd like to find out a little bit about what you do. So you're working for Odyssey Systems. Again, struck a little bit about what obviously Systems does. Certainly so, Odyssey Systems Consulting is a business. We do advisory inssistance services on behalf of the US government. So I work on a contract called SDA that stands for Space Domain Awareness SDA Support Systems. We work on the government's behalf to represent their interest, managing it with a large portfolio, manages large skill acquisition activities all the way through to research and development. We do a lot of studies work and a lot of technical management. So we're helping and advising the government as they build out advanced technology portfolios. For in our case, what's called space demain awareness really concerned primarily with what's happening in the space environment and by the US government, which particular government department see working with for the contract that I refer to before, that's with Space Systems Command so SSC, which is part of the United States Space Force. Field commands within the United States Space Force. All military stuff, brilliant. So tell me just a little bit about what you do in your job. So technically I'm a subject matter expert, that's what my job title is. But my role is to lead a small but focused team of multi disciplinary So we have several astronomers and computer vision specialists, a few data scientists, mini software engineers to try to advance the state of the art in space domain awareness, primarily through the application and of artificial intelligence and autonomy. Those are two sort of major technology focus areas, and so we try to move the state of the art forward in space doing awareness. What that looks like in practice is a lot of applied computer vision work for scientific imagery as well as we automate telescopes, so closed loop autonomy for telescope control, so they're reactive to stuff that's happening in space. That's sort of my overall area. We have a sort of direct team of six and then we have an extended team including performance contractors on behalf the government of about forty now and I'm responsible for that group. Alright, brilliant. So it's a it's a pretty substantial team. And you mentioned that in addition to the data scientists sort of actual scientists as well. So how does that sort of relationship work. How is how is this team structured them? Yeah, so that's been one of the most interesting components of this job over the past few years is you know, we're out here in Maui. A lot of our team comes from the observatory at the summit up at Holiday Kala, and so we have this really interesting interplay between people like me. So I came in as a computer scientist, having absolutely no background in spaces that I was at the time when I moved here, I was a military officer for the Air Force, and I moved out here as a computer scientist and was introduced to space. One of the things that we find really that's really interesting is cross pollination of technical ideas. So we have you know, this is this is almost cliche at this point in sort of the maturity of data science as a field, but domain knowledge is really we found is the essential ingredients in applying these techniques to advance the state of the art in new domains. So like you can't just take a resonant and apply it to an image classification problem for fits, it requires a variety of detailed transformations of that data. And that interplay between the technical specialists in that domain. Think a lot of this is astronomy and optical physicists and things like that. For us and what are more traditional computer science or data science roles has been very interesting. What's been really compelling is to watch those people growth towards one another. So what we see as a lot of computer scientists picking up instrumental astronomy skills and a lot of astronomers picking up software engineering and the data science skills building and annotating data sets, things like that.

So that's been really fun. So I was fascinated by the kind of domain knowledge that you need to do these things. So can you give me some examples of the sort of skills like you mentioned optical astronomy and thinks like that. So what does that actually mean? What what skills you need that? Yeah? I mean so for astronomers, a lot of those people have PhDs in astronomy. This is like instrumental astronomys we have some people who specialize in their PhDs in solar physics. We have we have a gravitational physicist on our team. So it is it is entire PhD in gravitational physics and then switched skills. And what's really interesting is we have a lot of for from the astronomy and optical physics community. Those skills tend to be very research oriented and very focused on the production of physical hardware in the real world. That has a really interesting synergy with what we tend to think of as entirely software oriented skills. You know, when we think about data science and computer science, we tend to think about mostly about the production of software to do things that we wanted to do well in our domain. We really don't have the luxury of just pretending that data is going to come in from somewhere and then we'll figure out how to get we'll figure out how a going to clean it and to get it ready for processing. But train models, it doesn't work that way. We have to think about the physics of the instrument all the way through the trained model. And so that for us is we see a lot of the skills that are needed for that are instrumental astronomy, building the actual instruments, their analogies in people, building advanced camera concepts, really cool stuff happening in metasurface optics right now with deep learning, and so it's really about having a broad base of skills that is relevant to the domain. You're trying to apply these data science techniques too. And then of course everybody's got a Python. We use primarily TensorFlow, but the teams moving towards PyTorch now as well, you've got to build those skills as well. So that's a skill base for us. So you've got like these hardware challenges, and you've got challenges with physics, and you've got the data science and machine learning challenges as well. I can see why a lot of people have PhDs on your team. Then, so let's go a little bit more into these data signs and machine learning seals. So you said you're doing a lot of work with a tensor flow and pietorch, so but you can look at a little bit about like what you're doing with PUS is a lot of deep learning. In this case, most of our applied work, especially in computer vision is of course deep learning. We really run the whole gamut. So we have a variety of classification problems. Everything I'm gonna talk about today is in the public literature. So you can find all this stuff if you go to Google scholar and look up look us up, you'll find all this stuff. I'm not talking about anything this defense secrets today. Yeah. No, no, no defense secrets today. That's right. Everything you can find here on Google scholars To give you a few examples of the kinds of things that we're building, right, So one is treating the problem of classifying an object. Identifying an object, right saying that that thing that I'm pointing out right now is this specific satellite. To be really concrete about what I'm talking about right now, from a spectral sample, so you point a big telescope with the thing, you use a use a spectrograph to split up the light. You get this sort of two dimensional blurry image that doesn't really look like anything, and then you apply a convolutional girl network with a classification head to classify that as the object's identity. Right, So the classes correspond to like identity. There's not that many, so it's pretty easy to formulate the problem that way. So like traditional classification problems. There's an example of it, but it's got a twist, right, because you have to actually care about the physics of that imaging process to make that model work. We've got classification one of our big like our most widely proliferated families of models. It's called sat net. There's overlap with that name. Now there's other things called satinet, but we have one called satinet that does deep space object detection. So this is this is basically in rate track imagery. Satellites out in geo or beyond the point of telescope at them, they have a sort of distinct signature on the focal plane at the telescope right, and so our objective is to detect those well, so what we do there is originally we did this when we first started it, it was like faster our CNN, and then we moved on to gil will be three for a little while, and now we've are most recently of the arts deformable leader, and so we have a variety of models that we apply to these problems. That one is really interesting too, because because we have a we have to sort of host and serve inferences for this thing at scale and also near edge devices where connectivity might be really poor. We also have to do a lot of mL ops stuff, so we have that in addition to the model development work. We also have to retrain, like due to demand adaptation and retrain and of course everything in defense, we have the additional twist of sometimes we work with classified data. Of course you can't talk about any of that today, but what I can say is it presents a deployment challenge that is really unique to our domain. So that's you know, the kind of techniques that we use our traditional computer vision. We actually have a really fun paper coming out soon and we have some there's already some work published on this. Using reinforcement learning to basically like think about this is like controlling the gain of an emerging kind of neuromorphic sensing modality, so that we have some forays into deep reinforcement learning. Yeah, that's that pretty much covers our our deep learning portfolio. And then on the autonomy side, which we haven't really talked a lot about so far. On the autonomy side, we have a whole portfolio dedicated to multi agent global distributed autonomy for in that case, telescope control. So that's a brief overview of the domain. Okay, wow, so there's a lot to go over there, and I'd love to get into some of these in more detail. I hope I'm not butchering this to say too that it sounds like a lot of what you do is like telescope takes photo of the sky and then it is that little spec there a satellite or a star or just like spec of it on the lending. This is whether like the classification in is that sort of the right angles. That's that's...

...more or less right. I mean, we have we have these we call them ground based optical systems or telescopes, right, variety of sizes, variety of costs. They go down to fift k all the way up to several tens of millions of dollars, and they point at the night sky. They do have different instruments, so it doesn't always look like dots and streaks, but they do have different instruments and you control them different ways. But yeah, that description is more or less correct. We point these big ground based optical systems that stuff in the night sky, and then we we get the data back, which comes in its scientific imagery, so it looks it's very analogous actually a medical imagery. If you've looked at like the computer vision literature for medical imagery, they look very similar in terms of how the data is actually presented, and then we we then train models to do information extraction tasks from that data. Yeah, that's pretty much it. So there's a lot of kind of exciting things you're working on. But before we get into the detail of like specific challenges, us like to take a step back and just think about what the goals are of your team. So you mentioned that the field is called space to main awareness. I's like to know a bit more about what does that mean and what is it you're trying to achieve. So our objective in space domain awareness is to discover what is present in space and to track those objects across times a really reductive definition because there's all kinds of additional dimensions to this problem. You might want to characterize objects and think about what they're if they're changing across time in some way that you can't directly observe, But the general objective of the field is to keep track of what is happening in space in order to enable for example, you know, space varying nations have to send people into space. You have to keep those people safe. Commercial entities want to do business in space, you have to ensure that they're those satellites can operate in that domain or at least provide the information to the world about what is happening. And so that all falls under the very broad application domain of space domain awareness. So that's our overarching goal, is to provide the US government with knowledge about what is occurring in space. Maybe I have watched too many like Bruce Willis movies, but imagine they're just oh, yeah, there's this new spec there and we've got an asteroid hurtling towards the Earth. But it sounds like it's more about checking the integrity of existing satellites just to make sure they're still functioning. Is that sort of correct? Yes, like that, so that Bruce Willis reference was apt. I've actually met some people from NASA Planetary Defense. There is a group of people and they have the coolest job title as far as I have determined, in the United States government. They do planetary defense stuff that they do that, you know, tracking asteroids and stuff like that. We actually use asteroids occasionally as like reference targets when we're imaging things to try to understand the science of our data exploitation techniques that happens occasionally, but yeah, we're mostly focused on what are called man made residents space objects sometimes called R s O S the acronym for them, and these are that's you know, satellites, space debris, used rocket bodies that aren't really doing anything up there. So when I was talking about characterization before, that's a little bit of jargon for what you describe like health and status. How's the satellite doing right? Is it still up and running? Is it's if we're no longer hearing from it? Is it's still stable? Is it dangerous? Has it exploded? Things like that. So that's the primary focus of space domain awareness, but we also have to care about debris. Lethal debris can can destroy entire satellites. You have to brie clouds that can make entire regions of the sky inoperable. And one dimension we haven't talked about yet is really the satellite owner operators have to have this s d A, this space to be an awareness knowledge in order to decide what to do with their satellites. The most critical thing we do is do what's called conjunction analysis, which is where we make sure to satellites. Oh, it looks like these two might come into contact with another you tube should probably do something to make sure you don't, because nobody wants to lose satellite, right, and then to re warning. So you know, those are really important functions that are the data that we exploit in form. I mean, you sort of think of spaces being kind of big, and so the chance of collisions are fairly low. But I guess there's a lot of things in space now, and so maybe the chance of collisions is bigger than you think. Yeah, it's it's a little counterintuitive because, yeah, you would assume that there's plenty of space in space, there's no probability particular to collide. Well, it turns out because we only know the orbital parameters with a certain level of accuracy, which is actually a function of how well we exploit the very data I was talking think about before, Because those detections inform our ability to do what's called correlation of the orbits to to get the orbital parameters and predict where they're going to go in the future. There's a lot of uncertainty and that especially for small objects that have like solar radiation pressure effects. And so while it's true that there's not a lot of stuff and there's plenty of space, the problem is because you don't know where it's going. There's a sort of cone of uncertainty around where an object might be in the future. You have to take that into consideration and do risk reduction potentially maneuver your satellite and the off chance that it will hit you. And so if you think about these objects existing in probability space, they spread out across time, right, so they're really small and there's a lot of space, except that because we don't know where they're going, they are in effect larger, which is a really interesting way to think about the problem. And if you have better information, that's why we do computer vision for these problems. Right, If you have better information that those objects become in effect smaller, which is interesting. I think, yeah, absolutely. So maybe let's get back to the data you're working with. She said, you work with a lot of image data. So how big sort of data sets we're talking about? Him?...

Sure, we have a lot of problem domains. So it's it's imagine these is like we have. Each problem domain has a different think about it as a benchmark data set that we build, and it varies from problem to problem and maturity from level of maturity. Right, So we have R and D projects that have five frames. So we're just getting started, and we got maybe a few gigabytes to data. Right, So that's kind of small data problems rather than big data problems in that case, I guess. In that case, yeah, we're and that's like R and D, right, So we're just seeing can we build a model that does the information transformation we care about? Right? Can we just map from the data shape to the data shape and not worry so much about performance. Just let's just see if we can build something that works into end that's on the small end, and then in the large end we have one data set. It's the one for that detection problem I was talking about before. Then Neil has well over a million annotated images in it. And these are fairly large images. They're sixteen bit images and they're usually five twelve by five twelve to ten twenty four by ten twenty four, the very large format images. And so that data, I have to be honest with you, I don't even know how large it is. It's probably several hundred gigabytes at this point across and that's across multiple different cameras. Matters a lot what camera the data comes from. In our domain, It's not like if they're not homogeneous like an iPhone camera, and natural imagery tends to be, so it matters a lot what the camera is. So we have those from a variety of different telescopes all over the world at different altitudes and with different quality of cameras. It sounds like you've got lots of different teams working on different problems then, and so this strikes me as a slightly academic environment. Is that sort of correct? Is there an academic fail at work sometimes? So one of the things that we really pride ourselves in with how we have constructed this team is that we do really like full spectrum development. We do full spectrum development, so we go all the way from low technology readiness level things, and that is very much an academic environment. So this is like we are not quite basic research, so we're not doing like proof of principal stuff, but really early applied research. These things tend to terminate in a peer reviewed publication that tends to be like the tar Get success criteria for those things all the way through the fielding of operational solutions. So think about like deploying model to live ops to inform space domain awareness decision makers about problems in space. So we do that whole thing, and and and the level of academic nous, if you will, varies across the portfolio depending on what you're working on. And so one researcher might have we've had this happen, might take a research project all the way from effectively ideation so no basic research because people already emitted com nets and people already emitted spectral imagery, but all the way from ideation through a fielded concept operating on an operational telescope for the space force. So it depends on which part of the life cycle maturity you're on and what you're working on that day. So in that case, I know a lot of people listen like trying to work out can they get a job in data? So first of all, how did you get into your team? And then yeah, cur system more generally, like how do people go up becoming space data researches? Sure, So there's two answers in two different directions. For in my case, I really had some of opportunities. I was a actual officer in the Air Force for many years. I mean I moved to Space Domain Awareness as my third assignment, so I came out to Malle. Not a bad gig as far as assignments go. There's definitely worse places to be assigned. So I came out to MALLI and I was a program manager for an autonomy program. Had some really incredible opportunities with the Air Force Research Lab there, So that's how I got to this location. My background is in computer science. I was not a space person initially, so I have sort of on the job trained into that domain. If you are interested going to the more general answer of you know what paths can impractice that can I follow to work in this kind of space, There's a variety, so it depends on where you're physically located, though less so than it used to. We live in a remote work world now, But at what stage of your career you're at, a great thing to do is to reach out to to look at job rex for defense contractors working in this space. That's a really accessible way to approach working in this area. Every contractor on my team is hiring, so that's A great way to get into the field is go and look for job rex in defense. They're all over the internet. Look at the major defense primes and wie endorse any of them in particular. But go and look at the major defense primes and look for their space related job re rex. That's a good way to get into the field. If you are a PhD student today, If if you're if that's where you are in your journey, you could look at potentially doing a fellowship with potentially the Air Force Research Laboratory, so that there's a great scholars program that we how every year we host several scholars out at our group. These are usually PhD students who are interested in potentially going into this field. We have hired several people out of that program. Some have gone on to be civil servants in the Air Force Research Laboratory, and so there's just there's a lot of paths. It really depends on where you are and what your skill set is. If you, for example, today, are a budding web developer and you are interested in how can I apply my skills to do problems in this domain? You could look at one of the many software factories around the department defenseies, Google, Defense software factories, A dozen will come up. If you want to work in space, there's one in Colorado Springs. So there's just a variety of ways to get in, and we're hiring. The business looks good for the foreseeable future. There are a lot of problems we have to solve in space. And you know, I...

...gotta tell you as far as reward for the work goes, a lot of places are hiring right now. You go get a job in Silicon Valley. There are places that will hire people. But we offer I think what working in the Department of Defense offers is really compelling problems and huge, wide open frontiers. These are not problems that have been optimized over the application areas have not been optimized over decades to maximize click through or something, right like, these are new problem domains and you can come in homestead here if you want to. It's a really, really inviting environment. So my contact information I think will be associated with this. I encourage people to reach out to me. Wonderful. Yeah, so it sounds like business is booming. But you said you came through a sort of computer science background into the military and then the sort of space stuff came later, but it's also possible to go in other directions. So maybe you come from space science first and then get their most application later. The people who are sort of interested in data science background, it sounds like Python and then some deep learning skills a sort of the way forward. Is that sort of true if everyone in your team or either other kind of das assigned skills that people use. Yeah, the primary skills that you need are infrastructure as well as Python and at least one deep learning framework. So we're moving towards framework agnosticity. That's not as easy as it sounds in plain English. It takes a lot of work to support multiple deployment frameworks. But for the most part, we're not going to be picky about what kind of framework you bring to the table. I personally am an old school tensor flow graph mode kind of guy. So if you're tesser flow graph mode, that's fine. If you are PyTorch, that's fine. Obviously Python is the linma franca. You've got to speak Python, so that's that's a requirement. And then it's really expected at this point in technical and maturity, if you're gonna be working on one of these teams unless you are literally in a pure research role and that's for like people who are graduate students coming into the program. Unless you're in a pure research role, you have to speak containerization. You have to You have to know how to construct a container image and produce them at run time. You got to know at least the basics of Docker and kubernettes helm that technology stack. And you also need to be generally aware of data intensive a location systems, so you have to know about databases and stuff. No need to be an expert in it, but you got to know how to write a sequel query and stuff like that. So I say, I love the fact that you described yourself as being old school. Tends offlow since it's a fairly recent technology. Still, but you mentioned how your team's transitioning from tensor Flow to Pietorch. Can you tell me a bit about why side to do that. Well, To be honest, it was a grassroots movement, so it wasn't really something like I decided and told the team, yeah, we're gonna go, We're gonna start moving too Pietorch. It was occasionally it really started. Remember before I talked about the spectrum of technical maturity It really started on the low maturity end of that spectru when people are doing research projects and one of the prototype quickly because it's just it's it's much more approachable. The data management is a little bit easier if you don't have to have two rigorous or high performance data pipelines, so like data set generator pipelines and stuff like that. One of our researchers, it was a saying PhD in gravitational physics I was talking about before, was new to the domain, very competent developer though, and just said, listen, I it's gonna take me a lot of work to do this in tensor flow. Let me just do it in piet torches that all right, and sure, yeah, go for it. And it worked out fine. Nothing that happened, right, So it's sort of a grassroots movement. There's another couple of developers who have moved over as well, and we don't really see any need to enforce a particular framework. It doesn't seem to be that important to enforce particular framework. The only thing that we have to enforce is the interfaces to inferenced time models. So whether you build in pietworks or you build in TensorFlow. You have to present a standard ap I at run time, and it preferably needs to be externally legible. We like those represented with open API s so that we have the ability to interface with those models at run time. We find that we are very rarely inferenced time performance constrained, so we don't really even care that much about inferenced time performance. That's just almost never the problem. Our images take a long time to take they can take already seconds to take an image sometimes, and so that's just not the problem. The problem is the software engineering dimension of fielding and deploying and sustaining and retraining and you know model. So this is when other people try and use your model for predictions. That's when this santitization masters, not when you're actually trying to figure out watch the model contain when you're training it, that sort of thing. Precisely. Yeah, it doesn't matter that much at training time. It matters a lot when you're running inference models. All right, So we talked a little bit about the technical skills. Now, I imagine, since you've got some pretty serious deep learning going on, you've got the physics, you've got all the other kind of space and not some things like that that it must be fairly challenging to try and communicate what you're doing in your results to other teams, particularly to non technical people. So how do you see like communication outside your team? Yeah, so I used to spend a lot of my days building stuff. Now I spend most of my days talking about stuff. And so I'm at the sort of pointing into the spear for this communication problem and perhaps unpopular opinion here, but certainly in the department on popular opinion. When I'm talking to non technical audiences, even general officers, scs s, theselecting your civilian executives, etcetera. Yeah, I give the same talk that I do to technical audiences, exactly the same talk. One of my...

...principles of communication is I never talked about artificial intelligence in the abstract because people will always fill in the gaps with what they think that means, and that is an enormous diversity of things. So I give exactly the same deck of charts whether I'm talking to a forced our general or I'm talking to a a new developer on our team to introduce the work. So for me, communication about these problems is really about specificity and relevance. So you have to be specific enough that people know what you're talking about. You have to put it in terms that are physically comprehensible. I am detecting an object. When I detect that object, what I mean is that it is at this location relative to the celestial background. Right Like, you have to be able to communicate those things, and you also have to be able to represent that in a way that is comprehensible to them. I do not believe in talking about these things in the abstract, because it becomes very challenging to do that. So I give exactly the same deck with the same paper side of the bottom, same nerdy papers side at the bottom, to every audience, and that has worked fine for years. So so so far, nobody's made me stop. So I guess that's you know, whether or not that's a good idea and material that's what I'm doing. So yeah, it seems to work out. Okay, that's kind of interesting. I really like the idea of being specific about what you're talking about. Can you give maybe some examples of how that talking in the abstract versus talking specifically is going to work? Sure? So if I say, hey, I've got an artificial intelligence model, which at that point I've already been a little bit. But if I were to say abstractly, I've got an artificial intelling this model, and it's going to do satellite positive idea for you. So what I've done there is I've used like three different abstractions and that in that very general about three different things all at the same time, and any given listener might fill in those gaps with three different levels of interpretation, right, And so that's not really a helpful way to formulate the problem though in general that make sure it's relatable to the people. That's not really helpful way to formulate the problem. But if instead I say, bear with me, I'm going to take a minute to talk about this. I only need a minute of your time. If instead I say we collect images from a large telescope. We take the light, goes into the telescope, we break it up and do its constituent wavelength components. That produces this blurry image that kind of tells you what the relative contributions of different wavelengths are in the image from this object that we took. And we know that different materials absorb and reflect different wavelengths differently. So we think we hypothesized that it should be possible to tell what the object is based on its material composition, which you can roughly infer from the image, but we don't actually know how to write down a set of rules to do that. So we learn that process via a large parameterized model. If I say that, I've so far found no one who can't understand that, right, like, they might know what is it? What is a large parametrize model? Mean, well, they might not know that, but they don't really care that much, right because I've given them something concrete enough that they can fit all of that in their brain. At the same time, I think that sometimes, especially for senior leaders, we expect a little bit too little of them. We think that, oh, you've got to give them, like, you know, crayon level diagrams. That's not true for the most part. As long as you're brief but specific, they will understand what you mean, and they will walk away from that with a very specific understanding of what it is that we can do and what is that we can't do. And where I've seen communication in the abstract about artificial intelligence fail multiple times in different organizations that don't have anything to do with one another. Is when senior leaders make the assumption that a technology is much more richer than it is, and then they go and make investment decisions based on that misunderstanding and maturity. So that's why I'm almost fanatical about talking about things in concrete terms, making your people will understand what the limitations are before well, certainly before they start spending any money. That seems very helpful. So it's probably talking about communication. One thing you've mentioned a few times is that your team publishes a lot of scientific papers in their view journals, And I guess my perception of the military is in generally it's quite secretive. So it surprised me a little bit that you're actually publishing a lot of your results. Can you tell me a bit about how that came about? Certainly so that was a deliberate decision. And by the way, the military does actually publish a lot. If you look at, for example, the Air Force Office Assigned Ific Research Right a fos R, it's called they fund grants all around the country. Some of the most pioneering work into this day insequential decision making under ucertainty and area I study personally, you go to the bottom of those papers, you're almost always going to find an a fos OUR grant number on those things. So it just sort of depends that the public doesn't in general, you know, when they see that acronym at the bottom, they don't necessarily know that that means, oh, that was the military that funded that grant. But in the Navy, of course, has a similar department. The Navy has a similar program, as does the Army. There's actually a lot of defense publications out there. But you're right, it's somewhat unusual to see the level of publication that we do from an organization like where we are right So in Space Systems Command at this particular location, this tends to be a lab oriented thing or an office of scientific research oriented thing. As a general rule, it was a deliberate decision. So the first thing that we realize is we're working in emerging technology, so we're not doing deep applied technology that like constitutes trade secrets. We are taking things in the public literature and we're applying them to publicly available or data that we can easily make publicly available that's not sensitive. Right. What we...

...do is we take those two basis of justification for being able to share this information. We put them together and say we're using publicly available techniques condets or not secrets, right, and we are using data that is not intrinsically sensitive and putting them together. It should be okay to publish this stuff. The reason that we do that, and sometimes this is, you know, it is somewhat onerous to go through the publication process. We've got to be careful not to over publish. It can become all you do. The reason that we make that choice is because we are really trying to incentivize a broad and self sustaining base of researchers in this applied technology area. So we want people doing their PhD students on this work. And the reason for that is in part selfish, right. So what we really want our job is to act in the best interests of the government. Right, And so it's in the government's best interest of a bunch of PhD students dedicate some of the most productive years of their life to solving some of the hardest problems on behalf of the nation. That is in the government's interest. And then those students can go on and get jobs in this domain and it's a it's a virtuous cycle. So that's part of it is trying to frankly generate work for free, so we don't have to enter into a tractual relationship to make that happen. That will just happen naturally. However, there is also the dimension of it is very easy to fool yourself in this domain. It is really easy to fool yourself when you're doing applied data science, when you're training deep learning models, especially in domains where not a lot of people are working, it is very easy to fool yourself. And so that necessary step of peer review from in particular because we don't just publish in artificial intelligence forums right that peer review in particular from the astronomy community and then the defense computer vision community. Those peer review opportunities that come from publishing those areas are things we actually can't get inside the Department of Defense. There's really no one who could do that for us, which is sort of a paradox of being out towards the leading edge. I think it's going to happen in all kinds of domains all over the world, for different industries and different disciplines. Is if you're out at the cutting edge. There's not a lot of people who can peer review you. We were very worried about fooling ourselves in the early part of this process, and so that's what took us down this path. And then what we discovered after we did that. Those are the two reasons we started the What we discovered is it makes it possible to do things like I'm doing right down. So this is kind of hard to do going on a podcast and talking in public if you don't have a track record of publication. But because we do, it makes it possible for us to go and communicate about our problems. And I think especially given that we have a brand new branch of service here in the United States Space Force, which frankly, it's difficult for the public to comprehend exactly what the underlying technical problems of the Space Force has to solved. Is because people are busy and they're not gonna not everybody has time to go and study this stuff. Being able to talk about this in public has its own dimension of value to it. So that has sustained us. Now that we've got the cycle going, and once you get started, it's it's easier to keep going than it is not. So if you're out there and you're working in the Department of Defense in apply technology, my advice is publish alright, Britiant. Yeah, that's a good sales pitch for publishing your work. It sounds like because of this that your team is actually like very collaborative with other organizations, certainly around the U S maybe around the world. Are you able to talk about examples of where you've collaborated with other organizations. So, first of all, we we fund PhD students sometimes and that has produced some really valuable collaborations with labs around the country. So that tends to happen through contractual means, but these at the end of the day, it ends up being one team. And so we funded some PhD students that we have, one of Stanford did some really amazing work in basically small satellite stuff. I'm gonna go into the details right now. So that's one way that we've done partnerships in the past. We've had a variety of academic institutions US and international academic institutions raise their hand through we have these sort of they're like public space war games, is the way to describe it. They're like places to everybody to get together and try out their stuff. Right, we participate in those very heavily and through those activities. One of the things we did one year was we just said, hey, could people just send us fs that's the image format for the telescope stake. Can you just send us your fits and see what happens? And we got like seven hundred thousand images from that, right. So, and there's five or six organizations involved there, so we collaborate with them, will annotate their data for them. We have a great annotation as a service company that's on sub to one of our prime contractor. They're called Enabled Intelligence, So they go out they annotate our data for us. They can do multiple classification levels, and then we give that as a collaboration, right, we give that back to the people who gave us the fits and now they have training quality annotated data in exchange for letting us use it to train our models. That's a collaboration model that works really well. We're really just beginning to get into the Allied partner, who is the international community basically collaboration games. So in coming up in November, there's another one of these things, and we've got a partner from Chile that has raised their hand and wants to participate with us. We're gonna be providing autonomy for their telescope I think. And then another partner in Australia who has an emerging sensing concept that we're very interested in but we'd like to participate with. So that's how we tend to do collaboration. But we've also had you know I talked before about funded PhD students, We've also had students at universities just see the work and say, oh, this is relevant and interesting to my research. Would you like to collaborate with me? And well, the students reach out to us like, we'll give them the data. We can give them most of the data. Some of it is sensitive, some of it is even classified. Obviously we can't give them that, but we give them all the data that we can well, and if they've got data, will annotate it for them and give it bad train models and give them baseline performance. It's...

...a really interesting collaborative community. The SAME's pretty incredible that you are collaborating with a load of organizations around the world. I just want to pick up on something you said that you provide an annotation service and I have like images of like cadets just having to l go through telescope images with like oh this starts this particular star and thats satellite. Can you tell me a bit about what annotation involves. Yeah, sure, so, I mean there's a fun story here. The first sat in the data said, the one I was talking about before, it wasn't annotated by a cadet. It was annotated by me. I annotated that data that many years ago. But no, that the annotation is We've matured a lot over the years. We have like tight instrumentation on dollar per frame, dollar per second, all that stuff. On our We have a very mature annotation pipeline. Now what it entails for just give you one problem. So every problem is different. I was talking about one, the most mature one. It's a sat net problem. We have a company are one of our prime contractors, has a sub relationship with them as a contractual relationship with them, and what they do is we have built a tool for them called SILT, the Space Image Labeling Tool, which has a nice like punt on fertility as well. You know, all the data grows out of it. But SILT has been was built by the prime contractor and has been fielded to them as their annotation platform, right, and so what they do it brings these images up for them, and there's all kinds of domain specific stuff you gotta do. Right when you look at these sixteen bit images, if you just try to display them on a computer screen, they don't look like anything. So all those knobs are tunable to the annotators. These annotators to this professionally full time. All those knobs are tunable and they can play back and forth through the images. Extremely important. They're not true videos because they might be taken minutes apart, but they're not moving. The objects don't move a lot, so you can basically track them if you watch across time, and you can even impute the location of objects with the eye. You can impute the location of objects if they're not visible in the frame. Sometimes it get too dim because the sun is not shining on them just right, they disappear, but you know they're there, so you can annotate that underlying source of truth. And what it looks like is they're going through and basically connecting these spatiotemporal lines to produce the location of the objects across time and then act we just feeled at a brand new feature. We have an excellent new developer who's just joined us recently, and so she has built an extension to silt that makes it possible for us to also annotate the stars in the image and to This is a bit down on the weeds, but basically, if you haven't, if you know where enough stars are in an image, basically think about you're looking at the nice guy a little piece of it. There's some satellites in there and a bunch of stars in the background. If you know where enough stars are, you can do what's called an astrometric fit. This is where you basically you can determine where that telescope was pointing based on what it sees in the background of what it's imaging. This is essential for that orbit characterization stuff I was talking about before. You have to do it or the observation is useless. We can now even go through and dynamically annotate those stars and it will it'll run basically an astrometric fit engine in the background, and if it doesn't get one, it will tell the annotator Nope, that didn't work, we need more from you. They'll keep annotating stars until it satisfies that requirement, which is really powerful because right now that you have to both do the object detection and the star detection and fit successfully with high precision and recall in order to have a valid surbation. Is not enough just to say that object is there. You have to say that object is there, and there is here. Here is the location of the nights guy that this telescope was looking at. Okay, that kind of makes sense. Just knowing that there's a satellite somewhere this guy is less helpful than knowing where it is relative to the rest of space, I guess. So it's really interesting hearing about the problems you've been working on. So can you tell us what's the approject that you've been most proud of working on? So, I mean, it's it's hard to pick one. It's sort of like choosing amongst your kids your favorite, and whoever I choose, someone else will be hurt. There are a lot of projects that have been really fun. I think that it's it's a toss up between satin at the problem I've talked about many times, it's detection problem because that one's really close to me because I started it personally. It was my Nights and Weekends project right before I before I just did management stuff all the time, that one is really really close. And also we had a great in the early days of that program. We had a great collaborative environment. Some of my colleagues who were out here at the time, we really you know, it was very much a startup kind of atmosphere at the time, Nights and Weekends on a shoestring, building out the framework for the d O team. Anyway, that project is a is a favorite of mine, but it also has grown into a very mature, operational capable product. But really I had to say the one that I've invested the most, and we really haven't talked about almost at all today, and that I think has the most potential for disruption in the department. And I think it is sort of for that reason, the project of which I'm most proud is that's probably is called Mackina. That's an acronym. I'm not going to go through the torturous acronym right now, but basically, this is a global autonomy program. It is designed to make it possible to get the right data from telescope. So far, we've only been talking about what do you do once you get the data well. In a in a sensor poor target rich environment like space, where there's too many things to look at, not enough sensors, it's not enough just to exploit the data correctly. You also have to collect the right data. And that itself is a really challenging sequential decision making under uncertainty problem. What does it mean to do a good job right. We basically have a program for that called Machina, and this is a really, I think is a novel programming paradigm. It's a very new way to approach problem of multi agent autonomy. It's built entirely on Kubernetes,...

...and we have a really interesting deployment construct for how that system will be fielded and sustained across time, updated and managed, and it has as part of it it's a closely agent so it has these computer vision approaches inside of it. Effectively, it's a thing that we transition these technologies into. That project is enormous in scope, and if it succeeds, we're in the early stages now. In fact, this upcoming exercise I was talking about November is gonna be a major test for that program is gonna be really its first major milestone. If we succeed at it, it could totally change the way we do sensor management in the Space Force. That's that. I think those are the two projects the most product I didn't I wasn't able to pick just one. I guess, sorry, your two favorite children, that's all right. So that less project seemed pretty interesting because it sounds like you've got the computer vision aspect, but you've also got this automation stuff and just it sounds like it's getting to this point where it's artificial intelligence, where it's trying to just take the humans after the loop completely. So just every all the telescopes are run automatically and send that correctly. Yeah, so they do run automatically already, all these telescopes are already robotic telescopes, so they're so the they're automated in the sense that you tell them to point at the location of the sky and they'll basically do that for you. What that program provides in addition to that is what we might call orchestration. Think about this as we have to use these sensors in a collaborative way. If if these sensors are very different, imagine one has a really wide field of view. If we can see a lot of stuff, but not very dim stuff. Another one has a really narrow field of view, so it can't see very much area, but it can see really really dim stuff in the little area you can see. You can put those two sensors together, and that's the hard part, right, But you can make those two sensors interoperate, collaborate with one another in such a way that the autonomous closed loop system they constitute together working together is more than the sum of its parts. And there's some really interesting you know this, really, I can't claim credit for that idea. That idea goes back to there's a darker program about this in the sixties. The site here amost the site that we're at out in MAUI has been researching this for years. That was actually that program that I was on when I first came over as active duty. I can't say a lot about it in this forum, but basically was doing similar work. That paradigm of extracting more than the sum of their parts value from a collection of heterogeneous telescopes working together is has the potential to be game changing for not just space to main awareness, but sensing in general, and of course these are not original ideas, right, These these ideas have been studied since the eighties and nineties. And actually an interesting connection back to a fos R as funded a lot of that stuff. So again the military funding research and development for those kinds of problems. That's really interesting. And so you mentioned some of these programs have gone back a really long way that sort of predates deep learning. So how would you say as deep learning affected what your teams can do in terms of research. Yeah, so this has been it's been really fun over the past few years. So we're coming into a community, but we we when I say, we aren't really talking about basically me here. When this first started, it was really just me and so I was coming into a community that was venerated, been around for generations. People have been doing space domain awareness since decades before I was born, doing space domain awareness since they called it s ss A at the time, but since my parents were children, right, and so so one sort of gets that what are these kids up to these days? Kind of attitude when you say, yeah, I'm going to do computer vision for this detection problem you've been doing for fifty years, right, Why we've been working on it for fifty years. What do you have to add, right, It's not necessarily that people are malicious or trying to prevent progress or anything. It's just that they know this domain. They've been working in this domain an entire career, and they know it right, and so their expectations are very hot. Here's what we discovered when we brought the deep learning revolution to space domain awareness by accident, because that's what I did in my graduate research, and that's what I wanted to do in my day job, so I did it as a nights and weekend's project. What we found was when we actually took the time to measure the legacy systems in terms of the way we tend to measure in particular detection problems in terms of precision and recall. Right at a particular IOU Really we actually use centroid distance because our objects a point sources. But that's that's not in the weeds when we think about formulating those detection problems in those information retrieval terms as opposed to detection theory, which is much more common in the astronomy community. What we found was when we put it in those terms, which really the terms we need for autonomous systems. Now explain why in a moment, But we find is that the legacy systems were not as performant as we thought they were. This wasn't because anybody did anything wrong. It's just because, as anybody who worked in computer vision before the deep learning revolution will tell you, computer vision is really hard hand design. Computer vision is harder yet, and so there were challenges, and in particular, there were edge cases that caused some truly egregious failure modes, and we uncovered those only because we systematically measured our performance so that we could do deep learning. You can't do deeply. You can't be serious about doing deep learning if you don't have really carefully controlled and well understood metrics for your problems, because otherwise, going back to earlier, it's really easy to fool yourself. And so if you don't do that, you can get in all kinds of trouble. And the trouble that the legacy community was getting into is that...

...they were assuming that ground truth couldn't really be known, and so they were doing their best to calculate the protection formats based on case true. So when you actually knew an object was there and that caused all kinds of downstream failure modes with precision, So we get really low precision on certain kinds of images, on really important kinds of images that you really don't want to have a lot of a lot of false positives on. And so that beget this whole program. So does that mean like the false suppose it means you think that there's a satellite there, but there isn't precisely, Yeah, so a false positive in this case would be hallucinating the existence of a satellite. And so that can happen if the model is trained to be really high recoit, if the legacy system has been hand tuned to be really high recall, which means if I say an object is there, it's definitely there. Right, And so if you only care about case true, you don't care about case false. And this this transcends domains. Everybody can have this problem, right, it's not just space domain awareness. If you only measure one dimension of the problem, you can get truly aberrant behavior and other dimensions. And that did happen. We have demonstrated that numerically experimentally, I should say on sky in the real world many times over the past few years, but it was it was slow going initially because we were coming into a very well established community. It sounds like there really has been a deep learning revolution then, or at least there's a deep letting revolution ongoing then throughout the field. I would describe it as ongoing. So just to be clear, some of these techniques, these non learned techniques work exceptionally well. We to this day do not have I talked about astrometric fits earlier, telling where you are in the night sky. We don't have a learned solution for that today. The best solution is the one that was hand designed. So the revolution might not conquer every city, you know, like some technical disciplines might actually be done best in a hand design way that's very explainable and has really well instrumented performance. But there is an ongoing revolution. This week, this very week that we're recording, this is a most conference out here in Maui. It is the Conference in Space Domain Awareness. I don't think anybody would be offended to hear me say that. I think it is the biggest conference. It's gonna be like people here this year, the whole community will be here. And I'm becoming this since there were no deep learning talks, with like one or two deep learning talks across the whole conference to now. I haven't quite counted this year, but last year there were dozens applying deep learning to these problems, and not all from our group, some of many of them were, almost all of them were independent of us. So it really does feel like we're getting that revolutionary wave that the rest of the world had in thirteen, fourteen, fifteen, now in two in our application area. Okay, so it sounds it's gradually becoming a lot more popular, but not quite that throughout all the teams yet. Perhaps all right, so you've told a little bit about your wins, can you tell me a bit about what's the biggest sort of challenge that you're facing at the moment. There are a variety of dimensions of our applicationary that make it difficult. Some of them really aren't exciting. Our biggest challenges are often frankly institutional and bureaucratic. No, you can't deploy that software to that host is a very common problem. And like, so getting to yes on questions like that consumes easily sixty percent of our program bandwidth, which I think that's probably a fairly numerically accurate estimate. So big problems in the Department of Defense relate to digital infrastructure and the ability to field these things. Our solution to that is to build our own and certified and actually go through the process that takes time and is laborious, but we're doing it now motivated by the larger scale of deployments that is being demanded by the modest successes that we have had so far in computer vision and so, for example, the existing cybersecurity process, even with a continuous authority to operate and it's called an a t O, even under those circumstances, if I'm retraining a model every hour or so because I'm trying to keep up data as it's flowing in at a very fast rate, there is no viable path to do that deployment action today. That is a huge challenge. Context is a huge challenge. How where do I put this model to run inference? How do I deploy this to this classified system? Those are always our primary roadblocks. We also have you know, rearing its head again. Its sensor sparsity, right, so we have a high demand, low density sensor network. We don't have a lot of these things are very expensive, especially the large aperture telescopes with these exquisite instruments on them, and they're expensive to maintain and to operate. We have access to all of them, which is an incredible benefit for our research, but the problem is that they can sometimes be very difficult to work with. You don't get they get pulled away for higher party missions necessary, but it does happen, and it does disrupt research and development, and so we have a problem with data sparsity, especially in our more advanced domains. That's setting that problem. We've got such a large number of collaborators now we really don't have a data sparsity problem there, but in the rest of the domains, we very often find ourselves bootstrapping with simulated data, and you can never fully eliminate the semto real gap. Even with domain adaptation, it's never the same as having the same number of images from the real domain, and so that proves a persistent challenge for us. So there's infrastructure, and then there's access to data collecting devices, which have a very different flavor than the kinds of problems that I think most people see when they think about doing computer vision and industry. For example, most people are processing natural imagery to do information extraction tasks. They're not thinking about what if you only got twenty seven of those...

...images a week? Right, So how how would you approach that problem? Of course people are working on data efficiency, but at that scale, in this case small scale, that can be really that can be a basically a performance limiting challenge. So those are the two I think of all the top of my head. Okay, all right, thank you. So before we wrap up, I have a slightly silly question to ask. So, just in the past, like I've worked with like biologists work in the lab, and I've worked with engineers who gets blow stuff up, and so it's been me there as a data scientist just like sat on my laptop a licen data. So I have this kind of hardware envy. Now I know you're based in Maui and Hawaii, and so you have these like giant telescopes nearby after us, Like, do you get to play with them yourself? Do you get to play on the telescopes? This is one of the cool thing I joked before about, like, oh, you can maximize click through that's something you can do with your life, or you can operate one of the world's largest telescopes at the Summit of holyocklatt Now, right, like, so, I think we have a compelling case for people who have hardware in v yes. So unfortunately these days I've I've moved into a role now that is a lot of technical management. So I'm doing a lot of communicating, a lot of a lot of architecture level work. So if you're asking, do I personally not recently? Have I in the past, Yeah, And it was a lot of fun on our direct team, so you don't have to go more than one step removed from me. There are people who work primarily at the summit. We have world class astronomers, my colleagues Ryan Swindle and Zack Zack, both PhD s. They work primarily, They work a lot at the summit and they are I mean Zach restored this censor on the trenon port of this three and a half meter telescope to do spectral imagery. We published about that. Now Ryan is really one of the most masterful operators of that world class optical instrument alive today. So they are operating these enormous like room sized systems, some of the most advanced technology the species has to offer to generate data which they then come back to sea level or or you know, if they live in Coola halfway up the mountain, they go back to their home and they train models on that. Right, So it's there truly cross disciplinary. It is a really motivating part of our work. It is a non trivial component of the reason that we all work here. Not not the hardware. It's not really the hardware. It's although that's cool, that wears off, right, What doesn't wear off is the fact that we are working out towards the edge of the species technological capabilities and occasionally just occasionally bumping it forward a little bit. So anyway, yeah, that's that's hardware alright. So yeah, playing with giant telescopes on a mountain and then deep learning on a tropical paradise. Said, Yeah, I've got to say I'm jealous. That does sound pretty amazing. So thank you justin for your time. It's been really fascinating just hearing what you do with your team and you work just some really important, exciting stuff. So thank you once again, Richie. It was my sincere pleasure thinks so much for your time today. Ye, you've been listening to data framed a podcast by Data Camp. Keep connected with us by subscribing to the show in your favorite podcast play air. Please give us a rating, leave a comment, and share episodes you love. That helps us keep delivering insights into all things data. Thanks for listening. Until next time.

In-Stream Audio Search

NEW

Search across all episodes within this podcast

Episodes (116)