KW 11: Robotik-Häppchen
Shownotes
Unser Newshäppchen für die Fahrt zur Arbeit. Immer Montags gibt es die Robotik News - mit-recherchiert, geschrieben und gesprochen mit und von einer KI. Wenn Ihr auch in die Robotik News wollt, dann schreibt uns eine Mail.
Transkript anzeigen
00:00:00: Robotik in der Industrie, der Podcast mit Helmut Schmidt und Robert Weber.
00:00:09: Guten Morgen liebe Sorgen, ihr habt natürlich keine Sorgen, denn hier ist das Robotik-Häppchen
00:00:17: um 6 Uhr am Montagmorgen.
00:00:20: Mein Name ist Robert Weber und wir suchen einen neuen Partner für das Robotik-Häppchen.
00:00:26: Wenn ihr also Lust habt, Partner zu werden, dann meldet ihr euch bitte bei Helmut oder
00:00:30: mir und jetzt geht's los.
00:00:33: Guten Morgen und willkommen zum Robotik-Häppchen in der Kalenderwoche 11.
00:00:38: Helmut und Robert sind im Vormesse-Stress, deshalb haben wir heute ein Interview für
00:00:42: euch zum Thema Reinforcement Learning.
00:00:45: Verdammt wichtig für die Robotik.
00:00:47: Geführt hat das Gespräch Robert und der Gast war Jan Kordnick.
00:00:51: Jan war viele Jahre zusammen mit Professor.
00:00:54: Dr. Jürgen Schmiedhuber, Mitgründer von Nysons und hat viele Robotik-Reinforcement-Learning-Themen
00:01:00: betreut.
00:01:01: Wer erinnert sich noch an den Würfel und die Robotikhand von Festo?
00:01:04: An dem Projekt war Jan auch beteiligt.
00:01:06: Also, Deep Dive.
00:01:08: Und jetzt geht es los.
00:01:09: Guten Start in die Woche.
00:01:10: Hallo everybody and welcome to a new episode of our industrial AI Podcast.
00:01:16: Mein Name ist Robert Weber and my guest is a well-known guest, Jan Kordnick.
00:01:19: He is back.
00:01:26: Hello Jan, welcome to the podcast.
00:01:29: Hello, good morning, hello everyone.
00:01:31: Good morning, how are you?
00:01:33: More right, it's a little early hour for a scientist.
00:01:37: It's 8 o'clock in the morning, right?
00:01:39: Yes, yes.
00:01:40: So, you define yourself still as a scientist.
00:01:43: Why, Jan?
00:01:44: I'm sort of like a retired scientist, you know.
00:01:46: In order to grow my business and I listen to people like, you know, ex-Navy seals.
00:01:52: And they feel like, they talk like, no, I don't do that anymore, you know.
00:01:56: But they wake up at 4 o'clock, I think.
00:01:59: Yes, they wake up at 4 o'clock.
00:02:01: And they advise people just if you're gonna wake up at 4, everything is gonna be fine.
00:02:05: And I don't think so.
00:02:07: Okay, so you're based in the south of Switzerland, near to the Italian border.
00:02:11: So you have a more relaxed morning, right?
00:02:14: Yes, yes.
00:02:15: Indeed, you know, everything is kind of a little slow.
00:02:19: Now we are approaching the period where almost nobody will work here.
00:02:23: Okay.
00:02:24: Because a lot of religious holidays and everybody starts phonetically in September again.
00:02:30: So, kind of very relaxed here.
00:02:33: But we want to talk about reinforcement learning, because I am seeing it come back or am I wrong?
00:02:41: You're right in the sense that you are seeing it come back.
00:02:45: But it's been here all the time.
00:02:47: And it's been a little bit under the radar, you know.
00:02:50: It's actually a very old topic.
00:02:54: I think it might be even older than neural networks from the 40s.
00:02:58: It's even older, it probably starts in psychology, where scientists were doing experiments with mice and little creatures, you know.
00:03:07: And putting them through the mazes and observing how they behave and so on.
00:03:12: So, with deepseek, the discussion about reinforcement learning came up again, right?
00:03:17: Why is it so important?
00:03:19: What can it do that other techniques can't, especially when we talk about deepseek first?
00:03:25: Okay, so reinforcement learning, you can see it as two things.
00:03:30: You know, when I was learning about reinforcement learning, using reinforcement learning,
00:03:35: we learned that there are two things we used to call it like reinforcement learning with lowercase r and lowcase l.
00:03:41: This was sort of a definition of the problem.
00:03:44: And then there was uppercase rl, which was the collection of the methods.
00:03:48: And the definition of the problem, the problem statement, is that you have some sort of an agent.
00:03:54: Oh, agent, very popular, right?
00:03:57: Yes, but don't get derailed to what people call agents now, because it's something different to what we used to call them back then.
00:04:04: But I'll get to it.
00:04:05: And then you have environment, right?
00:04:07: An agent interacts with the environment.
00:04:08: An agent can be the mouse, or it can be a human who's learning something or it can be a robot, or it can be whatever.
00:04:14: It can be a chip-GPT.
00:04:15: And so, the agent interacts with the environment and observes the environment through observation and then takes action.
00:04:23: And the actions do something for the agent's good, and then this cycle repeats.
00:04:29: And every agent wants to maximize its future-expected reward, which is the definition of reinforcement learning.
00:04:36: No such that he harvests as much reward as possible.
00:04:40: And then the reward is typically some sort of value that the environment gives him, right?
00:04:45: And then this problem applies to many real scenarios, like driving a car, manipulating a robot hand, you know, going from A to B, learning something from the books, and reasoning, right?
00:05:03: And now we're getting to the popular terms, now we are getting to what are these novel models, like deep-seek doing, when they actually think about stuff.
00:05:14: So all the, let's say, I don't know what I call them, all language models, because they're kind of like not that old, right?
00:05:21: But these traditional ones, like the first chip-GPT, they unroll the input sequences using that statistical model.
00:05:30: You can imagine it as like a statistical parrot, right?
00:05:36: Like, which has listened to lots of tags, and then he kind of randomly pulls out, you know, what fits the current input, the current problem.
00:05:46: Whereas the reasoning models, they actually employ reinforcement learning, they employ the mechanism of interacting with the environment.
00:05:55: And the environment here is the space of the prompt of the text, and they are trying to refine, comes out, using the reward.
00:06:05: And then the reward in the deep-seek, you know, I'm an expert in language models, but the reward in deep-seek was something like tested on ground-truth tasks, like code generation, right?
00:06:17: And these constitute the environment for refining the model.
00:06:22: So, so, it's sort of the arousing moment, or the interesting part is like, when you do step from feedforward network to recurrent neural network,
00:06:31: when you have that feedback group and suddenly things start to become very interesting, because you can work with long sequences very efficiently,
00:06:38: reinforcement learning brings the same to language models, because it allows to refine very efficiently and get to the gist, get to the essence of the problem.
00:06:49: So, it reduces that, you know, in the language models you always have like two approaches.
00:06:55: One is from the inside, when you are trying to understand why it does what it does, using, you know, understanding how the models work.
00:07:02: And then from the outside, from the user to actually take it and not knowing how it works, they start using it, and they observe certain properties, like generating random texts, right?
00:07:13: And then, and it has high perplexity and high randomness.
00:07:17: And then reinforcement learning tries to bring this down very efficiently, because at the same time it's very compact and creates models, which are very nice.
00:07:29: And of course, there was a lot of tricks involved.
00:07:32: Yeah, absolutely.
00:07:33: Reinforcement learning is like, you know, when you are grilling a meat and you every minute you come with some sauce and you, you know, stick it in here and there and it makes it million times better.
00:07:47: Okay.
00:07:48: And eventually it works.
00:07:49: That's interesting, what you mentioned, because a few weeks ago we had Professor Marius Lindauer from the University of Hanover as a guest, and he said, it's a quote, "Reinforcement learning is a beast."
00:08:00: How do you see that?
00:08:01: Is he right?
00:08:02: Well, defined beast, you know.
00:08:04: It's very tricky.
00:08:05: In a good sense.
00:08:06: In a good sense, really.
00:08:07: In a good sense.
00:08:08: It's a monster tool, right?
00:08:10: Okay.
00:08:11: It's because reinforcement learning is sort of like the mother of all problems, right?
00:08:15: Everything you do as a human, you can frame it as reinforcement learning, except some very specific problems that you're solving, right?
00:08:25: Like absorbing knowledge through reading books, you know, doing that supervised learning.
00:08:29: Your father tells you how to drive a car, right?
00:08:32: And here, you know, there is a curve you need to steer a little bit right.
00:08:36: This said supervised learning.
00:08:38: Yeah.
00:08:39: But as you hop in your own car and try to get to the target and you're driving, that's reinforcement learning.
00:08:46: Or, very popular or unpopular topic, you want to get to the Mars, right?
00:08:50: That's reinforcement learning.
00:08:52: And the reward is RV on the Mars yet.
00:08:54: And that's the problem, the problem of reinforcement learning that you have to infer all the underlying actions that lead you there.
00:09:01: And sometimes this is very, very difficult.
00:09:03: Yeah, we talked about AutoML and when it comes to reinforcement learning, he said it's a beast and AutoML methods can help you to manage this beast to get it in the proper way to help the developers to use reinforcement learning.
00:09:17: What is your opinion on that?
00:09:19: It's very important to distinguish between optimization and RL.
00:09:24: So, one essential property of reinforcement learning is that when agent interacts with the environment in that closed loop, it affects the environment, it changes it.
00:09:36: In der ersten Textbook-Example von Reinforcement Learning, was initially applied to exactly what the mice did to the mazes and tried to navigate through a maze, which was more aesthetic.
00:09:51: You can see the whole thing and it doesn't have many so-called hidden states, right?
00:09:58: But more than RL is tackling the real world, you have a lot of hidden states.
00:10:02: You don't have full observability, which is something that in der Reinforcement Learning science called POMDPs, partial observable Markup decision processes.
00:10:13: I don't want to get into this because when everybody was explaining to me how this works, you're using these terms and everything was hidden behind the fuck, you know, all of the terms.
00:10:23: It's just actually a common pattern in the science now.
00:10:26: If you want to make a dent in science, you invent your own terms.
00:10:30: It brings the most popularity, like deep learning, for example, right?
00:10:35: So there is that problem that as the agent interacts, it influences its environment, you know, and you also see it in practice, you see it in real life if somebody wants to really make a dent in society, he influences or changes the rules, right?
00:10:50: And we can see it now, very clearly what's happening in the world.
00:10:55: But the RL agent does the same thing, like whenever he interacts with the environment, something changes.
00:11:01: So the underlying distribution of data changes, and he has to learn from it and navigate.
00:11:07: And that's what makes it really hard.
00:11:09: So that's what makes it really difficult to convert to just supervised learning problem.
00:11:13: It's a different class of problem, and it solves more, more essential, more difficult problem in the world than just the standard keep learning.
00:11:22: So that's why it's so sexy.
00:11:23: That's why it's so cool.
00:11:24: And that's why he called it the beast.
00:11:26: Okay, regardless of large language models, where do you see reinforcement learning in your project, in your industrial environment?
00:11:34: And what is the best way to approach an reinforcement learning project?
00:11:38: Maybe you can do it step by step.
00:11:41: Let's go through an industrial project.
00:11:43: Yes.
00:11:44: So in my experience, it's very little industries, which had, it was very now, and I would have changed a little bit, but it was, you know,
00:11:55: it was very hard for the industrial partners to imagine that they have a reinforcement learning problem.
00:12:01: And they approached us and said, let's say, okay, and I take AutoML as an example, they are producing something, it has some parameters,
00:12:09: and they have to optimize them such that they make better products.
00:12:14: And they said, oh, and we heard about this, you know, this, this is sort of like a best what you can get.
00:12:20: And they say, okay, we know all the methods in machine learning, we know, we've heard about RL.
00:12:25: And how can it help us?
00:12:27: And then that's the first step when I try to convince them, okay, so we either have reinforcement learning problem,
00:12:33: and that's where you are making something and the path is difficult, right?
00:12:37: Your equipment is wearing out.
00:12:40: The environment changes when the agents, when the, you know, agents could be even the workers or the robots or whatever you make.
00:12:47: They wear out, the environment constantly changes and you have to constantly add up to it, then you have reinforcement learning problem.
00:12:53: But if you have sort of like a production line, which needs to be configured, like AutoML configures,
00:13:00: you know, networks, for example, they are hyperparameters, hyperparameter tuning,
00:13:04: then it's more an optimization problem.
00:13:06: And then it's the question, you know, how differentiable, hey, where environment is and what methods can you use.
00:13:12: But very often, it's not even there.
00:13:15: Very often, the customers see some sort of, let's say, quality problem, right?
00:13:20: And they are making something and they can visually observe it and they have people, they have 1000 people of observing LCD screens that they are making
00:13:27: and they look with the magnifying lens, they look for the defects.
00:13:30: That's a story from, let's say, 30 years ago.
00:13:33: And that's where we usually start, right?
00:13:35: So we work with the customers and say, okay, so we have something which can be really solved quickly by computer vision.
00:13:41: And so we can assess the quality of your products, for example.
00:13:44: And then that's sort of like a foot in the door for us.
00:13:48: And then the reinforcement learning or optimization problems, they come secondary, right?
00:13:53: The customer can now assess the quality.
00:13:55: And then they start asking questions like, what do we do to improve the quality, right?
00:14:00: How do we turn the knobs in the factory such that we make better products, such that the quality is higher
00:14:05: and you don't report that many defects with your system, which you have already installed.
00:14:09: So, that's the interesting part.
00:14:11: And that's when RL kicks in.
00:14:14: And then it has its own difficulties, right?
00:14:17: Because you cannot really maybe have said that in the previous podcast.
00:14:20: So I don't want to repeat myself too much.
00:14:22: But let's say you don't want to unleash the beast of RL directly in the factory
00:14:29: to start turning the knobs frenetically there and to see the outcomes, right?
00:14:33: Because you would be destroying 100.000 Glass factories per second, and that's not realistic.
00:14:40: So, what we typically do is we model the browsers.
00:14:44: And that becomes our surrogate environment.
00:14:47: And then we train our reinforcement learning agents with this environment.
00:14:50: And then we do the deployment.
00:14:52: The deployment itself is a problem also, because it has its pros and cons.
00:14:55: And it has its tricks that you have to...
00:14:58: So it sounds very complicated, right?
00:15:01: So, it sounds very complicated, but it's not that complicated here.
00:15:05: What is complicated for non researchers are the RL methods, right?
00:15:10: If you want to, let's say, make an edge in scientific RL,
00:15:14: you have to do all the tricks with the techniques, right?
00:15:20: Because if I go to the history, there are some methods like, you know, temporal difference.
00:15:25: And then policy gradients, which is popular where you reformulate the RL problems
00:15:30: such that you can use the gradient to find the best path through the environment,
00:15:34: best state action sequences.
00:15:36: And then there are methods like PPO, Proximal Policy Optimization,
00:15:40: which, you know, I'm just giving the reference to,
00:15:43: the listeners want to look at some modern RL.
00:15:46: PPO is one of them.
00:15:48: And there is a lot of tricks inside, like advantage functions
00:15:51: and how to, again, you know, how to tune the hyperparameters.
00:15:54: It's actually not that many.
00:15:56: And then the derived methods were actually used in the deep seek.
00:16:00: So deep seek uses some like group related policy optimization,
00:16:05: which is again derived from PPO.
00:16:07: And if you look into the deep seek paper, you will see that there is a formula
00:16:11: spanning across in the whole width of the paper across multiple lines.
00:16:16: And that's what they, that's their objective function that they optimize.
00:16:20: Which role does data play when it comes to reinforcement learning?
00:16:25: If we consider deep learning simple, supervised learning simple,
00:16:30: that you have training set that you, it can have, you know,
00:16:34: it can contain a lot of texts, all the texts on the planet,
00:16:39: such as in the case of language models.
00:16:41: If we consider that simple, then reinforcement learning is harder,
00:16:45: because, as I said before, the data changes as you interact with the environment.
00:16:51: So it's almost futile to think that you would be able to collect all the experiences,
00:16:57: all the trajectories that the agent can walk through the environment.
00:17:01: And that's why it's difficult, right?
00:17:04: Because the data basically never stops generating.
00:17:08: So data-wise, it's a harder problem.
00:17:11: And if I refer back to what I said before, that you first want to create a model
00:17:16: from a factory and then learn on it, creating the model is a hard data problem,
00:17:21: because you have to cover all the cases, which you normally do not encounter.
00:17:26: Because RL, it's like when babies learn, right?
00:17:29: It's learning from your own era.
00:17:31: So you're walking on a path, you're working on a path,
00:17:34: balance beam and you make a side step and you fall.
00:17:37: That's what happens in very early stages.
00:17:39: You keep falling, keep falling,
00:17:41: and then you keep falling less and less and less.
00:17:43: And you have to be able to cover this.
00:17:46: So if you have fully software environment
00:17:50: that you can simulate,
00:17:51: and that can even simulate the failures, that's perfect.
00:17:53: But if you're learning the model from a factory,
00:17:55: which is operating so so
00:17:57: because people are controlling it,
00:17:59: you don't have these failure states.
00:18:01: And you have very limited amount of data
00:18:04: in these cases, right?
00:18:05: But you still have to explore them.
00:18:07: So that's the difficult part of it.
00:18:09: But it sort of removes the difficulty from the RL.
00:18:11: It puts more data and puts it forward
00:18:13: to the model generation.
00:18:14: So that's the complexity,
00:18:16: that's the hardness in the industrial RL.
00:18:18: Basically lack of the models.
00:18:21: All right, that was the era of digital twins.
00:18:23: We said that was something like five years ago, right?
00:18:27: It was a big deal.
00:18:28: Everybody was doing digital twins.
00:18:30: And to me, it almost seemed like
00:18:33: people do digital twins for no reason, right?
00:18:36: But the reason is here now.
00:18:38: And it's very important
00:18:39: that the people have created digital twins
00:18:41: because now we can use them in RL
00:18:43: and optimize the agents that actually work with them
00:18:45: as surrogates.
00:18:46: And then we can do the transfer to the actual industry.
00:18:49: - How do you maintain
00:18:50: and reinforcement learning approach and practice?
00:18:52: Or what do you need to do?
00:18:54: - What do you mean exactly?
00:18:55: Like if we deploy it to a factory,
00:18:58: how, what is the maintenance or how?
00:19:00: - Yeah, exactly.
00:19:01: - And then when you deploy the reinforcement learning
00:19:05: to the shop floor or to a process in the industry sector,
00:19:09: and what comes after?
00:19:11: Do I need to maintain?
00:19:13: - You fire all the employees,
00:19:15: they go home, you pay them psychologists.
00:19:17: (laughing)
00:19:19: No.
00:19:20: Well, you know, you can, there's two things, right?
00:19:24: You can always improve the methods.
00:19:26: You can always find, you can always tweak it.
00:19:29: You can find something which works a little bit better.
00:19:32: Same as the deep learning, right?
00:19:33: You can always improve it,
00:19:34: such as it works faster,
00:19:36: learns better, you get the better quality.
00:19:38: So there is this kind of like a software
00:19:40: slash theoretical maintenance of it.
00:19:44: Then there is the normal maintenance.
00:19:46: Like when you interact with the production,
00:19:48: you get into automation field and you have,
00:19:51: when you deploy something,
00:19:52: you have to take care of it, right?
00:19:53: Like battery runs out somewhere, you know,
00:19:55: computer dies and so on.
00:19:57: That's the, these are the classic problems
00:19:58: which we're here always in automation.
00:20:01: But you know, it's almost, I mean,
00:20:03: I don't wanna say like it's sort of like risk free
00:20:06: or nothing bad can happen
00:20:07: because you know, once you deploy it,
00:20:09: everything's well found.
00:20:10: But as always, as in deep learning,
00:20:13: you figure out that there are things
00:20:15: which you haven't thought of and then the algorithm
00:20:18: or the, the RL is designed
00:20:20: such that it would never, you know,
00:20:22: utilize something which is never explored, right?
00:20:25: Like we have one customer and then we are,
00:20:28: you're not doing RL there yet,
00:20:29: but we are doing the inspection and then he says,
00:20:33: yeah, and then everything works and they have a data,
00:20:35: we label training model, it's kind of nice and we like it.
00:20:39: And then suddenly it doesn't work at all.
00:20:41: And he's like, oh yes, because like once a year,
00:20:44: we engage this machine which brushes the material
00:20:49: and then material looks completely different
00:20:52: and the distribution changes and you don't know what to do.
00:20:54: So, but you know, as I'm thinking about it, like,
00:20:56: maybe you have asked that you either, let's say,
00:21:00: train a controller, which then you train,
00:21:04: let's say in simulation and then you deploy it
00:21:06: and then it works and then it's fine.
00:21:08: But maybe you ask about like,
00:21:10: is there an ability that the system
00:21:12: actually adaptively improves
00:21:14: and that it has to be prepared for it.
00:21:17: Same as other techniques in other fields.
00:21:21: Like you have to, let's say, let it allow to learn
00:21:25: and that's sort of not that hard, right?
00:21:28: That's sort of like an essential property of the problem.
00:21:30: You don't close, you don't completely close to learning,
00:21:34: right?
00:21:35: You deploy it and you still let it observe
00:21:37: and you let it, you know, collect the data
00:21:39: and maybe run some optimization on the trajectories
00:21:43: that you collected while working with the actual plant
00:21:47: and not with the model.
00:21:48: Yes, that's possible and that's kind of easy.
00:21:50: - What are the limitations
00:21:51: of reinforcement learning in the industrial sector?
00:21:54: - I would say for now it's--
00:21:56: - No limitations.
00:21:57: - No, no, no, no, but it's, you know,
00:21:59: as we discussed before, it's the data
00:22:01: because the method is far more advanced
00:22:06: than what the bottleneck of industrial problems
00:22:11: allows you to do.
00:22:12: So I wouldn't wanna say that there are no limitations.
00:22:16: You know, it's endless.
00:22:18: You know, maybe if I'm marketing person of some kind,
00:22:21: I would say it's limitless,
00:22:23: but yeah, I'm still an engineer, so.
00:22:27: - And scientists, we here, retired scientists.
00:22:29: - Former scientists.
00:22:30: - Yeah.
00:22:31: (laughing)
00:22:32: - And now I rely on other scientists
00:22:33: and I like that very much because, you know,
00:22:35: we can just use what the scientists come up with
00:22:40: and we can integrate and I enjoy this position very much
00:22:44: because I don't have to do all the science myself.
00:22:47: That's hard, you know, that reminds me
00:22:48: of our Calabrian cleaning lady at ITC-A
00:22:52: and when she was complaining about it,
00:22:54: our desks are too dirty, what's she gonna do?
00:22:57: And I'm looking forward to dying, go to heaven
00:22:59: when there are no dirty desks.
00:23:00: And I said, like, come on, what are you complaining about?
00:23:03: It's like us complaining that there is too many
00:23:05: open problems in science, right?
00:23:07: - Absolutely, absolutely.
00:23:09: - Which is frustrating.
00:23:11: - Yeah.
00:23:12: - So, and now we have a lot of open problems
00:23:14: in the, you know, intelligent automation for industry
00:23:18: and then, yeah, we can be.
00:23:20: - But what are the next steps
00:23:21: when it comes to reinforcement learning?
00:23:23: - I think there are two paths.
00:23:25: One is the industrial, just to use it more in industry
00:23:30: and to try to discover problems
00:23:32: when it actually can bring the value.
00:23:35: And then the scientific approach is just to integrate more
00:23:40: the latest great test deep learning technology,
00:23:43: like the transformer, which is being used in RL,
00:23:45: but it's not being used very much, right?
00:23:47: Or use modern neural networks like Excel, SDM for RL,
00:23:52: which is, you know, here and there,
00:23:54: but it's not very popular or very massively used, right?
00:23:58: Because a lot of people are now concentrating
00:24:02: on language models and how to use,
00:24:04: how to basically use RL for language model,
00:24:07: but it's just language, right?
00:24:09: But let's use the methods which are being used there,
00:24:13: like to tokenize the world
00:24:14: and not worry about the language modality,
00:24:17: but to not even just other modalities like sound or video,
00:24:21: that's there already,
00:24:22: but like use the modality of movements of machines
00:24:26: and then come up with methods
00:24:28: which can control them efficiently.
00:24:30: Yeah, and that's sort of like a holy grail
00:24:32: in the whole automation.
00:24:33: It's called dark factory, right?
00:24:35: Where you have no human and then everything works well.
00:24:38: And RL is one way to bootstrap these things
00:24:42: because essentially, again, it's the same problem.
00:24:45: It's just, you know, deployed to the field.
00:24:47: So I would say, you know, in theory,
00:24:49: let's use more of the nowadays latest transformer techniques
00:24:54: and bring it together and put it back to industry.
00:25:00: Finally, when it comes to reinforcement learning,
00:25:02: which three statements can you no longer hear and why?
00:25:06: Agents.
00:25:09: You know, we have these agentic systems.
00:25:12: I have a very good friend of mine who has taught me
00:25:16: at the university about agent technology systems
00:25:20: and he had the agent technology group
00:25:21: which was like 60 people post, I think.
00:25:23: And the classic agent technologies
00:25:26: both distributed AI, you have agents,
00:25:29: which, you know, agentic systems,
00:25:32: how they are called when you basically are a rapper,
00:25:34: a lamb into some piece of code
00:25:37: and they let interact each other with each other.
00:25:40: But that's fine, but again, it's an overloaded term, right?
00:25:44: So originally it was something else.
00:25:46: It was distributed AI and solving all the network
00:25:49: and communication problems.
00:25:51: And the agents were relatively simple, I would say, over there,
00:25:55: but it was massively focused on, you know,
00:25:57: how to orchestrate them such that they do something good,
00:25:59: right, like controlling the air traffic control
00:26:02: where you have, you know, 100,000 clients in the area
00:26:05: and each of them has to communicate with each other
00:26:07: and negotiate their paths and navigate completely
00:26:10: without humans and completely failsafe.
00:26:13: That's agent technology problem
00:26:15: and that's what they're solving
00:26:17: and they were very good at it.
00:26:18: And now it has shifted towards that every agent
00:26:22: is like a very simple piece of code
00:26:24: and which contains, you know, that 300,000 billion weight
00:26:28: models and they talk to each other
00:26:30: and then nobody cares about the communication between them
00:26:33: so it has to come.
00:26:34: So, you know, I can really say that I hate that hearing
00:26:37: but it's sort of like the problems which are there
00:26:42: are not even remotely considered, I would say.
00:26:45: And that's my pain with that because--
00:26:48: - So that's the first one, the second one?
00:26:50: - Yeah, what I didn't like was when I was at some forum
00:26:53: and they said, okay, with these LLM's,
00:26:54: there's that magic parameter that you tune
00:26:57: and it does the magic and, you know,
00:27:00: it starts oscillating less.
00:27:02: It's just Boltzmann temperature
00:27:04: for probability distribution.
00:27:05: Like, so what I don't like is when things get like
00:27:09: oversimplified, omitting the true nature of the problem
00:27:14: and it, you know, puts it in that haze in the fog of war
00:27:20: where you cannot really understand what is what
00:27:23: rather than explaining it properly
00:27:26: in simple terms such that everybody can understand.
00:27:28: And what I like on that is if you do that,
00:27:30: then people say, aha, so now it all makes sense.
00:27:35: But I think it's the biggest problem of the science, you know,
00:27:37: and I don't wanna sound too bold and like feel above
00:27:42: the people who don't really wanna learn it
00:27:44: because the topic is just too large
00:27:46: and you don't have enough time to study everything.
00:27:49: So these are these oversimplifications.
00:27:53: - Okay, that's the second and the third one?
00:27:56: - Depends how popular we wanna get.
00:27:58: Yeah, it's not that much of a term.
00:28:00: It's more that explanatory problem, right?
00:28:03: When you read a paper and then people are overusing
00:28:07: some terms that they've invented
00:28:10: and it's very hard to imagine what they think like
00:28:14: or what they mean with that.
00:28:16: - That sounds a little bit like your former boss.
00:28:18: - Which one?
00:28:19: Like Jürgen?
00:28:21: - Yes, yes.
00:28:22: - No, but he wasn't like that, you know,
00:28:24: I got a different in here because he wasn't the one
00:28:27: who was inventing terms just to promote his stuff a little bit.
00:28:32: Well, you know, take upside down RL for example,
00:28:35: but he was more trying to defend it
00:28:38: and then explain things as they are
00:28:40: and trying to see through them.
00:28:43: So this is definitely his good property.
00:28:48: And yeah, but sometimes he was inventing terms.
00:28:50: I remember once we were,
00:28:53: because we were doing RL,
00:28:55: not using let's say the classic RL techniques,
00:28:58: but we were using evolutionary RL,
00:29:00: which is like a huge topic.
00:29:01: Maybe we should do another podcast on that.
00:29:03: - Yeah, absolutely.
00:29:04: - It becomes very funky and it's very interesting
00:29:06: and it's very cool because it's very simple in a sense, right?
00:29:10: The methods are very, very simple
00:29:11: and almost everybody can use them.
00:29:13: Like those who cannot program, you know, the RPO or PPO,
00:29:17: they can definitely do evolutionary RL
00:29:20: and do a big thing with it.
00:29:22: We have even with my previous company,
00:29:24: we have even developed a library
00:29:26: which was for evolutionary RL running on.
00:29:29: - So can you tell us the low code RL?
00:29:32: What is it?
00:29:33: - I would say it's low code RL, you know?
00:29:35: Because you can write evolutionary algorithm
00:29:38: as a one-liner in Mathematica and then that.
00:29:41: - Oh, Mathematica is coming back every time
00:29:44: we got an episode on that.
00:29:46: - Yeah, I haven't used that thing last 18 years
00:29:50: or I know how long.
00:29:51: But yeah, so that's sort of like a different angle
00:29:54: to attack the reinforcement learning problem.
00:29:57: And then we were thinking of how to call it.
00:30:00: And he said, okay, let's call it
00:30:02: because we were using as policy,
00:30:03: we were always using a neural network,
00:30:05: especially a recurrent neural network,
00:30:06: especially LSTM at that time.
00:30:09: And he said, okay, it's a deep network
00:30:11: which was already going.
00:30:12: And this is RL, let's call it
00:30:13: the deep reinforcement learning.
00:30:16: And I think I've had,
00:30:18: I don't really remember correct this been 10 years ago,
00:30:20: but I think I've had a paper
00:30:22: or I had like a talk at the workshop
00:30:25: which is called Deep Reinforcement Learning.
00:30:27: And when there was at NAEPs,
00:30:29: was called NAEPs, that time, not new RIPs as now,
00:30:31: but it's the same conference.
00:30:32: They organized a deep reinforcement learning workshop
00:30:37: which was more about DQM, the algorithm from DeepMind
00:30:41: which was really well-insolving Atari games.
00:30:44: And they organized that workshop
00:30:45: and they actually had to invite
00:30:46: because I coined the term.
00:30:48: But the term was actually coined by Juergen
00:30:51: and it was when we were walking to lunch
00:30:52: at ECR something like 12 years ago.
00:30:56: - Okay, that's a nice story.
00:30:57: What are your next projects for you and your team?
00:31:00: - Next big thing.
00:31:01: - Visiting Hanover Messer.
00:31:03: - Yes, but we have a product for inspection
00:31:07: and we are working on a product for the actual control.
00:31:09: And that's sort of like a holy grail.
00:31:11: It's very difficult to do.
00:31:13: And it's very hard to imagine that you take a box
00:31:16: and you deploy it to the factory
00:31:18: and it learns there and makes a model
00:31:20: and trains for reinforcement learning agents
00:31:22: such that then you can connect the box properly
00:31:24: and it starts working.
00:31:25: That's my sort of like a dream now.
00:31:28: It's sort of like a holy grail.
00:31:30: I wouldn't even call it a project at the moment
00:31:33: but I think we seriously consider.
00:31:36: And it has a lot of obstacles to be solved on the way there
00:31:39: which is which constitute like 99.9% of the work
00:31:43: before we actually get to it.
00:31:44: But I hope we have it at some point.
00:31:46: But it's more in the skies.
00:31:48: You know, like we have lots of other things
00:31:50: which are more on the ground.
00:31:52: That's probably what you've asked about and these are,
00:31:54: you know, I'm going to Hanover Messer.
00:31:56: That's for sure.
00:31:58: I'm going to the AI in the Alps.
00:31:59: Thanks.
00:32:00: - I don't know this.
00:32:02: - Completely unrelated event that nobody knows about.
00:32:04: - Absolutely.
00:32:05: - It's nobody knows what it is, right?
00:32:07: But it's very famous.
00:32:09: And then, yeah, and then opening new niches
00:32:13: of where we can use the research to improve the industry.
00:32:18: And here's one thing which I, do I have time?
00:32:20: - Yeah, sure, sure.
00:32:21: - So here is one thing which is when you ask about things
00:32:24: which I, hey, maybe that's the fourth one
00:32:26: or maybe that's a better third one.
00:32:28: There is a lot of areas in science and in society
00:32:36: where we as scientists or former scientists
00:32:41: can make a big improvement like protein folding
00:32:45: or like, you know, medicine,
00:32:47: like basically revolutionize the whole field.
00:32:50: And it's very tempting to do so just to take the method
00:32:54: which you know that it's very good in solving problems,
00:32:58: enter in some field that the people
00:33:00: don't know anything about AI
00:33:02: and just make a revolution there.
00:33:05: And then you become a hero, right?
00:33:07: But the only thing you did is just interface it
00:33:10: to the new problem and that's very,
00:33:12: I would say that's very tempting for every,
00:33:15: this is a huge urge just to make waves somewhere
00:33:20: where you don't, in the field that you don't know anything
00:33:23: about, right?
00:33:25: And you wanna become a rock star.
00:33:26: That's very, very tempting.
00:33:28: And I'm trying to resist that because it,
00:33:32: I don't know, it just wouldn't feel,
00:33:33: wouldn't make me feel great that I try to convince people
00:33:38: like you guys, you have lots of experience
00:33:40: and we're gonna replace you with missions
00:33:42: you don't know anything about because we can do much better.
00:33:44: That's very difficult.
00:33:45: And also we say it in the industry
00:33:48: when we deploy some solution to factory
00:33:50: where they have workers, very skilled ones
00:33:52: and now they are going outclassed by some sort of,
00:33:55: you know, youngsters with computers
00:33:58: and that comes to again, you know,
00:34:00: the industry revolution and breaking of the machines.
00:34:03: I just don't wanna cause that, you know,
00:34:05: AI is here not to outclass anyone.
00:34:08: It's just here to help people to work better
00:34:11: and be happier, maybe not work at all.
00:34:15: And but be happier.
00:34:16: It's an important aspect, you know,
00:34:18: you should not make people unhappy with, with AI.
00:34:22: - Jan, it was a pleasure.
00:34:23: Once again, keep fingers crossed for your next project.
00:34:27: Looking forward to meet you in Hannover
00:34:29: and really looking forward to meet you
00:34:31: at this strange event in the Alps.
00:34:33: - It's not a strange event.
00:34:34: It's, you know, you have to cut it out, but what is it?
00:34:39: I just wanted to make it look like
00:34:42: that we don't know anything about that
00:34:44: because it's some sort of like a magic event
00:34:46: that everybody would wanna attend.
00:34:48: Yeah, so.
00:34:49: - It's a magic event, absolutely.
00:34:50: - Yeah, I'm looking forward as well.
00:34:52: - Thanks a lot, bye-bye.
00:34:53: - Thanks, bye.
00:34:54: - Robotik in the industry.
00:35:00: The podcast with Helmut Schmidt and Robert Weber.
00:35:03: (electronic music)
00:35:06: [MUSIC]
Neuer Kommentar