In this special edition of the Cloudify Tech Talk Podcast, we delve into a unique case study with a special guest Eitan Yanovsky, Co-Founder and CTO of Optibus. We cover all things DevOps – from Kubernetes and beyond – and how they relate to this very specific and successful use case.
‘Scaling Kubernetes Clusters’ https://shayn-71079.medium.com/scaling-kubernetes-clusters-8a061321de93
Intro: Welcome to the Cloudify Tech Talk Podcast, taking a deeper dive into all things DevOps, all things toolchains, all things automation and all things orchestration.
Jonny: Hey guys, welcome to the Cloudify Tech Talk Podcast. My name is Jonny, I am your moderator as usual. I know it’s been a little while, but it’s been worth the wait because we have a very special guest on today’s podcast with a very special use case. We’re going to be discussing unique challenges when it comes to scaling Kubernetes, Lambda and beyond. So without further adieu, I’m now going to hand over to Nati Shalom, our CTO, who can properly introduce our special guest for today’s session; Nati.
Nati: Hello everybody and welcome to the Cloudify Tech Talk Podcast. I’m honored to have my guest Eitan Yanovsky, I know Eitan from a previous life in a giga space. This has been one of our exceptional unique developers. One of the things that I remember Eitan bringing, he was kind of starting from the dotnet group, but he was responsible for doing a lot of the replication architecture and multi-site replication and things that are related to hardcore destabalizing type of architecture in the product itself. Also being responsible for, I think introducing us to the test driven design philosophy as we moved to Python. So I remember at least some of the good parts that Eitan brought us a legacy that remained with us for a while. Eitan has started his own startup, which is called Optimus and interestingly enough, Optimus is now growing very nicely and very rapidly touching a very unique, I would say area in the high-tech industry, which is related to optimization of public transportation. Eitan will talk a little bit about that, about Optimus and one of the things that kind of brought me to this specific discussion is since it’s a tech talk it’s the unique challenges that I think Optimus represents in terms of handling scaling Kubernetes, Lambda services, when would you choose one versus the other and in general, I think the experience that Eitan went through his evolution and choice of technologies, I think would be interesting for everyone listening to this podcast. So without further adieu, I will pass the ball to Eitan; hi Eitan.
Eitan: Hi Nati, thank you, it’s a pleasure to be here.
Nati: Yeah, it’s almost a reunion. Hopefully, not the last one. So maybe I think kind of a good segway kind of give a bit of introduction on my end, but why don’t you represent yourself and then walk through us to the Optimus journey, how did it all start and what is Optimus today?
Eitan: Yeah, sure. So I’m Eitan and I have a master’s in computer science and also in math and I’ve started my career mainly at giga spaces actually before that I was only in a student position and I started the giga spaces fresh out of college. As you mentioned in the dotnet group later on evolved to leading the core team and even the Cloudify team at some point, it was a combined team at that point in time and I’ve been there for about seven years, I think, which is kind of the similar amount of standard now at Optibus.
Nati: So that’s responsible for a good chunk of your life, at least your career path.
Eitan: Yeah, Giga Spaces was a very good place to learn how to tackle scale issues, distributing systems, which really helps me in what we do today.
Nati: And I also usually say that it was also a good place to learn what not to do.
Eitan: I think I have, I’ve learned more about that where I’m at, where I’m at now, and I mainly love solving distributing challenges, there’s different systems algorithms and this is the core of what of our challenges in Optimus, how to build a scalable distributing system that needs to solve a very complicated and challenging problem that requires sophisticated algorithms and to do it at scale and at speed. This is the main problem that we are facing and let me tell you a bit about Optimus itself. So at Optimus we’ll build a platform which is used to plan and operate public transportation or mass transit, not necessarily just public transportation, but moving the masses and using our platform, the people or the companies that are in charge of doing that task is using it to actually plan the movement of every vehicle and every driver each minute of the day and also to plan the movement of the people themselves that are supposed to create building the routes, the best way that they, that you can do in order to serve the public, building the timetables and, and essentially everything that you need in order to operate and the plan and how to move the masses.
Nati: Now, that’s how you did, you started with this? How did you get into this?
Eitan: Yeah, and that’s a funny story. So it was actually, while I was still a student back in 2005, something like that, during my summer semester where I had a break and my other co-founder Amos, sat down at the beach and they thought, well, what will we do in the summer in other to get ready for our career. It was our first year in the university.
Nati: Optimizing bus traffic is a great summer exercise, right?
Eitan: Yeah. That’s the first thing you think about when you’re at the beach, right.
Eitan: So luckily the father of Amos is the CFO in one of the private, public transportation operators in Israel and he introduced us the challenge of planning and how to actually do the day-to-day walk of vehicles and drivers and how it was managed back then. We’ve expert people that use Excel in order to do that and he rightfully thought that this is very weird that we are not using computers in order to solve such a complicated puzzle, that you need to move thousands of tasks and decide how to do it in a way that is efficient and feasible and each decision of someone that is using his mind and excel at the end of translates to millions of spendings on a yearly basis due to inefficiency and also just serving the public walls because the plan is not that feasible or it’s very tight and things like that. So we got this challenge and we thought, okay, well starting math and computer science sounds like a very fun challenge to do and also let’s learn a new language while we do it. Let’s learn C sharp, it’s sounds like a promising language at that time and this is how it started essentially, as a pet project while we were students.
Nati: So that was 2005, right?
Nati: And then you kind of abandoned the idea and moved to work for Giga Spaces, I’m assuming, until kind of that idea came back, right? Something of that sort.
Eitan: Something of that sort it’s kind of was always in the back burner where we worked on that, we had access, while we were students we worked on that and had access to real data of a real operator and we were able to fine-tune and build algorithms that just solved the problem. It was just a console application that gets on Excel and outputs on Excel. It wasn’t anything more than that in terms of UI and the cloud architecture
Nati: And the father of Amos was actually using it or…?
Eitan: So we were using it yeah, they were using it and they actually won two big tenders and we came from a kind of a relatively small operator to one of the largest operators in Israel. I think that they are today the loudest one after the Dan Eggel companies and it was always in the back burner it was, we treated this as kind of a project and somewhere around 2011, where like four years after I’ve already been working in leader spaces, we started to get inbound traction because we saw, because more and more tenders have been published in Israel, it became more and more private and the operators heard about this project that we’ve done and, and they actually wanted to use it in order to submit them. There was even new operators that didn’t exist, wanted somehow to be able to build a plan that essentially dictates the cost of how much money you want to do in order to operate the product transportation and they were, it was huge tenders of hundreds of vehicles, each one, and by getting distraction and, and providing some kind of, I would say, services, we didn’t give the platform at that time, it was kind of services that we did on our own or at some point even had students that worked on using the platform to do these services for tender submission. We realized that it’s time to do something serious with that, it seems like there’s traction and eventually it’s about 2014. The beginning was the inflection point where we left our day jobs and established the company after we saw major inbound entrance from the large operators, the largest one in Israel is Egad, which actually said, okay, I want you to build this platform and I’m willing to pay and we started a company from day one with very significant revenue, kind of bootstrapped, but also started to raise money then at that point in time.
Nati: Well, that’s a very nice story. So tell us a bit about what does it involve with optimizing public transportation? How it is, how it is done, how it was done before Optimus, but what are the challenges there and where does kind of Optimus comes into play?
Eitan: Sure. So part of transportation optimization is actually one of the anxious problems in terms of computer science research and, and, and algorithm development. It’s also an NPR problem to solve it to optimality. So clearly there is no algorithm that can actually solve it to optimality within a reasonable time and the way that it’s being done before us, and even still in many places that we have not been, have not reached yet is similar to what I’ve described in 2005, which is using some Excel, some manual expertise, trying to build a plan of how to use your resources, your buses, your drivers, while respecting all the different roles that you need to do. Respect, which I’ll explain a bit later and do it efficiently and in many places, clearly we are not the first computer solution, computer-aided solution for that and we have a legacy incubator competitors that exist for tens of years with systems that are, have been mainly built 20 or 30 years ago in terms of the algorithm development and it’s on-premise systems, not cloud-based and very slow and very non-intuitive and very hard to operate.
Most of the users of the system essentially don’t really use the system because it’s so hard so they just use it as if it was an Excel and still use it manually and it’s in order of magnitude, slower than our algorithms, which are distributed. So we can build a plan for an entire city-wide operation, which includes thousands of vehicles and drivers within minutes basically solving an internet problem within minutes, again, not automatically, but to very close to that. Versus the existing legacy solution it can take between half a day to few days, depends on the scale and the quality of the results is, is poor compared to what our algorithms can provide and the robustness of what we do.
Nati: And in terms of the customer that are using it, it’s not just the local, if you’d like bus transportation or whatever. I understand that also, you know, companies like Facebook and Google who, you know needs to move employees back and forth their facility, especially in the traffic channel, the Silicon valley are also kind of the customers of that. Can tell us a little bit about the, the broader customer?
Eitan: Yeah, so the typical cloud customers are said would be a public transportation operator, an agency that is operating within a city or within a few cities and needs to transport, even sometimes millions of passengers a day, but still the platform itself is for the moving masses and not just public transportation, it’s moving a lot of people, you just gave a good example. So the San Francisco shuttle operation of employees, for example, Facebook, they have hundreds of buses that operate on a daily basis to move people from their homes to the, to the offices and they use essentially the same platform. At the end, it’s the same problem. You need to plan how you’re going to do it, and you need to do it efficiently and it’s not an on-demand service because once you need to move masses, it cannot be something that everyone decides now I want to go or not. You need to move people, all the demand, you need to move it to the resources and not the other way around and this is something that we also learned from them when we ask them, why are you not doing an on-demand operation and they said, it’s just not going to work at this scale. They have double Decker buses, you cannot have a double Decker buses pick you up wherever you want, and also have it run efficiently and actually being able to pick up 70 or 80 people at the same time. We also have a mining company, by the way.
Nati: What is that? Mining company you said?
Eitan: They have also mining companies need to transport their employees and also huge operations and very similar to what I’ve just described.
Nati: Very interesting. So just out of curiosity could I build Uber with this or ways?
Eitan: No, this is not planned for on-demand. The transportation on demand is something which is very different than a fixed-route transportation, which you can plan at scale and when you look at the masses, the demand does not change on a daily basis. When you, when you look at the entire metropolis of, let’s say Tel Aviv, you’ll have roughly the same number of people that needs to move from point A to point B, there’s specific times of day and this way you can build an efficient plan and move the people at a very efficiently, with a very low cost per passenger versus using an on-demand service that is, first of all, limited by its capacity.
Nati: What about something like via or bubble, that’s somewhere in between, between our demand and, and mass transportation ?
Eitan: Actually it’s very on demand, it’s like Uber pool in a sense, it’s on the, definitely on demand, there’s no routes and there are debates about the efficiency in that, in that sense, how much does it cost to transport a single passenger and also there’s a, there’s an issue of, of traffic. You can’t have one, everything on-demand, needs to be a combination of both modes, where the masses are using big vehicles that are planned properly and bringing the demand to the vehicles and where you have low coverage or where it doesn’t make sense to have a big bus where you have not that much of a demand. It’s a complimentary service to use on-demand. It’s typically the first mile and last mile, so you take the passenger, you put it on the main corridor where you have the main large vehicles and then I combine between the two. At the end, we have the same road and the road share the same, you know, it’s shared by all resources. So, even in the future where everything is autonomous you can’t have small four-passenger cars using the same road, as long as we stick to 2D roads and not somehow can be able to harness the attitude or something like that. So it needs to be a combination of both.
Nati: Got it. And in terms of optimization, the result of the optimation is obviously finding the right hours, the right rate, the routes for this, and also finding the spots of masses, right? Like where to actually place stations and you know, kind of do the right round. So you cannot also maximize the, like the, the efficiency in terms of a profit, not just in terms of speed, right?
Eitan: The optimization problem itself is, is it’s not one problem. You just described a few different problems in a few different stages and few different life cycles of what you optimize. So clearly building the network of routes, the timetable is one problem, or actually two problems I’ve just described and once you also have the network of routes, you need to actually, and the frequency of the trips and how much, how many times you need to, you know, to operate each route and each time different times of day then comes across how do I actually allocate resources in order to do that task. So it’s not like we have a timetable, which is all the routes are the same, they operate back and forth because that’s clearly very inefficient. They don’t have the same demand for all the routes and so buses don’t just go back and forth. They need to change route during the day, otherwise, you will need many more vehicles in order to operate and on top of that, you have the constraints of operation. So if you have, let’s say an electric bus, it has limited the distance it can drive before it needs to reach out. So you need to plan when you need to reach our chain and then where do you reach our chain, but I have limited capacity on the charger and the 100 buses it needs to be charged.
So how do I do that while respecting the fact that I have only 10 places at charger A, five available places at charger B and I have one battery type for bus type A and all of that kind of problems. It’s a very complicated problem to do and on top of that, you need to actually allocate the drivers and the drivers needs to be paid, and you need to make sure you maximize their pay time and they need to take a break, you need to make sure you comply with the rules and you actually have three union agreements with the same bulk of drivers. So we need to make sure twenty drivers of union A, has this set of rule and twenty drivers has this set of rules and everything is basically one big global problem and you can’t hand tackle each as a separate problem because clearly it’s not optimal if you take only into consideration one and then move on to the next stage.
Nati: It sounds like almost a iron dome complexity in terms of algorithm, and I’ve run down, it’s a very relevant comment right now for those that are living in Israel these days. But even so I, I think again, this background gives you hopefully the audience some ideas about the complexity. And as I say, I think as you’re getting to the details and the numbers of consideration that you have to take into account the number of sources, clearly, this is not a trivial thing. So let’s talk about the point of our discussion, which is doing it at scale. I kind of phrase it as auto-scaling at scale, which is kind of a an interesting way to put it. So what do I mean by auto-scaling at scale? So it’s not just auto-scaling, it’s also doing it large-scale demand in this case, a lot of data, not necessarily a lot of users, but a lot of data and that’s a, and through this discussion, we’ll talk about the different dimension, because I think there’s a lot of confusion between you know, what does latency mean at scale and there’s different dimension of latency when you’re doing a batch operation within real-time operations.
So we’ll try to use this use case to also go through those consideration in this case, as you probably heard from Eitan, we’re talking mostly on a batch type of processing and less than written processing, even though that’s coming down the road somewhere and so we’ll hear about the consideration around that or anything, anyone who is dealing with AI and machine learning and other things will probably resonate with some of the considerations that I think we’re hearing right now, because there’s a lot of AI and machine learning probably kind of plugged into this sort of optimization. So automating its scale, let’s start talking on the technical bits less on the heart of the story. So I started with a kind of when, when we had a kind of a preparation call you mentioned a lot of things that you’ve done in terms of efficiency. I know your background in Giga Spaces. I know where that mindset came from and, and I wanted to kind of maybe use your brain and, and this start to clarify, what is the ratio? What is the kind of relationship between efficiency and scale, how the two are related?
Eitan: Yeah, sure. So in our case, as you said, it’s not a matter of scaling in order to support many concurrent user it’s, you know, the scale of the system in order to support ongoing optimization requests which are triggered by users. So solving that problem requires an immense amount of compute power and memory, and specifically how we’ve built our algorithms is that they are distributed and we know how to make use of as many calls as you can give us in order to provide a better investor solution, which is key, you know, though, to be able to actually solve the problem fast. It’s very hard to build optimization algorithms which are distributed. Typically it’s a very tentative process that’s required every stage depends on the previous one.
Nati: Like a MapReduce type of model, right?
Eitan: Yeah, so the most CPU intensive part, we’re actually doing that with use a pattern in our case …. but it’s very similar and the thing here is that you want to actually scale, get a lot of CPU power for a very short period of time, like a burst of CPU power and if it’s triggered by a user clicking the, okay, please optimize after I’ve said everything, please optimize and I want the results within a few minutes, please. So it’s, it’s not batch processing time, it’s offline, that happens on, on the night or anything. It happens in real-time and with respect to what the user is asking you to do but it’s not an operation that takes one second or two seconds, it can take a few minutes. And while that happens, we need a lot of compute power and among many things that we’ve done one of the things that we’re using Lambda all our system by the way is cloud-native is deployed on Amazon, AWS, it’s a multitenant system. There’s nothing on-premise in our customer’s sites and it’s all used for on the browser and we are using Lambda in order to get very fast, a lot of CPU power, basically a lot of concurrent workers without paying for what you would pay, if you, if you to do it alternatively by let’s say spawning new machines using Kubernetes or any other way in order to handle this load, spawning a new machine at cold start would, would be typically 20 seconds, something like that or if you have machines at standby, you would pay for them while they are at standby. But like I said, this is more of a burst type of operation so we cannot predict the demand and have machines waiting at the right time to handle these loads. Because if it’s a very peak kind of a very short burst, the virtual peaks, we really need to spawn these machines or spawn these calls in this case on demand.
So when we use Lambda, we are taking the advantage of the very short cold start of Lambda which is sub-second and it’s being handled by AWS. We don’t need to handle that, even if you have Kubernetes, you still, you still need to take into account your scaling consideration. It’s not completely obstructed away from you, and you need to plan your node pools and things like that. So we don’t need to do any of that. We just need to make sure AWS has increased our max rate of concurrent Lambdas and we only pay for the exact seconds. These Lambdas are up and processing and once they’re done with the job, it’s immediately terminated and we don’t pay while in other scaling mechanism, which are not built for burst would still have this tail downtime because they are typically, okay, let’s wait, maybe I’m going to have more work coming. So it’s worth the wait for a minute before I decide to go down. So if I would need 20, 30 seconds of CPU for let’s say 1000 calls, using it with Lambda I can get it very efficiently, and relatively cheap versus starting machines, waiting for them, paying for the 30, 20 seconds ramp up time, then paying for the tail downtime. So this is how you can scale and get burst power for a very short period, very efficiently.
Nati: So I think you touched on a couple of things here that I wanted to kind of break down to the audience. First, maybe you start with kind of taking a step back and tell us a bit about your journey to get to Lambda. So you started with a single server, if I remember, on Amazon, and that was your first cloud, like product then from there moving into what, what was the next phase?
Eitan: Yeah, so, so the first phase was like, it was even a windows.net service server on Amazon that was actually spawned on demand when the user was clicking optimization. That was even pre-2014 era.
Nati: It was kind of a Lambda, but single Lambda and a single server.
Nati: Okay. Let’s talk about that. Let’s talk about that. Why did you choose; because I had that question about the efficiency and scale and why is it important at scale and the reason why I’m asking that question is because in some of those discussions, do you think about, okay, I’m doing auto-scaling. If I’m doing auto-scaling and I have the flexibility to shrink and expand when I need to, I don’t have to be efficient in every unit of work. And, and I think there’s a, there’s a lot of nuances here that are related to costs that are related to other aspects. So I think what you’re kind of describing is a lot of those areas of efficiency that I want to break to make sure that people are capturing the lessons learned from, from your journey. So again, just let’s go one step at a time, you started with a windows server and then the next one was Kubernetes.
Eitan: No, Kubernetes was not.
Nati: And today would you consider Go, or are you happy with the choice of platform?
Eitan: I would probably would have considered also Go, but we are quite happy with that choice of Python, because like I said, it’s very good for building fast researcher capabilities, but also we implement, we implemented some of the core pure algorithmic parts. We’ve built our own C plus plus library, which is triggered from the Python. So we knew we always have that option that we can implement specific power that are very math-oriented and it’s like a black box input-output in C plus plus and we’ve done that in order to overcome some of the disadvantages in the very, you know. .
Nati: Okay. So we talked about the, we talked about the language, now let’s switch about the other parts before we get into the Kubernetes part. So that’s Mongo DB why Mongo?
Eitan: Very similar to the pipeline decision. We said we need to move fast. We don’t know exactly how the data is going to look within a month, two months or four months from now. We know what we want now, and we need it to adapt very fast and to basically don’t really have a schema and don’t want to maintain any complicated architecture of, I don’t know, sequel servers in order to, to, to support what we will need in the future and we said at that point of time, Mongo did not even have transactions, but our type of work, we said, we don’t really need transaction as long as we keep working with single documents and have automatic operations and it was an easy choice. Clearly, things have evolved and now we do have operations that are more than one document and we need transactions. Whether or not that would be, is the best choice for everything that we have now, no, and we are not using solely Mongo DB , also postgreSQL, depends on the use case. But when you want to run fast and you don’t want to take too many architecture decisions so early on when you don’t even have a product that’s a very good choice.
Nati: Using something like Hadoop or, or a framework like Spark, we considered any of that or?
Eitan: At that point of time, we know it wasn’t relevant. Even at that point of time, we didn’t have a, the system was not distributed across many machines. Like I said, the Lambda came at the latest data. At that point of time, when you did an optimization, it was isolated to a single, very strong computer with many calls and it was paralyzed within that computer using its eight or 16 calls and from that, we jumped to using Lambda and then distributed a lot and some more with many, many more computers. Clearly, there are many stages in that algorithm, some of them use numbers, some of them use other things.
Nati: So we’re getting to the Kubernetes side, so you think; you didn’t jump directly into Lambda right. You were going through Kubernetes and then Lambda?
Eitan: Yeah, we started with very early on before Kubernetes was mature or even exist. We started with Dockers in 2014. We managed them with Ansible and the cloud formation and all the deployment was done on our own. We didn’t even manage, we didn’t even think about managing auto-scaling at that point of time, when you have one or two customers, you don’t really care about scaling. You just put one server and now it doesn’t, it can’t meet up with the load. So we put two servers and then three servers and then said, okay, now it’s time to start to actually move to auto-scaling and that exactly came with the time where we moved to Kubernetes. Now the reason that we put hard-coded servers, because scaling in our case it’s not easy. It’s not like, okay, using a lot of CPU let’s, let’s start another server. It’s because every time you have a request, it’s going to use about 100% of the CPU of that machine. So that will actually cause infinite scale. If I would just look at the, at this metric and every time that I get a new request, it will say, oh, I’m using 100% on the CPU. Let’s add another computer. So we actually need to scale based on our, a queue of incoming requests and this is when we moved to Kubernetes. We are using Celery and we actually monitored the queue state of the Celery, how many tasks are pending, and that actually triggers an escape event and since each task requires sometimes it’s enough that you have one task, but then we need to scale. So that’s when we moved to glass and moved everything to Kubernetes, our web servers clearly when, and all of that, moved to Kubernetes, but these ones don’t need really to scale that much because the number of concurrent users that we have don’t create significant long on the web servers that the HTTP request of that in mind.
Nati: Yeah. Yeah, yeah. Yeah. So let’s talk about the, the Kubernetes scaling aspect. So Kubernetes is known to have a lot of these other scaling built in, let’s talk about the difference. What is auto-scaling Kubernetes versus what you’ve experienced with Lambda and obviously what is the limitation with other scaling within Kubernetes?
Eitan: Yeah, so the type of scaling that we need is not based on the basic metrics that you would look on a computer, a machine, which is like the memory that is being used or the, the CPU.
Nati: That’s what Kubernetes is using, right?
Eitan: Yeah, that’s the basic ones that you see that your server is loaded with a lot of request. It’s usually typically starts to create a spike in memory or not even spike, just a gradual increase of memory and CPU and you say, okay, let’s have another machine to load to spread the load between the machines.
Nati: Again, just to clarify that. So in Kubernetes, we have something that is called replica set, and we have a way to create the wrench of that replica set and Kubernetes have some measurement to know when to increase the number of instances in that, within that boundaries of that replica set. The different ones, because it’s supposed to be simple, the different ones are based on CPU or memory and that’s basically how Kubernetes does the scaling, right?
Nati: Okay. So let’s talk about what are the challenges with that model? One, one, one challenge that you mentioned is the metric itself. So when you’re doing batch operation, CPU spike wouldn’t necessarily be the accurate measure to trigger scale and you mentioned that in this case, it’s the length of the job, in this case, stuff’s in a queue that is more accurate measurement to trigger scale, right and that’s not out of the box within Kubernetes and you have to kind of write your own policy or scanning policy to do that, right?
Eitan: Yeah, yeah, exactly. So in our case, because each tasks, it takes a significant amount of time, it’s not a one second task and it requires a lot of CPU. Then the way to scale it will actually take the full amount of CPU that you give that it reaches one worker node and it will just take the entire CPU of that node because it needs that CPU so you need to scale for each task. You need to add the node that handles that. In our case, the Lambda is only one part of that grid and we still have a lot of other parts. So we still use Kubernetes and we have a custom scaling logic, like you mentioned, which monitors the queue state of the rabbit and queue to know if there is a pending tasks and then escape on demand on that and that task will be allocated to that computer, to that pod within that node and that task…
Nati: The map is kind of in, sorry for interrupting you, I’m just wanting to make sure that I’m capturing those points. So the map would probably be Lambda and this would be done in, in Kubernetes, is that?
Eitan: Yes, exactly. Okay. And other things as well, but, but yeah, so that starts the process and that, and that pod would trigger this Lambda burst on the, when it knew, when it’s needed as part of the process, the process itself has a lot of stages and that part is not done in Kubernetes and before we’ve used Lambda, we’ve actually used that machine resources in order to scale. So we were within Kubernetes, we started a task, that task was assigned to a server, that server, that call headed of to a CPU and it started to, to, to do its work and in doing that work, it had do, it has done this, this, this parallel part within its own calls and we moved that part to actually Lambda and that have many more calls and can they, using an a lot more CPU and much, much more efficiently as I’ve described.
Nati: Excellent. So basically the skinny model of, in a, a map reduced type of scaling we have a couple of factors that we need to take into considering and that’s where if you like the separation between mapping reduced and becomes interesting. In Kubernetes itself, whenever you spawn a new instance you have to take into account the time it takes to actually provision a new container, which takes, let’s say at minimum 30 seconds, if I recall correctly in our previous discussion, right?
Eitan: Something like that yes.
Nati: So that’s, yeah, something like that. So that gives you like for every virus that you need to take, you need to, let’s say if the entire process let’s take, take one second. Just you know, taking it to extreme, obviously thirty second of boot time becomes very significant factor. If the, if the process takes an hour then 30 seconds is negligible, and that’s why in, in, in the map process, probably I’m assuming that the process is relatively short then thirty seconds becomes a good portion of that, of that portion and then that’s why thirty second is very significant. In the case of reduce it’s probably because it’s doing a lot of the aggregation process, it’s probably longer and that’s where the, the Bhutan has less of an impact on the overall processing and that’s where the Kubernetes scaling while they would probably be good enough. And the other thing that I think you mentioned is that Lambda in general deals I mean, in, in the case of Kubernetes, you have to kind of program a lot of things yourself, especially when you’re talking about the affinity, anti-affinity. Again, those are not familiar with the terms, affinity, anti-affinity, meaning that you don’t want to run the same process at the same place, the same server or same data center. There’s a lot of those aspects that are curved out for you in Lambda, walk us through to the model of Lambda versus Kubernetes in that sense, what does Lambda saves you beyond the boot time?
Eitan: So Lambda basically, you know, hides everything that you’ve just said from you. I just asked for one Lambda and I get one Lambda, I asked for one hundred, I’m going to get 100 Lambdas. I don’t need to think about where these 100 Lambdas are going to be created, which machine is going to host them. Do I have enough machines available at my pool to host them? It’s just as simple as that, just run this piece of code 200 times, 300 times, and all the heavy lifting is being done by Amazon, very simple. When you’re on with Kubernetes, yeah, the auto-scaling itself, how to auto-scale and, and when to auto-scale, it’s being done by Kubernetes, but you still need to supply Kubernetes with a lot of information and settings in order to support it. So one basic thing is that you need to specify the node pools and you need to limit them and specify what is the minimum, what is the upper limit and these posts are being shared by many different operations. So you need to do a lot of fine tuning in order to find the right number and what happens if now I need a bigger burst and I did not define that. So I need to increase the capacity, but what if I have a bug in my code, which is now causing some other service to deploy 100 replicas. So you want to limit it, not to overflow your accounts.
So you typically put a limit, you could say, let’s have unlimited pools, but you’re are in a risk of getting unlimited resources for that reason. That’s one thing and the metric in which how you decide to scale out, it’s not, it’s not like you tell Kubernetes, please give me 200 codes to do these tasks. It’s you need to trigger something that’s monitors, something metric or something that would understand, okay, I now need 200 items to do that. It’s not designed for that kind of work. Clearly it can be done, but it’s a lot of things that you need to manage on your own, and you would still need to pay for the ramp up time and for the tail downtime. So the combination, both of course in terms of resources and cost in terms of maintaining needs and building these capabilities and making sure it works and that’s also a significant cost in terms of, you know, man, man hours that needs to be done for that.
Nati: Excellent. So, so I think we went through quite a length between Lambda and Kubernetes, and maybe just one thing about the caveat on Lambda services, it is more complex to develop Lambda in terms of debugging, understanding what’s going on. So it’s not like a free meal completely, as I think, you know, people may think.
Eitan: Yeah, that’s true. Also the cost of Lambda is flat. Like you said in the beginning, if I ever, if the task at the end would be very long and it will not be scoped within let’s say half a minute or a minute, if it starts to become 10 minutes, 15 minutes, then the cost of having a 200 or 1000 Lambda’s up and running for 15 minutes, each one of them is clearly much more expensive than having spot instances with the same equivalent CPU, power running and that’s the trade off that you need to find a sweet spot where it makes sense to stop using the Lambda for that point, for that, for that purpose and still revert to using Kubernetes and things like that and pay the overheads like I just mentioned and yeah, debugging a Lambda is trickier and you need to use a lot of monitorings and APM tools in order to get insights into your Lambda and that’s stickier. But that world is also evolving and today we even have a new product that we’re developing and it’s pure serverless. Everything is Lambda and you know, you learn how to do it, it has its advantages, the disadvantages, clearly.
Nati: Would it be like, there would be a point in time in which you would say everything can be serverless?
Eitan: I think the difference would be if we, if the task is longer running or short running, that’s one of, one of the main things. So if you have only very short running tasks and call start is, call start in terms of, you know, if you need always below one millisecond latency, then you need to have something ready there always. You cannot have the cold stuff that says, okay, now I needed to spawn a new Lambda and it took me 200 milliseconds or 300 milliseconds. So it depends if, if you can live with that in our case, clearly we can live with that because it’s a long running tasks and we have places in our code with that we have actually in memory cache because we want the manual, the operation that the user is doing, let’s say he’s interacting manually with the system, because we need to load a lot of state in order to be able to provide the recommendations in real-time and things like that. So we keep the data in memory in, not in, in memory cache it’s even visualized in the heap of the process and we need to make sure it’s live as long as the user is walking with it. If you use Lambdas you don’t control the life cycle and it can die and stop, you can every time get a different Lambda. So in that you use case when we need a long running process with state in memory state, clearly Lambda is not a good choice and if you cannot bear the cold starts, sometimes it’s rare, but if you need like 99.99% that the below one millisecond or something like that, the Lambda would not be a good choice for that.
Nati: Excellent. So I think we kind of covered the main topic for this discussion, which is fascinating, have the off scaling and scale and I think you could see a load in your answers, and one thing that I think is important here is that there’s no one solution for everything when it comes to scale, you have to obviously optimize the scale architecture to the right, to the right problem. So there’s definitely, like a lesson learned on how you do scale for, I call it batch processing, but the right to say that it’s not the, a nightly build batch processes. Still, there’s a lot of those aspects of scanning that are different than if I would do it for large concurrent users, like in the case of Facebook type of a scale and, and the other thing that I think, again, just summarizing what we discussed efficiency at scale is very important because it sums up at the end of the day, the cost and and also in terms of the time it takes to actually process this entire thing, the time or latency of processing it again, it’s not the latency in, in fraction of millisecond, like in some systems, but it’s a latency on how much use is going to get the response. Obviously when you have a lot of tasks that latency accumulates to a large number, and that means that the, the speed of response would be much lower over time, especially when talking about the distributed system.
So there’s a lot of those consideration of the different, hopefully the story of Optimus is a, is a lesson for a lot of other people that would be not necessarily building traffic and compete with you traffic possession, but anyone who is doing that best processing AI and others, because I think a lot of those lessons are still fairly broad and generic. As a wrap-up up kind of you know, using in the podcast, the kind of open-ended question about, you know, in retrospect, what would you do differently and obviously this usually kind of leads to a different type of discussion because it’s the type of things that open your mind to areas where you’re not bounded by reality.
Eitan: Yeah, I think I would, in, in, I would pick my co-founder hat answering that and not necessarily the CTO technological, but I think it’s more interesting depends on the, on the scale. So I think that one thing that I’ve learned is that you should raise as much money as you can early on instead of trying to raise as little as you can and clearly it depends, there is a trade-off if you raise too much money, you start to become very inefficient and you think you have all this time because you have a lot of money in the back, so it needs, it needs to make sense, but, but the reality says that you will need more people than you think in order to do what you want. So I would start rephrasing more money early on. We didn’t raise that much money when we started by design. Not because we couldn’t, this was kind of the strategy, which I think it was wrong. Another thing that I’ve learned…
Nati: Here we talk about scale of the business and the strategy to scale a business, not facilitate it?
Eitan: Yeah, and the other thing is that when you start, when you build, clearly you just, the main focus is getting a product out there, getting your feedback and be able to build something that has a good product market fit and brings real value when someone really wants it and you don’t need to focus on what is the best architecture and how the right documentation in order to make sure that someone else can read and understand what I’ve done. You know, at that point of time, you need to prove that you actually built, you can build something that makes sense and all these kind of things should, and typically are getting neglected. Like I said, we started with a single server that has no scaling because who cares? I have one customer, he can have this server for his own, right. But at some point of time, when you do start to scale, it’s, you need to find the right time that you start to be better prepared to actually transfer all the engineering knowledge that you have in your mind if you are the one that has built it, that you will not be a bottleneck. That other can actually pick up and scale down in the organization and do the work without everything being you know, blocked by your approval or your knowledge, or pick up doing something with is completely wrong or 10X the time, because they don’t know that what is the exact thing to do? So that’s something that’s only now I think only now, but we are only in the last years, we started to do a lot better.
But you need to find the right time, which is not too early and not too late. It’s hard to say exactly when, you know, they’re going to be the bottleneck for the scale of their R and D. And the last thing I would say is as a founder, even though I’m a technological founder at the beginning, and in the very first few years, I was also a salesperson, I was also doing many things, HR and all of that, which is not really technologically oriented, but the main thing that I think founders are good at is actually selling the product because they know exactly what the product can do or cannot do. They, they are you know, they have credibility as long as you don’t just tell the stories and the lives, but you need to find the balance between being credible and clear and not be too honest, because then it’s going to be very hard to sell it your first year. But I was very involved and on top of everything and actually building the understanding of how the ecosystem works, when a potential customer asks me X, I know exactly actually meant something else and I know how to answer the question in a way that builds the confidence and divert the conversation to the correct place and it’s very easy….
Nati: The right question kind of a thing?
Eitan: Yeah, or understanding what really stands behind that question and that’s because you do it so many times that you understand it, and when you scale, you, you always start to bring in a lot of salespeople and, and sales experts and VP of sales and they, it’s easy to let go and say, okay, this is your target, reports to me when you know, when you’ve met or didn’t meet them. But they don’t have the, the Optimus, or your company’s experience when they come, they may have, they have good much, you know, better experience than me in selling stuff, but they don’t know to understand the specific question that I’ve just said, because they didn’t have enough exposure and it’s helpful then to see the red signs, which are not the classical salespeople, you know, that the typical one of is not answering me, or a lot of things that are repetitive. There’s somethings that are very specific to your company and you need to make sure that until they gain enough experience you don’t just disappear and stop being in that meetings, you need to teach them and once they have experience and they start to bring people, they will teach them the new ones. But just being in a very good sales person at the beginning, you still need to be on top of it for at least the first year.
Nati: Basically being hands on, not just technically, but also on the sales side is a very important thing to have a, the right product match and the right product fit, market product fit, because that’s basically where you get the feedback loop from a market for your product and those types of discussions. I completely resonate with that. I think that’s kind of been a good journey of discussion. As I expected, I really enjoyed it and I’m sure we’re going to have more of that and probably other topics. I would say that Eitan is also writing nice blogs, so that’s was the actual trigger for this specific podcast. So check out the Eitan blog, I’ll put a reference to that in the in the note section in this podcast. So thank you very much, really a pleasure.
Eitan: Thank you. It was a pleasure and always nice talking to you.
Jonny: Thanks guys, once again, for a fantastic episode. Okay, as usual, all supporting materials for this can be found cloudify.co/podcast. If there is anything you feel that we need to be talking about on this podcast, please reach out to us firstname.lastname@example.org. In the meantime, stay happy, healthy, safe, and we will look forward to catching you on the next session.
Outro: This episode has been brought to you by Cloudify, the one platform you need to accelerate, migrate, and manage your multi-cloud environment for faster deployments, optimized costs, and total compliance, head to cloudify.co to learn more.