Podcast | Episode Sixteen- the Multi Product SaaS Challenge ft. Eyal Fingold @ Check Point Software

Expanding on the last session which dealt with transforming a single product into SaaS, this session features Eyal Fingold, VP R&D Cloud Security Products at Check Point Software and dives into turning 50+ products into SaaS …and its challenges; going beyond the recently discussed ‘trillion dollar paradox’.

Transcript:

Nati: Hey, hello everybody. Welcome to the Cloudify Tech Talk Podcast with me and Eyal from Checkpoint. We’ll be discussing today, last time we discussed how we transformed a single product, elastic specifically with Uri Cohen and how we transformed it into a SAS. In this discussion, we’re actually going to expand on that topic and in this case, Eyal from Checkpoint is going to talk to us on a significantly bigger challenge. How do we take not just a single product and turn it into a SAS, but a 50 or even more product and turn them into SAS which obviously creates a whole new set of challenges that we’ll be covering today and I’ll let Eyal start with the introduction of itself. 

Eyal: First of all, thanks for hosting, very nice being here. So my name is Eyal, I’m VP Cloud Security Products at Checkpoint. I’ve been here for around two and a half years, actually joined from the Dome9 position,I was the VP R&D at Dome9 and I believe that the cloud sort of products in Checkpoint, we’ll talk about it maybe a bit. That’s pretty much about me. 

Nati: Let’s talk about Checkpoint and it’s product. I mean, people Know Checkpoint from the firewall, I’m not sure how many people are familiar with Dome9 and how many people are familiar with the rest of the products that checkpoint has, we mentioned 50 products. So we’re probably not going to go through them one by one, but at least let’s cover the families.

Eyal: Yes. So actually Checkpoints,three main sorts of pillars of products. You mentioned the network security, that’s the pillar of quantum, that’s sort of the core place where the Check Point started with our firewall, our access control, IPS and tracking intended, threat prevention and a set of different gateway starting from leaving IOT towards SMB and big firewalls and you can manage them. So that’s sort of the quantum pillar. We have the Harmony pillar, which is focusing on remote workforce security, could be endpoints that you install on your laptop or mobile, could be remote access to either VPN or solution, we bought auto for example with that piece. Also with Office 365 emails and CASP solutions that we have. So in the entire suit of harmony covered that set of security solutions. Then we have Cloud garden I’ll dive a bit more.

So first of all, we have the gateway, the traditional checkpoint firewall adjusted to the cloud, ability to run it as virtual deploy it in different scenarios, whether it’s a transit gateway, or just a simple VM and it understands the cloud assets. So you can actually make sure that in the management, you actually relate to the assets that you have, and understand them very well. Then we have a set of capabilities coming from what were the Dome9 solutions. So starting with a network security group management, actually giving you control, giving security people control over managing their network segmentation through security groups. We have Posture Management, whether it’s allowing you to set up a security guard rails on how your environment, how you want it to look. You can transform it into a compliance, other people know this from the compliance speeds, how those Garbers will translate into a PCI or HIPAA or other set of regulations, and then how you  automatically can remediate it. We have a box that they actually contribute. The classic example of your S3 is not encrypted, not tagged as an exception; you automatically want to make sure it’s encrypted at first. If it’s violated those set of rules that you decided to put on it. 

We have there what we call the intelligence piece. It’s actually the different cloud providers taking the cloud events or unless you understand the CloudTrail or the network traffic or the flow logs, actually give them a lot of context because we understand the environment. So we know if a certain action was done from the VM or a Lambda function or user that’s issued a token and then use it later or understand the different source IPs destination of the flow logs, give this context, add the checkpoint threat intelligence information that we have about what we know from the sources and then we can actually have a set of detection capabilities. Whether some of them are logical things that we wrote logic. Some of them are AI-driven, discovering anomalies in action and stuff and then issuing a security event that might also trigger a remediation action. 

We have lately also added a lot of work security down on the workload production. We acquired Portago a couple of years ago, which gives you serverless security. We have on that two main capability, one is posture related. We analyze the code; we understand which APIs you’re using. What is the security role assigned to that Lambda and we can actually recommend changes and fixes. We also understand, of course, the SharePoint libraries being used and their vulnerabilities, their recommendation fixing it. So a lot of poster about the code, but also we have runtime piece; the runtime piece actually, and I believe too; first it covers the top security attacks and enables you to mitigate them but what I like about it, it’s actually learning the behavior of the function and on such workloads it’s easy actually to learn them. So after creating this baseline, if we see a deviation from that, we actually can block it. 

So assume you are always getting data from a dynamo DB and then you’re trying to put some data there, which is an anomaly. We can block that action or any other HTTP stuff that you’re trying to do through microservices and lately we actually launched a container-based security. So there, we have first of all, compliance for your container environment, you’re running Kubernetes or managed Kubernetes in the cloud. We understand the environment and the same capabilities we have about Posture Management are applicable now on containers. We also have image assurance, so we actually can scan the images, whether they’re on CICD or production and it gives you a set of vulnerabilities that are coming from whatever third party there is, password leaks or things that we discovered in the image  assurance and we also have admission control capabilities, actually, any action you’re trying to do on that level of the control plane of the Kubernetes we can identify and they allow you to put Garber. 

So starting with a simple thing, like deploying images only from your repository or more advanced cases that you can apply and we are now going to also release the same capabilities I described about the learning of the Lambda applicable to a runtime container. So creating a base of how the container actually behaves from process and file perspective and then allowing you to actually identify problems and blocking them. So those are sort of the set of workload capabilities and we also have more Cloud capabilities, we have an NVR capability in a sense, similar to what I’ve described with flow logs that analyzing the flow, this is actually doing by tapping into the actual traffic itself and allowing you to have the same gateway capabilities, but for mirroring traffic and not actually having a gateway in the middle and we also have application security. We just learned that as well in a modern web application security installed that means signatures and all the time managing the rules of your WAF, actually it’s an an I-based capability as we are learning a lot of the behavior and we actually are able to do the OWASP Top Ten, without UN needing to configure anything about your security. We can deploy inside your Kubernetes, or is it a gateway. So that sort of set of capabilities are on our cloud security.

Nati: Oh, this is quite impressive and now I understand, I think, better the challenge that you discuss here right now because I understand that probably all of those products have been written a different, I would say time, they have different maturity cycles and different stages in which they’re, I would say where they are in the SAS journey. It is also interesting by the way to hear all that and kind of talk about all those security products being delivered as SAS, because I think that it’s part of the things that when we talked about Cloud and SAS in earlier days, that was always one of the last part that you would think that would run on the cloud. But I think as the vast majority of applications are now running on the cloud, there’s no question that this is where it should be living. The question I think that we’ve been discussing here is this is the suite of applications, which is really big and you’ve taken a decision to move, basically be a SAS first company, right?

Eyal: In a sense, yes, I think it was done a few years ago. What we have is what we call our infinity board now where you can actually subscribe and then you can have all of those sorts of services free there, no, you can try them out, you can actually understand if they work for you and then decide if you want to purchase them and which service. So similar sort of concept as you look at the cloud providers themselves. You have those sets of services, you can try them out, and then you can decide what works for you. All of them have the same user and tenant management, same usability, same concept in a lot of ways and it’s not just about the new services that we provide. We also do, for example, we discussed network security, and one of the things that we’re Checkpoint was sure of throughout the years was our management capabilities, the management of the gateways and now this is also available as a SAS service. 

So instead of having your on-prem server, you need to manage it yourself. You can actually get a service, which actually gives you what you need. So, yes, it’s a decision and I think it’s very important as today, companies struggle not just to find a security solution, but also to have everything in one place. Customers don’t want so many vendors that do so many things for them and then when they need to actually interact with them, because you have one event in one vendor and another event that actually correlates together those are different systems. So actually vendors want to have a consolidation of solution, ease of use and I think that’s one of the major things that we’re driving to within the portal

Nati: Interesting we’ll touch on the infinity portal in a second, but what we’re saying is that part of the journey that you’re describing here is not just how to move 60 applications into SAS, a kind of a by-product where maybe the main target is also to create synergies between them and integrate them and not just look at them as separate silos that are moving each on their own pace into cloud. So let’s talk about the challenges first and then talk about the infinity portal as a way for you to create some common ground for those services. So if we look at that from a challenges perspective, there’s obviously the cloud-native transformation, and then how do you actually move all of those services into kind of a cloud-native type of model, and also how they deal with some of the common concerns, which is multi tenancy and how do you actually also make it more developer-centric? So maybe you could describe some examples of how this was done before, and what’s the challenge of transformation and I think it will make it easier to understand the challenge itself. Both with the cloud-native transformation, but also how do we make it accessible for developers and what does it mean to make it accessible to developers? I’m assuming it was accessible for developers before, but there was a twist in the story.

Eyal: I think; so we have this organization that actually owns and develops the main foundation for this what I call the portal. But it’s actually a unified infrastructure that gives you the basic set of services. So you mentioned the tenant management that’s one of the items and then developers that actually want to onboard and start developing applications. We have a notion of service developers or groups or teams can actually start developing a new service. The services on boards into the platform are actually a set of basic stuff with the gains from this platform. So you mentioned tenant management is one of them, but it also includes single sign-on services, including a set of sort of ,looking for the right terminology ,guidance on how this application should be sort of written.

Nati: Maybe Eyal, just before we jump into the solution, how was this done before? Like when I was a developer in checkpoint and I wanted to get access to surveys, what was the process before we had that platform and before you’ve done that transition?

Eyal: So first, to be honest, I don’t know. When actually I joined it was already part of the making, it was developed. It started, I guess, with a set of, trying to look back at one of the services that was on boarded there was our custom solution and looking at protecting Office 365. So I guess a group of developers started developing this as a solution, as part of the roadmap in a discussion they kept with the pros and that was sort of probably this and few other projects that started where the foundation for this process.

Nati: What was the problem that you were trying to solve that includes taking a lot of time for developers to get set-up of the environment, or it wasn’t really controlled, or…?

Eyal: From day one, it was set in mind that we want to find a place to have it. So I guess, that the need to actually have one solution for the customer and the customer having the same capabilities and having to work on the seminar itself. So the logs and the events and the mandate and the policy that you manage relates to the same object was one of the main goals from day one. It’s something that is sort of the foundation of the management that works in checkpoint and that’s sort of was the goal to start with.

Nati: So basically the answer that you had there, and again, I’ll try to cover some of the challenges there, generic challenges that we’re discussing right now, but it sounds like because you had this goal to not just transform them into SAS, but also integrate them so that you can have cross-value-added services between them, what you’ve mapped is the generic concerns versus the specific application concerns. In the generic concerns there’s the tenant management, there’s the logo aggregation, and things of that line and I remember also monitoring troubleshooting, those are general concerns, right? 

Eyal: It’s how you take it from, it’s not just the developer’s concern, it’s also the operational concerns as you mentioned more during troubleshooting, but it also goes into the sales at the end and how you integrate with sales tools and sales processes and support processes. So again, look at the customer, he has an issue with one of the services. He wants to go into one place and not every product has its own place to open a support ticket. So it even goes into this sort of process of how to make a purchase. You have a license, you want to put the license in,  you want to have everything in one same experience and also how we troubleshoot stuff from the customer perspective, not the developer. So there’s the developer perspective troubleshooting its own service, how you actually make sure the entire organization works with the same operational processes, but also how you troubleshoot a customer issue because it could be the infrastructure, it could be the specific service or the integration to another service. So how you go through this entire process, that’s part of it as well.

Nati: Interesting. And usually when it comes to, I mean, it sounds very trivial to say, okay, there are a lot of common concerns, and now let’s put a common platform to support that. We know the experience with platform as a service pass for those well, let’s remember that it was a very painful process. I think because there’s always this tension between having a common platform and agility, like each product has its own aid and wants to move in its own pace and at the same time, having a common platform kind of conflicts with that to some degree, because it does create this central unit of control that everyone needs to synchronize with. So how did you go about that? How do you maintain velocity and obviously agility and have this common central control?

Eyal: That’s a great question. I think that at the end, in one word it’s balance and it’s always balance. I think other than making sure each service on board the basics, which is the user tenant management, the security mechanism that we have because we want to make each service secure and have that as one of it. It has its freedom to choose when and what services you consume. So for example, we mentioned, we have a unified DevOps team that helps your services to the new product. So we have what we call a list of approved services and if you want to use them, you’ll be able to get services from the DevOps group and it will help you facilitate, let’s say let it be. you want to use or another service that you want to use.

On the other hand, if you have your own needs and you want to do something else, then you can actually do it yourself as long as you comply with the set of processes. You want to make sure, again, passing some stuff, but at the end of the day, it’s your decision how you actually want to use it. It will give you a set of monitoring capabilities to elastics and stuff, but you can actually have your own if you need more or you have other needs that you want. So that’s really up to the group that developed what they use.Also the ‘Why’ for the end of the day. So we have the same Why experience. We have abilities to actually, we have shared components but if you have reports or stuff that you want to do on your own, you actually build on your own.

Again, you can contribute it back into our sort of community, so someone else can use them, but you still have the freedom to develop what you need.  We mentioned monitoring and troubleshooting. It’s really up to; there is an application that is significantly more mature, it’s already onboarded a lot of customers, they have different challenges than what we call out. We have services that are in a lot more, they are running through some few EA [inaudible 22:44] versions or POCs and running with customers, their different level of maturities so their troubleshooting and monitoring needs are really different.

Nati: Interesting. So back to the platform, as a service example, I would say if I summarize what you just said, that there are quite amount of differences here. First of all, it’s not mandatory, it’s not a platform in the sense that it kind of obstruct the cloud, and now you run things on that platform and it basically consumes the cloud resources for you, which was the problem with the platform as a service because it really creates this least common denominator and it really limits what you can do with the actual underlying infrastructure. In your case, it’s more of an incentive kind of model in which you’re allowing users to use the certified services, but they don’t have to and obviously, the incentive, if they do choose to use that, then they have it managed internally and they can have the support for it. If they don’t, they take ownership and they have to go through this compliance process themselves. 

So it’s not like you’re creating a central control that everyone needs to go to, it’s more of a center of excellence in a sense in which you have common practices, kind of get it together and different teams can choose to use it or not use it at their own pace, which is interesting. In terms of the certified environment as mentioned, I think there were a couple of other services. Tell me a little bit about that, what is a certified environment and when do you decide what to make certified and what not?

Eyal: So I think it’s difference, So that’s a bit going we’re running mainly at the moment on AWS. We have few services that we are providing in disciplined nature and if someone actually develops the underlying DevOps using Terraform to do deployments and to actually control the infrastructure, it’s important as we are managing Gil location data, data centers, and we want to deploy a new data center, or even if you want to have your own dev or staging environment that you want to integrate with another service and you need that service. So they’re actually facilitating this entire process. So it’s really important that when you’re talking about certified services, it’s actually things that the DevOps group owns specifically, and they help you do that. So there is a sort of, when customers go into it from a user perspective, whether it’s the browser, they go into a set of security gateway and services to one account that actually provision and different AWS accounts for the applications and their workload.

So assuming I’m an application and my role runs on containers. So Kubernetes is one of the sort of certified services you can use and consume. You don’t need to set it up. You don’t need to, actually; what you’re worrying about are the deployments, you don’t need to know anything about how this sort of facilitated for you in that sense and we actually, with the monitoring automatically of everything and sort of like a managed Kubernetes service in that sense as well that our own DevOps own it. On the other end, if you have specific needs and you need to run your own VM, or you want to go into Lambda and then you have specific needs. So for example, that might be better supported into some level or that you need to have it yourself, and then you probably need to write your own Terraform file to make sure it’s part of this deployment, if you want to make sure it’s going through this entire process.

So I don’t know if that sort of explains how it’s been done. So when it can be Cassandra or MongoDB how we choose which services is probably based on demand. So if there are a few groups that need specific services that we are not supporting, we’ll consider and it also of course is our ability to execute it. So sometimes we don’t have the capacity to handle more. So we actually need to actually cut where we weren’t doing work. So that’s sort of the decision points on what DevOps, the main group will own. Also, you mentioned about DevOps, some of them, some of the products are really, as I said, maturity level, they have their own DevOps as well, because they are big enough or they’re matured enough and they facilitate their own DevOps as well. So they get guidelines and they get help, but actually they have their own people to execute it.

Nati: So basically what you’re saying is that from a certified environment you’re covering, and again, that’s most of based on demand, it’s more templatized or pre-templatized environment that someone already had secured and whatever and in that context, it covers certain database I’m assuming, Kubernetes clusters that you mentioned and the service that users get is they don’t have to worry about how the database has been instantiated and managed. It is basically just up to the endpoint and the certified thing takes care of it and that’s how you have some level of control of security concerns and potentially cost and other things, and obviously ongoing support for that service. That’s interesting. The other thing that I think you mentioned is that this is not necessarily just centralized DevOps, there are also other groups that are doing DevOps, and this really brings me to a discussion that we’ll probably open more in a second, really.

How does an organization like yourself control such an environment where there are so many people involved in accessing infrastructure? I mean, I think that’s one of the things that is very different in clouds that people have more freedom to do their own thing and cloud obviously encourages you to do more spinoff environments, spin off infrastructure, spinning off of Kubernetes cluster, spinning of Flandreau services, spinning off whatever and I’m trying to see later when we get to that part how does that work when you actually need to have some degree of control over all this. Before we get to that, let’s talk about the environment itself. So you mentioned Kubernetes, so how many Kubernetes clusters do you have and which type of Kubernetes, is it EKS or fire gate or whatever?

Eyal: Well, we have a few dozens of clusters, a few hundreds. I don’t remember actually the right number at the moment in front of me. It’s really up to which environment there is. As I mentioned, we have multiple data. So in the production environment, we have multiple data centers on the road. We have also in Australia, US, UK, different data centers that we support around the world. 

Nati: It’s mostly for regulation purposes, right?

Eyal: Most of the customers are actually probably due to data residency. There are some products we talked about the remote access, so there are also products that also have this concern to making sure the pole that they actually connect to is near them and then there’s latency and stuff, but that’s a more of a specific service requirement and he might have more coverage than actual the data center that provides management capabilities. Through those data centers, data residency regulations that are coming in strongly more and more in the last year, year and a half.

Nati: So when you mentioned all the regions, it’s all still under AWS, right? You don’t run on multi-core multiple groups?

Eyal: Well, it actually depends on the service. So for example, we use some extended data explorers, Azure data explorer, So they need to cover the same regions as well. It really depends on the service, but there’s no way that customer data that needs to sit on a specific data center will go out and sell us in because of the underlying technology that we use. 

Nati: It’s interesting because what I’m seeing on the region side is that sometimes customers also have opinions about the cloud infrastructure. So in some SAS companies that they’ve been mandated to make the data available, not just on a specific region, but also on a specific cloud infrastructure. I’m not sure if you’ve seen something like that, it sounds like you didn’t hit that requirement. 

Eyal: We didn’t get that that requirement. As much as I call, it’s mainly about the data residency itself and that sort of part, if, for example, there’s themself running on a specific cloud provider and we provide them the security, whether it’s through API security that covers connecting to the cloud provider itself, whether it’s through agents, as long as we give them the security they need to, a security solution and the data stays, that’s the main concern and if I go back to the questions, so the production, therefore, all the different clusters, this is why im saying it’s hundreds of clusters depends on where there is the production, there is the staging, there is ad hoc development. So some of the group actually have the capability to actually spawn their own set of environment. Let’s take for example, the Posture Management on Dome9, they can actually with a click of a button and everything is Terraformised, they can actually create their own environment for either development or testing and it’s purely that and some the different groups actually can work with the localized environment to Docker explodes and stuff like that. So that’s sort of the range. 

Nati: Again, let me just summarize what we just said, and then we’ll move to the next thing. So you mentioned…

Eyal: Yes, just one thing and then you mentioned technology. So we actually run either our own cluster. Those actually are a bit older cluster that we are trying to move out of but we do have our own clusters that just wrapping up the ends and then install Kubernetes on them. We are using a lot of EKS and in some, we actually use the fire gate version of the EKS. And of course, there is the extreme some of the product we mentioned, Protego for example, that we bought, their entire environment is pure Lambda. So going into that and we also have a lot of use of Lambda, again the Posture in the dome9. So the range of technologies is also different and it’s not just the number of clusters or the number of workloads.

Nati: Yes, so I think it’s interesting because there’s always this question of can we run everything on one Kubernetes? Can we run everything on Terraform? What I’m seeing over and more, especially in companies your of size is that it’s never one thing, it’s many things. So for example, why not use one big Kubernetes cluster with namespaces to manage all those things?

Eyal: I think isolation is one of the key points when you run complex environments. I would even take it into, do you run your workload on one AWS environment or many AWS environments? So it’s for me I usually, again, it’s a balance. You don’t want every workload to have its own cluster. But I do think that trying to create a spread or different microservices should have their own clusters and actually this creates a separation. It could be a separation of costs that actually resulted in having your own AWS accounts and you can allocate the cost and understand which service and how much it costs, and stuff like that. But also just to make sure that if you have one issue, you have a production issue with one service, it does not affect other services and we want to think that Kubernetes will orchestrate everything for us. It does a good job on memory and CPU. It does less of a good job on networking. So for example, if your workload is heavily network-based then the distribution of that is a bit complex and probably needs to be separated. 

So I think at the end, there’s no sort of a magic number, how many clusters you should have, but I do think you need to separate the cluster to different concerns, could be service-oriented, could be the type of what you’re doing is different and I don’t see necessarily the need to actually consolidate. Consolidation doesn’t mean you have better visibility.

Nati: That’s actually a very good point and in terms of you also mentioned that using Lambda services, and I’m assuming other cloud services, probably those services wouldn’t be running on one platform, even if it’s Kubernetes. So there’s a lot of that trend to also start and run on a SAS based kind of model where a lot of the services are actually managed externally. Are you using also like do you see yourself moving everything to Terraform, or is that going to be still different templates, Terraform being one of them, other things will be cloud formation. I’m not sure if you’re using it. What’s the state of the union there?

Eyal: I think it’s evolving, the entire infrastructure is evolving they’ve seen a lot of new projects coming out as well. Very interesting projects. So I can’t say where we’ll end up. We did start with the, so for example, when we started without formation, and then we moved into Terraform today, we are mostly, the infrastructure is mainly used in Terraform through the entire organization, and it helps us with the knowledge as well, and the use of stuff and people can help each other. It’s not just the DevOps developers themselves who own some of their Terraforms as well and it’s important to actually distribute that. But I don’t know if, there’s no sort of rules saying we’ll stay there. If there is a need or a problem we might move. I’ve seen some of the Terraform projects. They are huge and complex as well, so we might need to be able to see how we solve some concerns there.

Nati: Interesting. yeah, I actually, it’s interesting that I see that, there’s also the point of transformation and in some cases, if there’s already let’s say the database provides a cloud formation for EKS that is already hardened and they optimized it. So the question is whether you take that or you transform it into Terraform because you want to standardize on one thing. So what’s the justification there.There’s one thing that has already been done and optimized and then the question of…

Eyal: Yeah, I’ll probably find the same Terraform version of that thing. We do tend to go for managed services, but I think that Terraform and cloud formation perspective is a bit different. And again, it’s changing, at the point we needed to move out from cloud formation, there are a few capabilities we were missing, which of course now they have. So it’s really things that are evolving. But today the Terraform knowledge is significant enough so that people who just choose to understand what was in the, you mentioned, for example, a template. So they will understand what was in their template that is important, but we’ll actually execute it themselves if this is sort of the concern. But in other places it’s different. So if I totally agree and again, this is why we are moving a lot. So if you see a lot of the things that I’m pushing towards, it’s the managed services, and again, where the managed services are giving us an advantage, because this is their core competency. It’s not ours, we are trying to build security solutions. So I don’t want to manage Kubernetes clusters, for example. We can say the same for other services as well. So this is where we’re striving. I think as long as the managed service gives us enough flexibility and it’s good service, this is where we’re trying to go.

Nati: Interesting. So on the technology side, we talked about Kubernetes. We talked about the multi Kubernetes cluster and why you evolved to dozens and potentially hundreds of clusters because of separation of concerns. We also talked about Terraform that this is, I would say the common ground, but not necessarily the only thing in town. We also talked about the, in terms of the, we started talking about it, the development and production structure. I think you mentioned that you are using Docker and Kubernetes, can you elaborate on that?

Eyal: I think Docker, to be honest I am less into those. As far as I recall, the Docker Compose is something that was being used for the developers themselves, in some cases in their own development environment, there is a way for them to actually ramp up to what they need there. So that’s sort of the main usage for Docker Compose. But the base domain usage is Kubernetes throughout the system.

Nati: Right. And in that case the idea is that again, to allow users, developers to have a quick let’s say development environment, they don’t necessarily need to spawn entire Kubernetes on the desktop. They can run the containers themselves against the rest of the services and in that case, they have a bit more agile thing and that’s why you’re using Compose, which is more lightweight. And the main thing when the developer is developing is not really developing against Kubernetes, it’s developing against these containers, which I think makes sense, but it is interesting that you chose to have a different structure for development and production. I’ve seen some that are trying to say, okay, let’s move the developer environment also to Kubernetes. 

Eyal: Maturity of the different services. I mentioned that for the Posture Management, that was part of the Dome9. They have the same environment, so the developer actually spawns and can spawn a complete environment, what we call sort of an ad-hoc environment for him. Actually in many cases, doesn’t have any choice because the set of technologies there is also SPSS, Lambda, dynamoDB, and Mongo and there are a lot of sets of technologies that there’s no way of having them in his environment. So we actually need to spawn it. Everything’s pre-configured, we can actually build it and work with that and on the other end, they’re all less complex environments and people that have just started something. So truly again, maturity where you are, and I will provide you actually the way to run fast. I think that’s the main thing.

We want the developers to actually focus on learning fast and help them facilitate everything. And facilitation, I mentioned, it’s not just a technology, so you’ve built your own, you’ve built your PC, you’re going into an EA customer. You want to talk with the customer and then you’re going into your initial production. We cover everything from how you actually update your availability and status page, how you manage your support and operational tickets, how you monitor things, how you connect into licensing services. So everything from that perspective and how you go to the different processes of emergency. Of course, again, security. We also have compliance. I see we have PCR SOC. All of this, if we want to make sure that the solution at the end fits and is in the same standards. So this is why we’re actually orchestrating and helping you have the same process for this.

Nati: Okay. Got it. So I think we’ve covered the environments, what checkpoint is running today, the product and I think we have a good visibility on the challenges that I think you’re having. Now I want to switch into a different topic that will be a good segue to talk about your platform, the infinity platform and the rationale behind that and what’s been there and I wanted to switch to a different thread that I shared with you about a post that Martin Casado was being there for those who are less familiar as being the founder of Nicira, which was bought by VMware. The latest stage now is an analyst in Andreessen Horowitz. He wrote an interesting piece with Sarah Wang just to give you the gist of it. The whole idea was talking about the cost of infrastructure in SAS companies and the effect of it on the company’s profitability and therefore valuation which was kind of an interesting twist.

I’m not going to talk about the specific numbers that were quoted there, because I think there’s a lot of debate about those numbers and whether they’re representative or they’re representing a more special case. But I think the general argument was that there’s a tension between how SAS companies operate today, which is velocity around new features, new products, and an efficiency becoming a second class or a second, if you’d like API in their environment and that leads to this paradox that was discussed here is that the more you scale and the more you grow with the cloud, your margin actually shrinks because your cost of the cloud grows, but not necessarily, you cannot really match the revenue that you’re charging your customer for would say linearly with the cost of infrastructure, which means that over time, if you’re not careful, you could actually start well, that would be a very extreme case, you could start lose money, but potentially.

He also touched on, again, in my view, a more extreme case of handling that, but not necessarily the right way, which is going off cloud to have a bit more control over your cost structure and Dropbox has been the poster child of that, but not just Dropbox. I think there were a couple of other companies mentioned that when they reach scale that’s the choice that they’ve taken. Again, in my view, that’s a fairly extreme move and there’s much more things that you can do before you actually get to that, but that’s been the thesis behind that post. So I thought that this would be a good point for this discussion, because what I’m hearing from you is you have a lot of products, you have a very complex infrastructure, a lot of groups that are now developing against this infrastructure and they all need that velocity and they all need to work and obviously you need to balance between the level of control and velocity, usually velocity wins. How do you see this being handled? And what do you think about the article and what do you think about this being handled and where do you see an equilibrium that will get to some sort of equilibrium between those two conflicting things?

Eyal: Yes. So first of all,I read the article, I think it’s a very interesting one as well. I think cost is really something, cloud costs and something that all people are being challenged with. We’ve seen the rise of fin ops lately, and a lot of people are investing in the lobby and we truly do reduce costs, which is the cloud cost. So I’ll start with this and then I’ll go back to the article. I think one of the challenges, going into the cloud with so many developers is that they don’t have the same skill set and knowledge of people that developed three years already and actually failed and learned from this failure. So they will fail, so if you’re raising five hundred developers to develop on the cloud, they don’t have the same capabilities you need. So this is, I mentioned the centre of access literally one of the pillars that we’re dealing with is cost. There is security, there is architecture, helping people to develop the right way, but there’s also costs and this drives some of the things we do. 

So I mentioned we are splitting one of the bigger applications that we have, we are to bring them to their own AWS account and so better cost measurement as well, because a lot of the things you cannot attribute to name spacing and tagging, and you have to go into specific accounts and you have to do the cost and measure it. I think KPIs are really important and we do have from products, we have KPIs that are measuring their sort of costs and we help them actually to sort of maintain it and actually also do cross reduction of people. I think that’s a piece that is part of today became part of the product. Same how we’ve seen security becoming part of the product, we’ve seen products need to be certified for PCI or to be certified for HIPAA or whatever, because this is sort of the future requirement and costs become also a future requirement. And today at the end of the day a product owner needs to decide what is investing in the next sprint or on the next version and some of those investments should be a cost reduction. 

It could be an improvement of architecture because it’s created a better experience as well, but the side effect might be also there is low cost and it’s also a piece that you need to take into account when you develop your features. So part of the design review for a security requirement, and also what impact is going to have on the cost and I think those are at the end of the day, the tools that you would use in order to balance your costs. I agree cost is an issue in that sense, because you mentioned linear or not, but it’s tied much more versus the on-prem, which was more of a sort of steps you needed to jump steps in order to change the cost and you had more control of it. Same with security, you used to have your firewall to protect your perimeter and that was sort of your jolt point. And also with cost, the on-prem, the chalk point is much more clear versus the cloud where you have sort of not, you don’t even have control over it. You have automatic systems that are intended to auto-scale to provide better service and SLA because you measure, I don’t know, a responsiveness or something, and automatically raise instances to give that, and the cost might change at that point. So it’s a bit out of control.

Nati: In that context, one of the things that I’ve seen which I’m not sure, again, what’s the solution for that, but one of the themes that continuously comes up with cost is automation, right? If you automate everything, then you have better control of the cost. But what I’m seeing is that, and I think Kubernetes is a very good example for that, because it does create some sort of a disconnect between the infrastructure and the user that is using it. So I heard from people that are, they first move to Kubernetes, they actually see a reduction in cost because containers are more efficient than BMs, but very quickly the cost skyrocketed because it was much easier to spawn off containers. So people started spinning off containers everywhere, and all of a sudden they find themselves at a point in which their cost is much higher than what they used to have with BMs. Is that something that you’re experiencing as well?

Eyal: I think it’s not specifically related to Kubernetes. I think it’s related to more of the new development approach of microservices and distributed systems that actually encourages scaling automatically in the sense that you have a queue of items that needs to be covered in some workload and then the workload that covers the queue, automatically scaling in and out actually covers it. So in conceptual, it says it’s going to cost less because you won’t have the workload and would also scale up or scale down if the queue is down. So conceptually you think you’re going to save a lot money, but at the end, you change paradigm of developing things like that and in some cases, this actually raises the cost because you have too many hopes what they’ll send that they finished this small micro work, and then they send it to another few to do some work and at the end, it probably might be more efficient than just everything in one place, but it gives you less scaling options as well. 

So I think it’s sort of the paradigm that people use and that’s really more related to the article in the sense that the SAS services, as you need to handle the scale, more tenants, more users, more workload that is being actually done or traffic or whatever that is being used to service the service is really related to the consumption. So if you sell more the cost price, and it’s rare, it’s very tied into your P and L and I think this is where you need to actually control it. I think the boat of a cloud is a sale,no one is going to return it into the port in the sense that I don’t see people going into Dropbox and going back to on-prem. I think actually we’re going to see a much bigger movement into the cloud, it’s just the beginning. So maybe we’ll have a different shift of innovation and technology that will help us handle and change it but specifically with what we can see at the moment, I think we’re going to see an even higher rise of the cloud in the next few years.

 Nati: But do you see, for example, when we talked about certified environment being optional, do you see organizations taking more mandate on, you have to use the certified environment, because this is the only way in which we can control costs and obviously security, and we know giving you option there, you’ll have to go through so that we’ll see, again, probably the trends between agility and control is balancing towards a bit more control than it is right now, which is more of a wild west in my organization.

Eyal: So, I have two roles. One is owning some of the cloud products and businesses, and I look at it from a P and L perspective and there in I want to look at it, I do want to see how my cost is and I think, we’ll touch it. So I have those KPIs, we have KPIs. One of the things in the article was adjusting, and having your KPIs. How much you cost per something that you provide. So I do have this, I own it, I manage it and in this sense I’m trying to see whether I can either reduce it or keep it at some level. But I think it’s inevitable that it will rise. Now, there might be two reasons that this will happen. One is that we are not engineering good work. We did a bad design. We did something that causes a huge cost, and that is fixable. And this is something we should always own and should do. And there is the inevitable problem that customers on SAS expect to get more service. If they expect to get more value out of what they pay, they pay you X amount of money for this service. Let’s say they’re paying per user or something. Next year, their expectations will be to get additional services out of your service, whether it’s providing better security or providing additional management features or whatever and those features will cost money. 

The engineering workload and in some cases, we’ll be able to justify the higher price and in some cases we won’t, and that’s sort of again,  that if you’re going to this article. So at some point, if you are a Dropbox user, you are willing to pay an amount of money for what you get, and you want to get more services. Maybe you want to get more storage, that’s the basic sort of thing and at some point, Dropbox will be able to charge more money. At some point they won’t, and this is the balance they need to sort of keep in mind. So this is one role and the second one for me is you mentioned certified. It started to answer on certification, so I’m providing sort of center of excellence to other product owners that they have their own metrics and so from the way I look at it, I’m consulting, I can help them understand where the cost gone, what they should do, but it’s their sort of their own P and L business, they need to manage it. They know what to manage, they know what they need to measure, and if it’s okay or not, and how their P and L looks like. So in this different perspective, I’m guiding them, but I will not sort of force them into a certified environment due to that, because at the end of the day, our goal is to provide better security to our customers and they represent their product and services.

Nati: Do you see that changing at some point that could become between consulting and kind of forcing as, if you make the KPI, the cost KPIs first tier, not second tier. So, maybe I’ll take a step back. I think you said a couple of things that are interesting. One, you said that the paradox itself is not necessarily with the cloud infrastructure costs, it’s with the entire SAS model. The paradox, the SAS model is built in such a way that a customer’s expectation is that you increase the value, not necessarily the customer wouldn’t pay for that increased value and therefore there is this efficiency already in place, regardless.

Eyal: Yes, but it’s not always the case. Our job is actually to show the customer why the added value of something they’ll be willing to pay for, but I’m saying it’s not always that case and it’s really up to the product and what you’re doing, whether this is of additional value or not.

Nati: But it does make the, I would say the challenge of efficiency much more strategic in let’s say in a SAS cloud business, than I think it was before because of the speed in which you could get out of balance. The cloud obviously makes it easier for you to get out of balance because they give you all those tools and things to use and it’s much harder today with this Chinese menu of pricing to really understand what’s going on and understand what’s the impact of every step that you’re making in the way. And that’s why I think; I hardly see it continuing in the way that it is being handled today. I think it is getting out of control. One thing that you mentioned…

Eyall: It will force everybody that is providing a SAS service to understand their cost model very good and to control it very well. Think about, for example, AWS and Azure and GCP, they do it very well. When you’re consuming an S3, they can tell you how much you pay, because they know how much it’s cost for them very well. They control it very good. Our problem, again, using those services is that we don’t control as good as they do, but it’s something that we can achieve and strive to do, which will understand what we do very good and own it. This is something that is doable. So I think going forward the expertise of each and the investment of each SAS service in the fin ops piece of that will have to be improved significantly for him to be able to actually control his business and it’s the same that he will need to control the security base and other pieces as well. This is something just I don’t want to say neglected, but it’s not the first thing that you think about when you’re starting, and only later on you understand that and start handling it, and that’s the main problem.

Nati: Yes, and I think that’s exactly the point that if I’m taking that and kind of point to that article again, if you mentioned Google and Amazon I’m sure that in their case, the cost is a very important KPI for the themes, whether they’re when they launch a new service, they’re basically not launching it before they know exactly the cost impact to the cent where with other SAS companies, I think that was what he was mentioning is because the incentives for velocity that practice and that mindset is not there and I think that’s something that clearly needs to change because there’s a huge anomaly here when you don’t have that mindset and you don’t think about it as a cloud provider in a sense that you do gave this KPI much more visibility into your decision making process. Now with that, I wanted to kind of touch on the infinity portal because it kind of touches on that problem. It’s kind of in a way a solution or some sort of a solution to that problem, not to the degree that I think is needed, but it’s an attempt to create some sort of a common platform. So let’s talk about the infinity portal and what’s the initiative behind it, what’s the objective?

Eyal: So at the end the main concept is to enable our customer to have a better experience and added value from having two services, one with the other. So it could be simple things like having managed objects that you manage them once. So I could give the basic example of you having an IP list that you want to use in the gateway, but also you want to use it in an email list or in a threat intelligence in the cloud. We want to be sure, I believe, for example, what is the dynamic of a fixed one. So instead, if you had five products you need to manage this five times. Same could be this, just your single sign on configuration, or your integration into an active directory. I have an email solution, the email solution needs to be connected into an active directory, but I also have another solution where the client lists more access to needs to access active directory. You don’t want to define those different times. So that could be just, that’s sort of having a common infrastructure, but also seeing your, let’s say your audit events in the same place. 

Someone changed a policy and that’s being audited, someone logged into the system and that’s being audited. Then you want to see all the audits of the different services together. You have security, we have security products so they issue a security event. Could be we found something in the gateway, the Posture Management found something, email was blocked and you want to have those events going into one place where you can actually correlate between stuff. You can actually, if you want to export that to a same product, you can do that once in different places. If you want to set up notification systems, you can do that in more places but even more you can actually, and that’s where we’re going to actually manage policies in one place. So our end goal is that our users, the CISA user will be able to define a policy and that policy will go into different services. So for example, you can define an access policy in one place and that will go into a gateway on the one hand, but also into a micro-segmentation of the Kubernetes and you will be able to send this policy from one place. 

So that experience is something that we’re building and already exists to some extent, and users can actually enjoy that. We are adding more and more things as we go, and that’s the infinity portal.

Nati: Very interesting and excellent explanation for that. We’re getting to the wrap up right now, and by the way, this has been an awesome discussion as I expected. Usually I have two last questions during the wrap-up that always surprises me when I get the answers, but let’s see where we are on that. So one question is to rate your top challenges. We talked about infrastructure, we talked about the cost, we talked about distribution. We talked about the number of technologies. 

Eyal: I think we’ve seen what’s going on lately in the news about the high-tech and the challenge of recruiting people and it’s not just recruiting people, it’s recruiting top talent and good people. I think if you now actually stop and ask my family and you ask the people in the street, they will be able to tell you that today the biggest challenge for everybody is people. And I think that was always true probably, if you go to ask people that were in the industry 20 years ago, they’ll probably say the same, but I think today it’s even finding good people, that’s the main challenge.

Nati: What about freeing your people to do most stuff by taking some of the work that they had to do themselves. We talked about Terraform, for example, do you see things that they can stop and move to other products to do stuff that you’re doing today by the team as a one way to deal with the people problem?

Eyal: It’s just shifting. Yes, but it’s just shifting the problem to someone else. So one of the things, for example, we’ve done is a lot of the Terraform now is not being built by DevOps themselves, but rather the R and D that actually consumer needs it, instead of asking something from DevOps, they write the Terraform, maybe based on guidelines, reviews, and stuff like that. But again, it’s just asking the R and D to do more, and that’s a simple case that it’s okay and it’s intuitively, and probably the R and D knows what he’s doing at that service that he needs. But I think that’s the main thing, I think that’s the top challenge for me. And I think looking at the time now of what I do every day, mainly this is, I mean, owning, how do you get more people and skilled people in the organization? That’s the main challenge.

Nati: Interesting. So the last question for the wrap up is looking at retrospect, let’s say that you would start that journey today, what would you do differently?

Eyal: So I think when we joined, when we were acquired by checkpoint two and a half years ago the infinitive board was at the beginning of it’s sort of journey and retrospectively I would probably force more decentralization of things and while we, and more managed services, I think that’s the two things I would probably do more when we; you’ve mentioned why not using a one cluster. One of the things we’ve done lately was actually to move to a more distributed stuff, we were more centralized and think that one was one thing that sort of hurt us in a few places and also I would choose more managed services where we thought we have our own managed cluster, for example, because we built it, but retrospectively I would have gone through two EKS to begin with. I can’t say if it’s one size fits all. It’s mainly because at the end of the day you will always won’t have enough DevOps and you won’t have enough people and if you can use those, the managed services better, that’s a game and the distribution just helps you have better visibility, flexibility, and sustainability in all different levels. So whether it’s cost and monitoring, it’s much better if you have distributed systems and while Kubernetes tries to enable you and if we can do it in one cluster, there are still limitations that I would just distribute them, ownership as well.

Nati: Great. There are more questions that I wanted to ask you, but we’re already kind of stretched the limit in terms of..  I know the podcast doesn’t necessarily have a limit, but I think, at least, what I put as a benchmark for a qualified tech talk. It’s been a really, really awesome discussion. I hope the audience itself will feel the same. You mentioned at the end that it meant challenges people, so people who would want to apply for a job, I think they will have your contact details here. It sounds like it’s definitely going to be a, it looks like you’re touching almost all the cool stuff in the cloud. So they’re going to have a lot of fun dealing with that. At least that’s something that I would get from this discussion. So by all means, reach out to Eyal if you’re interested in joining CheckPoint for DevOps and Eyal, thank you for sharing all this information.

Eyal: And I hope someone gained any insights out of it. So thank you and thank the audience.


Supporting Materials:

The Cost of Cloud, a Trillion Dollar Paradox – Martin Casado

Reducing Cloud Spend Need Not Be a Paradox – TheNewStack

comments

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Back to top