Skip to content

EP. 16 Lessons from Walmart’s Cloud Strategy with Eyal Barkay

EP. 16 Lessons from Walmart’s Cloud Strategy with Eyal Barkay

cloud-currents-ep16

About This Episode

In this episode, Host David McKenney sits down with Eyal Barkay, VP of Enterprise and Cloud at Walmart Global Tech, as he unfolds the complexities and triumphs of managing one of the world’s largest hybrid cloud environments. From his early days at Priceline to leading significant cloud transformations at Walmart, Eyal shares his knowledge on balancing innovation with operational efficiency, the strategic deployment of cloud technologies across private and public platforms, and the future of cloud integration in retail operations.

Know the Guests

Eyal Barkay

VP of Enterprise & Cloud at Walmart Global Tech

Eyal Barkay is the VP of Enterprise & Cloud at Walmart Global Tech. He has over 20 years of experience in technology leadership roles across various industries. Eyal began his career serving in the Israeli intelligence forces, focusing on communications technology. He then joined Lucent Technologies (now Nokia), working on early wireless LAN solutions. After earning a degree in computer science and business, Eyal held positions at companies like Juniper Networks, Priceline.com, and several startups. He joined Walmart about three years ago in his current role overseeing the company's cloud strategy and operations.

Know Your Host

David McKenney

Vice President of Public Cloud Products at TierPoint

David McKenney is the Vice President of Public Cloud Products at TierPoint. TierPoint is a leading provider of secure, connected IT platform solutions that power the digital transformation of thousands of clients, from the public to private sectors, from small businesses to Fortune 500 enterprises.

Transcript

David McKenney
Welcome to this episode of Cloud Currents, where we explore a variety of strategies and technologies shaping the world of cloud computing. I'm Dave McKenney, VP of cloud product at TierPoint, which basically means I work on a lot of cloud stuff, whether it's public, private, or any variant in between. And that has me really excited to talk with our guest today, Al Barkay, who is the VP of enterprise and cloud at Walmart Global Tech. So thanks for joining us today. Eyal, how you doing?

00:41 - Eyal Barkay's Background and Cloud Journey

Eyal Barkay
Good, thanks. How are you?

David McKenney
I'm doing well. So, Eyal, you were responsible for spearheading Walmart's journey to build one of the world's largest, shall I say, hybrid cloud environments. And I'm really looking forward to hearing about this because I imagine that when you build something across, is it two public cloud providers and your own private cloud? Correct?

Eyal Barkay
Yep.

David McKenney
That the benefits have to be considerable when you look at the options of best of breed services, the resiliency across multiple clouds. But I have to think that there are just as many challenges that are amplified as a result of trying to maintain that luxury, if you will. Before we dive in, I know we're going to spend a lot of time talking about the Walmart cloud that you guys have built and maintained right now. Love to get a bit of a history about your career and the journey that's led you to where you are today and what got you interested in cloud computing.

Eyal Barkay
Yeah. First of all, I wasn't the initiator of the hybrid cloud in Walmart. I only joined three years ago. And I'm leading today the enterprise and cloud group, which is the storage compute data protection for the enterprise, as well as the private cloud team and automation, GCP team, Azure team, many great and talented cloud engineers across the board who are maintaining, as you said, one of the largest hybrid cloud in the world today with, I would say hundreds of thousands of nodes. I joined Walmart after four and a half years with the Priceline, where I led the infrastructure and the technology operations. At Priceline, I started the journey of migrating to cloud. But as we did that, we wanted to make sure that we're doing it for the right reasons and we're doing it in the right way.

And in my mind, moving to cloud has to really have its benefits. So just taking legacy applications and lift and shift them to the cloud is not the right approach. And what we've done is really modernizing those applications that were there in Priceline for many years. We modernized them, took them through twelve factorization effort, containerized them and then we're able to, as we move them to back then it was GCP, move them to that public cloud, we're able to utilize the elasticity that the public cloud gives us and those applications that need to be really close to the PaaS offerings. So we did it the right way. In many companies today, companies that I worked for, that's not. It was done. Somebody really likes the buzzword cloud or really liked back then and said, okay, let's just move it.

What was done in Walmart is okay, let's just make sure that we're doing the right things. We have the legacy application. Most of them are running actually in our enterprise realm, in the data centers. And then when we go on the cloud journey, we should have our ability to utilize public clouds with their offerings as well as private cloud. So as we move to that modernized era, we also want to maintain that flexibility. So that's a kind of hell where how I plugged myself into the Walmart cloud journey.

David McKenney
That's awesome. I know you said you came in maybe a little more recent to the Walmart, but Priceline, that couldn't have been a small infrastructure. And refactoring, replatforming that to a public cloud, just start to finish, from the planning to the last server to go, which there's probably never the last server to go, but how long did that process take at Priceline?

 

05:09 - Walmart's Hybrid Cloud Strategy

Eyal Barkay
So we actually have a very consolidated effort there. I believe we had about 500 different products and it took us three quarters to move 60% of them. That was the number. So we really had kind of a war room mode of work with our application owners, with the developers to make sure that they know what to do, to make sure that we support them, that we spin up and provision what is needed for them to really test it in the cloud environment and just walk them through the process.

David McKenney
When you join Walmart, you can actually say, hey, I didn't miss all the fun. I've done this before. I know Walmart had a pretty considerable OpenStack practice prior to working with Microsoft and GCP. And I think that's. You're using OpenStack today for your private cloud, is that correct?

Eyal Barkay
Yes.

David McKenney
Back in, I think it was 2018 timeframe is when Microsoft and Walmart put out that announcement that they were entering into a strategic partnership to go hand in hand and work on some cloud transformation items. What was the driving force? I know that's a while back, but then and maybe even now around these cloud decisions that Walmart is making. So you went from running a fairly large, substantial OpenStack cloud to broadening out to using Microsoft Azure and then Google and sort of, maybe, I don't want to say you're redoing it again today, but there's definitely a presence that you have that's kind of reinvigorated this OpenStack cloud that you guys have out there at the edge.

Eyal Barkay
Yeah, the OpenStack is mostly our private clouds that do exist in our enterprise data centers as well as in the regions east south and east west and south central. And they're aligned with where the public providers are. So what we call triplet. And you can find some videos in YouTube that explains what is the Walmart triplet. But the idea there is that you can utilize the either open in the private or in Azure or GCP, but you can also consume the best of breed services that either were built internally within Walmart or built and maintained by the cloud providers by Azure and GCP. So we don't want to take away the flexibility from our developers to utilize platforms that were already built by someone else and they don't need to develop themselves.

So the notion of allowing best of grid, that's kind of the essence of it. The other portion of this is we're also pulling a cost leverage. Having more than one public provider and having our private in play allows us to get the best deals that we can from both the public providers as well as the vendors that we buy hardware from when it comes to our private cloud.

 

 

08:44 - Cloud Architecture and Regional Distribution

 

David McKenney
Yeah. So I think you mentioned there that you're operating on east west and south central. That immediately strikes me that you're trying to position these clouds almost as it's your own regional makeup, where you've got a pretty large private cloud sitting nearby, both Google and Microsoft. And you said it was your triplet architecture. I'm assuming there's edge nodes or edge cloud situations at each one of your retail shops. Is that make sense?

Eyal Barkay
Yes. Yeah. So that would be about 10,000 or more edge nodes that are like very small clouds that are all connecting to our different regions where they can consume the cloud services. And that's part of the thing that we're trying to do to make our customers get the best experience or to allow our customers to get the best experience is to really decide what would sit in the edge and what would sit in the core or what can sit in our enterprise data centers as well. Right?

David McKenney
Yeah. So Walmart's been around for a very long time, and so there's always been the storefront aspect of it. And then over time, the e commerce aspect started to grow and become quite big. So are both the storefront and the e commerce systems, are they separately run or are they merged as part of this cloud strategy? Can you talk about that at all?

Eyal Barkay
Yeah. So the cloud regionals, as I mentioned earlier, that consists of the public and the private. They serve both the edge and the core. And by edge, I mean those stores, distribution centers, fulfillment centers. And the core is the online commerce. The applications, yes, there is some separation. There are applications that are completely to serve the edge. And there are applications that are, when you go to walmart.com or any of the other services that we have or the services that we provide, our associates, the associates in the stores are using their handhelds or any other technology, device or application. It might pull data locally or it might go up to the cloud to consume it. So from infrastructure perspective, we're serving all Walmart needs, including the internal, the BizOps, right. Or HR and finance and all that.

 

11:51 - Benefits of Multicloud Approach

 

David McKenney
So you're using these three clouds. And we've already talked about a few of the benefits just in passing. The best of breed you talked about cost gives you some flexibility there between providers and just overall cost of placement for based on workload type. But would you say there's any one benefit that led you, or key factor that led you down the hybrid approach versus using a single public cloud provider and then just linking edge to, say, a single cloud versus using these three?

Eyal Barkay
Look, I think that GCP is great at things. Azure is great at other things, private is great at completely, well at another set of things. So why not utilize all, why prevent that from our developers and from our applications? We want to be to always use the best technologies that are out there, and sometimes one is leading the technology and sometimes another. So why not keep that flexibility? I think flexibility is definitely a top reason for choosing this full hybrid.

David McKenney
I'm sure any engineer just loves the playground of tools that are available. I've got ten different ways I can solve this problem with that. Do you find that having three cloud options leads to any, you know, getting stuck in that deciding phase? Or is that, or is really the innovation just too great to even say that's a problem?

Eyal Barkay
So it's not that we're allowing three different platforms to do the same thing. We're trying to, we're trying to help make the decision on what would be the best choice for Walmart, for specific technologies, and then set it in the platform so developers wouldn't have to think about it too much. There's one great example which I think you can also find the stories online about element. Element is our AI platform. That kind of figures things out for the developers who want to utilize AI right under the covers. They can have GPU's in different providers, they can have schedulers that make the utilization more efficient, and they can utilize Genai paths that are provided either by GCP or by Azure or maybe others. And it kind of simplifies the work for the developer.

When it comes to workload placement, a developer doesn't have to go and deploy in a different way. If it's a store, if it's a Azure, if it's GCp or if it's private, they go to one platform and they hit provision or deploy and it goes, it works the exact same way for them. And that's one of the things that is GDP, the global tech platform, which is the organization I belong to. One of our top priorities is to make sure that we have a seamless experience for our customers, which are the dev community.

 

15:25 - Developer Experience and Platform Consistency

David McKenney
That's crazy. Would you say for those developers then, is there any hindrance on using native services? It doesn't sound like you're trying to push for agnostic cloud. You should build something that not only runs in Google, it also runs on Azure, it also runs on private. You're really giving a best of breed, best of placement reality to your developers and that's great. When it comes to building out resiliency for services, would it be safe to say that most of the resiliency for those developed apps stay within that cloud ecosystem so that like say Google is responsible for that app's resiliency? And I'm sure it's not like full of, but is that the way it's being designed?

Eyal Barkay
Yeah. So first of all, when it comes to GCP managed services or Azure managed services, their past offerings, they own their resiliency. Right? When it comes to us running on the is, wherever it is, we're trying to make sure that the way that we do it takes into consideration the, you know, the right patterns for ha, for deploying in the multi availability zones, for regional deploying in different regions, and be able to failover between the regions when needed. We're trying to minimize movements as much as possible with increased resiliency with the right ha patterns and availability zones. But we do that within the provider, within private, within Azure, within GCP.

David McKenney
Okay, this is really interesting. So maybe shifting gears a little bit to the operations side. So maybe like a two part deal here, one you're maintaining operations and support across three cloud architectures, and they're all pretty vast in their ecosystem. Maybe you could talk about how that's structured and how you're handling that. And the second part would be the OpenStack platform itself. So Openstack has obviously got a lot more work involved on Walmart's part to lifecycle, maintain, expand, all that, whereas public cloud, that's obviously taken care of for you. So do you find that there's an operational, maybe? I'll say a little bit more of an overhead in the Openstack side of the house when compared to the public cloud side? Again, not without its benefits for sure, but I'm just curious how the operational landscape looks for you guys.

Eyal Barkay
So, first of all, when it comes to the public providers, we do have the procedures in place to make sure that things are up and running, and they're up and running well, and we're building the capacity and the resiliency required on top of their platform. But we also have the right playbooks, we have the right relationships, we have the right SLA set with them, and they do it on a daily basis, and they are pretty good at it. And sometimes when they do have to make a huge change, then they will consult with us first and they will make sure that we are aligned with their deployment schedules and with the way that they roll out. And they would first roll out in a region or in an availability zone. Walmart is not necessarily there.

 

19:06 - Operational Challenges and Solutions

Eyal Barkay
But when they do get to us, then we make sure that we're not losing our resiliency because of their changes. There is a close collaboration with them on that. When it comes to OpenStack and private in general, we do many things to make sure that we're resilient, but we can't have someone just watch it and make sure that a human make sure that, I don't know, almost 2 million cores running out there. Private is really up and running. We use tools. We make sure that we have the right observability in place. We make sure that we have right self healing capabilities, which we built ourselves for open stack. And we're contributing to the community as well. The Openstack community. Our engineers are very much active in that community.

We also have some patents around the Openstack, and we're pretty, where we feel pretty comfortable with its resiliency, I know. Or with its stability. I know many people in the past had a lot of thoughts around OpenStack stability and why people choose different technologies. But we're really, we keep it in very high uptime, like on wood, but we also keep some capacity for health purposes. We don't just use the bare minimum possible if we need to motion between vms across the environment because of a hypervisor issue, that we would do that we have people, but we have technologies. We implemented a lot of smart mechanisms across our private cloud to keep it up and running and up and running well and serve our customers without blips.

David McKenney
Yeah, no, keep working on it, because I think a lot of people really appreciate the contributions that come from running. To be honest, I can't imagine how many enterprises have a cloud at that scale. There's not that many. So the experiences that you have, we can all learn from.

Eyal Barkay
Yeah. I would say also when the providers, not even only the cloud providers, but the vendors that we work with, they like us being their design partners and get exposure to new services, which is also exciting for our engineers to get exposed to all this stuff, because once it works for Walmart, it probably works for everybody else. They got it covered.

 

22:13 - Cost Optimization and Decision Making

David McKenney
Turn it into a service. Right, there you go. Let's talk about some of the cost aspects of this system. We're talking about private cloud. There's a cost value comparison between, you might pay a little more for running something in a certain cloud, but there's a value to the service that you're getting. But I think I've seen articles that show that Walmart has saved quite a bit of money in the things that it's able to run in its own private cloud. How do you go about the decisions that are. I know you talked earlier that there's very specific things that may run in one cloud or the other. Others are left to a choice based on best of breed.

But can you talk about, just strategically, what decisions are made when you say, look, this is maybe better to run over here because we can save some considerable costs, but what other factors go into a decision simply or beyond saying we can save a buck by moving it over here?

Eyal Barkay
First of all, I want to start and say that all enterprises that are utilizing hybrid cloud or any enterprise that uses public cloud today is thriving to get to cost optimization and Walmart the same as others done that, and continue doing that. When you have the benefit of a private cloud that is running in a cost efficient manner, then you can really choose, okay, do I need to pay that markup to the public cloud provider? Then the question that you're asking yourself, okay, is it worth the buck? And the first thing in mind would be, is the application built for it.

If the application is highly dependent on a past service with synchronous calls to the database that is running in a public provider, then I'm not sure that it would make sense to really put it in private completely, maybe portions of it, maybe in the future. So you're just asking yourself, is it fit for the profile? Because there are many others that don't necessarily need it and they are better fit for the profile of writing in a private cloud. Our triplets do allow running in private and utilizing paas in public because they're in very close proximity. It's up to like twelve milliseconds away from private to GCP to azure, but it doesn't fit everybody.

And in some cases when we modernize our applications, we're also looking at, okay, how can we also live in this environment where we can make our application portable and make the application be able to really utilize hybrid in life and we are doing it in cases. The other thing that you're looking at is the offerings. So private cloud has different or specific offerings for compute, very high performance compute and like general compute. And we also have the opportunity to oversubscribe like the public providers do. Right, but this is all in our heads, so we know what the application needs. And should we or should we not oversubscribe there and then should we squeeze some more into the cloud module or less? And it's cases where you have very high elasticity through the day or very high elasticity in specific seasons.

Then you have two options. You either scale out, you run in private and you scale out to public. You rent the capacity and then scale it down when you don't need it anymore, or you run it in private. But then as long as you're not utilizing the capacity, it's just sitting there doing nothing. So you're kind of wasting money. So those that are very highly elastic today, we would probably recommend them to run in public until they adopt the full hybrid scale out concept.

David McKenney
So the compute side, do you attempt to exhaust all avenues of residents reservations or spot or savings plans before you make that placement decision to move it to maybe a different cloud? Or is that all one and the same.

 

27:42 - Legacy Systems and Modernization Efforts

Eyal Barkay
Put aside spot for a second, because it's a bit of a different story. But when you talk about optimization, we want people to optimize our application owners to optimize as much as they can. It doesn't matter where they are. If they'll optimize lending private, they will allow more applications to use the same capacity if they will optimize in public, then they will, their bill will go down. Right. So it doesn't matter where they are today and where they're going to be in the future. We want to make sure that they're utilizing their capacity as best as possible. Right. We don't want to let cores just sit there idle. Yeah, do that thing. Just wait.

David McKenney
Right. So do you guys do any sort of, I have to think there's some sort of policy enforcement or is it more of a reactive look back to see so going in as developers place workloads, are there sort of guardrails and policies that are enforced as they're doing so, or is it sort of a post implementation review? Both.

Eyal Barkay
Yeah, it's both. It's both. Always both. You know, many companies started the journey of cost optimization in cloud only not long ago. And when years ago people have architected for their applications and then they onboarded cloud and then they provisioned resources. It wasn't top of mind. And then with those you have to go back and tell them, hey, you have an opportunity here. The new was when they architect, when they develop for their application and then when they provision the resource, they already have it in mind and they're saying, okay, I don't need these quotas, I don't need this amount of capacity. I can live with better, you know, with something that is more efficient. I can scale up, scale down. I know what is my pattern?

So I can even prepare either private, my private provider, which is our team, or I can prepare Azure or GCP that in these months we would need more capacity and we will utilize that capacity and we're doing it like you can imagine with our scale before holidays. Oh my gosh.

David McKenney
Yeah.

Eyal Barkay
We need to prep Azure and GCP to tell them, hey, you're going to use this capacity.

David McKenney
Do you see the whopper of a deal we have coming out? Right?

Eyal Barkay
Yeah, exactly. Yeah.

David McKenney
It's just never ending. I'm sure too, given all of the advancements that you guys have made, there's probably still legacy workloads sitting there just because for how long Walmart's been around and just how fast you've moved, how do you balance addressing these legacy solutions and along with what you're doing in your more next generation solutions that you're building out there? And I have to assume that these are maybe more leaning towards the storefront scenarios. Maybe they've got some physical ties to physical things or physical systems, but maybe can you talk about where are some of the struggles because the legacy workloads is always one of those things that people point to, hey, it can't go to public cloud because it's a legacy workload. Defining that is always subject to the industry that you're in.

I think it'd be really interesting to hear if you could talk a little bit about what the legacy workloads and the challenges that Walmart has right now.

Eyal Barkay
Yeah, it's a great question. We do have legacy workloads. I don't know if they're as old as Walmart. They're not. But we do have those. And when we're building for modernized, we do need to take that into consideration by means of the resources that need to maintain those legacy applications. Those legacy even hardware has to be up and running, stable, secure and in compliance. And we need to make sure in order to keep them as such, we need to make sure that they're being patched, that they're being upgraded if possible, where possible. Those that are not possible to upgrade, we need to see, okay, how do we get rid of this tech debt? How do we get them off our landscape? Because I'm guessing you heard about Walmart's infosec department, which is one of the best in the world.

We don't want to disappoint them and have especially people are vulnerable in our environment. So it requires a lot and in order to do that, because the landscape is pretty large and we don't have as many engineers to just nurture those legacy. We need a lot of automation and try to implement the blueprint of cloud into our enterprise, into our legacy. So with that automation in place, we're able to maintain this and allow our engineers to really focus on the modernized areas. And yes, sometimes they do need to intervene when it comes to legacy, but we try not to as much as we can. The other part of it is that we're constantly working on an EOL program. So we don't like having AOL hardware or software on our floors and on our floors or our vendors floors.

And we constantly shrink the amount of AOL that we have on the floor every year. And these days we're very small amount. But EOL is just one portion of legacy, right?

David McKenney
Yeah. Now that's interesting. I think somebody made a statement once, like automate the routine and manage to the exception. And it sounds like that's what you're talking about with your legacy workloads, that, yes, they're legacy, but we found ways to produce automation on top of them that sort of diminished the need to have somebody sit in their carefeeding every day that you're able to get past that. And that's fantastic.

Eyal Barkay
Exactly. Yeah. And then when it comes to, okay, how do we shift those to the cloud era as we're going through those tech debt program or elimination of tech debt on EOL, when we having the conversations with the application owners, we always put in front of them the opportunity to really modernize the application and not just, okay, let's refresh the hardware or let's say upgrade the OS or whatever it is, just make your application more suitable for cloud like. And then we move them to either OpenStack or cases to the public cloud. And it's kind of a, you know, it's kind of just another way of pushing for modernizing the whole stack. Right. Everybody wants to modernize the application. Not everybody has time.

But now when these programs are also pushing them to do stuff, then hey, there's another reason to go and to really modernize well, and they can't say.

David McKenney
As for the lack of a tool set in front of them, they've definitely got a whole slew of them. And actually, I want to talk about that some more. Just hearing you talk about the options, the opportunities that developers have, the choices that are there to enable the innovation and growth for Walmart. We're in technology. We're an MSP and hosting provider, and there is a high demand for these cloud skills. And you certainly have those across three major distributions, Google, Microsoft Azure and OpenStack. So what are you doing to attract the talent, to build and curate the talent and keep the talent at Walmart? I think we've gotten a sense of it so far. But is there anything that you'd like to throw out there?

Consider it a pr for getting more folks, but it strikes me as pretty interesting on what you guys are able to attract for talent there.

Eyal Barkay
Yeah. Look, first of all, I think that it's really out there in the news that technology, like Walmart is a technology company today and it works with the most advanced tech that is out there. And this definitely attracts talent and good talent. I mentioned earlier that we are also getting exposure to new offerings by the cloud providers. That's another thing that people or engineers really like to always be in the forefront of technology. Third, you can't beat, hey, I worked for the largest or one of the largest clouds in the world. It's always good. Yeah, it's always good for your resume. And once you've done that, anything else, any other place you're ever going to go to, it's going to be, hey, I've done that. I know how to do it. I know how to do it at scale.

I know how to do it. I'm familiar with the technologies and I like it. The other point I would make is that you have the opportunity to really learn technologies that are in GCP or azure or private cloud. We have plenty of other vendors which are not just the cloud service providers, plenty of other technologies, softwares and platforms that we're using across our landscape, which are engineers. They're dealing with them, they're learning them, they're dealing with them, they're partnering with those vendors to ask for new features. And then they see them come to life and they say, oh, this is me, this came from me. And it's pretty cool. So that's another thing that I think people like. And it attracts talent this way.

 

And Walmart is really good at investing in its talent, investing in associates, not talking about the great benefits that people get as they work for Walmart, but investing in training, increasing their skill sets. At the end of the day, it's a win situation. People are happy, the company is happy. If we invest in our associate skill sets, then we're going to gain more higher quality work and more velocity and greater technologies that we can enable for our environment, which trickles into our application owners, which trickles into improved experience to our customers. So that would be a big one. And last one, look, we're enabling a platform that, as I said earlier, needs to be stable, needs to give a seamless experience to our developers, needs to allow many different technologies and opportunities and options for developers.

Cloud engineers in many cases, or DevOps engineers, or storage, compute, whatever, engineers that join a company, they usually are part of the platform and be part of the Walmart platform. Walmart Global tech platform is like, I think it's the top because we serve so many developers, so many applications, and the amount of traffic that funnels through our infrastructure is just your work for the hyperscalers. You wouldn't beat it anywhere else.

David McKenney
Well, I think we're coming up on time. Any departing comments? Al, this has been great.

Eyal Barkay
Yeah, I enjoyed it a lot. I really appreciate you guys inviting me for this and yeah, happy to do it some more.

David McKenney
Yeah, I tell you, it's really interesting hearing about what you guys are doing at Walmart. And I certainly learned a lot and hope others will go out there and make sure they look up some of those conferences, those recent keynotes around Walmart and see where Walmart is going because I love what you said. Walmart is a technology company, and I think that's an understatement. So thank you very much, Al. Really appreciate your time.