PurePerformance - The Future of Ops is Sleep with Amit Chiba from Nedbank

Episode Date: September 25, 2023

I was fortunate to travel to South Africa and meet many tech leaders in Johannesburg and Cape Town to talk about Observability, Security, Automation, Platform Engineering, DevOps and FinOps. One of th...ose leaders is Amit Chiba, Multi Product Specialist at Nedbank. I sat down with Amit to discuss his personal journey and his projects at Nedbank, one of the leading financial institutions in South Africa. Tune in and hear from Amit how self-service platform engineering helps them to scale observability, how they tackle cloud costs and why he thinks that the future of IT Ops is more Sleep!

Transcript
Discussion (0)
Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time of Pure Performance Cafe. It's a special episode because I'm traveling, finally traveling again. I'm actually here in South Africa, in Cape Town. Spent the last couple of days here, went to Johannesburg first, and now I'm here, but I'm sitting with one of the speakers at the Dynatrace event, Amit. Amit, thank you so much for doing this interview with me. How are you? I'm very well, thank you. And thanks for having me. It's an honor and a privilege to speak
Starting point is 00:00:49 to you as well as the audience. It's really great because I think for many of our users or listeners in the podcast, South Africa or Africa in general, I think it's just an unknown territory. And also for me, I only traveled, it's my third time time here but I know there's a lot of things happening especially in the observability space now I mean could you tell me a little bit more about your background and what observability means in the organization that you're working in absolutely so I've been in the IT operations IT service management space for the past 23 years worked with multitude of products and tools.
Starting point is 00:01:28 Initially, my background or the first immediate task was, well, let's just do infrastructure type of monitoring because that's the only thing that we knew at that particular stage. So, started off my career at NetBank itself, spent a couple of years at an international company, IBM, but also servicing multiple customers, both locally as well as internationally. Carl's been one of those accounts as well. So, yeah, I mean, I've had a good understanding and exposure in terms of what other companies are doing. I've joined NetBank back again since 2014, also in the observability space.
Starting point is 00:02:13 So it's been an interesting journey for me up until now. And I've had the privilege to understand and see the evolution of monitoring into observability. So if I look at observability at an organizational level, what it means to NetBank, well, NetBank has a centralized IT structure. What that means and the benefits that we've seen out of having a centralized IT structure is that when it comes through to implementing observability and monitoring, it's not done in specific silos.
Starting point is 00:02:50 It's done across the entire organization itself. So Dynatrace is one of the tools that we're actually using in our observability space. It is deployed across our entire environment. And really, one of the big benefits that we've seen is that it's been able to provide us with that single pane of glass. One question that I think also asked you in the group of your colleagues. Now, the beauty of centralizing things is that you can enforce standards, you can provide templates.
Starting point is 00:03:19 But some people say, you know what, everything is centralized. If I need something from that central organization, I'm just one of many that have to then put in a ticket and then have to wait. How do you balance the being central versus giving people the freedom and autonomy? Well, that's an interesting question because that was definitely a challenge that we had in the past.
Starting point is 00:03:41 We came from a background where even when you wanted to deploy an application, you'd often have to request for services from the server operations team, which would typically take weeks, if not months, to actually get deployed. So obviously from that perspective, it was a big challenge for us. One of the ways that we started looking at it was, well, how can we make it easier for our customers to actually consume these particular products? How can we become more agile as an organization and start delivering so that we can reduce
Starting point is 00:04:17 the time to actually go live into production as a subset of even specific services. So in that particular regard, what we've done was, because of the centralized model, having those guardrails in place from a platform engineering perspective really allowed us to start, firstly, making the capability available so that months that it used to take to provide a service started getting now broken down into weeks right weeks and days it's gotten to a point now where that was all
Starting point is 00:04:55 good and well to have have your turnaround time in days but then we said well how could we further optimize everything and potentially bring it down to ours? So we looked at sort of having a one-click button, sort of being able to go into a specific catalog and pretty much request those particular services. So as a result, as part of one of the strategies within NetBank was to provide a hybrid cloud environment. The name for our hybrid cloud journey is called NetVana. So that's our NetVana area. So we created a NetVana marketplace, which is essentially a service catalog hosted on a specific platform, accessible by all of our staff, our group technology staff itself,
Starting point is 00:05:46 where they can pretty much go select a particular server or deployment requirement, and after specifying a couple of parameters, go and have the deployment happen. Our server deployment now, or VM deployment, is now less than in our itself. That's cool. I mean, I love this so much when you showed me your platform, right? We talked about platform engineering and then I asked
Starting point is 00:06:14 you, so what do you use and how does this look like? And then you showed it to me and basically you phrase it very well because platform engineering, the goal is really to, while you have a centralized unity, you really want to make sure that developers or your customers, as you told them, as you call them, that they can do things in self-service so that they don't feel like they have to be waiting in line to get things done from a central unit, but you're providing things as a self-service, everything fully automated as possible and making in the end them more productive. Exactly. everything fully automated as possible and making in the end them more productive. Exactly and just in that particular regard whenever any resource is requested by the marketplace the necessary permissions are provided so
Starting point is 00:06:55 that they automatically get the access that they need not only to deploy directly to the to the infrastructure or the services that they've just requested, but it provides them with the necessary permissions, as well as having all of those particular guardrails in place. So our monitoring agents, the Dynatrace One agent, is embedded as part of the entire deployment, as well as some other day-to-operational type activities as well. If I may, one of the other sort of things that I've seen across organizations is the fact that day-to-operations is always seen as an afterthought.
Starting point is 00:07:41 It's always, well, let's slap on a monitoring agent after the deployment actually happens. But really, if you look at the entire cycle, your day two operations is at the heart of everything because that is the cycle that takes the longest to actually complete. So it needs to be planned. And from day zero, the actual design, that's when things should actually start happening. So you have to design for operations, you have to design for resiliency. And I really liked, actually, in your slide, you showed the four steps in the end
Starting point is 00:08:13 and you said you call it actually site reliability operations, which I really like. I mean, I always talk about site reliability engineering, but really in the end, you are obviously need to encourage your engineers to think about how to operate or how to operationalize reliability and think about it in the end, you are obviously, you need to encourage your engineers to think about how to operate or how to operationalize reliability and think about it in the very early days. Exactly.
Starting point is 00:08:30 And I also like your slogan where you said, the future of IT is sleep. Because if you do the job right, they can sleep more. Not on the job, as you said. Not on the job. But not getting woken up. Hey, the last topic I would like to cover, because you also brought it up as one of your projects.
Starting point is 00:08:48 If you're allowing a lot of people to, you know, get a server here, get a server here, Kubernetes cluster here, the whole cost topic, right? I mean, it will cost a lot potentially if they do things. FinOps, obviously, is a big thing. Like, how can we make sure that we are financially sane in what we're doing?
Starting point is 00:09:04 What are your projects there? How do you see FinOps? Yeah, so FinOps is a very topical discussion within the bank itself. We also place a lot of emphasis on it. One of the things, apart from just trying to report on the current usage within our cloud environments is to try and make the visibility available to our users itself.
Starting point is 00:09:31 In that particular regard, Dynatrace with its carbon footprint dashboards provides us with the capability to actually understand and see which particular resources are overutilized. What we've actually done was, using the power of Grail, we've adapted some of those dashboards to actually provide it at an application level so that at any given time we can see what is the consumption being like over a period of time, as well as which particular VMs or servers actually contributing most and which particular resources can actually be scaled down. One of the other projects that we're looking at is to see, well, how can we actually bring more of that cost analysis within Dynatrace itself? So that's currently something that we're also working at, looking at at the moment.
Starting point is 00:10:25 And I'm also very much looking forward to, because you promised me, now we have you on record, that you will present the FinOps topic on one of our Automation Guild meetings
Starting point is 00:10:33 in the future. That's really awesome. Awesome. Hey, thank you so much, Amit, for sitting down with me, for traveling from Johannesburg, where you also did the presentation
Starting point is 00:10:41 down here to Cape Town for enlightening people with your stories. And I think now it's time to get some food. Awesome. Thanks, Andy. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.