The Good Tech Companies - Inside Cloud-Scale Systems: A Discussion with Abhinav Sharma

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Inside Cloud Scale Systems, a discussion with Abanov Sharma by Sanya Kapoor. The evolution of cloud computing and artificial intelligence has fundamentally transformed how enterprises build and scale technology platforms. Modern cloud infrastructure must handle millions of concurrent users across dozens of geographic regions while maintaining security, reliability, and performance standards that meet the demands of government agencies and Fortune 500 companies alike. Engineers working at this scale navigate complex distributed systems, automate build-out pipelines, and build sophisticated monitoring

Starting point is 00:00:39 frameworks that done detect potential outages before they impact customers. The most accomplished infrastructure engineers demonstrate versatility across multiple domains, from networking and security to AI and machine learning systems. These professionals understand that building platforms used by millions requires not only technical expertise, but also the ability to anticipate scale challenges months or years in advance. The transition from traditional cloud infrastructure to AI-powered document intelligence systems represents a natural evolution for engineers who have mastered the fundamentals of distributed computing at hyperscale. With seven years at Microsoft Azure, followed by leadership roles in enterprise AI at DocuSign, Abanov Sharma exemplifies this

Starting point is 00:01:22 comprehensive journey through modern cloud and AI engineering. His experience, his Experience spans Azure Networking's global infrastructure, Azure AI searches platform evolution during the pre-LLM era, and now architecting document intelligence systems processing over 150 million agreements. In this conversation, we explore Abanov's journey across these diverse engineering teams and the work involved in building and scaling large-scale cloud and AI infrastructure. Architecting continuous testing frameworks for global cloud infrastructure. The early days in Azure networking were intense.

Starting point is 00:01:55 As a new grad, it was overwhelming to dive into such a vast ecosystem of networking components operating at cloud scale. It was easy to get lost in the sea of three-letter acronyms, each pointing to a different team or subsystem, without a clear sense of how they all fit into the broader networking architecture. Just learning the layers of the stack took several months, Abanov recalls, reflecting on his early career after completing his master's at Columbia University in 2016. Heson took on a project that gave him deep, end-to-end exposure.

Starting point is 00:02:25 to Azure's public networking stack, software load balancers, virtual networks and subnets, virtual machine scale sets, network security groups, and more. His role was to develop a continuous regression testing framework capable of validating real customer deployment scenarios across more than 50 regions simultaneously. This large-scale testing and monitoring effort enabled Abanov to design and deploy a highly configurable framework that helped the broader networking organization identify regression issues early, particularly in load. traffic regions. By catching these problems before they escalated, the team was able to prevent hundreds of critical outages that could have impacted Azure customers and eroded trust in the

Starting point is 00:03:05 platform's reliability. It was one of those obvious projects that simply needed to exist. A severity one issue in the networking stack can cascade into outages across countless upstream services. I'm glad I had the chance to be the team's customer zero. It taught me a tremendous amount. Building an enterprise-grade search platform, after his time in Azure networking, and amid the rising momentum around AI in 2018, Abanov transitioned to Azure AI search. Compared to networking, the team was newer and smaller, focused on delivering search as a service APIs for enterprise customers. Greater than, for example, if you're a large e-commerce company, search is a fundamental greater than capability. You could build and maintain

Starting point is 00:03:47 your own in-house search clusters and greater than teams, or you could offload that work to Azure AI search. Demand for this kind greater than of managed search capability was growing rapidly across industries and use greater than cases. During his four years with the team, Abanov contributed across multiple layers of the platform stack. He worked extensively on several foundational components essential for scaling a service, including telemetry systems, billing infrastructure, secret management, service build out and a wide range of internal tooling. These capabilities formed the backbone of the platform's growth, and Abanov played a key role in shaping them. Growing the service from just a handful of regions to more than 40 worldwide required us

Starting point is 00:04:29 to standardize and automate our regional build-out process. This was essential for scaling effectively, Abanov notes. A major inflection point for the team arrived with the Jedi contract, a multi-billion-dollar U.S. government initiative for cloud adoption that dramatically elevated expectations around scale, reliability, and compliance. Cloud providers across the industry were competing intensely for the award. During that period, we had one clear objective. Make the Azure platform as compliant and reliable as possible so we could position ourselves as the letting cloud provider for enterprises and the government. To achieve this, he led several key initiatives to strengthen the platform security and compliance posture. He introduced automated secret rotation mechanisms

Starting point is 00:05:12 across a fleet of tens of thousands of machines, significantly improving operational security. He also spearheaded privacy efforts focused on telemetry obfuscation, ensuring that no sensitive information was logged and that all data was consistently classified and accessed at the appropriate security level. In addition, he built tooling with robust role-based access controls to support just in time, JIT, access for node remediation, further tightening the platform's auditability and security model. There were many requirements that were in place for Jedi, many of them foundational which would live on for quite a while. That was a busy time, very busy but productive. In between all of this, there were modernization efforts ongoing to move workloads

Starting point is 00:05:55 from bare cloud compute to more cloud-managed Kubernetes services. It was truly a transformative time for the platform layer. Looking back, lessons from Azure's growth and career pivots. Through all these years, Abanov notes that it was an incredible time to be at Microsoft. It was fascinating to see Azure as a platform evolve so dramatically from when I first joined. There was a saying in Azure back then, design your systems as if you'll be running 10x the traffic just a few months from now. Designing for hyperscale workloads forces you to think differently about everything, how you enforce tenancy isolation, how you approach user privacy and data classification,

Starting point is 00:06:33 how you design workflow checkpointing and restore mechanisms, and even how you build tooling to expand your service globally with consistent naming and infrastructure patterns. These are decisions you make up front, not years into the life of a service. Very few companies in the world give you the chance to operate at that scale, and Microsoft is one of the rare places that has both the ambition and the resources to build at a truly global scale. After his time at Microsoft, Abanov moved to DocuSign, where he stepped into the world of document intelligence and AI-driven solutions designed to streamline complex legal processes. After a lot of this platform experience under my belt, I felt it was time to gain

Starting point is 00:07:12 experience with core ML systems, specifically in the NLP space. It always seemed like a fascinating world to me being able to interact with documents and dive into problems like language detection, document layout parsing, metadata extraction, chat systems and information retrieval using semantic similarity algorithms. I believe legal tech is the next frontier for AI applications, and it has tremendous growth potential in the years ahead. Onward and upward, about Abanov Sharma. Abanov is currently a staff ML engineer at DocuSign, where he focuses on advancing document intelligence capabilities to drive adoption of the intelligent agreement management, I Am, platform. Since its launch in June 2024, the platform has processed hundreds of millions of

Starting point is 00:07:56 agreements, with Abenavi playing a key role in shaping the AI systems that enable this scale. Prior to DocuSign, Abinov worked at Microsoft's headquarters in Redmond, Wa, contributing to core cloud and infrastructure initiatives across Azure networking and Azure AI Search. His work supported both the foundational backbone of Microsoft Azure and the growth of Azure AI search during the pre-LLMera. Abinov holds master's degrees from the University of California, Berkeley and Columbia University, where he specialized in data science, information retrieval, machine learning, distributed systems, and operational excellence. He also earned a bachelor's degree in computer science from Manipal

Starting point is 00:08:36 University, building a strong foundation in core computer science principles and programming. Outside of work, Abanov enjoys playing the guitar, traveling, exploring new places, and learning about aviation systems. This story was distributed as a release by Sonia Kapoor under Hackernoon business blogging program. Thank you for listening to this Hackernoon story, read by artificial intelligence. Visit hackernoon.com to read, write, learn and publish.

The Good Tech Companies - Inside Cloud-Scale Systems: A Discussion with Abhinav Sharma

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.