Expert Data Engineer: his journey to IT and an exciting cloud project

10.11.22, Чт, 09:00, Мск,

Vivekkumar Muthukrishnan is a highly skilled and accomplished Data Engineer with a specialized focus on architecting and implementing robust Big Data systems, particularly in hybrid and cloud-based environments.

Over the course of his ten-year career, Vivekkumar proved his expertise by successfully designing and implementing high-impact systems for leading companies across a diverse range of industries. His proficiency lies in crafting scalable and efficient solutions that meet the complex data processing needs of modern enterprises. With a track record of delivering innovative and visible projects, Vivekkumar brings a wealth of experience and technical prowess to the realm of Big Data engineering.

Vivekkumar

Vivekkumar, could you share the journey that led you to a career as an IT expert?

My path into the IT field began with a strong aptitude for mathematics during my school years, coupled with a natural curiosity for computers. Early on, I discovered my knack for programming and found joy in assisting fellow students with their coding challenges in college. Graduating from Anna University with a Bachelor of Technology (B.Tech.) degree in Information Technology laid a solid foundation for my technical journey. As my academic years progressed, I delved into learning and implementing cutting-edge technologies. I made a conscious effort to stay ahead of the curve, consistently integrating these technologies to solve complex business problems effectively. What fuels my passion for IT is the dynamic nature of computer technologies and their continuous evolution. Witnessing how individuals unite to achieve common goals through programming is a source of great satisfaction for me. Overall, my love for technology and its transformative potential has been the driving force behind my fulfilling career in the IT domain.

You have had the opportunity to work with renowned companies such as Samsung, Hewlett Packard, Akamai, and currently with Shopify. Among these experiences, which company stands out as the most exciting to be a part of?

While I hold immense respect for all the companies I've been associated with, I find Shopify, the Canadian e-commerce provider, to be particularly thrilling. Shopify has evolved into a leading and globally recognized solution for businesses venturing into the world of online retail. As a comprehensive platform, it offers a suite of tools and services meticulously designed to simplify the complexities of creating, managing, and scaling an e-commerce presence.

What sets Shopify apart is its ability to cater to businesses of all sizes, from startups to large enterprises, providing customizable website templates, a secure payment processing system, and a plethora of applications and plugins to enhance functionality. The user-friendly interface empowers merchants, even those without extensive technical knowledge, to build and manage their online storefronts seamlessly. Shopify's extensive ecosystem supports diverse needs, spanning from inventory management and order processing to comprehensive marketing and analytics tools.

Renowned for its commitment to user experience, flexibility, and top-notch customer support, Shopify has become the preferred choice for entrepreneurs and businesses navigating the dynamic landscape of e-commerce. My time at Shopify also exposed me to the intricate challenges associated with handling large volumes of data, including issues related to distributed data, complex ETL processes, replication challenges, and metric aggregation latency. This dynamic environment has been both challenging and rewarding, contributing significantly to my professional growth.

Could you elaborate on the origins of the project and the specific challenges it aimed to address?

The inception of the project was sparked by a crucial observation — traditional batch applications were taking an extensive 12 to 24 hours to furnish merchants with insights into their product sales. Recognizing the need for a more agile solution, our objective became clear: reduce the turnaround time to a 5-minute Service Level Agreement (SLA). This meant that if a merchant made a sale on their Shopify-powered website, they would swiftly access pertinent insights on their dashboard within this short timeframe.

The impetus behind the project was aligned with the company's overarching initiative to diminish the latency between data generation and its presentation on the dashboard. This endeavor seamlessly converged with our project's objectives, enabling us to contribute significantly to this overarching goal.

One of the pivotal challenges we faced was the sheer scale of the data we had to contend with — a staggering influx of over 5 million records per minute. The relentless surge in both pace and size presented a monumental task, resulting in an aggregated dataset exceeding 2 Terabytes. This colossal volume underscored the complexity of our mission, pushing us to innovate and implement robust solutions to meet the demanding requirements of real-time data processing and analytics.

Can you elaborate on your position and your pivotal role in the project, and how it unfolded overall?

I held the position of Senior Data Developer, Senior Site Reliability Engineer (SRE), and DataOps Engineer within a collaborative two-member team alongside a staff/principal engineer. My central responsibility encompassed designing a real-time analytics streaming pipeline using Apache Flink, a critical component in meeting the objectives of the project.

My engagement extended to gathering intricate requirements from data scientists, with a focus on delivering actionable insights such as products purchased and customer lifetime value. This involved not only conceptualizing the solution but also implementing the essential infrastructure, utilizing Kubernetes and Terraform templates to orchestrate a seamless operational environment.

A key facet of my role involved fine-tuning the streaming application to optimize memory usage and processing efficiency, ensuring a judicious approach that minimized cloud expenditure. Rigorous monitoring of the application was imperative, enabling swift detection and resolution of potential failures to uphold minimal downtime, thereby guaranteeing uninterrupted value delivery to merchants on a 24/7 basis.

Navigating the intricate web of cross-functional teams, I collaborated extensively to achieve project goals, particularly in the context of complex systems like Kafka, Kubernetes, and Google Dataproc. Effective communication with stakeholders played a pivotal role in aligning all aspects of the project, offering transparency and fostering a cohesive approach to its successful execution.

Can you provide insights into the duration of the project and its most impactful outcomes?

The project spanned almost 2 years, during which both merchants and customers experienced tangible improvements. Notably, there was a discernible uptick in sales, with users gaining rapid access to insightful analytics. This accessibility empowered merchants to make informed decisions about their inventory, thereby enhancing the overall customer experience by ensuring that desired products were readily available.

The quantitative results were equally impressive on a company-wide scale. The Gross Merchandise Volume witnessed a notable surge of 17%, indicative of heightened commercial activity. Merchant retention exhibited a robust increase of 28%, underscoring the effectiveness of the project in fostering enduring partnerships. Moreover, the overall revenue registered a commendable growth of 11%, a testament to the project's positive impact on the company's financial performance. Additionally, there was a noteworthy reduction in the Churn Rate, signaling enhanced customer loyalty. It's worth noting that while these figures reflect the comprehensive performance of the company, our project, situated at the core of the customer data platform, played an integral role in contributing to these overarching successes. The data-driven insights and optimizations facilitated by our project undoubtedly played a pivotal role in shaping these impactful outcomes.

What innovative aspects characterized the project, and how has its success impacted your personal and professional trajectory?

The project distinguished itself by harnessing the latest cutting-edge technologies to craft a product aimed at elevating key metrics, as outlined earlier. What set this endeavor apart was our pioneering spirit — venturing into uncharted territory with minimal assistance from the open-source community, given the experimental nature of our pursuits. Amidst this, I rapidly acquired proficiency in Scala, navigating the intricacies of functional programming, a skill often deemed challenging even with years of experience, particularly in the context of stream processing use cases.

Beyond the technical nuances, the project significantly influenced the industry landscape. At a time when real-time stream processing was a mere buzzword, we not only implemented an application but set a trend in the market. Personally, this journey became a masterclass in functional programming, stream processing, and the art of transitioning from greenfield experimentation to robust production systems — a valuable skill set that extends far beyond the scope of this project.

The impact on my career trajectory has been profound. As the project unfolded, I delved into unexplored realms, learning and adapting to the challenges of real-time data engineering. This unique blend of experiences positioned me as an expert data engineer, a distinction in a field where individuals often specialize in either streaming or batch processing. By successfully navigating both domains, this project has rendered me a distinctive professional, contributing to my growth and market relevance.

Author: Kirill Dobronravov