How to build reproducible Machine Learning pipelines with MLOps

https://www.traditionrolex.com/42
Вечірні сукні
November 2, 2021
Пункт Point, Pips Стоимость Пункта, Как Ее Рассчитать?
March 17, 2022

ML Ops is a set of practices that combines Machine Learning, DevOps, and Data Engineering, which aims to deploy and maintain ML systems in production reliably and efficiently.

source https://ml-ops.org/img/mlops-loop-en.jpg

The Deployment of machine learning models is the process of making models available in production to meet intended business goals. It is the last and most challenging stage in the machine learning lifecycle. Machine learning Architecture in production requires multiple components to work such as infrastructure, applications, data, documentation, and configuration.

You will also have to remember that you’re putting a software application into production, which means you’ll have all the requirements that any production software has, including:

  • Scalability: How the solution behaves faced with increased workload
  • Consistency: Repeatability and reliability i.e ability to produce the results and resilient to errors
  • Maintainability: Reusability and Modularity
  • Flexibility: Adapt to changes
  • Reproducibility: Specific to Data science

Architectural best practices are important as building a working pipeline without them is easy but maintaining them over time, updating models, redeployments will eventually fail.

Fig 2. Steps of MLOps Lifecycle Management

 

Where Data Science meets Engineering

  • Journey from Research to Production essentially its more of deploying a pipeline than just the model
  • Components from Research namely Feature Engineering and selection along with model take the journey to production
  • Automation of all stages of the workflow and any manual intervention like SSH, manual scripts leave scope for error.
  • Docker as a container is the choice as the primary unit of deployment
  • An Orchestrator and optionally a model lifecycle management setup
  • Choice of PAAS (Platform as a Service) and IAAS (Infrastructure as a service) is split across different cloud providers /platforms

Challenges in ML deployment and Model life cycle management

As Machine learning models get embedded in software products and services, the best practices and tools employed with software delivery also apply to ML deployment, thereby minimizing technical debt while employing best practices to test, deploy, manage and monitor ML models.

Traditional DevOps allow developers to abstract accidental complexity and let developers concentrate on actual problems, using tools automation, and workflows but we can’t we simply keep doing the same thing for ML as ML is not just code, it’s code plus data. An ML model, the artifact that you end up putting in production, is created by applying an algorithm to a mass of training data, which will affect the behavior of the model in production. Crucially, the model’s behavior also depends on the input data that it will receive at prediction time, which you can’t know in advance

It is of course possible that a single person might be good enough at all of them, and in that case, we could call that person a full ML Ops Engineer. But the most likely scenario right now is that a successful team would include a Data Scientist or ML Engineer, a DevOps Engineer, and a Data Engineer.

  The Challenge

  • Needs coordination between Data Scientists, IT Teams, S/W Developers, and business

Fig 3. Interaction of stakeholders in Machine Learning Lifecycle

 

Traditional dev-ops employ rapid delivery cycle in minutes but differ in its application to ML in a fundamental way where it consists of code with data

  • System Complexity involving a large spectrum of skills
  • Need for reproducibility (Versioning everywhere)
  • Managing configuration model hyperparameters, requirements, data sources can all be changed via configuration
  • Data dependencies i.e., data sources can change suddenly
  • Unit and Integration testing input feature code, Model specifications code
  • AB testing /canary release to a limited audience and blue-green deployments
  • Model Quality validation before serving
  • Model monitoring

 Reproducibility in Machine Learning Pipelines:

 In traditional software, system behavior can be effectively captured by source code versioning as code defines all behavior. In ML there are two other aspects to be tracked to effectively capture system behavior i.e. model version and the data on which the model is trained, and some meta-information like training hyperparameters. Models and metadata can be tracked in a standard version control system like Git, but data is often too large and mutable for that to be efficient and practical.

It’s necessary to version data and tie each trained model to the exact versions of code, data, and hyperparameters that were used. The ideal solution would be a purpose-built tool, but so far there is no clear consensus in the market and many schemes are used, most based on file/object storage conventions and metadata databases.

There is no consensus on any purpose-built tool and many approaches are used with most based on metadata databases or file storage conventions

The challenge to reproducibility is while the code is crafted in a controlled environment the real-world data comes from a source of entropy and there are bound to be inconsistencies. The challenge of an ML process is to create a bridge between these two planes in a controlled way.

MLOps in building ML pipelines

An ML pipeline consists of a series of stages. As ML models need data transformation a data pipeline creates a series of repeatable transformation nodes consisting of data pre-processing, aggregations, etc. This is essentially a data pipeline with data engineering steps where a series of steps are applied between the data and its source. Many tools help manage the run of these pipelines. This approach promotes code reuse, runtime visibility, management, and scalability.

  1. Since ML training is also assumed as a series of data transformations, the ML stages can be added to a data pipeline to turn it into an ML Pipeline. Most models will need two versions of the pipeline i.e. for training and one for scoring/serving. The data pre-processing and feature engineering stages will be migrated from research to development to production for both training and scoring /serving
  2. Models development /training is essentially an experiment-driven and tracking these experiments will require specific tooling to track the models, data, and hyperparameters
  3. The above two steps are essential in building reproducible pipelines across environments i.e from research to production.

 

Although documenting an incredibly fragmented problem space in terms of approach and tooling would not be possible in entirety and it is very hard to generalize. We can have approaches ranging from cloud platform offerings on all major cloud providers to entire stack put together with opensource tooling, but by now we have documented the contours of the landscape to develop a brief understanding of an MLOPS road map.

Gope Biswas
Gope Biswas
AI/ML Deployment & Implementation Lead - Genpact

Leave a Reply

Your email address will not be published. Required fields are marked *

Snehanshu Mitra,
HEAD – CoE, DS & ARTIFICIAL INTELLIGENCE

Snehanshu leads the AI initiatives at NASSCOM. He heads the CoE for Data Science & AI – in partnership with Govt of Karnataka and the Telangana AI Mission in partnership with GoTS. He is responsible for creating, nurturing, and scaling up a vibrant AI ecosystem that involves driving AI adoption, accelerating AI startups, leveraging AI for societal good, work with enterprises to co-innovate and promote applied research and AI skilling.

In his two decade long career, Snehanshu has advised enterprises on driving business transformation and delivering impact through data science & AI. His core experience lies in developing strategy, creating & nurturing world-class capabilities, driving innovation, delivering value proposition to global clients, research and managing P&L.

Snehanshu has worked with several organizations across the globe – multinationals, GCCs and startups across sectors such as Technology, Telecom, Hospitality, Retail and Banking. Prior to joining NASSCOM, he was part of Vodafone Shared Services, Zyme, Dell Global Analytics and Accenture.

Madhav Bissa
PROGRAM DIRECTOR

Madhav brings more than 20 years of experience in Strategy Consulting, Research & Analysis and Executive Search.  He has advised Fortune 500 and FTSE 500 and leading Indian organizations on the topics of Corporate Strategy, and M&A. He has worked at global organizations like Arthur D. Little, Heidrick & Struggles and Accenture.  He has been a founder of two start-ups wherein he provided business support services to organizations in the areas of strategy, fund raising, recruitment and documentation.

Madhav is also a visiting faculty at various academic institutions and from time-to-time delivers lectures and workshops on Strategy and Business Analytics.

Currently Madhav works at NASSCOM’s Centre of Excellence for Data Science and Artificial Intelligence as Program Director.  In this role, he helps organizations adopt Data Science and Artificial Intelligence solutions and assists DS&AI startups to connect with investors.

Supriya Samuel
Branding & Marketing Manager – CoE, DATA SCIENCE & AI NASSCOM

Supriya Samuel has more than 14 years of work experience across many profiles in Sales, Branding, Campaign Management, Digital & Product Marketing, Channel Enablement, Event Management and Account Based Marketing.She holds a Client Centricity and an Agile Explorer Badge from IBM and is also a Certified Digital and Product Marketer from Udemy.

During her stint with IBM for more than a decade, she has been a part of the ISA (India-South Asia) Inside Sales and worldwide teams to drive Marketing efforts for the Global Alliances, Industry, Product and the Account based Marketing Teams. She played a crucial role in setting up the MDF process for Pan Europe to leverage the SAP Funds to run demand generation activities and created a new digital experience like Oracle Virtual University for the IBM Sales Teams to help them navigate a wide range of enablement materials. She also drove the end-to-end planning of IBM’s Cloud presence at the world’s premier Banking Event- Sibos in 2018 and was instrumental in conceptualizing the VIP Framework for IBM’s Top Integrated accounts in 2019.

As the Marketing & Branding Manager at NASSCOM – CoE DS&AI, Supriya leads and drives the Integrated communication plan for promotion and dissemination of various NASSCOM’s CoE DS&AI Programs which includes Events, Webinars, Technology workshops and Marketing content to the NASSCOM Teams and the DS&AI ecosystem. At present ,she is spearheading activities such as driving & engaging conversations across various social media handles of CoE DS&AI. She has worked very closely with the Government of Karnataka for the CoE’s participation in Asia’s largest Summit (Bangalore Technology Summit) creating a strong brand presence in the DS&AI ecosystem.

Become a Member







    Explore how CoE can help Enterprises







      Let us know if you have any interesting AI Blogs on trending Topics to share and we’d be happy to feature them on our Website





        Join Our Ecosystem







          Become a Partner







            Co-create With Us







              Krishna Prabhu,
              TECHNICAL DIRECTOR

              Currently Technical Director at NASSCOM CoE DS & AI, play pivotal role in National initiatives like Open Data Platform, AI HPC Labs, AI – Technical mentoring, help accelerate AI adoption in Industry with initiatives like Innovate to build, Data and AI Policy frameworks

              Over 23 years experience in Leading and delivering Analytics engagements and Solutions across Industries. Played key roles
              Strategizing Analytics Solutions and Leading Advanced Analytics Centre of Excellence. Delivered Advanced Analytics engagements
              in Cognitive, Data Sciences, IoT, Predictive Customer Intelligence (PCI), Predictive Maintenance and Quality (PMQ) across
              Domain areas

              A Senior Data Scientist, AI specialist and practitioner

              Have lead Concept to roll-out of Advanced Analytics solutions for Fortune 500 companies across USA, UK, SE.Asia, Africa

              An alumni of IIM Bangalore specializing in Business Analytics & Intelligence
              Background in Bachelor of Engineering from Bangalore Univ and holds a Diploma in Management & Economics

              Get in Touch







                Sudeep Kumar Das,
                PROGRAM MANAGER

                Sudeep with over 17 years of experience in Customer Management, Account Strategy, and Partner Management. Having spent around a decade in technology companies like CISCO & Oracle in business development and customer success roles. Sudeep always had a keen interest in organizational development as a subject and hence took up an Executive PG course on Organisation Development and Change in the famous Tata Institute of Social Sciences.

                Currently, the go-to person for anything on the AI Startup Ecosystem and driving State level Skilling initiatives for the CoE. Driving key initiatives like the Advance Acceleration Program and Faculty Development Program for the CoE

                He is a Go-getter and hustler in chief. Sudeep is an avid swimmer and runner; and believes in the learnings from sports in our daily lives.

                      Advance Acceleration Program







                        Raj Shekhar
                        Lead – Responsible AI KTECH COE Data Science & AI NASSCOM

                        Raj is driving NASSCOM’s efforts at defining a roadmap for an extensive roll-out and adoption of responsible AI in India. Before joining NASSCOM, Raj served as Consultant (Data, AI) at International Innovation Corps (IIC) of The University of Chicago, supporting operations of the Open Data Working Group—an initiative by IDFC Institute and IIC to advance India’s open data aspirations, and IIC’s engagement with the Ministry of Electronics and Information Technology, Government of India—aimed at building capacity for data and AI innovation through policy and program implementation. Raj also is the Founder & Executive Director at AI Policy Exchange, an Affiliate at The Future Society, and sits on the Founding Editorial Board of Springer Nature’s AI and Ethics Journal.

                        Tarun Kumar
                        Consultant – Evangelist, Data Strategist, Knowledge Asset, NASSCOM

                        Tarun is currently leading the Data Strategy Initiatives for CoE – Data Science & AI at NASSCOM. During his 20+ years of work history, he has led multiple teams with a focus on the application of machine learning and cloud-native across various sectors such as Telecom, Digital and GIS & IT.

                        An avid learner, Tarun is passionate about creating an impact on society, environment, corporates and developer communities with the adoption of emerging technologies. He is a B Tech graduate from IIT Mumbai and holds various other certifications as well.

                        Tarun is currently engaged in key COE initiatives like Telangana AI Mission, Responsible AI (RAI), MLOps and AI Pathshala.

                        Shares