357x Filetype PDF File size 0.86 MB Source: www.cognizant.com
Cognizant 20-20 Insights
Digital Business
Accelerating Machine Learning
as a Service with Automated
Feature Engineering
Building scalable machine learning as a service, or MLaaS, is critical to
enterprise success. Key to translate machine learning project success into
program success is to solve the evolving convoluted data engineering
challenge, using local and global data. Enabling sharing of data features across
a multitude of models within and across various line of business is pivotal to
program success.
Executive Summary
1
The success of machine-learning (ML) algorithms nonexperts. Most enterprises began their ML journey
in a broad range of areas has led to ever-increasing with projects of simpler analytical complexity because
demand for its wider and complex application, they were primarily focused on the maturity of their data
proliferation of new automated ML platforms/solutions infrastructure, ML model development process and
and increasingly flexible use of these techniques by deployment ecosystem.
October 2019
Cognizant 20-20 Insights
2,3,4
According to a recent O’Reilly published study Creating a feature store, a central repository of
roughly 50% of enterprise respondents said they features (basically any input into an ML model)
were in the early stages of exploring ML, whereas in a store with a marketplace construct, enables
the rest had moderate or extensive experience of producers like ML engineers (creating and
deploying ML models into production. populating new features) to share them with
consumers like data scientists (building ML
Enterprises, irrespective of their maturity, are models). This will reduce GTM substantially,
currently focused on managing data pipelines along with enabling data lineage and bringing
and evaluating/developing ML platforms. But governance into the data pipeline labyrinth. For
as they ascend the maturity curve, they need to enterprises to mature in ML, a focus on setting up a
solve the problem of the ML model-related data feature store will be as essential as the adoption of
pipeline labyrinth as creation and management auto ML frameworks, model monitoring and model
of these elements are labor-intensive, which over visualization — which was also the outcome noted
time introduces data complexities and related by the recent O’Reilly survey.
operational risks.
This white paper offers insights into why enterprises
ML is core to the success of digitally native need a fully functional feature store in their ML
businesses such as Uber and LinkedIn for creating maturity journey and how this can be achieved
new products and redefining customer experience using an operating model that can accelerate
standards at a global scale. There are certain ML scale goals through automation, making ML
aspects of ML architecture that can be deftly learning algorithm features reusable, cost-effective
adopted by digital immigrant enterprises as they and tangible. This is critical because our approach
seek to mature their use of artificial intelligence (AI). automates one of the most laborious activities in
the model lifecycle — feature engineering.
2 / Accelerating Machine Learning as a Service with Automated Feature Engineering
Cognizant 20-20 Insights
The need for a centralized feature engineering ecosystem
5
ML is a powerful toolkit that enables businesses The process of building and deploying an ML
to strive for excellence, whether it’s new product model goes beyond setting up a requisite
development or achieving operational efficiencies. infrastructure. ML projects have a typical timeline
However, ML initiatives entail the development of two to four months for idea validation and
of complex systems that behave differently than prototype development, which often gets extended
traditional IT systems. by several more months if prototypes are pushed
into production. The cycle is repeated for each
In fact, ML systems contain inherent risks (e.g., model rebuild iteration or new model development.
complex data pipelines, unexplainable code)
which, unless addressed properly, lead to high Figure 2 (page 4) illustrates an ML project,
maintenance costs over the long run. The depicting various stages and related efforts.
development of ML code is generally seen as labor- Processes with relatively less effort have been
intensive and complex, whereas other essential addressed by the deployment of ML platforms
activities surrounding it are seen as less critical — like Sagemaker, but key labor-intensive processes
which is incorrect. Rather, data (functions such as around data acquisition and processing are
quality, features, etc.) and resource management still repeated in each iteration of the model
are equally important for building a successful ML development exercise.
infrastructure (see Figure 1).
A day in a life of a data scientist (DS) consists
of deriving insights, knowledge and model
ML heat map depicting processes and related efforts6
Data Verification Machine Monitoring
Resource
Management
Data
Collection
Configuration ML Code Analysis Tools Serving
Infrastructure
eature traction Process Management
Tools
Figure 1
3 / Accelerating Machine Learning as a Service with Automated Feature Engineering
Cognizant 20-20 Insights
Illustrative model lifecycle
2–4 weeks 2–6 weeks 4–6 weeks 1–2 weeks 1 week
Development Development Development Model Model
Environment Data Acquisition Data Feature Development Deploy-Ready
Setup Engineering
Model Rebuild
Model Monitoring Model Serving Production Feature Production Data Production
Engineering Acquisition Environment
Setup
1–2 weeks 1 month 2–3 months 1–2 months 1–3 months
Figure 2
development from data. (For more on this, read
“Learning from the Day in the Life of a Data Working solo
Scientist” in our Digitally Cognizant blog). This
requires data cleansing, transformation and feature
extraction before building a stitch of ML code. The DATA SCIENTIST
process starts with data extraction in a modeling
sandbox, on to hypothesis validation, followed
by deployment of code that requires designing a Focused on code generation
fully fledged data pipeline. The activities happen without much collaboration with
architects and engineers.
primarily in isolation, which is typical of
an experimentation phase.
Upon successful exploration, other key role ML ARCHITECT
players — like ML engineers and an ML architect —
must come up to speed and plan necessary Wondering what data/IT
support activities, which results in a longer architecture changes are needed to
development lifecycle (see Figure 3). support code.
During model development, the data scientist
will build common features and features that are ML ENGINEER
specific to the model. Industry standard practice is
to create extract, transform, load (ETL) pipelines for
common features while generally bundling model- Wondering what data pipeline
specific features within the model itself — which reengineering are needed to
leads to the following situations: support the codes.
Figure 3
4 / Accelerating Machine Learning as a Service with Automated Feature Engineering
no reviews yet
Please Login to review.