How Cloud Can Transform the Realm of Analytics Pilots

by Kumar Singh, Research Director, Automation & Analytics, SAPinsider


Getting into the pilot seat

Pilots are a critical element in advanced analytics journeys. For those not familiar with this jargon, a pilot essentially is testing a analytics application use case, in a scaled down, siloed environment. And as organizations evolve in their analytics journey, they will go through several of these pilots. In fact, I believe that for an organization to build true intelligent enterprise capabilities by end of this decade, they will have to run hundreds of pilots. Remember, experimentation is critical to innovation. Now there are two aspects of a pilot, when you think about it from a supply chain and operations perspective. One aspect is to test the algorithm, outside of it IDE, with real world data and the second aspect is to leverage the recommendations to make changes to business processes. This article focuses on the first one. And we will call it algorithm pilot from this point onward.

If you have spearheaded projects pertaining to running these algorithm pilots, and helped design pilot environments and architectures, you know that it is a messy process. An algorithm gets built, trained, tested and validated is a siloed environment and most of the data that is used is fed as piecemeal data files. Sometimes quick and dirty data lakes are created to support the development. But the real challenge comes into play when you test the algorithm “outside the lab”. You want to see how it would behave when it is fed the data pipelines that it will interface with and consume in the real world. THAT…is the true definition of an algorithm pilot. I see many professionals confusing the first stage, of developing and algorithm and testing and validating it as successful algorithm pilot. It is not ! The algorithm pilot is essentially putting your algorithm out in the field.

But the view from the Pilot’s seat is not always beautiful

And to take the algorithm outside the lab and “simulate” a test environment among organizational systems is a process that is not easy. IT resources are requested in advance, approvals, what not. And if you have done this before, you know that several chinks start showing up in the model as soon as it gets out of the lab and starts “tasting” real work data. Many tweaks are made, and essentially, a significant portion of the model development process happens again !

And hence, organization should create a standardized test environment these days, in the cloud- a cloud based environment created within an organizations infrastructure in the cloud, dedicated exclusively for running pilots.

A standardized, plug and play environment dedicated to pilot algorithm development

The concept is not radical in terms of infrastructure or technology involved. The only pre-requisite is that you leverage Infrastructure as a Service (IaaS) and Platform as a service (PaaS) and majority of large organizations have the key components of what is required already in the cloud. And this is where the beauty of a cloud infrastructure comes into play.

If you start thinking about the key components that you need to run a robust algorithm pilot, the first five aspects that may come to your mind are:

  • Ease of creating an architecture for the pilot
  • Connectivity to real world organizational data systems for robust “real world” testing
  • Interdependency of the use case (pilot) on other use cases (pilots)
  • Ability to scale up fast
  • Ease of transition to production

And this is where the Infrastructure as a Service (IaaS) component of cloud computing comes into play. We will get into an overview of how this can be done later in this article. This is the first step where you create an environment, which has access to all the real world datasets . You can compartmentalize this “pilot environment” so that several pilots can run simultaneously.

But an environment with access to all data points is not the only benefit of having this dedicated pilot environment. And to illustrate the other aspects, I will use an example. Since my experience of hands on data science and analytics is in the area of supply chain and distribution, we will use an example from that area.

Understanding with an example- Warehouse Analytics

Suppose that you are on the path towards building an intelligent warehousing and distribution operations and a key aspect of this “intelligence” that you envision is powered by analytics. As a first step, you identify use cases, as shown below, where you think advanced analytics can add significant value. Let us review them briefly as we will need that information later.

Algorithm Pilot 1: Simulation models are frequently used by Industrial engineers for designing warehouse flows but their usage is primarily prescriptive. However, if you have designed a simulation model accordingly, you can train a neural network from thousands or millons of simulation runs to understand for every possible inbound and outbound flow scenario, what can be the best way to operate the warehouse. Note that two key given aspects that the simulation model will use (along with many others) are known inbound and outbound flow/demand patterns.

So if your warehouse is suddenly overflowing, the deep learning algorithm that is monitoring the data, will flag that at certain point, your flows, schedule, labor assignment etc. need to be modified, to avoid the warehouse from overflows and disruptions. All the tools and technology to develop something like this exist already so there is nothing futuristic about this pilot.



Algorithm Pilot 2: As a logistics professional, you are probably already thinking about the importance of managing your dock flows optimally and the impat any type of mayhem on your docks can have on an algorithm like Pilot 1. As an example, if you dock is overflowing, the algorithm from pilot 1, in silo, will suggest a replan that does not know how docks will manage the overflow in next few hours. Without that information, if it assumes that the docks will function as normal, or at overflow state rate, the new plan will be worthless.

And hence optimizing docks in tandem is necessary. Dock schedule optimization algorithms will typically be heursitics with multiple optimization algorithms embedded. Can be implemented with open source tools.



Pilot 3: But you are a smart warehouse manager and therefore you are already thinking about the challenges of optimizng your dock without having robust visibility and control into your yard. And this is where the third pilot comes into picture. This is relatively simpler since many smart Yard Management Solutions (YMS) exist. What you need is a “linking algorithm ” (read about linking algorithms here: This algorithm taps into the smart YMS data , culls relevant data and translates that into desired information for the dock optimization heuristic.



Pilot 4: And then, since you are strategic, you know that eventually, with all the fancy analytics you plan to run, you will uncover a need to redesign warehouse layout and flows. So you have planned a pilot around this aspect as well.


Now there can be few more pilots in the warehouse analytics domain (and hopefully, the four examples above already made you start thinking in that direction) but for the sake of this example , let us say that you have finalized these four pilots. But what was THE one theme that you noticed as we moved from one pilot to other ? You noticed that:



And imagine the chicken or the egg situation here. If you are optimizing in silo, it is actually never optimal. But creating a massive pilot of an end to end analytics platform is something that also does not make sense and may not be realistic (unless you are developing an off the shelf solution). So what do we generally do ? We go with the lesser evil. We build siloed pilots and build business case based on their results. And the $ savings in the real world may not translate into what the siloed pilot suggested because of the interdependencies.

And this is where the “pilot in the cloud” solution can help. In the environment where they all have access to the same data, and output their results in the same data pool, only to get consumed by the other pilot algorithms, will mimic the true end to end aspect .

How the setup will work-at a high level

To simplify it, here is what you basically need to do (remember, as mentioned earlier, that a key pre-requisite however is that you have a robust IaaS and PaaS solution, which all leading cloud solutions are).

Create a master development environment in the cloud (I have heard from many developers that they believe that cloud advantages are not so common in the world of software development vs business applications. I disagree ! Cloud based IDEs like AWS cloud9, code server, gitpod etc. have been around for some time now and I have seen more sophisticated offerings emerge recently like Koding, which by the way is open source and free and will work on any of your favorite hyperscalers like AWS, Azure and Google Cloud). Next, you can configure a data source location (for example as an AWS EC2 ), that connects to your overall data hub as a node and then this source communicates with something like an AWS cloud9.

The above description has left the granularities of setting this up on purpose but the key aspect here is that you can use your data hub to push data that your model would need, in the data pool created (like the EC2 example). Remember that when you are setting up something like this, you create a standardized development environment that is familiar to your IT Ops and support folks as well (in case you need them). You can further standardize this pilot environment by providing application frameworks, code samples and development tools.

One key aspect that you need to keep in mind though is that you need to make sure that your application & data architecture is designed for interoperability and multi-cloud flexibility.

Now there are many additional benefits but most of them are benefits that come with cloud so I will not dig deeper into them.


What does this mean for SAPinsiders ?

To summarize it, as indicated in the illustration, having your pilot IDEs in the cloud, tapping into the same data source, that is near real time real world data, provides you the following benefits:

  • Seamless connectivity to all data points
  • Test exchange of information among modules rapidly and as close to production environment as possible
  • Faster scaling in testing
  • Ease of transition to production
  • Seamlessly connect to other tools to create platforms
  • Minimal additional effort required if looking to offer as a SaaS product (though the architectural setup will need to be a bit more intense in this case)


Kumar Singh, Research Director, Automation & Analytics, SAPinsider, can be reached at