This site uses cookies. Read more.

Industry: Retail & Ecommerce
  • Recommendation systems
  • Pricing models
  • Forecasting
  • Customer retention and churn analysis
  • Behavioral segmentation
  • Trend discovery
  • Deeplearning
  • Data visualizations
  • Reports
  • Advanced R/Shiny and Dash dashboards
  • Scaling R/Shiny for thousands users
  • Data acquisition
  • Data validation
  • Competition monitoring
  • Data enrichment with experts knowledge
  • Data governance

The strides in e-commerce represent an entire paradigm shift in retail. Despite only making up around 10% of all retail purchases, e-commerce accounts for more than $2 trillion dollars in sales. Mapping the interactions between offline and online worlds seems like an arduous task, but when we focus on each customer and their purchasing habits, it can be broken up into a few different paths. We’re going to take a look at a few surprising ways that data science can increase your sales, both offline and online.

Do you know who you are selling to? You already have quite a few different systems for gathering client information, but they are scattered, and you do not take them into consideration. You have member information from in-store purchases, but online purchases do not ask for this information. For example, groceries and big-box stores optimize separately for online and offline. And, this is how our marketing endeavors lose value. If we don’t take a wider look at our data and search for insights, loss is inevitable.

Retail Sales Demo Dashboard

Retail Sales Demo Dashboard

Amazon is a great example with its anticipatory shipping. This means that products are shipped before customers even order them. Past behavior and predicted purchases are used to plan shipment orders, transforming next day shipping into next hour shipping. Is that crazy? Of course, it is! Products are being shipped without anyone to receive them, and it doesn’t even matter. Once these products are in a given area, they can be marketed at a discounted rate or kept at the final hub. This is more of a logistic miracle than an e-commerce one, but it shows that you have to be forward-thinking if you want to lead the future. I would wager that you are not Amazon, but with such a behemoth so far ahead, it makes sense to try your hand at staying competitive in your niche. It is certainly working for Amazon, who made over $2 billion in profits last year.

Think about it

But how does this actually work? There is some machine learning that goes into predicting client behavior. Machine learning takes in data to train a model. Training is the process of feeding data into a model for it to apply statistical weights, allowing the model to automatically recognize future purchases. For example, John purchases a new book every two or three weeks. Based on this behavior, we know what to expect from him.

We do not use all of our data, but rather, we divide it into training and test data. If we use too much of our data in our test set, then we could end up overfitting, which results in our model identifying artifacts in the data that do not exist.

This simplified example isn’t representative of the insights that can be pulled out of millions of clients’ purchase history. These behaviors are then juxtaposed with each other to segment clients into various cohorts that overlap and vary. Machine learning methods can be used for a variety of different use cases, including product recommendations, churn predictions, logistics planning, and automatic personalized marketing.

Let’s get started and build some interesting applications using data science. First off, we need to agglomerate all of the relevant data that we have on our clients. After introducing GDPR in Europe, there are many legal restrictions on what sensitive information you can store. There is a legal way around this – to anonymize your data. Many times, this can be done with a common user ID across every system you use. I also want to be clear and point out that there are certain types of data that we simply cannot collect. Always ask a lawyer or Data Protection Officer about your options when dealing with sensitive information.
We should also consider a few large steps that need executing, namely, creating a data lake and preferably stream processing, as well. You may be wondering why we need a data lake as opposed to a data warehouse. There are a few ways in how they differ, but the biggest advantages of data lakes over data warehouses are:

  • Format: We can be agnostic about what data we let in, it all goes in and doesn’t need structure.
  • Flexible: It can be configured as needed and highly flexible to our needs.
  • Cost: We can use servers adaptively.
  • Processing: Raw data can be loaded without any assigned model.

Flexibility is exactly what we need when doing data science; the entire structure of a data warehouse would need to change whenever we want to try something new. Stream processing lets us react instantaneously to changes in our client’s behavior. For example, they walk into our store but purchase an item on our site at a lower rate. We lost money here because our dynamic pricing did not consider the client’s location. This may not always be avoidable, but we should get those gears spinning to figure out the best possible solutions.

Product Recommendations Systems

There are a few ways for recommendations to reach your clients. The first, most basic example is where we do not know our client or where they are browsing from. Without using data science, we can recommend a high-margin item or purchase that our clients frequently make to our new visitors.

Your recommendations based on intuition are not correct. As an example, let suppose we have three products:

A) margin: $10 probability of purchase: 10% = expected margin is $1

B) margin: $5 probability of purchase: 30% = expected margin is $1.5

C) margin: $1 probability of purchase: 60% = expected margin is $0.6

The best product to recommend here is B, even though it is not most common (less than C) and does not have the highest margin (less than A).
Now, consider a client who we recognize and can tailor our offering to. Based on their activity, in combination with our other client data, we know what we expect our client to purchase. Not only that, but we have information about what items fit our client’s needs. Let’s say that they have a grill in their shopping cart and are proceeding to checkout. This would be a perfect time to remind our client about a spatula, grill scrubber, lighter fluid, or a thermometer. This helps increase sales but also makes sure that our client is not unpleasantly surprised when their item doesn’t include batteries. And, we can always get data, including:

  • weekday, time of a day
  • IP address -> geolocation
  • browser data (mobile or desktop? which internet browser?)
  • behavioral data (how do they browse our website?)

The most interesting and complex way to help our customers is by taking their aesthetics into account. Taking into consideration historical purchases, recent activity, online behavior. and social media gives us a much deeper preview of their interests, preferences, and overall taste. This is how we truly appeal to our client individually. Furniture is a segment that aptly highlights where this can provide immense value. We know that our client is looking for a chair. Based on their activity, they are quite interested in tech and sci-fi. Therefore, a cushy leather chair should not be our first recommendation, but rather, an ergonomic chair with sharp lines.

Product Recommendation System Demo

Product Recommendation System Demo


Assortment Optimization

Long tails – the bane of inventory and catalog optimization; well, what if we got rid of them? There are obviously items that we need to keep satisfying our niche customers, but overall, we should take a deep look at what we keep in stock. By analyzing our product demand and the types of products our clients usually buy, we can more accurately ascertain the need for certain products to be in stock. There may be items that are very niche but pull in a large cart because this item has a lot of associated cross products. The margins on other long tail products may be so high, that even with only a few annual purchases, they are still worth including.

Competitors are a very good source of information for the assortment we may want to include. Considering a full view of competitors’ inventory can illuminate blind spots and introduce you to products that you should have offered a long time ago.

In regard to the personalized product recommendations above, this works when creating new products or updating seasonal lines. We take the aggregate aesthetics of our proposed segments and can see the types of designs that could bring in the highest number of new clients. It may turn out that a neon purple carpet with pterodactyls are exactly what expecting parents want in their soon-to-be child’s bedroom. I don’t think this is the case, but that is the issue we are addressing. Human intuition pales in comparison to the information and insight we can pull from our data. If we are diligent in our data science approach and streamline our inventory to a near-optimal level, savings start popping up everywhere – from decreased production times to minimized maintenance. Customer service and sales can also focus on increasing their product expertise to better serve customers.


The market will pay whatever the market can support. The price of fruit obviously changes depending on their country of origin, season, and weather conditions. Those are supply-side changes, but what about demand? Various customers are prepared to purchase different items at drastically different prices. A taxi driver needs oil changes or car filters much more than a student who only drives home every other weekend. Scale also plays into pricing. Buying in bulk takes this into consideration but does not include recurring purchases. It makes sense to discount a given item if we expect our client to purchase the same item every month. Also, adaptive pricing that reacts to deals that may be lost could benefit from a discount at the point of purchase. The best pricing decisions also take a larger scope of data into account, including weather, location, time and day of purchase, and other economic factors.


Personalized Marketing

This is especially beneficial for large retailers. Let’s take a look at an example that holds quite a bit of promise for routine advertising. This can apply to digital and offline equally, but it is easier for digital. A monthly newsletter with information about new products, discounts, and promotions is often sent our entire list of customers, without any regard for who they are. But they should be personalized, at least to the cohort level. Here, we can take advantage of the product recommendations, assortment optimization, and dynamic pricing, as they all have an effect on our communications.

A few years ago, there was quite a bit of buzz around Target, in particular, detecting pregnant customers. These customers are proven to be extremely tired and not have any desire to do their groceries in multiple places. As such, it was in Target’s interest to get these customers through the door for formula or diapers because they would then do their entire grocery shopping for the week. The lifetime value of these clients is very high, as there is low churn among them for a few years following pregnancy. But, the hype was over glorified. The story itself revolves around a Target manager, a 16-year-old girl, her father, and a flier. The father had received fliers and coupons for maternal vitamins, diapers, etc. and came in to complain to the manager. The manager apologized and proceeded to call the father a few weeks later, but it turned out that the daughter was in fact pregnant. While the exact story never happened, Target did find a way to predict pregnancies. It turns out that there are very specific items purchased when an individual is in their second trimester. Over 20 items taken into consideration were enough to give every customer a pregnancy chance score. This example illustrates the kinds of insights that can be discovered but only scratched the surface of what is possible.


We’ve looked at four examples of how data science and machine learning can be used in online and offline retail. While the number of possibilities is almost unlimited, the most important points I want to reiterate are the benefits of personalization and the true impact of trusting in data. It is wasteful to collect data and not take advantage of it.

Filip Stachura
Filip Stachura

Contact us. We will reach you in 24h.