Artificial intelligence is being applied in retail across the whole product service cycle from targeted marketing, product recommendation, customer service interactions, and customer lifetime value.
With the Deep Learning in Retail & Advertising Summit fast approaching, we're taking a look back at some of our speakers from the event last year. Already confirmed to join us in London this March 15-16 is Dima Karamshuk, Senior Data Scientist from Skyscanner, Changtao Shong, Data Scientist at Twitter, Sam Lloyd, Group Analytics Manager at Travis Perkins, Alessandro Magnani, Distinguished Data Scientist at Walmart Labs and many more. Early Bird discounted passes end on February 2, so register now to guarantee your place at a discounted price.
Last year we were joined by Ben Chamberlain from ASOS who spoke about how they're applying AI and DL to optimise CLTV. We were fortunate enough to have Anthony Cuthbertson, Journalist at Newsweek interview Ben at the summit, and in the interview Ben explained how he began his work in Physics before transitioning into Finance and then ultimately working his way into AI and retail. 'I started looking at how you can profile people based on their social networks to understand customers from their interactions with other people'. At ASOS, 'we predict the customer lifetime value using both ML and DL by taking a load of features that we constantly record at ASOS such as every time someone buys something what did they buy, when did they buy it, what did they look at, and we grind the information up into features before applying standard ML techniques to predict how much they're likely to spend over the next year or so.'
In Ben's Presentation he spoke about CLTV and how DL helps ASOS optimise all aspects of their business mode.
In an ideal world ASOS would know exactly every transaction a customer will make for the rest of time and then they would discount those cash flows so that cash flows today are worth more than cash flows in the future. Then you could some up discounted cash flows and that would be the CLTV. But ASOS cannot do that because they do not know how long a customer lifetime will last for.
Of all the customer profiling that they do at ASOS, customer lifetime value prediction is the key metric they learn. They use it to target 3 other metics, the big business metrics:
Average basket value
By being able to predict how much customers are worth in the future ASOS are able to target them with increased levels of marketing resources, to nurture really good customers and spend less time on the not so good customers.
‘Buy until you die’ method, how long a customer is going to remain a customer multiplied by companies average basket value and multiplied by average frequency of customers.
RFM (Recency frequency monetary value) slightly more sophisticated, estimates 3 random variable.
Both of the above are no longer fit for purpose because now that ASOS have these enormous data sets, principly session logs, but also much more detailed transaction, returns and customer care logs, other machine learning models are now fit to produce better predictions of lifetime value.
To do supervised ML ASOS need labels, this rules out waiting until the end of time- they use a year for each data set. There are 5 main classes of data they use for these models:
Customer care data
Light demographic data
ASOS are currently (at time of presentation) using 135 handcrafted features from the raw data which is done using random forest feature importance (they use Python). However this does not cover cross feature & does not cover features that are really important but only to a small group of people.
One of the real challenges with customer lifetime value prediction is that the distribution of the spend of the customers is horrible. You’ll find a large proportion of customers will churn so zero lifetime value. Then you will have a powerlord distribution so a median of say £100 but some people spend millions of pounds per year at ASOS.
When ASOS did this originally they were wrong for most of the customers, almost £30 over for each. The solution to the problem was to use a model which learns percentiles and then they map from the percentiles onto pound values.
ASOS experimented with 2 different architectures for automatic feature learning.
The first is a wide and deep model. The wide part refers to a really big logistic progression with lots of cross features and the deep part is a deep neural network. Wide network memorises data and the deep network is able to generalise. They tried a different numbers of neurons (calculated they needed 100,00 neurons to beat the performance of the random forest. Realised it was cost 100 times more to run (they run it everyday) so not a viable solution. Benefit does not outweigh the cost.
The second model they tried was taking a random forest and using neural embeddings of customers. With Neural embeddings you learn the embeddings in an unsupervised way with no labels and take them out then stick them in as additional features to the random forest. They use the word2vec deep learning model, which takes categorical data like words and represents them as vectors which have the property that similar things are close to each other in the vector space. Much easier to compute in lower dimensions and you encapsulate similarity which is really important.
They took every product in the ASOS catalogue and looked at the sequence of customers that viewed that product, that gave them a sequence and then they pulled out the TensorFlow or whatever word2vec incrementation they wanted. you stick that sequence in and out comes an embedding of customers where similar customers are close to each other in embedding space. This was a big win!
Interested in learning more about deep learaning in retail? Join us at the Deep Learning in Retail and Advertising Summit in London this March 15-16.