Inter-purchase time is a critical factor for predicting customer churn. Improving the prediction accuracy can exploit consumer’s preference and allow businesses to learn about product or pricing plan weak points, operation issues, as well as customer expectations to proactively reduce reasons for churn. Although remarkable progress has been made, classic statistical models are difficult to capture behavioral characteristics in transaction data because transaction data are dependent and short-, medium-, and long-term data are likely to interfere with each other sequentially. Different from literature, this study proposed a hybrid inter-purchase time prediction model for customers of on-line retailers. Moreover, the analysis of differences in the purchase behavior of customers has been particularly highlighted. The integrated self-organizing map and Recurrent Neural Network technique is proposed to not only address the problem of purchase behavior but also improve the prediction accuracy of inter-purchase time. The permutation importance method was used to identify crucial variables in the prediction model and to interpret customer purchase behavior. The performance of the proposed method is evaluated by comparing the prediction with the results of three competing approaches on the transaction data provided by a leading e-retailer in Taiwan. This study provides a valuable reference for marketing professionals to better understand and develop strategies to attract customers to shorten their inter-purchase times.

Inter-purchase times prediction is about predicting when a consumer may purchase a product or service again based on his/her purchase history. Inter-purchase times prediction has been applied to churn prediction, online advertising, search engines, recommendation systems, and inventory control. Therefore, improving the prediction accuracy can help businesses lower the customer churn rate and determine deficiencies in business plan or operation process.

In literature, various classical statistical approaches have been proposed to predict inter-purchase time. For example, reference [

Although remarkable progress has been made, classic statistical models are difficult to capture behavioral characteristics in transaction data because transaction data are dependent and short-, medium-, and long-term data are likely to interfere with each other sequentially. Alternatively, various researchers have switched to Markov decision process (MDP) based techniques because of their ability to capture sequential information [

To solve the problem mentioned above, this study applied recurrent neural networks (RNNs), a type of deep learning models, to construct an inter-purchase time prediction model in relation to various purchase behavior characteristics of online customers at several time points. The characteristics of purchase behaviors included the seasons and times of customer transactions, devices used by customers during transactions, types of product purchased, and purchase amounts. In addition, to increase the prediction accuracy of the RNN model and understand the heterogeneity of purchase behavior, a self-organizing map (SOM) was used to pre-classify the similarity of customers’ purchasing behavior. The analysis of variance (ANOVA) was applied to identify the key differences between clusters. Meanwhile, to interpret critical features for the prediction, we employed the permutation importance method [

To evaluate the effectiveness of the proposed SOM–RNN method, this study used customer transaction data provided by a major e-commerce company in Taiwan. Moreover, the prediction accuracy of the proposed model was compared with single RNN model and two families of the machine learning model, such as Multi-Layer Perceptron (MLP) and Support Vector Regression (SVR). The above models are used as benchmarks for model comparison because their successful data mapping characteristics. For more information regarding these models, please refer to the work of [

Deep learning is an algorithm based on the principle of machine learning [

The transaction data used in this study consists of customer’s ID and login date/time, device, and purchased items with prices. To obtain a meaningful dataset, a list of query and data preprocessing were executed. Since this research focuses on predicting purchasing behavior throughout the transaction, the dataset was transformed to a format in which each row consisted of customer’s ID, transaction ID, login date/time, purchased items, total purchase amount and inter-purchase time. In other words, the prediction model constructed in this study can predict the time interval between the ^{th} and (t+1)^{th} purchases based on a customer’s t^{th} purchase behavior. To effectively reduce differences in the data, increase the model’s prediction accuracy, and understand the differences in purchase behaviors, this study used an SOM to perform similarity clustering on the transaction data of Internet users. Multiple prediction variables were used as the input units in this study; that is, the vector data of multidimensional space were mapped to two-dimensional topological spaces, and the output was the clustering result. In addition, a one-way ANOVA test on the clustering results was used in the study to clearly analyze the differences between clusters. Finally, the prediction model for each cluster by regarding the seasons and times of customer transactions, purchased product type and purchased total price as input variables was built by RNN. When carrying out the construction of the RNN model, we search many different values for each of considered parameters, such as neural network unit, parameter initializer, dropout rate, and optimization type, to optimize the model setup. The detailed illustration of each utilized techniques in the study is provided as follows:

An SOM is a feedforward and unsupervised neural network model proposed by Kohonen [

The establishment of an SOM model includes three crucial processes, namely, the competitive, cooperative, and adaptive processes. The calculation process can be briefly described as follows: Assuming that the input variable X of each M dimension can be defined as shown in

The competitive process refers to the neuron i(X) (also known as the winning neuron) most similar to the input vector X, calculated according to _{j} have the greatest similarity.

In the cooperative process, the winning neurons obtained from the competitive process are regarded as the center of their topological neighborhoods, and the distances from the winning neurons to other neurons are also calculated. Because the interactions between neurons in a topological space are inversely proportional to the distances between neurons, greater distance between neurons in the topological space signifies less mutual influence. This topological neighborhood concept can be expressed using a Gaussian function as shown in

where the neighboring area of function _{j,i(x)} is the proximity value between the winning neuron ^{2}_{j,i} is the Euclidean distance between

The third process of the SOM model is the adaptive process for neuron connection weight, whereby the connection weight is adjusted according to the distance from the input sample, with the adjustment method as shown in

The calculation process of the entire SOM network model is repeated through the aforementioned competitive, cooperative, and adaptive processes until the network converged. Finally, the input samples and their corresponding activated neurons are arranged in a grid in the topological space, and the numbers or names are marked in the arranged grid to obtain a feature map. The marked grid element represents the neuron activated by a specific input sample in the SOM network and is called the image of a specific input sample. The distribution of input samples can be observed based on density maps obtained from the cumulative number of input samples corresponding to each map.

An RNN can be regarded as a conventional artificial neural network that expands the information cycle over time. It allows neurons to interconnect to form a cycle, so information at

According to _{t-1}, _{t}}, where _{t} = (x_{1}, x_{2},…,x_{N}). In a fully connected RNN, the input unit is connected to the hidden unit in the hidden layer, and the connection can be defined by the weight matrix W_{IH}. The hidden layer contains M hidden units, _{t} = (h_{1}, h_{2},…, h_{M}), which are interconnected through recurrent connection W_{HH}. The hidden layer structure of RNN also defines the state space of the system as shown in

where f_{H}(•) is the activation function of the hidden layer; b_{h} is the bias vector of the hidden unit. The hidden unit is connected to the output layer through weighted connections W_{HO}. The output layer has P units, which can be expressed as _{t} = (y_{1}, y_{2},…, y_{P}), and it is estimated as follows:

where f_{O}(•) is the activation function of the output layer; b_{o} is the bias vector in the output layer. Because input–target pairs were arranged in chronological order, the aforementioned steps were also repeated with

As shown in

A transaction data from a Taiwanese e-retailer selling more than 100 assortments of skin cares and cosmetics products was used to illustrate the proposed method. The firm’s website is structured with several categories and each category consists of multiple product overview pages. In an overview page, an array of product photos is shown. By clicking the product photo, customers will be led to the page of product details which provides high-resolution product photos, price, and product description. Customer transaction data were collected during a time period of about nine months, dating from Feb. 1^{st} 2020 until Oct. 31^{th} 2020. During the nine-month time period, 1,254,188 transactions were made by 81,547 unique customer IDs, which can be considered a high data volume compared to most previous studies [^{th} and (t+1)^{th} transaction of customers so that, given consumer behavior revealed the data analysis, the firm can deliver appropriate marketing stimuli to a customer to shorten the inter-purchase time before next transaction.

Since this research focuses on predicting customer’s inter-purchase time throughout the transaction, the dataset was transformed to a format in which each row consisted of Customer ID, Transaction ID, device, Purchased product type, and purchase amount. Following [_{1}) and weekends (x_{2}). The variable of transaction time in a day was classified into morning (x_{3}), afternoon (x_{4}), evening (x_{5}), and midnight (x_{6}). The devices (computers, mobile phones, and tablets) used to place an order was classified into computers (x_{7}), mobile (x_{8}), and tablets (x_{9}). The product in this dataset can be categorized to skincare(x_{10}), lip care(x_{11}), daily necessities(x_{12}), cosmetics(x_{13}), manicure products(x_{14}), and spa products(x_{15}). Dummy coding was applied to all these variables. In addition, the total purchase amount was represented by x_{16}. The dependent variable, inter-purchase time (y), was defined by the number of days between the customer’s current transaction date (t) and the next transaction date (t+1). Moreover, because an inter-purchase time is affected by the preceding inter-purchase time, the previous inter-purchase time [y(t-i)] was also included as a predictor along with the aforementioned x_{1}, …, x_{16}. The definition of each variable and an example of the type of data structure were shown in

Variable | Definition |
---|---|

x_{1}(t) |
Whether the t^{th} transaction made in weekday (0 = no,1 = yes) |

x_{2}(t) |
Whether the t^{th} transaction made in weekend (0 = no,1 = yes) |

x_{3}(t) |
Whether the t^{th} transaction made in the morning (0 = no,1 = yes) |

x_{4}(t) |
Whether the t^{th} transaction made in the afternoon (0 = no,1 = yes) |

x_{5}(t) |
Whether the t^{th} transaction made in the evening (0 = no,1 = yes) |

x_{6}(t) |
Whether the t^{th} transaction made in midnight (0 = no,1 = yes) |

x_{7}(t) |
Whether the computer used to place the t^{th} transaction (0 = no,1 = yes) |

x_{8}(t) |
Whether the mobile phone used to place the t^{th} transaction (0 = no,1 = yes) |

x_{9}(t) |
Whether the tablet used to place the t^{th} transaction (0 = no,1 = yes) |

x_{10}(t) |
The quantity of skincare product purchased at the t^{th} transaction |

x_{11}(t) |
The quantity of lip care product purchased at the t^{th} transaction |

x_{12}(t) |
The quantity of daily necessities product purchased at the t^{th} transaction |

x_{13}(t) |
The quantity of cosmetics product purchased at the t^{th} transaction |

x_{14}(t) |
The quantity of manicure product purchased at the t^{th} transaction |

x_{15}(t) |
The quantity of spa product purchased at the t^{th} transaction |

x_{16}(t) |
The purchase amount of the t^{th} transaction |

y(t) | The inter-purchase time between the t^{th} and the (t+1)^{th} transactions |

A computing system consisting of an Intel Xeon E5-2673 V3 with 8 cores running at 3.2 GHz and 128 GB RAM was used in this study. We implemented SOM, RNN, SVR, and MLP methods in Python using scikit-learn, while we used TensorFlow for all experiments with deep learning. Four error evaluation criteria, RMSE = (Σ(T_{i}−P_{i})^{2}/n)^{1/2}, MAE = Σ|T_{i}−P_{i}|/n, MAPE = Σ|(T_{i}−P_{i})/T_{i}|/n and RMSPE = (Σ((T_{i}−P_{i})/T_{i})^{2}/n)^{1/2} were considered in this study where RMSE, MAE, MAPE and RMSPE are the root mean square error, mean absolute error, mean absolute percentage error, and root mean square percentage error, respectively; T_{i} and P_{i} represent the actual and predicted value of the i^{th} data points, respectively; n is total number of data points.

In this study, to enhance the precision of the applied RNN model in predicting inter-purchase time, we adopted the way by Kagan et al. [_{1} and _{2} is equal to _{1} and _{2} and _{1}, _{2}) the covariance of _{1} and _{2}. It takes values between –1 and 1 where 1 is total positive linear correlation, –1 is total negative linear correlation and 0 is no linear correlation. We observed that variables are not strongly correlated with each other indeed. That is the variables do not contain similar information. These weakly correlations may be leveraged in the variables importance process.

To confirm that the final implementation results of the SOM provide satisfactory clustering quality (lower is preferable), this study adopted six output dimensions ( 3*1, 4*1, 5*1, 6*1, 7*1, 8*1) for SOM cluster analysis. The quality of clustering is an index used to indicate the density of the data’s and clusters’ centers of gravity. In general, a larger output dimension provides higher clustering quality, but the explanatory power of the clustering result is relatively difficult to interpret. In this study, the clustering quality under the 4*1 output dimension was optimal (i.e., the greatest data density), so four clusters were used for subsequent analysis and comparison of inter-purchase time prediction models. In addition, to verify the appropriateness of the boundaries of online purchase behavior between the four clusters, this study used ANOVA for testing of the clustering results. Variable means of each cluster were reported in

As

Variable | Cluster1 | Cluster2 | Cluster3 | Cluster4 | Overall | Overall | |
---|---|---|---|---|---|---|---|

N = 2370 | N = 2064 | N = 1682 | N = 1529 | mean | S.D | ||

1.045 | 1.541 | 1.468 | 1.424 | 1.152 | 0.0113** | ||

2.000 | 1.050 | 0.012 | 0.340 | 0.569 | 0.0387** | ||

0.261 | 0.167 | 0.312 | 0.203 | 0.482 | 0.0374** | ||

0.771 | 0.419 | 0.332 | 0.373 | 0.664 | 0.0193** | ||

1.233 | 0.613 | 0.538 | 0.586 | 0.811 | 0.0035** | ||

1.510 | 0.597 | 1.055 | 0.705 | 0.837 | 0.0536* | ||

1.285 | 0.654 | 0.601 | 0.644 | 1.075 | 0.0020** | ||

1.367 | 0.932 | 1.862 | 1.108 | 1.194 | 0.0873* | ||

0.032 | 0.043 | 0.029 | 0.028 | 0.032 | 0.225 | 0.1142 | |

1.887 | 1.661 | 1.550 | 1.635 | 3.240 | 0.0317** | ||

0.117 | 0.092 | 0.063 | 0.156 | 0.073 | 0.346 | 0.1421 | |

1.523 | 0.514 | 1.303 | 0.614 | 2.202 | 0.0444** | ||

1.780 | 1.079 | 0.757 | 2.679 | 0.885 | 1.821 | 0.2106 | |

0.181 | 0.158 | 0.138 | 0.168 | 0.781 | 0.0421** | ||

0.021 | 0.018 | 0.023 | 0.019 | 0.020 | 0.052 | 0.1654 | |

4057.2 | 3922.4 | 4054.8 | 4088.2 | 4021.56 | 3278.84 | 0.2131 |

Note: **:

After clustering the purchasing behavior by SOM, we build a predictive model for each SOM cluster. The purchase behavior data of each cluster included all transaction records belonging to the cluster customers. In addition, because traditional evaluation methods, such as using train-test splits and k-fold cross validation, ignore the temporal components inherent in the time series data, we have to split up data and respect the temporal order in which values were observed. To retain the training data in the chronological order of customer purchases, this study used customers as the units and randomly divided the customer data into two datasets. The datasets were respectively divided into 70% and 30% for estimation and test set for modeling customer transaction data. Then, all variables (i.e. x_{1}(t), …, x_{16}(t), y(t)) were ordered by transaction ID and normalized in the range between 0 and 1 with

For RNN model, transaction date (x_{1}(t), x_{2}(t)), transaction time period (x_{3}(t), …, x_{6}(t)), used devices (x_{7}(t), x_{8}(t), x_{9}(t)), the type of product purchased (x_{10}(t), …, x_{15}(t)) and the total transaction amount (x_{16}(t)) were taken into consideration along with previous inter-purchase time y(t–1). In addition, to capture conditional dependencies between successive transactions in the model, the number of transaction lag (tg) was defined as the number of transaction delays and treated as one of the hyper-parameters of the RNN model in this study. Hence the size of the variation of the current purchasing behavior will be represented by matrix of size tg×20 and the whole data is divided into several sliding windows. The concept of sliding window is shown in

For the other hyper-parameters of the RNN model, we consider the following: (1) number of hidden units of an RNN cell; (2) parameter initializer; (3) activation type; (4) dropout rate; and (5) optimization type. The number of hidden units of an RNN cell is the dimensionality of the last output space of the RNN layer. The parameter initializer represents the strategy for initializing the RNN and Dense layers' weight values. The activation type represents the type of activation function that produces non-linear and limited output signals inside the RNN and Dense I and II layers. Furthermore, the dropout rate indicates the fraction of the hidden units to be dropped for the transformation of the recurrent state in the RNN layer. Finally, the optimization type designates the optimization algorithm to tune the internal model parameters so as to minimize the mean squared error loss function. The candidate values used to perform the grid search for the hyper-parameters in the RNN model are listed in

Hyper-parameter name | Hyper-parameter values | Example of optimal hyper-parameter | |||
---|---|---|---|---|---|

Cluster1 | Cluster2 | Cluster3 | Cluster4 | ||

No. of hidden units | {16, 32, 64} | 64 | 64 | 64 | 32 |

Parameter initializer | {normal, uniform, glorot_normal |
glorot_ |
glorot_ |
glorot_ |
glorot_ |

Activation type | {relu, tanh, softmax} | softmax | softmax | softmax | softmax |

Dropout rate | {0.0, 0.1, 0.2, 0.3} | 0.2 | 0.2 | 0.1 | 0.2 |

Optimization type | {SGD, adam, rmsprop,} | Adam | Adam | Adam | Adam |

Batch size | {1, 100, 200} | 200 | 200 | 200 | 100 |

No. of time lag (tg) | {3, 4, 5} | 5 | 5 | 4 | 4 |

Model | Cluster | Training | Testing | |||||||
---|---|---|---|---|---|---|---|---|---|---|

RMSE | MAD | MAPE | RMSPE | Time* | RMSE | MAD | MAPE | RMSPE | ||

SOM-RNN | 1 | 0.05 | 0.05 | 14% | 16% | 612 | 0.06 | 0.06 | 16% | 22% |

2 | 0.05 | 0.13 | 5% | 8% | 631 | 0.06 | 0.19 | 7% | 10% | |

3 | 0.08 | 0.08 | 16% | 21% | 658 | 0.12 | 0.10 | 22% | 30% | |

4 | 0.17 | 0.15 | 17% | 19% | 629 | 0.20 | 0.18 | 22% | 27% | |

SOM-SVR | 1 | 0.05 | 0.04 | 12% | 18% | 594 | 0.07 | 0.06 | 17% | 23% |

2 | 0.05 | 0.16 | 7% | 9% | 641 | 0.06 | 0.20 | 10% | 13% | |

3 | 0.11 | 0.09 | 19% | 25% | 597 | 0.13 | 0.11 | 23% | 31% | |

4 | 0.15 | 0.13 | 18% | 27% | 651 | 0.21 | 0.18 | 25% | 34% | |

SOM-MLP | 1 | 0.05 | 0.04 | 14% | 17% | 316 | 0.07 | 0.06 | 19% | 24% |

2 | 0.05 | 0.16 | 9% | 10% | 336 | 0.07 | 0.21 | 12% | 15% | |

3 | 0.10 | 0.08 | 23% | 30% | 333 | 0.13 | 0.11 | 29% | 37% | |

4 | 0.16 | 0.16 | 27% | 32% | 322 | 0.23 | 0.20 | 36% | 44% |

Note: * time to train the model (in second).

Model | RMSE | MAD | MAPE | RMSPE |
---|---|---|---|---|

SOM-RNN | 0.11359 | 0.13281 | 17.51% | 22.84% |

SOM-SVR | 0.11716 | 0.13701 | 18.73% | 25.03% |

SOM-MLP | 0.12333 | 0.14904 | 24.21% | 30.13% |

RNN | 0.12082 | 0.14516 | 22.50% | 28.25% |

To evaluate the robustness of the proposed method, the performance of the SOM-RNN and the comparison models was tested using different ratios of training and testing sample sizes. The testing experiment is based on the relative ratio of the size of the training dataset size to complete dataset size. In this section, three relative ratios are considered. The prediction results for the four clusters made by SOM-RNN and the comparison models are summarized in

Training ratio% | Model | Cluster1 | Cluster2 | Cluster3 | Cluster4 |
---|---|---|---|---|---|

70% | SOM-RNN | 16.38% | 6.95% | 22.01% | 21.96% |

SOM-SVR | 17.13% | 9.62% | 22.84% | 25.29% | |

SOM-MLP | 18.99% | 11.46% | 29.08% | 36.21% | |

80% | SOM-RNN | 15.79% | 5.87% | 16.01% | 18.45% |

SOM-SVR | 16.44% | 7.85% | 16.12% | 20.64% | |

SOM-MLP | 16.74% | 10.28% | 18.49% | 32.46% | |

90% | SOM-RNN | 12.60% | 5.02% | 12.55% | 16.47% |

SOM-SVR | 12.37% | 6.74% | 13.17% | 19.14% | |

SOM-MLP | 13.89% | 6.44% | 15.29% | 32.24% |

In

In order to test whether the proposed SOM-RNN model is superior to the comparison models in inter-purchase time prediction, the Wilcoxon signed-rank test is applied for SOM- RNN model. The Wilcoxon signed-rank test is a distribution-free, non-parametric technique which determines whether two models are different by comparing the signs and ranks of prediction values. The Wilcoxon signed-rank test is one of the most popular tests in evaluating the predictive capabilities of two different models [

Model | Relative |
Comparative model | |
---|---|---|---|

SOM-SVR | SOM-MLP | ||

SOM-RNN | 60 | –2.558(.011)** | –2.533(.011)** |

70 | –2.533(.011)** | –2.533(.011)** | |

80 | –2.585(.010)** | –2.533(.011)** | |

90 | –2.558(.011)** | –2.533(.011)** |

Note: The numbers in parentheses are the corresponding

To help researchers understand the prediction, it is necessary to realize the importance of different features in the models. Deep learning models are difficult to interpret because of their complex structures and a significant number of parameters. To evaluate the importance of features in RNN models, we employed the permutation importance method. The permutation importance method initially proposed by Breiman [_{1} (Whether the t^{th} transaction was made in weekday) is the variable which influences the prediction of inter-purchase time most. On contrary, X_{2} (Whether the t^{th} transaction was made in weekend) have less impact on the prediction of inter-purchase time.

The SOM–RNN model proposed in this study not only improved the inter-purchase time prediction accuracy, discovered purchase behaviors of website customers, but also made a substantial contribution to search engine optimization (SEO) and product marketing. Relevant research results can assist website managers in determining approaches to adjust web content to shorten inter-purchase time of customers, as well as help marketing executives gain a clear understanding of adopting certain measures to shorten inter-purchase time. In addition, the inter-purchase time prediction method for website customers proposed by this study provided a systematic description and application programs for the e-commerce platforms of different industries, which can contribute to the growth and development of companies.

The result of this study also indicates that search engine design supervisors should provide suitable product information according to customer purchase behavior and product preference, indirectly inducing Google to provide more organic search traffic to reward the webpage. Moreover, marketing professionals can shorten sentences and use content chunking to ensure that product information can be digested according to the product preferences of website customers. Furthermore, keywords or visual effects can be added at appropriate times to induce customers to spend more. For example, for customers who prefer to purchase manicure products

This paper proposed an inter-purchase time prediction model by integrating SOM and RNN (SOM-RNN). SOM was applied to group customers according to the similarity of behavior. Then, for each cluster, customer’s purchase behavior data were applied to RNN to construct inter-purchase time prediction model. Finally the permutation importance method was employed to rank the importance of features in the inter-purchase time prediction models. The transaction data provided by a leading e-retailer in Taiwan was used to evaluate the proposed method. Moreover, this study compares the proposed method with SOM-SVR, SOM-MLP and single RNN using prediction error as criteria. The empirical results show that the suitable SOM-RNN models with variable importance interpretation can be developed, and the optimal hyper-parameter values are searched to predict inter-purchase time of customers. Moreover, the sensitivity analysis has also been performed to test the consistency of the proposed model. One of the key findings of the results is that the website purchase behavior identified by SOM in this study can be used to develop optimal search engine strategies and marketing tactics.