<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">51046</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2024.051046</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Learning Dual-Layer User Representation for Enhanced Item Recommendation</article-title>
<alt-title alt-title-type="left-running-head">Learning Dual-Layer User Representation for Enhanced Item Recommendation</alt-title>
<alt-title alt-title-type="right-running-head">Learning Dual-Layer User Representation for Enhanced Item Recommendation</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Zhu</surname><given-names>Fuxi</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-2" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Xie</surname><given-names>Jin</given-names></name><xref ref-type="aff" rid="aff-2">2</xref><email>jinxie@scuec.edu.cn</email></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Alshahrani</surname><given-names>Mohammed</given-names></name><xref ref-type="aff" rid="aff-3">3</xref></contrib>
<aff id="aff-1"><label>1</label><institution>Applied Research Center of Artificial Intelligence, Wuhan College</institution>, <addr-line>Wuhan, 430212</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>College of Computer Science, South-Central MINZU University</institution>, <addr-line>Wuhan, 430074</addr-line>, <country>China</country></aff>
<aff id="aff-3"><label>3</label><institution>Unmanned. Company,</institution> <addr-line>Riyadh, 11564</addr-line>, <country>Saudi Arabia</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Jin Xie. Email: <email>jinxie@scuec.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2024</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>18</day><month>7</month><year>2024</year></pub-date>
<volume>80</volume>
<issue>1</issue>
<fpage>949</fpage>
<lpage>971</lpage>
<history>
<date date-type="received">
<day>26</day>
<month>2</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>23</day>
<month>5</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2024 Zhu, Xie and Alshahrani</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Zhu, Xie and Alshahrani</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_51046.pdf"></self-uri>
<abstract>
<p>User representation learning is crucial for capturing different user preferences, but it is also critical challenging because user intentions are latent and dispersed in complex and different patterns of user-generated data, and thus cannot be measured directly. Text-based data models can learn user representations by mining latent semantics, which is beneficial to enhancing the semantic function of user representations. However, these technologies only extract common features in historical records and cannot represent changes in user intentions. However, sequential feature can express the user&#x2019;s interests and intentions that change time by time. But the sequential recommendation results based on the user representation of the item lack the interpretability of preference factors. To address these issues, we propose in this paper a novel model with Dual-Layer User Representation, named DLUR, where the user&#x2019;s intention is learned based on two different layer representations. Specifically, the latent semantic layer adds an interactive layer based on Transformer to extract keywords and key sentences in the text and serve as a basis for interpretation. The sequence layer uses the Transformer model to encode the user&#x2019;s preference intention to clarify changes in the user&#x2019;s intention. Therefore, this dual-layer user mode is more comprehensive than a single text mode or sequence mode and can effectually improve the performance of recommendations. Our extensive experiments on five benchmark datasets demonstrate DLUR&#x2019;s performance over state-of-the-art recommendation models. In addition, DLUR&#x2019;s ability to explain recommendation results is also demonstrated through some specific cases.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>User representation</kwd>
<kwd>latent semantic</kwd>
<kwd>sequential feature</kwd>
<kwd>interpretability</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>Applied Research Center of Artificial Intelligence</funding-source>
<award-id>X2020113</award-id>
</award-group>
<award-group id="awg2">
<funding-source>Wuhan College Research Project</funding-source>
<award-id>KYZ202009</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>User representation learning refers to building a user&#x2019;s interest representation through the analysis of user behavior and preferences, as well as the modeling and learning of user-related data. This is an important step towards a personalized recommendation system. A common research direction in user modeling is based on representation learning, using special machine learning algorithms to model or represent users or behaviors [<xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;<xref ref-type="bibr" rid="ref-4">4</xref>].</p>
<p>Among the numerous user-related data, text usually contains users detailed descriptions, evaluations, opinions and other information about items, which can provide a deeper understanding of user interests. Compared with simple click records or rating data, review can provide richer and more fine-grained user feedback. Moreover, review can provide certain contextual information to help understand the user&#x2019;s motivation and background. Finally, the review may contain some implicit interests and needs. Therefore, review can better capture users&#x2019; interests and hobbies and provide more accurate, comprehensive and personalized recommendations.</p>
<p>However, using review for user representation learning and applying it to item recommendation also faces some challenges and difficulties: 1) Data sparsity: Review usually has high sparsity. This leads to data imbalance and sparsity problems when training representation models. 2) Complexity of semantic understanding: User review texts are usually subjective and individual differences, so there are complex semantic structures. Different users may have different reviews on the same item, and the reviews on different items may also be diverse. 3) Contextual understanding and time-effectiveness: User reviews are usually generated in specific contexts. Therefore, the acquisition and utilization of contextual information need to be considered. Additionally, users&#x2019; interests and preferences may change over time.</p>
<p>After the emergence of transformer, applying pre-training and self-attention mechanisms to natural language processing can alleviate the complex semantic structure in review texts and deepen the semantic understanding of the context. However, in item recommendation, simply using transformer to extract text semantics is not complete enough, and does not take into account the role of the interactive relationship between items and users on key phrases in reviews. At the same time, the data sparsity problem in item recommendation, the time-effectiveness of user representation and recommendation explanation have not been solved.</p>
<p>Towards this end, we propose dual-layer user representation learner, named DLUR. This framework can utilize rating information as a supplement to review text data to alleviate the data sparsity problem. The semantic understanding layer adopts a semantic representation method from words to sentences and then transitions to paragraphs, extracts parts with strong semantic relevance in layers, and effectively parses complex semantic structures. In addition, in terms of the division of data paragraphs, paragraphs are collected from the two perspectives of users and items, maintaining the user&#x2019;s subjectivity and the diversity of item content. Finally, text feature extraction is used to process the context. The sequence feature layer extracts sequence features to solve the timeliness of user representation. The resulting representation learning framework is also capable of refining interpretable sentences.</p>
<p>Specifically, the core part of DLUR revolves around user interest representation learning. By integrating user interest representation and item representation, and using the pairwise learning method to train the model, the items that the user is interested in can be predicted. Moreover, in the process of user representation learning, we use user-item interaction, and the designed integration model can calculate the weight of text sentences, and the key sentence patterns extracted can be used as explanation sentences. User interest representation learning is divided into three parts: latent factor learning, text representation learning and sequential factor learning. In the first component, we utilize LFM to extract the long-term latent factors of users in ratings. In the second component, we add an interactive attention layer to the Transformer model to further increase the weight by integrating interactive attention and self-attention, improve the accuracy of semantic extraction, and thus mine the interesting parts of users&#x2019; comment data. In the third component, we utilize item reviews to mine sequence factors to capture users&#x2019; dynamic interest changes in a timely manner. To summarize our main contributions of this paper as follows:
<list list-type="order">
<list-item>
<p>This paper proposed a novel user representation learning method, called DLUR, that is capable of 1) having the ability to learn from sequences, and 2) capturing relationships between users and items. and, 3) extracts multi-form factors to ensure the versatility of user representation.</p></list-item>
<list-item>
<p>This paper applied DLUR in the recommendation process and can provide recommendation explanations at the same time.</p></list-item>
<list-item>
<p>This paper has extensive experiments on two public datasets demonstrated the superiority of DLUR compared to the recent state-of-the-art methods. A further appeal of DLUR is its applicability in real-world scenarios, which validates possibility of adopting DLUR on various Web platforms.</p></list-item>
</list></p>
<p>The remainder of the paper is organized as follow. In <xref ref-type="sec" rid="s2">Section 2</xref>, we highlight the relevant works of recommender system and text representation. The framework and detailed construction of our model are introduced in <xref ref-type="sec" rid="s3">Section 3</xref>, and <xref ref-type="sec" rid="s4">Section 4</xref> applies the model in recommender system. <xref ref-type="sec" rid="s5">Section 5</xref> presents the results and analysis of the experiments. <xref ref-type="sec" rid="s6">Section 6</xref> concludes the paper and provides suggestions for further research.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<sec id="s2_1">
<label>2.1</label>
<title>Recommender Systems</title>
<p>Since the recommendation system lacks a certain understanding of the relationship between users and recommended items, in other words, it is indifferent to the interaction between users and items, resulting in a scarcity of data that can be used for recommendations. The main methods to solve the problem of data sparsity can be subdivided into context, collaborative filtering and algorithm-based improvement optimization.</p>
<p>Context-aware recommendations can alleviate the data sparsity problem. Jannach and Ludewig use different time divisions for evaluation to reduce the amount of data required for training and improve the efficiency of algorithm learning [<xref ref-type="bibr" rid="ref-5">5</xref>]. CoSeRNN is a neural network architecture that models a user preferences as a series of embeddings, one per session. By using approximate nearest neighbor search algorithm, context-sensitive instant recommendations are efficiently generated [<xref ref-type="bibr" rid="ref-6">6</xref>]. Unger et al. integrated contextual information into the neural collaborative filtering recommendation method and proposed three deep context-aware recommendation models based on explicit, unstructured and structured latent representations of contextual data [<xref ref-type="bibr" rid="ref-7">7</xref>]. Zheng et al. used multi-angle attribute interaction and local lifting technology to effectively capture different levels of interesting factors, improve the scoring effect, and also alleviate the problem of data sparsity [<xref ref-type="bibr" rid="ref-8">8</xref>].</p>
<p>As one of the most successful strategies in recommendation algorithms, collaborative filtering recommendation has a wide range of applications, such as Grouplens, Ringo, Tapestry and other commercial recommender systems. Collaborative filtering is traditionally divided into two categories: one is memory-based, which uses the entire user browsed and purchased product database to generate prediction results; the other is model-based, which builds a hierarchy model of user preferences before product recommendations. Gong et al. proposed to improve the structural similarity and numerical similarity respectively, and combined the two to obtain a user similarity calculation method that takes into account both structure and numerical value [<xref ref-type="bibr" rid="ref-9">9</xref>]. Zhang proposed a collaborative filtering recommendation algorithm based on user-item mixture model, which improves data sparsity by introducing user interest factors and item semantics [<xref ref-type="bibr" rid="ref-10">10</xref>]. Sun et al. used a pre-filling algorithm based on sentiment analysis to fill the sparse rating matrix to obtain a dense matrix [<xref ref-type="bibr" rid="ref-11">11</xref>].</p>
<p>There are also some data preprocessing strategies that lead to improved performance of recommendation algorithms on sparse data. For example, in [<xref ref-type="bibr" rid="ref-12">12</xref>], the user&#x2019;s interests are expressed as some topics through shallow semantic analysis, and a full probability formula is used to predict the topics of interest to the user. Mao et al. proposed a collaborative filtering algorithm based on Sigmoid function, which can effectively alleviate the problem of data sparseness and improve recommendation quality [<xref ref-type="bibr" rid="ref-13">13</xref>]. Poirson et al. proposed a method based on emotional evaluation. However, in practical applications, this strategy inevitably encounters difficulties in emotion perception and duration [<xref ref-type="bibr" rid="ref-14">14</xref>]. Ajoudanian et al. proposed a new fuzzy C-means clustering method. This method solves the sparsity problem by using the sparsest subgraph detection algorithm to define the initial center of the clustering method [<xref ref-type="bibr" rid="ref-15">15</xref>]. Although the above three methods can improve a certain recommendation effect, most of the data sources come from ratings. From the perspective of the development of recommendation systems, a single rating data source can mine limited user interests and cannot intuitively express user interests.</p>
<p>After the emergence of Transformer, many models use the composition principle of Transformer or the self-attention mechanism to build new models to complete recommendations based on temporal factors. As a method based on attention mechanism, SASRec takes into account both Markov chain and RNN-based methods. This model can capture long-term semantics while also targeting fewer actions using an attention mechanism [<xref ref-type="bibr" rid="ref-16">16</xref>]. DIEN designs an interest extraction layer to capture temporal interests from historical behavior sequences. In the evolutionary layer of interest, the attention mechanism is innovatively embedded into the sequential structure [<xref ref-type="bibr" rid="ref-17">17</xref>]. The BST model uses the Transformer model to capture the associated characteristics of each item in the user&#x2019;s historical sequence. And by adding the items to be recommended, the correlation with the items in the behavior sequence can be extracted [<xref ref-type="bibr" rid="ref-18">18</xref>]. RNN and its extension method GRU can model causal models in user sequences using nonlinear transitions between consecutive hidden states. Recommendation methods based on Transformer have many advantages. It can learn from variable-length inputs, learn from long-term dependencies, stimulate the vitality of sparse data, and compress hidden states. The shortcomings of this method are: complex structure and configuration, high hardware requirements, and lack of interpretability.</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Text Representation Learning</title>
<p>In recent years, deep neural networks have become the main technology for user interest representation learning. Among the many deep structured semantic models (DSSM) [<xref ref-type="bibr" rid="ref-19">19</xref>], deep or neural factorization machines (DeepFM/NFM) [<xref ref-type="bibr" rid="ref-20">20</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>] have become some representative works based on supervised representation learning.</p>
<p>Currently, in the field of natural language processing, a large amount of work has been focused on the direction of unsupervised models of sentence or paragraph vectors. The paragraph vector DBOW model is an unsupervised algorithm that learns fixed-length factor representations from variable-length text fragments [<xref ref-type="bibr" rid="ref-22">22</xref>]. Hill et al. proposed two new phrase or sentence representation learning goals: Sequential Denoising Autoencoding (SDAE) and FastSent, which is a sentence-level linear bag-of-words model [<xref ref-type="bibr" rid="ref-23">23</xref>]. A sentence embedding uses a latent variable generation model to provide a theoretical explanation of sentences in an unsupervised approach that can defeat complex supervised methods including RNN and LSTM [<xref ref-type="bibr" rid="ref-24">24</xref>]. These excellent models are independent and unordered based on single sentences. But in the actual context, there are many different forms of text expression, so all the sentences in the paragraph are not unrelated. Therefore, paragraph vectorization needs to take into account the order of sentences.</p>
<p>The emergence of attention mechanism research [<xref ref-type="bibr" rid="ref-25">25</xref>] simplifies the above problems. The attention mechanism is a technology that allows the model to focus on important information and fully learn and absorb it. It is not a complete model, but should be a technology that can be used in any sequence model. And another paper proposed by Google takes the idea of attention to the extreme. This paper proposes a brand-new model-Transformer [<xref ref-type="bibr" rid="ref-26">26</xref>], which abandons the CNN and RNN used in previous deep learning tasks. BERT [<xref ref-type="bibr" rid="ref-27">27</xref>] is built based on Transformer. This model is widely used in the NLP field, for example: Machine translation, question answering systems, text summarization and speech recognition, etc. The main innovations of the model are in the pre-training method, which uses two methods: occlusion language model and next sentence prediction to capture word-and sentence-level vector representations, respectively. It is this pre-trained language model that opens a new chapter in natural language processing.</p>
<p>In many natural language processing scenarios, there are relatively few supervised data, and the introduction of larger-scale unsupervised data can improve the effect. This is the main reason why BERT is widely popular in the field of natural language processing. In addition, language itself is normative, and this norm has great universality for different natural language processing tasks. Therefore, regular migration can be performed through BERT. However, in the recommendation field, there is a large amount of supervision data. The recommended users themselves do not have strong regularity, and they change rapidly. The rules are not universal and difficult to migrate. Moreover, BERT needs to make use of large-scale data to fully learn various knowledge such as semantics in the text through pre-training, and then use it for downstream tasks. Therefore, while BERT can bring better results for text-dependent recommendation scenarios, such as news recommendation, BERT is difficult to implement on low computing power devices. Moreover, there is a problem that the training process requires a large amount of unsupervised text data, which has low interpretability, and the model compression process leads to a performance loss of the language model on the inference task [<xref ref-type="bibr" rid="ref-28">28</xref>]. Therefore, this article does not directly use the BERT model, but improves it from the bottom layer of the transformer, making the newly obtained model more suitable for recommended data sources.</p>
</sec>
<sec id="s2_3">
<label>2.3</label>
<title>User Representation Learning</title>
<p>Due to the problem of data sparsity, a single text representation cannot fully represent user portraits. Industry experts have sought factors that affect user representation from many aspects and have proposed a variety of user representation learning methods.</p>
<p>TERACON introduces an embedding for each task, which is utilized to generate task-specific soft masks that not only allow the entire model parameters to be updated until the end of training sequence, but also facilitate the relationship between the tasks to be captured [<xref ref-type="bibr" rid="ref-29">29</xref>]. In DUVRec, a user preference is learned based on the representations of two distinct views, i.e., item view and factor view. Specifically, the item-view user representation is learned as the previous sequential recommendation, while the factor-view user representation is learned by a coarse-grained graph embedding method [<xref ref-type="bibr" rid="ref-30">30</xref>]. RobustSR with social regularization and multi-view contrastive learning, which aim to enhance the model awareness of relation informativeness and the discriminativeness of user representations [<xref ref-type="bibr" rid="ref-31">31</xref>]. RecGURU [<xref ref-type="bibr" rid="ref-32">32</xref>], JNET [<xref ref-type="bibr" rid="ref-33">33</xref>], LDBR [<xref ref-type="bibr" rid="ref-34">34</xref>] are learned models which can solve practical problems from the perspective of user representation and have achieved good experimental results. Therefore, this article is also inspired by the above model, and based on the extraction of original text factors, adds effective factor data and learns user representations.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>The Proposed Model</title>
<p>We now present our item recommendation framework as follow figure. In <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, the core of this framework is user representation, and this part is mainly composed of semantic layer which include ratings factors and text representation extracted from review text, and user sequential representation. The representations of these two layers were integrated into user interest representations.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Illustration of the proposed dual-perspective embedding user representation learner (DLUR) for the explainable item recommendation</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_51046-fig-1.tif"/>
</fig>
<p>In terms of data collection for user interest representation, in addition to user comment texts, user latent factors extracted from ratings are also added. The data sources are more abundant, and the rating data is larger than the review data, which can make up for the lack of review data. Moreover, this method can also directly use the user&#x2019;s latent factors mined from the rating data as the user&#x2019;s long-term interests when the comment data was missing.</p>
<sec id="s3_1">
<label>3.1</label>
<title>User Latent Factor</title>
<p>The representation of user&#x2019;s long-term latent factors adopts the LFM model. The rating matrix <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is expressed as the ratings of <italic>n</italic> items by <italic>m</italic> users, which is a quite sparse matrix. At the same time, <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents the rating of item <italic>j</italic> by user <italic>i</italic>. In LFM, <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> can be expressed as the product of two matrices. One is <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mi>F</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> that each row of <italic>P</italic> represents the user&#x2019;s interest in each latent factor, and <italic>F</italic> represents the number of latent factors. The other matrix is <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>F</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and each column represents the distribution of items on each latent factor. The following is the scoring formula for LFM:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:msub><mml:mrow><mml:mover><mml:mi>r</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>F</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>In order to prevent overfitting, a regular term is added to the objective function after control:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mi>L</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2260;</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:munder><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>r</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x2211;</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mo>&#x2211;</mml:mo><mml:msubsup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>P</mml:mi><mml:mo>,</mml:mo><mml:mi>Q</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The decomposed <italic>P</italic> and <italic>Q</italic> are the user latent factors and item latent factors required in the model structure diagram. In <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, the lower part of the semantic layer of the blue dotted box represents the user latent factor.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Text Latent Factor Extraction</title>
<p>In addition to ratings that can characterize users or items, user reviews are also a source of data that can intuitively express user interests. The entire text factor extraction process is shown in the figure below:</p>
<p>In the <xref ref-type="fig" rid="fig-2">Fig. 2</xref>, the gray part is the prototype of Transformer. An interactive attention layer is added to the text factor extraction process. The technical idea is that through the interaction between users and item reviews, it is possible to further identify which words in the review sentences are key words in the user&#x2019;s personality expression. Integrating interactive attention and self-attention can lead to a more focused vector representation.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Word-level text factor extraction process</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_51046-fig-2.tif"/>
</fig>
<p>For word vectorization, the Sent2vec [<xref ref-type="bibr" rid="ref-35">35</xref>] unsupervised learning method is selected to create word vectors based on contextual information. The objective function is as follows:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mrow><mml:munder><mml:mrow><mml:mi>min</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi><mml:mo>,</mml:mo><mml:mi>V</mml:mi></mml:mrow></mml:munder><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>S</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>C</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x2113;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>u</mml:mi><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow><mml:mtext>T</mml:mtext></mml:msubsup><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mo>&#x005C;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:munder><mml:mi>&#x2113;</mml:mi></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup></mml:mrow><mml:mtext>T</mml:mtext></mml:msubsup><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mo>&#x005C;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mstyle></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula></p>
<p><inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents the <italic>t</italic>-th word in sentence <italic>S</italic>, <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mi>S</mml:mi></mml:math></inline-formula>. <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> represents the negative sampling of the <italic>t</italic>-th word. <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents the target word vector. <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents the source word vector. Thus, the review documents of user <italic>x</italic> and item <italic>y</italic> are transformed into vector matrices: <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:msubsup><mml:mi>H</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msubsup><mml:mo>}</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msubsup><mml:mi>H</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msubsup><mml:mi>H</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msubsup><mml:mo>}</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msubsup><mml:mi>H</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, where <italic>n</italic> and <italic>m</italic> represent the review lengths of user <italic>x</italic> and item <italic>y</italic>, respectively. Then a self-attention mechanism is further applied to words to capture long-distance dependencies in comments. It can be calculated as follows:
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mi>A</mml:mi><mml:mi>T</mml:mi><mml:mi>T</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>H</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>H</mml:mi><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>H</mml:mi><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:msqrt><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:msqrt></mml:mfrac><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>H</mml:mi><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>V</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Among them, <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>V</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> are all learning parameters, and <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the dimension size. In addition, the self-attention mechanism in Transformer is implemented in parallel using <italic>g</italic>-heads, where each head calculates attention according to <xref ref-type="disp-formula" rid="eqn-4">formula (4)</xref>. The output of multi-head attention is the concatenation of <italic>g</italic> heads, followed by a linear mapping:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mi>M</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>H</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>H</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mi>W</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msup></mml:math></disp-formula>
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>A</mml:mi><mml:mi>T</mml:mi><mml:mi>T</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>Q</mml:mi><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mi>K</mml:mi><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mi>V</mml:mi><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>V</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Among them, <italic>f</italic> is the join operation. <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>V</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> are the corresponding query, key, and value weight matrices under each header, which are all learnable parameter matrices. Therefore, the user review vector matrix can be expressed as <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msubsup><mml:mi>E</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>T</mml:mi><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>. The item review vector matrix <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msubsup><mml:mi>E</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>T</mml:mi><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>V</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> can be obtained using the same process.</p>
<p>After the user review vector has been obtained, it is necessary to interact with the item review data to further highlight the influence of words. Inspired by [<xref ref-type="bibr" rid="ref-36">36</xref>,<xref ref-type="bibr" rid="ref-37">37</xref>], we use an attentive matrix <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msup><mml:mi>A</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msup><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> to derive a vector containing the importance of each word for both <italic>U</italic> and <italic>V</italic>. Specifically, the matrices <italic>U</italic> and <italic>V</italic> are mapped to the same latent space, and the correlation of each user-item pair is calculated as follows:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:msubsup><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>h</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:msup><mml:mi>A</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msup><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>In the formula, <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> represents the factor vector of the <italic>i</italic>-th word in the review document of user <italic>x</italic>, <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>E</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>T</mml:mi><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>. <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> represents the factor vector of the <italic>j</italic>-th word in the review document of item <italic>y</italic>. <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:msubsup><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> represents the correlation between <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, where row <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:msubsup><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mo>&#x2217;</mml:mo></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> contains the correlation between all word factor vectors in <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. Similarly, column <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:msubsup><mml:mi>F</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> contains the correlation between the factor vectors of all words in <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. The mean pooling operation of row <italic>F</italic> and column <italic>F</italic> is as follows:
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:msubsup><mml:mrow><mml:mtext>g</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>m</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>m</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>According to the above correlation formula, the importance of the eigenvectors in <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is highlighted.
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:msubsup><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>g</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>g</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p><inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:msubsup><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> represents <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula> attention weight at word granularity.
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msubsup></mml:math></disp-formula></p>
<p>The resulting vector matrix is <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:msubsup><mml:mi>E</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>. Then use the residual network for normalization:
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:msubsup><mml:mi>E</mml:mi><mml:mrow><mml:mi>w</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>y</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>N</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>E</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>T</mml:mi><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>E</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The result is a vector representation of user interests through the comment text. However, all resulting vectors are word-level. User interest representation requires the overall characteristics of the user, so it is necessary to integrate factor vectors with sentence semantics based on the factor vectors of words.
<disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>R</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>w</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>S</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:munder><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>Among them, <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>E</mml:mi><mml:mrow><mml:mi>w</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>. After obtaining the factor vector of the sentence, we can now consider the factor vector of the paragraph as a whole. The entire process is the word-level text factor extraction process in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. Now the input that needs to be replaced is replaced by a sentence-level factor vector. After going through the entire process from <xref ref-type="disp-formula" rid="eqn-4">formulas (4)</xref> to <xref ref-type="disp-formula" rid="eqn-11">(11)</xref>, the result is the sentence factor vector <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:msubsup><mml:mi>E</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>.
<disp-formula id="eqn-13"><label>(13)</label><mml:math id="mml-eqn-13" display="block"><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>R</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>p</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>w</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:munder><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>Among them, <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>E</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mi>U</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>. The matrix composed of <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the final paragraph-level factor matrix <italic>E</italic> obtained from the text. The rows in the matrix represent the text factors of each user.</p>
<p>According to <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, in the right of the semantic layer, the user interest vector <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:msup><mml:mi>C</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> representation should be the integration of the latent vector <italic>P</italic> and the text vector <italic>E</italic>.
<disp-formula id="eqn-14"><label>(14)</label><mml:math id="mml-eqn-14" display="block"><mml:msup><mml:mi>C</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mo>&#x2295;</mml:mo><mml:mi>E</mml:mi></mml:math></disp-formula></p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>User Sequential Factor</title>
<p>Inspired by BST [<xref ref-type="bibr" rid="ref-18">18</xref>], user sequential factors are represented by factors extracted from item reviews using Transformer. The factor extraction process is shown in the <xref ref-type="fig" rid="fig-3">Fig. 3</xref>.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Sequential factor extraction process</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_51046-fig-3.tif"/>
</fig>
<p>In the <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, the Embedding Layer in the figure is mainly responsible for the conversion of item factor vectors and position factor vectors. The item factor extraction is shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. It adopts an extraction process similar to the user interest vector <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:msup><mml:mi>C</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and integrates the text factors of the item and the latent factors of the item decomposed by LFM to obtain the item embedding Z. Positional embedding compares the value method of positional factors in BST. However, since the rating sequence is different from the click sequence, the interval time is uncertain. Compared with the click sequence in the session, the time interval will be larger and the reference is not great. Therefore, the one-hot hard coding method is directly used.</p>
<p>Scaled dot-product attention in Transformer is defined as follows:
<disp-formula id="eqn-15"><label>(15)</label><mml:math id="mml-eqn-15" display="block"><mml:mi>A</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>Q</mml:mi><mml:mo>,</mml:mo><mml:mi>K</mml:mi><mml:mo>,</mml:mo><mml:mi>V</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:mi>Q</mml:mi><mml:msup><mml:mi>K</mml:mi><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:msqrt><mml:mi>d</mml:mi></mml:msqrt></mml:mfrac><mml:mo>)</mml:mo></mml:mrow><mml:mi>V</mml:mi></mml:math></disp-formula>where <italic>Q</italic> represents the queries, <italic>K</italic> the keys and <italic>V</italic> the values. In our scenario, item embedding is taken as input, and they are converted into three matrices through linear projection and fed into the attention layer.
<disp-formula id="eqn-16"><label>(16)</label><mml:math id="mml-eqn-16" display="block"><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mi>M</mml:mi><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>C</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mi>W</mml:mi><mml:mrow><mml:mi>H</mml:mi></mml:mrow></mml:msup></mml:math></disp-formula>
<disp-formula id="eqn-17"><label>(17)</label><mml:math id="mml-eqn-17" display="block"><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>A</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>E</mml:mi><mml:msup><mml:mi>W</mml:mi><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mi>E</mml:mi><mml:msup><mml:mi>W</mml:mi><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mi>E</mml:mi><mml:msup><mml:mi>W</mml:mi><mml:mrow><mml:mi>V</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where the projection matrices, <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:msup><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>V</mml:mi></mml:mrow></mml:msup><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:msup><mml:mi>C</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the user&#x2019;s sequential factor matrix output after passing through the Transformer layer.</p>
<p>Based on the obtained user interest factors and user sequential factors, a relatively complete user portrait factor <italic>C</italic> can be obtained.
<disp-formula id="eqn-18"><label>(18)</label><mml:math id="mml-eqn-18" display="block"><mml:mi>C</mml:mi><mml:mo>=</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>C</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi>C</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Application of User Representation in Recommendation</title>
<p>The design of the recommendation method is based on the idea of user representation. Among the available data resources, the user&#x2019;s latent features are decomposed using ratings as part of the long-term interests, and are integrated with the long-term user interest features extracted from the text to form a feature that can richly express the user&#x2019;s consistent interests. Interest changes extracted through time series can also express the user&#x2019;s current status. Combining the two can fully express user information. Given a user <italic>u</italic>, the goal of the paper is to construct a user interest representation based on the user&#x2019;s multi-dimensional factor, and then compare it with candidate items to recommend items with high similarity. At the same time, the recommendation structure will provide a sentence-level explanation mechanism.</p>
<p>After obtaining the user representation <italic>C</italic>, the features of the items are extracted in the same way. As can be seen from <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, the extraction process of item feature Z is consistent with the extraction method of user long-term representation. <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:mi>Z</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>T</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:mi>Q</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> integrates the feature <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>T</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> in the item review data and the latent feature <italic>Q</italic> of the item decomposed from the rating matrix. From this idea, the following scores can be calculated:
<disp-formula id="eqn-19"><label>(19)</label><mml:math id="mml-eqn-19" display="block"><mml:mi>s</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x03C6;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The pairwise learning method was selected to train the model. All user-selected items with ratings and reviews are used as positive samples. Randomly select the next item from other sessions in the same batch as a negative sample. These positive and negative samples are used to train the entire neural network. The BPR loss function in the pairwise method applied to the personalized recommendation system is adopted:
<disp-formula id="eqn-20"><label>(20)</label><mml:math id="mml-eqn-20" display="block"><mml:mi>L</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:mo>&#x22C5;</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Among them, <italic>N</italic> is the number of negative sampling samples. <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the positive sampling sample score. <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the negative sampling sample score. <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:mi>&#x03C3;</mml:mi></mml:math></inline-formula> is the sigmoid function. <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:mi>&#x03BB;</mml:mi></mml:math></inline-formula> is the <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:msub><mml:mi>l</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> regularization hyperparameter. <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:mi>&#x03B8;</mml:mi></mml:math></inline-formula> represents the parameters of the model.</p>
<p>Depending on the <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mi>s</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, a list of items can be recommended to the user <italic>x</italic>. This list is also the result of the application of user interest representation in recommendations. The recommendation results can include not only the user&#x2019;s textual semantic features, but also the latent features in the ratings, and also include the user&#x2019;s temporary change features. The accuracy of the recommended item list is relatively high.</p>
<p>The method based on user representation proposed in this article can also solve the problem of recommendation explanation. Many interpretations of object-based features or aspects are based on words or phrases, but this method is prone to cause semantic ambiguity or incomplete expression. If all reviews of recommended items are used as an explanation, there will be redundancy. After all, there are many sentences in the reviews of items, but users may only pay attention to part of them. Therefore, review sentences that can be used for explanation become the key to setting up the explanation mechanism. In <xref ref-type="sec" rid="s3_2">Section 3.2</xref>, the model uses the nature of the interaction between the user and the item and the attention mechanism to successfully find the sentence with high attention among the many reviews of the user on an item, so the sentence can be used as a recommended explanation.</p>
<p>The sentence-level comment feature vector <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:msubsup><mml:mi>v</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> of user <italic>x</italic> has been obtained by <xref ref-type="disp-formula" rid="eqn-12">formula (12)</xref>, and the sentence-level comment feature vector <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:msubsup><mml:mi>v</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> of item <italic>y</italic> can also be obtained by <xref ref-type="disp-formula" rid="eqn-12">formula (12)</xref>. According to the process of extracting text features in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>, after going through the process from <xref ref-type="disp-formula" rid="eqn-4">formulas (4)</xref> to <xref ref-type="disp-formula" rid="eqn-8">(8)</xref>, we can get:
<disp-formula id="eqn-21"><label>(21)</label><mml:math id="mml-eqn-21" display="block"><mml:msubsup><mml:mi>a</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>g</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:munderover><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>g</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi><mml:mrow><mml:mtext mathvariant="bold">s</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p><inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:msubsup><mml:mi>a</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is the weight of the item in the <italic>j</italic>-th sentence at the sentence level. Moreover, this weight is obtained after the user interacts with the item. The higher the value, the greater the influence of the sentence on the user. From this, sentences with higher weights can be selected as recommended explanations.</p>
</sec>
<sec id="s5">
<label>5</label>
<title>Experiments</title>
<p>In the experimental part, multiple experiments were designed to verify the overall performance of the model and the technical advantages of each part. First, three recommendation indicators of the recommendation system are used to compare with the baseline to verify the recommendation effect that the characterization model can achieve. Then the effectiveness of each part of the features that make up the user representation is verified separately to demonstrate the advantages of the model. Finally, the visualization of the text weight and the selected high-weight sentences are used to generate recommended explanation sentences.</p>
<sec id="s5_1">
<label>5.1</label>
<title>Experimental Setup</title>
<p>The experimental part uses four popular data sets from Amazon and the Yelp dataset for experiments. The four data sets are: &#x201C;Cell Phones and Accessories&#x201D;, &#x201C;Clothing Shoes and Jewelry&#x201D;, &#x201C;Electronics&#x201D; and &#x201C;Toys and Games&#x201D;. Each data set contains &#x201C;user ID&#x201D;, &#x201C;product ID&#x201D;, &#x201C;rating&#x201D;, and &#x201C;review text&#x201D;. Meanwhile, we chose reviews from Yelp in 2019. The experiment selected items that contained reviews. Then the user&#x2019;s interest factor is extracted from the review.The basic statistics of the datasets are shown in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Data set feature statistics</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Datasets</th>
<th>#Users</th>
<th>#Products</th>
<th>#Review</th>
<th>Avg review length</th>
<th>Density</th>
</tr>
</thead>
<tbody>
<tr>
<td>Phones</td>
<td>3216</td>
<td>9018</td>
<td>47139</td>
<td>135.39</td>
<td>0.1625%</td>
</tr>
<tr>
<td>Clothing</td>
<td>5200</td>
<td>20424</td>
<td>72142</td>
<td>35.88</td>
<td>0.0679%</td>
</tr>
<tr>
<td>Electronics</td>
<td>45225</td>
<td>61918</td>
<td>773502</td>
<td>160.83</td>
<td>0.0276%</td>
</tr>
<tr>
<td>Toys</td>
<td>4188</td>
<td>11526</td>
<td>74423</td>
<td>103.09</td>
<td>0.1541%</td>
</tr>
<tr>
<td>Yelp_2019</td>
<td>545241</td>
<td>128228</td>
<td>1215836</td>
<td>97.53</td>
<td>0.0017%</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Data preprocessing for the data set:
<list list-type="simple">
<list-item><label>(1)</label><p>The text is divided into different documents based on userID and itemID. Each user&#x2019;s review of an item acts as a paragraph in the document.</p></list-item>
<list-item><label>(2)</label><p>Each paragraph is divided by punctuation marks, with one sentence per line.</p></list-item>
<list-item><label>(3)</label><p>All letters in each sentence are converted to lowercase letters.</p></list-item>
<list-item><label>(4)</label><p>Use Natural Language Toolkit (NLTK) to complete word segmentation. In addition, the data set is filtered so that each user has at least 10 or more item options, regardless of whether there is comment data, and the rest are deleted.</p></list-item>
</list></p>
<p>Divide each data set into three groups: training set, validation set, and test set. For each data set, the last item record selected by the user is retained as the test set, the penultimate selected record is used as the validation set, and the rest is the training set. The experiment uses the training set to train the model, the validation set to adjust parameters, and finally the optimal parameter settings are applied to the test set to achieve the final recommendation result.</p>
<p>The hyperparameters of the comprehensive recommendation method are adjusted on the validation set. Set the number of heads h in the multi-head attention mechanism in the word-level and sentence-level text feature extraction process to 4. The entire text feature extraction includes the number of Transformer layers set to 6. Dimension size is 512 (adjusted in [128, 256, 512, 1024]). The dimension of the feedforward network is 2048. The dimensions of the word vector and the dimensions set by userID and itemID are all 300 (adjusted in [200, 300, 400]). To avoid transition fitting loss rate is set to 0.3 (adjusted in [0.1, 0.3, 0.5, 0.7]). Set the batch size to 400. The number of negative samples used is 5 for each positive sample. All parameters in the baseline model were adjusted with reference to the setting strategy in the original paper to adjust the hyperparameters in all methods.</p>
<p>The evaluation of experiments adopts common recommended standards, including HR (Hit Ratio), MRR (Mean Reciprocal Rank) and NDCG (Normalized Discounted Cumulative Gain). And generate a Top-10 item recommendation list for each user to observe the performance of the recommendation method.
<list list-type="bullet">
<list-item>
<p>HR can be used to determine whether the correct items are included in the final recommended Top-20.</p></list-item>
</list>
<disp-formula id="eqn-22"><label>(22)</label><mml:math id="mml-eqn-22" display="block"><mml:mi>H</mml:mi><mml:mi>R</mml:mi><mml:mrow><mml:mo>@</mml:mo></mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>N</mml:mi><mml:mi>u</mml:mi><mml:mi>m</mml:mi><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>O</mml:mi><mml:mi>f</mml:mi><mml:mi>H</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>s</mml:mi><mml:mrow><mml:mo>@</mml:mo></mml:mrow><mml:mi>K</mml:mi></mml:mrow><mml:mrow><mml:mi>G</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>Among them, the denominator is all test sets, and the numerator is the number of test sets in the Top-k list.
<list list-type="bullet">
<list-item>
<p>MRR is the average reciprocal ranking of desired items. This evaluation metric focuses on whether recommended items are placed in a higher position.</p></list-item>
</list>
<disp-formula id="eqn-23"><label>(23)</label><mml:math id="mml-eqn-23" display="block"><mml:mi>M</mml:mi><mml:mi>R</mml:mi><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>Q</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>Q</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow></mml:munderover><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:msub><mml:mi>k</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>Among them, <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:msub><mml:mi>k</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the ranking of the <italic>i</italic>th recommended item in the recommendation list.
<list list-type="bullet">
<list-item>
<p>NDCG is widely used to measure sorting accuracy. If the item selected by the user is ranked higher in the recommendation list, the score is higher. What is used here is the average value of all users NDCG.</p></list-item>
</list>
<disp-formula id="eqn-24"><label>(24)</label><mml:math id="mml-eqn-24" display="block"><mml:mi>N</mml:mi><mml:mi>D</mml:mi><mml:mi>C</mml:mi><mml:mi>G</mml:mi><mml:mrow><mml:mo>@</mml:mo></mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>U</mml:mi></mml:mrow></mml:munder><mml:mi>N</mml:mi><mml:mi>D</mml:mi><mml:mi>C</mml:mi><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>@</mml:mo></mml:mrow><mml:mi>K</mml:mi></mml:mrow><mml:mrow><mml:mi>I</mml:mi><mml:mi>D</mml:mi><mml:mi>C</mml:mi><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>u</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
</sec>
<sec id="s5_2">
<label>5.2</label>
<title>Model Comparisons</title>
<p>To verify our DLUR&#x2019;s advantage, we evaluated DLUR&#x2019;s performance through the comparisons with the following baseline models and the ablated variants of our model.</p>
<p>The following baselines are the representative item models. Each model is similar to DLUR in terms of recommendation ideas or feature extraction ideas, but the technical routes are different, which better reflects the advantages of DLUR.
<list list-type="bullet">
<list-item>
<p>DeepCoNN model [<xref ref-type="bibr" rid="ref-38">38</xref>]. This model simultaneously utilizes the semantic information in user and item reviews to construct their respective features.</p></list-item>
<list-item>
<p>APSE [<xref ref-type="bibr" rid="ref-39">39</xref>] is a rating model that extracts user and item features by using reviews, and combines existing rating features to predict the ratings of unrated items. This method also uses scoring and attention mechanisms to extract user interest features. Compared with the recommendation method in this article, user features are extracted from text.</p></list-item>
<list-item>
<p>PRSL is the result of previous research. The overall architecture of this method is similar to the recommendation method in this article. They both extract user interest features based on historical records and dynamic short-term changes.</p></list-item>
<list-item>
<p>GRU4REC is a GRU-based serialization prediction model [<xref ref-type="bibr" rid="ref-40">40</xref>]. This model uses data sequences to extract sequence features of users within a short period of time. It is consistent with the idea of extracting some user interest features in the recommendation method of this article, which are all derived from sequence features.</p></list-item>
<list-item>
<p>The AttRec model also make uses of the user&#x2019;s short-term and long-term interest characteristics to build a recommendation model [<xref ref-type="bibr" rid="ref-41">41</xref>]. The self-attention structure is used in short-term interest feature extraction, while long-term interest feature extraction comes from rating data. The idea of this model is similar to the recommendation framework of this article, but the technology used and data sources are different.</p></list-item>
<list-item>
<p>SSG design a three-way encoder architecture that jointly captures long-term (set), short-term (sequence), and collaborative (graph) features of users and items for recommendation [<xref ref-type="bibr" rid="ref-42">42</xref>]. The common point between the model and the DLUR model is that they both use review and sequence features, but their implementation methods are different, and the methods of feature fusion are also different.The performance comparison of each model is shown in <xref ref-type="table" rid="table-2">Table 2</xref>.</p></list-item>
</list></p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Performance comparisons of all models on the five datasets</title>
</caption>
<table frame="hsides">
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead valign="top">
<tr>
<th>Dataset</th>
<th>Metric</th>
<th>DeepCoNN</th>
<th>ASPE</th>
<th>GRU4REC</th>
<th>AttRec</th>
<th>PRSL</th>
<th>SSG</th>
<th>DLUR</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="3">Phones</td>
<td>HR</td>
<td>0.1197</td>
<td>0.1612</td>
<td>0.1813</td>
<td>0.2037</td>
<td>0.2128</td>
<td>0.2232</td>
<td>0.2253</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0132</td>
<td>0.0301</td>
<td>0.0428</td>
<td>0.0435</td>
<td>0.0481</td>
<td>0.0484</td>
<td>0.0492</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.0447</td>
<td>0.0687</td>
<td>0.0711</td>
<td>0.0933</td>
<td>0.1032</td>
<td>0.1089</td>
<td>0.1097</td>
</tr>
<tr>
<td rowspan="3">Clothing</td>
<td>HR</td>
<td>0.0312</td>
<td>0.0494</td>
<td>0.0583</td>
<td>0.0712</td>
<td>0.0789</td>
<td>0.0805</td>
<td>0.0821</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0095</td>
<td>0.0123</td>
<td>0.0131</td>
<td>0.0139</td>
<td>0.0191</td>
<td>0.0218</td>
<td>0.0221</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.0164</td>
<td>0.0212</td>
<td>0.0315</td>
<td>0.0428</td>
<td>0.0536</td>
<td>0.0603</td>
<td>0.0613</td>
</tr>
<tr>
<td rowspan="3">Electronics</td>
<td>HR</td>
<td>0.0573</td>
<td>0.0817</td>
<td>0.0906</td>
<td>0.1036</td>
<td>0.1247</td>
<td>0.1402</td>
<td>0.1372</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0102</td>
<td>0.0194</td>
<td>0.0204</td>
<td>0.0215</td>
<td>0.0308</td>
<td>0.0420</td>
<td>0.0413</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.0287</td>
<td>0.0402</td>
<td>0.0490</td>
<td>0.0638</td>
<td>0.0715</td>
<td>0.0810</td>
<td>0.0805</td>
</tr>
<tr>
<td rowspan="3">Toys</td>
<td>HR</td>
<td>0.1284</td>
<td>0.1591</td>
<td>0.1703</td>
<td>0.2014</td>
<td>0.2217</td>
<td>0.2219</td>
<td>0.2311</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0178</td>
<td>0.0289</td>
<td>0.0322</td>
<td>0.0397</td>
<td>0.0427</td>
<td>0.0496</td>
<td>0.0510</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.0463</td>
<td>0.0692</td>
<td>0.0785</td>
<td>0.0951</td>
<td>0.1012</td>
<td>0.1079</td>
<td>0.1098</td>
</tr>
<tr>
<td rowspan="3">Yelp_2019</td>
<td>HR</td>
<td>0.0267</td>
<td>0.0293</td>
<td>0.0311</td>
<td>0.0320</td>
<td>0.0342</td>
<td>0.0361</td>
<td>0.0354</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0082</td>
<td>0.0091</td>
<td>0.0097</td>
<td>0.0114</td>
<td>0.0121</td>
<td>0.0205</td>
<td>0.0190</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.0197</td>
<td>0.0211</td>
<td>0.0214</td>
<td>0.0219</td>
<td>0.0227</td>
<td>0.0301</td>
<td>0.0295</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Among the selected comparison models, there are score prediction models DeepCoNN and APSE. In the comparative experiment, items with high predicted scores are used as recommended items, and then compared according to HR, MRR and NDCG standards. GRU4REC is a recommendation model that uses temporal features. Compared with DLUR, it only considers changes in user interests in a short period of time. Current research shows that changes in user interests in a short period of time have a greater impact on the user&#x2019;s next item selection. At the same time, you can see that the end user&#x2019;s choice of items is still affected by the interest in historical record extraction. Therefore, from the perspective of comparative performance, the method in this article is still relatively good. Compared with three models, AttRec only uses rating data to extract long-term and short-term user interest features, PRSL only uses review data to extract user interest features, DLUR integrates latent features in ratings and reviews as well as semantic extraction features to make user portraits completer and more recommended. Our results are better than them. Compared with SSG, the recommendation system based on DLUR performs slightly lower on the datasets Yelp_2019 and Electronics. The main reason is that these two data sets have a large number of users and items, and there is a high proportion of interactions, but the proportion of reviews is low. Therefore, the interaction reflected by SSG&#x2019;s graph is better than the features extracted by review. Moreover, the three selected recommendation indicators are all used to measure the accuracy of recommendation ranking. In practical applications, DLUR can not only show that the recommendation list has high accuracy, but also shows a good advantage in ranking.</p>
</sec>
<sec id="s5_3">
<label>5.3</label>
<title>Ablation Experiment</title>
<p>Beside above item recommender baselines, we further compared following ablated variants:
<list list-type="bullet">
<list-item>
<p>DLUR-factor: It only has the rating-view module. In other words, <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:msup><mml:mi>C</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is directly used as final user representation <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:mi>C</mml:mi></mml:math></inline-formula> to compute <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:mi>s</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> by <xref ref-type="disp-formula" rid="eqn-19">formula (19)</xref>.</p></list-item>
<list-item>
<p>DLUR-sr: It only has the item-view module. In other words, <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:msup><mml:mi>C</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is directly used as final user representation <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:mi>C</mml:mi></mml:math></inline-formula> to compute <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:mi>s</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> by <xref ref-type="disp-formula" rid="eqn-19">formula (19)</xref>.</p></list-item>
<list-item>
<p>PRL: The long-term interest expression part of PRSL is PRL, which uses user-item pair interaction to extract text features.</p></list-item>
<list-item>
<p>PRS: The short-term interest representation part of PRSL is PRS, which uses GRU to extract short-term user interest features.</p></list-item>
<list-item>
<p>DLUR-lfm: This model removes the LFM used for rating from DLUR. It uses reviews and sequence features to make recommendations.</p></list-item>
<list-item>
<p>DLUR-re: This model removes the reviews factors from DLUR. It uses rating and sequence features to make recommendation.DLUR-att: This model removes the attention layer from DLUR. It uses word2vec to vector reviews.</p></list-item>
</list></p>
<p>It can be seen from <xref ref-type="disp-formula" rid="eqn-14">formula (14)</xref> that part of the model fuses the latent vector P and the text vector E as user interest features. In the baseline, DeepCoNN and APSE are also recommendation systems completed using such a technical route. The experiments in this section will compare part of the features in the model with other similar technical routes, including the traditional word2vec encoding method and PRL, to demonstrate DLUR&#x2019;s technological improvements.The specific comparison results are shown in <xref ref-type="table" rid="table-3">Table 3</xref>.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Long-term interest feature performance comparison</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th>Dataset</th>
<th>Metric</th>
<th>word2vec</th>
<th>PRL</th>
<th>DLUR-factor</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="3">Phones</td>
<td>HR</td>
<td>0.1643</td>
<td>0.1865</td>
<td>0.1937</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0205</td>
<td>0.0328</td>
<td>0.0415</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.0511</td>
<td>0.0798</td>
<td>0.0931</td>
</tr>
<tr>
<td rowspan="3">Clothing</td>
<td>HR</td>
<td>0.0412</td>
<td>0.0534</td>
<td>0.0720</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0117</td>
<td>0.0154</td>
<td>0.0175</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.0318</td>
<td>0.0376</td>
<td>0.0427</td>
</tr>
<tr>
<td rowspan="3">Electronics</td>
<td>HR</td>
<td>0.0634</td>
<td>0.0930</td>
<td>0.1022</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0108</td>
<td>0.0156</td>
<td>0.0194</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.0421</td>
<td>0.0580</td>
<td>0.0633</td>
</tr>
<tr>
<td rowspan="3">Toys</td>
<td>HR</td>
<td>0.1143</td>
<td>0.1540</td>
<td>0.1892</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0216</td>
<td>0.0350</td>
<td>0.0417</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.0475</td>
<td>0.0700</td>
<td>0.0891</td>
</tr>
<tr>
<td rowspan="3">Yelp_2019</td>
<td>HR</td>
<td>0.0112</td>
<td>0.0193</td>
<td>0.0231</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0049</td>
<td>0.0061</td>
<td>0.0084</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.0076</td>
<td>0.0093</td>
<td>0.0116</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The experiment in this section only extracts features from text and ratings as user interest features for personalized recommendations. From this, we compare the effects of various text feature extractions. Although traditional word2vec is the most commonly used method of text vectorization in personalized recommendations. However, the recommendation effect is not as good as the semantic extraction effect of PRL. The recommendation method in this paper integrates text features and latent feature vectors in ratings, which shows that the feature vectors in ratings are also very helpful in improving the recommendation effect.</p>
<p>User temporal features will be compared with PRS. PRS utilizes semantic coding in the coding part, so the obtained temporal features also contain semantic information. However, DLUR takes into account the scarcity of user review data, and the recommendation method is a task-independent general method. Therefore, the form of ID encoding is used instead. The comparison results are shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>. In <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, method is the DLUR-sr model.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Comparison of time sequence features</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_51046-fig-4.tif"/>
</fig>
<p>As shown in the <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, among the four data sets, Clothing and Electronics, as two data sets with relatively low review density, have slightly improved in the comparison indicators. The other two data sets have similar comment densities. It can be seen from the figure that the performance of PRS and the method in this chapter are comparable. Comparing the technical ideas of the two methods, when there is sufficient review data, PRS performs better because it takes advantage of semantic features. When the review data is sparse, the advantages of recommendation methods based on user interests are very obvious.</p>
<p>DLUR mainly comes from two hierarchical structures, in which the attention layer is used. In order to reflect the completeness of the model, each part was removed separately in the ablation experiment to test the effect of the model. The comparison results are shown in <xref ref-type="table" rid="table-4">Table 4</xref>.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Comparison of the effects of various parts of DLUR</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th>Dataset</th>
<th>Metric</th>
<th>DLUR</th>
<th>DLUR-lfm</th>
<th>DLUR-factor</th>
<th>DLUR-re</th>
<th>DLUR-att</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="3">Phones</td>
<td>HR</td>
<td>0.2253</td>
<td>0.1845</td>
<td>0.1937</td>
<td>0.2038</td>
<td>0.2143</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0492</td>
<td>0.401</td>
<td>0.0415</td>
<td>0.0473</td>
<td>0.0489</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.1097</td>
<td>0.0884</td>
<td>0.0931</td>
<td>0.0983</td>
<td>0.1078</td>
</tr>
<tr>
<td rowspan="3">Clothing</td>
<td>HR</td>
<td>0.0821</td>
<td>0.0701</td>
<td>0.0720</td>
<td>0.0793</td>
<td>0.0803</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0221</td>
<td>0.0168</td>
<td>0.0175</td>
<td>0.0204</td>
<td>0.0214</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.0613</td>
<td>0.0409</td>
<td>0.0427</td>
<td>0.0595</td>
<td>0.0606</td>
</tr>
<tr>
<td rowspan="3">Electronics</td>
<td>HR</td>
<td>0.1372</td>
<td>0.1005</td>
<td>0.1022</td>
<td>0.1294</td>
<td>0.1317</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0413</td>
<td>0.0178</td>
<td>0.0194</td>
<td>0.0402</td>
<td>0.0409</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.0805</td>
<td>0.0534</td>
<td>0.0633</td>
<td>0.0791</td>
<td>0.0800</td>
</tr>
<tr>
<td rowspan="3">Toys</td>
<td>HR</td>
<td>0.2311</td>
<td>0.1734</td>
<td>0.1892</td>
<td>0.2207</td>
<td>0.2243</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0510</td>
<td>0.0392</td>
<td>0.0417</td>
<td>0.0490</td>
<td>0.0503</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.1098</td>
<td>0.0790</td>
<td>0.0891</td>
<td>0.1052</td>
<td>0.1073</td>
</tr>
<tr>
<td rowspan="3">Yelp_2019</td>
<td>HR</td>
<td>0.0354</td>
<td>0.0209</td>
<td>0.0231</td>
<td>0.0321</td>
<td>0.0341</td>
</tr>
<tr>
<td>MRR</td>
<td>0.0190</td>
<td>0.0073</td>
<td>0.0084</td>
<td>0.0175</td>
<td>0.0184</td>
</tr>
<tr>
<td>NDCG</td>
<td>0.0295</td>
<td>0.0102</td>
<td>0.0116</td>
<td>0.0278</td>
<td>0.0283</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For the data sets Yelp_2019 and Electronics with a small proportion of reviews, the DLUR-lfm model recommendation effect is relatively poor. The results reflected by other data sets are not ideal. The main reason is that the proportion of reviews is relatively small. Once the decomposition of ratings is missing, there will be fewer data features that can be mined and the recommendation effect will be compromised. For data sets with fewer reviews, the recommendation effect of DLUR-re is less affected. The three index values of the data set with more comments are quite different. And the model is unable to generate recommended explanations. DLUR-att removes the attention layer, and using only word2vec in the review part cannot establish user-item semantic interaction, and cannot accurately extract keywords with high attention. This will make non-keywords also have an impact on the extraction of user features and make it impossible to generate explanations. From <xref ref-type="table" rid="table-4">Table 4</xref>, although the three indicator values of the DLUR-att model are better than DLUR-re, but they still cannot reach the indicator values of DLUR.</p>
</sec>
<sec id="s5_4">
<label>5.4</label>
<title>Text Weight Visualization</title>
<p>The important feature extraction of DLUR comes from text, and the text processing process includes two processes from word vectorization to sentence vectorization. An interactive attentive layer is added based on the transformer to focus more on the weight of the vector unit. The experiments in this section display important words and sentences in text paragraphs from a visual perspective. Therefore, we extracted a set of user-item pairs from a review in the dataset and visualized it. <xref ref-type="table" rid="table-5">Table 5</xref> shows all the comments corresponding to a specific user ID and uses various colors to show the influences of the sentences on the paragraph. We only show the top 5 sentences in the entire review in terms of importance.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>High weights sentences in the special user review documents</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>UserID: A2XU46XXNV19C8</th>
</tr>
</thead>
<tbody>
<tr>
<td>I keep this board on top of the hallway table so that I can quickly write notes (which will not get lost until I erase them) and it folds down neatly so it is easy to hide when company calls. I also like the size&#x2013;not too big and not too small. Quality magnets hold pretty good but I use it mainly for notes. <styled-content style="background:#FFFF00">NOTE: Children&#x2019;s alphabet letters don&#x2019;t hold very well&#x2013;they tend to slide.</styled-content></td>
</tr>
<tr>
<td>The bus arrived without the stop sign. In fact, it was packaged without the sign at all. It had clearly broken off but had been shipped to me anyway. Now, being it is going to a 4-yr-old boy, I never anticipated the stop sign to last long but it would have been nice to present it to my son with the sign still in tact. Not worth returning it for a replacement. Hopefully the rest of the bus will last a little bit longer.</td>
</tr>
<tr>
<td>This toy is truly one of those toys that will be sought after in 30 years when parents are looking for quality toys for their grandchildren&#x2013;SO BUY IT NOW! This is a 5 star toy&#x2013;no doubts about it. This toy is the ideal cause &#x0026; effect toy. Babies and toddlers love the fun music, they are awed by the speed at which the balls fly out from the tube and they will eventually learn how to press the large button by themselves, and they are absolutely intrigued by the unpredictability of how the balls fly up and roll down the tube. FASCINATING!!! The balls do fly all over the place so just know that, but that is all part of the fun. Yes, you will definitely be looking for balls under your coffee tables, etc., but so what? This is such a great toy. <styled-content style="background:#FFFF00">THIS IS ALSO A GREAT TOY FOR CHILDREN WITH AUTISM.</styled-content></td>
</tr>
<tr>
<td>This is really a simple, easy to use, and fun little toy. I&#x2019;d call it a classic toy.</td>
</tr>
<tr>
<td>In my humble opinion, Vtech is \&#x201D;the BEST\&#x201D; educational toy company out there. We have never been let down by the quality, thoughtfulness and talent that goes into making Vtech toys. <styled-content style="background:#FF0000">Both my children, one autistic (age 4) and the other not (age 2) are learning and having fun every time they turn on any of their creative learning devices.</styled-content> Another great company for electronic learning toys is LeapFrog&#x2013;also very nice products.</td>
</tr>
<tr>
<td><styled-content style="background:#FFFF00">Bought this for my 4-yr-old with Autism to help him with motion and movement.</styled-content> So far he is only interested in the sounds and is a bit timid to actually sit on it. However, I think as he gets older he will enjoy it more. Honestly though, wished I had checked Craig&#x2019;s list first because this thing is extremely sturdy and will most likely hold up in good to excellent condition by the time your child has out grown it. Also, it would have already been put together for you by someone else! Lovely toy though&#x2013;a true classic.</td>
</tr>
<tr>
<td>Parents know that finding learning toys ideal for autism AND neurotypical siblings is a challenge. <styled-content style="background:#F28C2E">I have a 4-yr-old (autistic) and a 2-yr-old (neurotypical) and THEY BOTH LOVE THIS LEARNING TOY</styled-content>!!!! You could buy 2 if you don&#x2019;t want arguments but I bought one so they BOTH learn how to share and take turns. Wonderful job to the LeapFrog game creators, and thanks.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The sentences in <xref ref-type="table" rid="table-5">Table 5</xref> use three colors: red, orange, and yellow to indicate the sentence weight from high to low. It can be seen from the sentences with high weight in user comments that the user is most concerned about whether the toy is suitable for his two children. One child has autism, and the user repeatedly emphasizes the suitability of the toy for the child. In addition, users are more concerned about the functions of toys. It can also be seen that DLUR&#x2019;s refinement of text semantics is from the perspective of user-item pairs, and it can extract the points that users are most concerned about in an interactive way. This is a representative user characteristic.</p>
<p>In <xref ref-type="table" rid="table-6">Table 6</xref>, we show the top 25% of the weighted words. It can be seen from these shading words that words with character identification have a high weight, such as &#x201C;autistic&#x201D; and &#x201C;4-yr-old&#x201D;. In addition, some nouns have relatively high weight, such as &#x201C;motion&#x201D; and &#x201C;learning&#x201D;. Through these we can also see what users focus on when choosing toys.</p>
<table-wrap id="table-6">
<label>Table 6</label>
<caption>
<title>Highlighted words in the top 3 sentences according to the attentive weights in the user review documents</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>UserID: A2XU46XXNV19C8</th>
</tr>
</thead>
<tbody>
<tr>
<td>Both my children, one <styled-content style="background:#FF0000">autistic</styled-content> (<styled-content style="background:#F28C2E">age 4</styled-content>) and the other not (<styled-content style="background:#F28C2E">age 2</styled-content>) are <styled-content style="background:#FF0000">learning</styled-content> and having fun every time they turn on any of their creative <styled-content style="background:#FF0000">learning</styled-content> <styled-content style="background:#FFFF00">devices</styled-content>.</td>
</tr>
<tr>
<td>I have a <styled-content style="background:#F28C2E">4-yr-old</styled-content> (<styled-content style="background:#FF0000">autistic</styled-content>) and a <styled-content style="background:#F28C2E">2-yr-old</styled-content> (<styled-content style="background:#FF0000">neurotypical</styled-content>) and THEY BOTH LOVE THIS <styled-content style="background:#FF0000">LEARNING</styled-content> TOY!</td>
</tr>
<tr>
<td>Bought this for my <styled-content style="background:#F28C2E">4-yr-old</styled-content> with <styled-content style="background:#FF0000">Autism</styled-content> to help him with <styled-content style="background:#FFFF00">motion</styled-content> and <styled-content style="background:#FFFF00">movement</styled-content>.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s5_5">
<label>5.5</label>
<title>Sentence-Level Explanation</title>
<p>In the explanation mechanism generation stage, we send recommended items and explanations to users at the same time. DLUR focuses on the generation of user portraits. After being applied to the recommendation system, the generated recommendation explanations are compared with the comments in the original test set, as shown in <xref ref-type="table" rid="table-7">Table 7</xref>.</p>
<table-wrap id="table-7">
<label>Table 7</label>
<caption>
<title>Comparison between recommended explanation and real reviews</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Pair<sub>real</sub> (user<sub>1</sub>, item<sub>1</sub>, 5.0)</th>
<th>Pair<sub>prediction</sub> (user<sub>1</sub>, item<sub>1</sub>, 4.876)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Review</td>
<td>Explanation</td>
</tr>
<tr>
<td>Bought this for my 4-yr-old with Autism to help him with motion and movement. So far he is only interested in the sounds and is a bit timid to actually sit on it. However, I think as he gets older he will enjoy it more.<break/>Honestly though, wished I had checked Craig&#x2019;s list first because this thing is extremely sturdy</td>
<td>We still have this horsie, still love it, and it is in great condition after 4 years.<break/>Very durable and handles well on a vigorous bouncy ride.<break/> The sound is great, not intrusive, and it really wears out the little ones with a high energy level!</td>
</tr>
<tr>
<td>and will most likely hold up in good to excellent condition by the time your child has out grown it. Also, it would have already been put together for you by someone else! Lovely toy though-a true classic</td>
<td></td>
</tr>
<tr>
<td>Pair<sub>real</sub> (user<sub>2</sub>, item<sub>1</sub>, 3.0)</td>
<td>Pair<sub>prediction</sub> (user<sub>2</sub>, item<sub>1</sub>, 2.79)</td>
</tr>
<tr>
<td>Review</td>
<td>Explanation</td>
</tr>
<tr>
<td>I did not realize there had been a change in this horse until just now as I was going to review the one we have. I must admit the other one sounds a lot better in terms of features and one major important factor: safety. I am giving this horse 3 stars for that reason.<break/>The old one with stabilizers, yes, this one, no, which is a real shame because this toy would be perfect but for that important missing part.</td>
<td>The only thing I would change would be that the metal support rails had some kind of padding. Makes me nervous but not them.<break/>I would like to warn all of you reading these glowing reviews that most of the reviewers have the older version with many nice features.<break/>I hope buyers can track down the version we have!</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In <xref ref-type="table" rid="table-7">Table 7</xref>, we selected different user evaluations of the same item as the real evaluation. After DLUR is applied to the recommendation system, predicted scores and recommendation basis are generated. There is not much difference between the rating and the actual value. The two reference samples were chosen to demonstrate whether the recommendation explanations generated by high-scoring and low-scoring evaluations on the same item can support the user&#x2019;s intention to choose toys. In <xref ref-type="table" rid="table-7">Table 7</xref>, the high-scoring evaluation focuses on the sound, firmness, and movement of the toy horse, and the recommended explanation covers these aspects. In low-scoring evaluations, version and security are considered, and the recommended explanations also indicate the problem of version inconsistency. Overall, the explanation mechanism provided in DLUR meets user needs.</p>
</sec>
</sec>
<sec id="s6">
<label>6</label>
<title>Conclusions</title>
<p>In this paper, we focus on analyzing how users are represented and mining user hidden features from existing user reviews and ratings. In the process of mining latent features of text, the transformer-based model adds an interaction layer. So that the user&#x2019;s text representation content covers the characteristic information of the selected item. The advantage is that it can highlight the weight of keywords in sentences or paragraphs. The expression of user characteristics is more focused. On this basis, adding latent features decomposed by ratings can alleviate the lack of representation caused by the sparsity of review data. These historical comments and ratings can only reflect the general characteristics of users&#x2019; choices over time, but cannot represent the changes in users&#x2019; interests over time. Therefore, this article proposed a mining method of time series features. It takes fusion of user latent features and temporal features as user representation. From the experimental point of view, the results are very good.</p>
<p>The source data of user representation also has diverse content, including video, audio, etc. Future research will expand the data sources of user representation and mine user characteristics from videos, audios and images. Currently, there are many excellent models that can achieve user representation from a single data source, but the fusion method is relatively simple and cannot guarantee the commonality and diversity of user interests at the same time. Later research work will focus more on the study of integration methods.</p>
</sec>
</body>
<back>
<ack>
<p>The authors would like to thank the reviewers for their helpful suggestions which have considerably improved the quality of the manuscript.</p>
</ack>
<sec><title>Funding Statement</title>
<p>This research is supported by the Applied Research Center of Artificial Intelligence, Wuhan College (Grant Number X2020113) and the Wuhan College Research Project (Grant Number KYZ202009).</p>
</sec>
<sec><title>Author Contributions</title>
<p>The authors confirm their contribution to the paper as follows: study conception and design: Fuxi Zhu and Jin Xie; data collection: Mohammed Alshahrani; analysis and interpretation of results: Fuxi Zhu, Jin Xie and Mohammed Alshahrani; draft manuscript preparation: Fuxi Zhu and Jin Xie. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability"><title>Availability of Data and Materials</title>
<p>The data supporting the results of this study are public datasets that can be directly searched on the Internet.</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Ni</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Perceive your users in depth: Learning universal user representations from multiple e-commerce tasks</article-title>,&#x201D; in <conf-name>Proc. 24th ACM SIGKDD Conf. Knowl. Discovery Data Mining</conf-name>, <publisher-loc>London, UK</publisher-loc>, <year>2018</year>, pp. <fpage>596</fpage>&#x2013;<lpage>605</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3219819.3219828</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Yuan</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Yang</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<article-title>StackRec: Efficient training of very deep sequential recommender models by iterative stacking</article-title>,&#x201D; in <conf-name>Proc. 44th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval</conf-name>, <publisher-loc> Canada</publisher-loc>, <year>2021</year>, pp. <fpage>357</fpage>&#x2013;<lpage>366</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3404835.3462890</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Yuan</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Karatzoglou</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Arapakis</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Jose</surname></string-name></person-group>, &#x201C;<article-title>A simple convolutional generative network for next item recommendation</article-title>,&#x201D; in <conf-name>Proc. Twelfth ACM Int. Conf. Web Search Data Mining</conf-name>, <publisher-loc>Melbourne, VIC, Australia</publisher-loc>, <year>2019</year>, pp. <fpage>582</fpage>&#x2013;<lpage>590</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3289600.3290975</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Zhou</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization</article-title>,&#x201D; in <conf-name>Proc. 29th ACM Int. Conf. Inf. Knowl. Manage.</conf-name>, <publisher-loc>Ireland</publisher-loc>, <year>2020</year>, pp. <fpage>1893</fpage>&#x2013;<lpage>1902</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3340531.3411954</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Jannach</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Ludewig</surname></string-name></person-group>, &#x201C;<article-title>When recurrent neural networks meet the neighborhood for session-based recommendation</article-title>,&#x201D; in <conf-name>Proc. Eleventh ACM Conf. Recommender Syst.</conf-name>, <publisher-loc>New York, NY, USA</publisher-loc>, <year>2017</year>, pp. <fpage>306</fpage>&#x2013;<lpage>310</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3109859.3109872</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Hansen</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Contextual and sequential user embeddings for large-scale music recommendation</article-title>,&#x201D; in <conf-name>Proc. 14th ACM Conf. Recommender Syst.</conf-name>, <publisher-loc>Brazil</publisher-loc>, <year>2020</year>, pp. <fpage>53</fpage>&#x2013;<lpage>62</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3383313.3412248</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Unger</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Tuzhilin</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Livne</surname></string-name></person-group>, &#x201C;<article-title>Context-aware recommendations based on deep learning frameworks</article-title>,&#x201D; <source>ACM Trans. Manag. Inf. Syst.(TMIS)</source>, vol. <volume>11</volume>, no. <issue>2</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>15</lpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.1145/3386243</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Zheng</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Zhu</surname></string-name>, and <string-name><given-names>X.</given-names> <surname>Yao</surname></string-name></person-group>, &#x201C;<article-title>Recommendation rating prediction based on attribute boosting with partial sampling</article-title>,&#x201D; <source>Chin. J. Comput.</source>, vol. <volume>39</volume>, no. <issue>8</issue>, pp. <fpage>1501</fpage>&#x2013;<lpage>1514</lpage>, <year>2016</year>. doi: <pub-id pub-id-type="doi">10.11897/SP.J.1016.2016.01501</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Gong</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Research on collaborative filtering recommendation algorithm for improving user similarity calculation</article-title>,&#x201D; in <conf-name>Proc. 2021 1st Int. Conf. Control Intell. Robot.</conf-name>, <publisher-loc>Guangzhou, China</publisher-loc>, <year>2021</year>, pp. <fpage>331</fpage>&#x2013;<lpage>336</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3473714.3473772</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Research on recommendation algorithm based on collaborative filtering</article-title>,&#x201D; in <conf-name>2021 2nd Int. Conf. Artif. Intell. Inf. Syst.</conf-name>, <publisher-loc>Chongqing, China</publisher-loc>, <year>2021</year>, pp. <fpage>1</fpage>&#x2013;<lpage>4</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3469213.3470399</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, and <string-name><given-names>G.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Research on collaborative filtering recommendation algorithm based on sentiment analysis and topic model</article-title>,&#x201D; in <conf-name>Proc. 4th Int. Conf. Big Data Computing</conf-name>, <publisher-loc>Guangzhou, China</publisher-loc>, <year>2019</year>, pp. <fpage>169</fpage>&#x2013;<lpage>178</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3335484.3335536</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Koren</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Bell</surname></string-name>, and <string-name><given-names>C.</given-names> <surname>Volinsky</surname></string-name></person-group>, &#x201C;<article-title>Matrix factorization techniques for recommender systems</article-title>,&#x201D; <source>Computer</source>, vol. <volume>42</volume>, no. <issue>8</issue>, pp. <fpage>30</fpage>&#x2013;<lpage>37</lpage>, <year>2009</year>. doi: <pub-id pub-id-type="doi">10.1109/MC.2009.263</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Mao</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>H. U.</given-names> <surname>Rong</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Tang</surname></string-name>, and <string-name><given-names>M.</given-names> <surname>Shi</surname></string-name></person-group>, &#x201C;<article-title>Sigmoid function-based web service collaborative filtering recommendation algorithm</article-title>,&#x201D; <source>J. Front. Comput. Sci. Technol.</source>, vol. <volume>11</volume>, no. <issue>2</issue>, pp. <fpage>314</fpage>&#x2013;<lpage>322</lpage>, <year>2017</year>. doi: <pub-id pub-id-type="doi">10.3778/j.issn.1673-9418.1511072</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>E.</given-names> <surname>Poirson</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Da Cunha</surname></string-name></person-group>, &#x201C;<article-title>A recommender approach based on customer emotions</article-title>,&#x201D; <source>Expert. Syst. Appl.</source>, vol. <volume>122</volume>, no. <issue>1</issue>, pp. <fpage>281</fpage>&#x2013;<lpage>288</lpage>, <year>2019</year>. doi: <pub-id pub-id-type="doi">10.1016/j.eswa.2018.12.035</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Ajoudanian</surname></string-name> and <string-name><given-names>M. N.</given-names> <surname>Abadeh</surname></string-name></person-group>, &#x201C;<article-title>Recommending human resources to project leaders using a collaborative filtering-based recommender system: Case study of gitHub</article-title>,&#x201D; <source>IET Softw.</source>, vol. <volume>13</volume>, no. <issue>5</issue>, pp. <fpage>379</fpage>&#x2013;<lpage>385</lpage>, <year>2019</year>. doi: <pub-id pub-id-type="doi">10.1049/iet-sen.2018.5261</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>W. C.</given-names> <surname>Kang</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Mcauley</surname></string-name></person-group>, &#x201C;<article-title>Self-attentive sequential recommendation</article-title>,&#x201D; in <conf-name>IEEE Int. Conf. Data Min.(ICDM)</conf-name>, <publisher-loc>Singapore</publisher-loc>, <year>2018</year>, pp. <fpage>197</fpage>&#x2013;<lpage>206</lpage>. doi: <pub-id pub-id-type="doi">10.1109/ICDM.2018.00035</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Zhou</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Deep interest evolution network for click-through rate prediction</article-title>,&#x201D; in <conf-name>Proc. AAAI Conf. on Artif. Intell.</conf-name>, <publisher-loc>Hawaii, USA</publisher-loc>, <year>2019</year>, pp. <fpage>5941</fpage>&#x2013;<lpage>5948</lpage>. doi: <pub-id pub-id-type="doi">10.1609/aaai.v33i01.33015941</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Huang</surname></string-name>, and <string-name><given-names>W.</given-names> <surname>Qu</surname></string-name></person-group>, &#x201C;<article-title>Behavior sequence transformer for e-commerce recommendation in Alibaba</article-title>,&#x201D; in <conf-name>Proc. 1st Int. Workshop on Deep Learn. Pract. High-Dimens. Sparse Data</conf-name>, <publisher-loc>Anchorage Alaska, USA</publisher-loc>, <year>2019</year>, pp. <fpage>1</fpage>&#x2013;<lpage>4</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3326937.3341261</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>P. S.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>He</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Gao</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Deng</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Acero</surname></string-name> and <string-name><given-names>L.</given-names> <surname>Heck</surname></string-name></person-group>, &#x201C;<article-title>Learning deep structured semantic models for web search using clickthrough data</article-title>,&#x201D; in <conf-name>Proc. 22nd ACM Int. Conf. on Inf. Knowl. Manag.</conf-name>, <publisher-loc>San Francisco CA, USA</publisher-loc>, <year>2013</year>, pp. <fpage>2333</fpage>&#x2013;<lpage>2338</lpage>. doi: <pub-id pub-id-type="doi">10.1145/2505515.2505665</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Guo</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Tang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Ye</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Li</surname></string-name>, and <string-name><given-names>X.</given-names> <surname>He</surname></string-name></person-group>, &#x201C;<article-title>DeepFM: A factorization-machine based neural network for CTR prediction</article-title>,&#x201D; in <conf-name>Int. Joint Conf. Artif. Intell.</conf-name>, <publisher-loc>Melbourne, Australia</publisher-loc>, <year>2017</year>, pp. <fpage>1725</fpage>&#x2013;<lpage>1731</lpage>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.1703.04247</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>He</surname></string-name> and <string-name><given-names>T. S.</given-names> <surname>Chua</surname></string-name></person-group>, &#x201C;<article-title>Neural factorization machines for sparse predictive analytics</article-title>,&#x201D; in <conf-name>Proc. 40th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval</conf-name>, <publisher-loc>Shinjuku Tokyo, Japan</publisher-loc>, <year>2017</year>, pp. <fpage>355</fpage>&#x2013;<lpage>364</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3077136.3080777</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Le</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Mikolov</surname></string-name></person-group>, &#x201C;<article-title>Distributed representations of sentences and documents</article-title>,&#x201D; in <conf-name>Proc. Int. Conf. Mach. Learn.</conf-name>, <publisher-loc>Beijing, China</publisher-loc>, <year>2014</year>, pp. <fpage>1188</fpage>&#x2013;<lpage>1196</lpage>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.1405.4053</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Hill</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Cho</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Korhonen</surname></string-name></person-group>, &#x201C;<article-title>Learning distributed representations of sentences from unlabeled data</article-title>,&#x201D; in <conf-name>Proc. 2016 Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Lang. Technol.</conf-name>, <publisher-loc>San Diego, CA, USA</publisher-loc>, <year>2016</year>, pp. <fpage>12</fpage>&#x2013;<lpage>17</lpage>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.1602.03483</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Arora</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Liang</surname></string-name>, and <string-name><given-names>T.</given-names> <surname>Ma</surname></string-name></person-group>, &#x201C;<article-title>A simple but tough-to-beat baseline for sentence embeddings</article-title>,&#x201D; in <conf-name>Int. Conf. Learn. Representations</conf-name>, <publisher-loc>Puerto Rico, USA</publisher-loc>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Bahdanau</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Cho</surname></string-name>, and <string-name><given-names>Y.</given-names> <surname>Bengio</surname></string-name></person-group>, &#x201C;<article-title>Neural machine translation by jointly learning to align and translate</article-title>,&#x201D; in <conf-name>Int. Conf. Learn. Representations</conf-name>, <publisher-loc>San Diego, CA, USA</publisher-loc>, <year>2015</year>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.1409.0473</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Vaswani</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Shazeer</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Parmar</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Uszkoreit</surname></string-name></person-group>, &#x201C;<article-title>Attention is all you need</article-title>,&#x201D; in <conf-name>Neural Inf. Process. Syst.</conf-name>, <publisher-loc>Long Beach, CA, USA</publisher-loc>, <year>2017</year>, pp. <fpage>5998</fpage>&#x2013;<lpage>6008</lpage>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Devlin</surname></string-name>, <string-name><given-names>M. W.</given-names> <surname>Chang</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Lee</surname></string-name>, and <string-name><given-names>K.</given-names> <surname>Toutanova</surname></string-name></person-group>, &#x201C;<article-title>BERT: Pre-training of deep bidirectional transformers for language understanding</article-title>,&#x201D; <comment>arXiv preprint arXiv:1810.04805</comment>, <year>2018</year>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.1810.04805</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N. Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Y. X.</given-names> <surname>Ye</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>L. Z.</given-names> <surname>Feng</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Bao</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Peng</surname></string-name></person-group>, &#x201C;<article-title>Language models based on deep learning: A review</article-title>,&#x201D; <source>J. Softw.</source>, vol. <volume>32</volume>, no. <issue>4</issue>, pp. <fpage>1082</fpage>&#x2013;<lpage>1115</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Kim</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Lee</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Kim</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Yang</surname></string-name>, and <string-name><given-names>C.</given-names> <surname>Park</surname></string-name></person-group>, &#x201C;<article-title>Task relation-aware continual user representation learning</article-title>,&#x201D; in <conf-name>Proc. 29th ACM SIGKDD Conference on Knowl. Discovery Data Mining</conf-name>, <publisher-loc>Long Beach, CA, USA</publisher-loc>, <year>2023</year>, pp. <fpage>pp 1107</fpage>&#x2013;<lpage>1119</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3580305.3599516</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Xue</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Zhai</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Li</surname></string-name>, and <string-name><given-names>Y.</given-names> <surname>Xiao</surname></string-name></person-group>, &#x201C;<article-title>Learning dual-view user representations for enhanced sequential recommendation</article-title>,&#x201D; <source>ACM Trans. Inf. Syst.</source>, vol. <volume>41</volume>, no. <issue>4</issue>, <fpage>pp 1</fpage>&#x2013;<lpage>26</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Kang</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Guan</surname></string-name>, and <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>We are not so similar: Alleviating user representation collapse in social recommendation</article-title>,&#x201D; in <conf-name>Proc. 2023 ACM Int. Conf. Multimedia Retrieval</conf-name>, <publisher-loc>Thessaloniki, Greece</publisher-loc>, <year>2023</year>, pp. <fpage>378</fpage>&#x2013;<lpage>387</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3591106.3592244</pub-id>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Li</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>RecGURU: Adversarial learning of generalized user representations for cross-domain recommendation</article-title>,&#x201D; in <conf-name>WSDM `22: Proc. Tenth ACM Int. Conf. Web Search and Data Mining</conf-name>, <year>2022</year>, pp. <fpage>571</fpage>&#x2013;<lpage>581</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3488560.3498388</pub-id>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Gong</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Lin</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Song</surname></string-name>, and <string-name><given-names>H.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>JNET: Learning user representations via joint network embedding and topic embedding</article-title>,&#x201D; in <conf-name>WSDM '20: Proc. Tenth ACM Int. Conf. Web Search and Data Mining</conf-name>, <publisher-loc>Houston TX, USA</publisher-loc>, <year>2020</year>, pp. <fpage>205</fpage>&#x2013;<lpage>213</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3336191.3371770</pub-id>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Tao</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Feng</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Shao</surname></string-name></person-group>, &#x201C;<article-title>Learning dynamic user behavior based on error-driven event representation</article-title>,&#x201D; in <conf-name>WWW '21: Proc. Web Conf. 2021</conf-name>, <publisher-loc>Ljubljana, Slovenia</publisher-loc>, <year>2021</year>, pp. <fpage>2457</fpage>&#x2013;<lpage>2465</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3442381.3450012</pub-id>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Pagliardini</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Gupta</surname></string-name>, and <string-name><given-names>M.</given-names> <surname>Jaggi</surname></string-name></person-group>, &#x201C;<article-title>Unsupervised learning of sentence embeddings using compositional n-gram features</article-title>,&#x201D; <comment>arXiv preprint arXiv:1703.02507</comment>, <year>2017</year>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.1703.02507</pub-id>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y. J.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Dong</surname></string-name>, and <string-name><given-names>X. W.</given-names> <surname>Meng</surname></string-name></person-group>, &#x201C;<article-title>Research on personalized advertising recommendation systems and their applications</article-title>,&#x201D; (in Chinese), <source>Chin. J. Comput.</source>, vol. <volume>44</volume>, no. <issue>3</issue>, pp. <fpage>531</fpage>&#x2013;<lpage>563</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.11897/SP.J.1016.2021.00531</pub-id>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J. W.</given-names> <surname>Ahn</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Brusilovsky</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Grady</surname></string-name>, <string-name><given-names>D.</given-names> <surname>He</surname></string-name>, and <string-name><given-names>S. Y.</given-names> <surname>Syn</surname></string-name></person-group>, &#x201C;<article-title>Open user profiles for adaptive news systems: Help or harm?</article-title>,&#x201D; in <conf-name>Proc. 16th Int. Conf. World Wide Web</conf-name>, <publisher-loc>Banff Alberta, Canada</publisher-loc>, <year>2007</year>, pp. <fpage>11</fpage>&#x2013;<lpage>20</lpage>. doi: <pub-id pub-id-type="doi">10.1145/1242572.1242575</pub-id>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Zheng</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Noroozi</surname></string-name>, and <string-name><given-names>P. S.</given-names> <surname>Yu</surname></string-name></person-group>, &#x201C;<article-title>Joint deep modeling of users and items using reviews for recommendation</article-title>,&#x201D; in <conf-name>WSDM '17: Proc. Tenth ACM Int. Conf. Web Search and Data Mining</conf-name>, <publisher-loc>Cambridge, UK</publisher-loc>, <year>2017</year>, pp. <fpage>425</fpage>&#x2013;<lpage>434</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3018661.3018665</pub-id>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Xie</surname></string-name>, <string-name><given-names>F. X.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>X. F.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Huang</surname></string-name>, and <string-name><given-names>S. C.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>Attentive preference personalized recommendation with sentence-level explanations</article-title>,&#x201D; <source>Neurocomputing</source>, vol. <volume>426</volume>, no. <issue>2</issue>, pp. <fpage>235</fpage>&#x2013;<lpage>247</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1016/j.neucom.2020.10.041</pub-id>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Hidasi</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Karatzoglou</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Baltrunas</surname></string-name>, and <string-name><given-names>D.</given-names> <surname>Tikk</surname></string-name></person-group>, &#x201C;<article-title>Session-based recommendations with recurrent neural networks</article-title>,&#x201D; in <conf-name>Int. Conf. Learn. Representations</conf-name>, <year>2015</year>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.1511.06939</pub-id>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Tay</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Yao</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Sun</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>An</surname></string-name></person-group>, &#x201C;<article-title>Next item recommendation with self-attentive metric learning</article-title>,&#x201D; in <conf-name>Thirty-Third AAAI Conf. Artif. Intell.</conf-name>, <publisher-loc>Hawaii, USA</publisher-loc>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Gao</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Set-sequence-graph: A multi-view approach towards exploiting reviews for recommendation</article-title>,&#x201D; in <conf-name>CIKM '20: Proc. 29th ACM Int. Conf. Inf. Knowl. Manage.</conf-name>, <publisher-loc> Ireland</publisher-loc>, <year>2020</year>, pp. <fpage>395</fpage>&#x2013;<lpage>404</lpage>. doi: <pub-id pub-id-type="doi">10.1145/3340531.3411939</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>