<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">66754</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.066754</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Utility-Driven Edge Caching Optimization with Deep Reinforcement Learning under Uncertain Content Popularity</article-title>
<alt-title alt-title-type="left-running-head">Utility-Driven Edge Caching Optimization with Deep Reinforcement Learning under Uncertain Content Popularity</alt-title>
<alt-title alt-title-type="right-running-head">Utility-Driven Edge Caching Optimization with Deep Reinforcement Learning under Uncertain Content Popularity</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Kwon</surname><given-names>Mingoo</given-names></name></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Kim</surname><given-names>Kyeongmin</given-names></name></contrib>
<contrib id="author-3" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Song</surname><given-names>Minseok</given-names></name><email>mssong@inha.ac.kr</email></contrib>
<aff id="aff-1"><institution>Department of Computer Engineering, Inha University</institution>, <addr-line>Incheon, 22212</addr-line>, <country>Republic of Korea</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Minseok Song. Email: <email>mssong@inha.ac.kr</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>29</day><month>08</month><year>2025</year>
</pub-date>
<volume>85</volume>
<issue>1</issue>
<fpage>519</fpage>
<lpage>537</lpage>
<history>
<date date-type="received">
<day>16</day>
<month>4</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>21</day>
<month>7</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_66754.pdf"></self-uri>
<abstract>
<p>Efficient edge caching is essential for maximizing utility in video streaming systems, especially under constraints such as limited storage capacity and dynamically fluctuating content popularity. Utility, defined as the benefit obtained per unit of cache bandwidth usage, degrades when static or greedy caching strategies fail to adapt to changing demand patterns. To address this, we propose a deep reinforcement learning (DRL)-based caching framework built upon the proximal policy optimization (PPO) algorithm. Our approach formulates edge caching as a sequential decision-making problem and introduces a reward model that balances cache hit performance and utility by prioritizing high-demand, high-quality content while penalizing degraded quality delivery. We construct a realistic synthetic dataset that captures both temporal variations and shifting content popularity to validate our model. Experimental results demonstrate that our proposed method improves utility by up to 135.9% and achieves an average improvement of 22.6% compared to traditional greedy algorithms and long short-term memory (LSTM)-based prediction models. Moreover, our method consistently performs well across a variety of utility functions, workload distributions, and storage limitations, underscoring its adaptability and robustness in dynamic video caching environments.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Edge caching</kwd>
<kwd>video-on-demand</kwd>
<kwd>reinforcement learning</kwd>
<kwd>utility optimization</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>Inha University Research Grant</funding-source>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<sec id="s1_1">
<label>1.1</label>
<title>Background</title>
<p>The explosive growth in video streaming demand has highlighted the critical need for efficient data delivery in modern networks. Video content represents a significant portion of Internet traffic, consuming substantial bandwidth and often leading to network congestion. Edge caching has emerged as a key solution to these challenges, enabling data to be stored closer to users [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-2">2</xref>]. By reducing reliance on core network resources, edge caching minimizes data transmission, reduces latency, and alleviates network bottlenecks. These benefits translate into faster content delivery, improved quality of experience (QoE), and significant reductions in bandwidth usage, making edge caching an essential component of video streaming systems [<xref ref-type="bibr" rid="ref-3">3</xref>].</p>
<p>The rise of short-form video platforms such as Instagram Reels, TikTok, and YouTube Shorts has further amplified the demand for low-latency, high-quality video delivery. These applications rely heavily on real-time responsiveness and personalization, requiring efficient caching strategies that can adapt to fast-changing content popularity. As video continues to dominate everyday mobile usage, intelligent edge caching becomes increasingly vital to sustain user satisfaction and reduce strain on network infrastructure [<xref ref-type="bibr" rid="ref-4">4</xref>].</p>
<p>Edge caching faces significant constraints due to limited storage capacity. These constraints make strategic caching decisions critical, as maximizing the utility of edge resources is essential for balancing user satisfaction and operational efficiency. Utility, in this context, is defined as the overall benefit derived by edge service providers, considering cache bandwidth usage, video quality, and user satisfaction [<xref ref-type="bibr" rid="ref-5">5</xref>]. Importantly, utility is not solely determined by cache hit ratios; it also depends on the ability to deliver high-quality video content.</p>
<p>When the requested bitrate version is unavailable, lower bitrate versions can be provided to maintain service continuity. However, this often results in degraded QoE and reduced utility. To address these challenges, caching strategies must go beyond maximizing hit rates and instead prioritize storing bitrate versions that align with user demand and quality expectations. Effective policies must account for fallback scenarios, where delivering lower-quality alternatives risks degrading QoE and reducing overall utility.</p>
</sec>
<sec id="s1_2">
<label>1.2</label>
<title>Motivation and Contributions</title>
<p>Traditional caching methods often model the caching problem as a knapsack problem, using greedy algorithms to prioritize content with high hit ratios [<xref ref-type="bibr" rid="ref-6">6</xref>]. These algorithms are computationally efficient and work well under static content popularity assumptions. However, they struggle to adapt to the dynamic and uncertain nature of real-world video popularity, where user preferences and content demands fluctuate over time [<xref ref-type="bibr" rid="ref-7">7</xref>]. This lack of adaptability leads to suboptimal caching decisions, especially in environments characterized by variable workloads and shifting content popularity. This highlights the pressing need for adaptive caching strategies that can maximize utility under uncertainty and dynamic popularity trends.</p>
<p>In addition, modern VoD systems require caching strategies that go beyond simple content presence and consider which bitrate versions to store. Since delivery cost and user satisfaction vary with quality, caching must balance storage and QoE through version-aware decisions.</p>
<p>To handle the dynamic and uncertain nature of user demand, deep reinforcement learning (DRL) offers a data-driven way to learn adaptive caching policies. By avoiding explicit popularity modeling, DRL is well-suited for utility-driven caching in environments with fluctuating content trends and multi-version video delivery.</p>
<p>The contributions of this study are as follows:
<list list-type="bullet">
<list-item>
<p>This study introduces a DRL-based caching algorithm for edge servers, designed to adapt dynamically to changes in video popularity and workload uncertainty. By leveraging DRL, the proposed method addresses the limitations of conventional caching approaches, including those relying on static rules, heuristic policies, and predictive models, thereby providing more robust performance under dynamic content popularity.</p></list-item>
<list-item>
<p>This approach integrates a reward model that combines video quality and utility optimization. The model prioritizes high-quality content delivery while ensuring efficient use of cache resources, balancing user satisfaction and system performance.</p></list-item>
<list-item>
<p>The proposed method demonstrates superior performance compared to traditional greedy algorithms. It achieves greater adaptability to uncertain and variable content popularity, higher utility for edge service providers, and more efficient bandwidth and storage utilization, validating its effectiveness under dynamic conditions.</p></list-item>
</list></p>
<p>The rest of this paper is organized as follows. <xref ref-type="sec" rid="s2">Section 2</xref> presents the related work. <xref ref-type="sec" rid="s3">Section 3</xref> introduces the system model, and <xref ref-type="sec" rid="s4">Section 4</xref> provides the process of generating a synthetic dataset. <xref ref-type="sec" rid="s5">Section 5</xref> formulates the problem. <xref ref-type="sec" rid="s6">Section 6</xref> describes the proposed algorithms in detail. <xref ref-type="sec" rid="s7">Section 7</xref> presents the experimental results. <xref ref-type="sec" rid="s8">Section 8</xref> discusses key findings and implications of this work. Finally, the paper concludes in <xref ref-type="sec" rid="s9">Section 9</xref>.</p>
</sec>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>Several studies have employed DRL to enhance edge caching efficiency. Cui et al. [<xref ref-type="bibr" rid="ref-8">8</xref>] introduced a real-time caching method that integrates DRL with Q-learning techniques to address cooperative edge caching. Their approach improves energy efficiency, adapts caching policies in real time, and minimizes replacement and interruption costs. Similarly, Sun et al. [<xref ref-type="bibr" rid="ref-9">9</xref>] proposed a decentralized framework combining recommendation systems with edge caching to optimize resource utilization via direct and soft cache hits. Leveraging multi-agent Markov decision processes and the soft actor-critic (SAC) algorithm, their method enables edge servers to independently learn and implement optimal caching policies. However, these methods mainly focus on cache hit rates or coordination efficiency, without explicitly modeling video quality or multi-version delivery trade-offs. These methods, however, largely focus on cache hit rates or cooperation, without explicitly addressing multi-version video placement or quality-aware delivery. Wu et al. [<xref ref-type="bibr" rid="ref-10">10</xref>] proposed a DRL-based content update strategy that adapts in real time to dynamic content popularity. Zhong et al. [<xref ref-type="bibr" rid="ref-11">11</xref>] presented a multi-agent DRL framework to coordinate edge caching decisions in wireless networks, showing performance improvements without prior knowledge of popularity distributions. These methods, however, largely focus on cache hit rates or cooperation, without explicitly addressing multi-version video placement or quality-aware delivery. Kwon and Song [<xref ref-type="bibr" rid="ref-12">12</xref>] proposed a deep reinforcement learning-based approach to enhance the cache hit rate by adapting to dynamic file request patterns. Unlike their work, which considered single-version content in general file caching, our approach targets utility-driven caching in video-on-demand systems, supporting multi-version video placement and quality-aware delivery under dynamic popularity.</p>
<p>Research has also focused on video quality optimization under server capacity constraints. Lee et al. developed an algorithm for optimizing caching and transcoding tasks in multi-access edge computing (MEC) environments, adjusting bitrates based on video popularity and retention rates while considering server capacity limitations [<xref ref-type="bibr" rid="ref-13">13</xref>]. Tran et al. proposed a caching and processing framework for video streaming in MEC systems, addressing bitrate optimization through an online iterative greedy-based adaptive algorithm to tackle the NP-hard nature of the problem [<xref ref-type="bibr" rid="ref-14">14</xref>]. These methods rely on pre-defined rules and are less adaptable to dynamically changing multi-version workloads.</p>
<p>Several studies have explored trade-offs among cache hit ratios, content quality, and latency. Dao et al. [<xref ref-type="bibr" rid="ref-15">15</xref>] proposed a dynamic caching and quality level selection policy in adaptive bitrate streaming, modeled as a multidimensional knapsack problem and solved using a transfer learning-based genetic algorithm. Tran et al. [<xref ref-type="bibr" rid="ref-16">16</xref>] examined caching policies for mobile data traffic management, focusing on trade-offs among hit ratios, latency, and storage efficiency, and categorized algorithms into machine learning, deep learning, and game theory-based approaches.</p>
<p>Several recent studies have focused on similarity caching, where an edge cache stores and delivers similar content, such as lower bitrate versions, instead of the exact requested content. Araldo et al. [<xref ref-type="bibr" rid="ref-17">17</xref>] studied optimal content placement in caching systems by selecting appropriate bitrate versions to improve video delivery performance under storage constraints. Zhou et al. [<xref ref-type="bibr" rid="ref-18">18</xref>] proposed adaptive offline and online caching algorithms that jointly optimize content placement and delivery decisions by considering content similarity and user-perceived quality. Garetto et al. [<xref ref-type="bibr" rid="ref-19">19</xref>] investigated content placement strategies in networks of similarity caches, where content placement decisions are made by leveraging content similarity to improve caching efficiency. Wang et al. [<xref ref-type="bibr" rid="ref-20">20</xref>] developed an online similarity caching algorithm based on an adversarial bandit framework to adaptively handle dynamic user requests in cooperative edge networks.</p>
<p>Several studies have addressed content quality adaptation and multi-version caching, where multiple bitrate versions of the same content are considered to enhance user QoE and caching efficiency. Tran et al. [<xref ref-type="bibr" rid="ref-21">21</xref>] proposed a collaborative caching and processing framework in mobile-edge computing networks to support adaptive bitrate video streaming and improve user experience under storage and network constraints. Bayhan et al. [<xref ref-type="bibr" rid="ref-22">22</xref>] proposed EdgeDASH, a network-assisted adaptive video streaming framework that improves edge caching efficiency by dynamically aligning client requests with cached content through quality adaptation.</p>
<p>To the best of our knowledge, this study is the first to address edge cache management in scenarios with dynamically changing content popularity using DRL. By incorporating a utility function that integrates video quality and cache efficiency, our approach provides a robust and adaptable solution for optimizing edge caching under uncertain and variable content popularity conditions. A comparison with representative related works is summarized in <xref ref-type="table" rid="table-1">Table 1</xref>, highlighting the distinctions in objective, granularity, and learning methodology.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Comparison with related works</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Research direction</th>
<th align="center">Prior works</th>
<th align="center">Our contribution</th>
</tr>
</thead>
<tbody>
<tr>
<td>DRL-based edge caching</td>
<td>DRL methods focusing on hit rate or cooperative learning [<xref ref-type="bibr" rid="ref-8">8</xref>&#x2013;<xref ref-type="bibr" rid="ref-11">11</xref>]</td>
<td>PPO-based framework with utility-driven reward and dynamic workload adaptation</td>
</tr>
<tr>
<td>Hit rate-oriented caching</td>
<td>DRL for hit rate optimization using single-version content under dynamic requests [<xref ref-type="bibr" rid="ref-12">12</xref>]</td>
<td>Utility-driven DRL scheme with multi-version caching, version-aware reward, and fallback handling</td>
</tr>
<tr>
<td>Video-aware caching</td>
<td>Heuristic caching based on content similarity or static popularity [<xref ref-type="bibr" rid="ref-17">17</xref>&#x2013;<xref ref-type="bibr" rid="ref-20">20</xref>]</td>
<td>Utility-driven DRL scheme maximizing version-specific utility with bitrate selection and quality trade-offs</td>
</tr>
<tr>
<td>Bitrate- and latency-aware caching</td>
<td>Greedy or genetic algorithms for optimizing bitrate under latency and storage constraints [<xref ref-type="bibr" rid="ref-15">15</xref>,<xref ref-type="bibr" rid="ref-16">16</xref>]</td>
<td>Utility-focused DRL policy handling cache-size and quality trade-offs without handcrafted heuristics</td>
</tr>
<tr>
<td>Adaptive bitrate streaming in MEC</td>
<td>Client-driven adaptation with edge-assisted delivery [<xref ref-type="bibr" rid="ref-21">21</xref>,<xref ref-type="bibr" rid="ref-22">22</xref>]</td>
<td>Centralized utility optimization without client-side adaptation</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3">
<label>3</label>
<title>System Model</title>
<p><xref ref-type="fig" rid="fig-1">Fig. 1</xref> illustrates the system architecture for caching video content on edge cache. Edge caching reduces network bandwidth consumption by storing popular content closer to end users, thereby decreasing data transmission through core networks, lowering latency, and improving overall network efficiency [<xref ref-type="bibr" rid="ref-23">23</xref>]. From the perspective of the edge cache provider, maximizing the bandwidth utilized by the edge cache is directly linked to the profit generated.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>System architecture</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_66754-fig-1.tif"/>
</fig>
<p>In this context, utility is defined as the profit accrued by the edge service provider, which is proportional to the bandwidth provided by the edge server. The edge caching system operates on the principle that higher bandwidth utilization reflects enhanced service delivery capabilities, leading to increased revenue for the provider [<xref ref-type="bibr" rid="ref-24">24</xref>]. The overarching objective is to maximize the total utility generated for the service provider while satisfying the edge cache capacity constraint.</p>
<p>Assume that the VoD server depicted in <xref ref-type="fig" rid="fig-1">Fig. 1</xref> stores <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>video</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> videos, each transcoded into <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> distinct bitrate versions. Each version, denoted as <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, where <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mi>i</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>video</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula> and <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mi>j</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula>, is associated with a time-varying access probability <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>. This probability is measured at regular intervals <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mi>k</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>interval</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula>.</p>

<p>The edge server processes user requests based on cached files, with requests handled as follows:
<list list-type="bullet">
<list-item>
<p>Cache hit: If the requested video is cached, it is directly served from the edge server. This also includes cases where a lower bitrate version of the requested video is cached and served to minimize bandwidth usage [<xref ref-type="bibr" rid="ref-22">22</xref>].</p></list-item>
<list-item>
<p>Miss: If neither the requested video nor a lower bitrate version is cached, the request cannot be fulfilled.</p></list-item>
</list></p>
<p>Let <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> be a binary variable indicating whether version <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is cached in the edge server, and let <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mspace width="thinmathspace" /><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>j</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula> represent the set of caching decisions for video <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mi>i</mml:mi></mml:math></inline-formula>. The bandwidth <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msub><mml:mi>B</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> provided for version <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mi>j</mml:mi></mml:math></inline-formula> of video <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mi>i</mml:mi></mml:math></inline-formula> under the caching decision <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> is defined as:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:msub><mml:mi>B</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing=".2em" columnspacing="1em" displaystyle="false"><mml:mtr><mml:mtd><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mtext>&#xA0;if&#xA0;</mml:mtext></mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mtext>&#xA0;if&#xA0;</mml:mtext></mml:mrow><mml:mtext>&#xA0;</mml:mtext><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mrow><mml:mtext>&#xA0;and&#xA0;</mml:mtext></mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x2223;</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x003C;</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo fence="false" stretchy="false">}</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mtext>&#xA0;if&#xA0;</mml:mtext></mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mrow><mml:mtext>&#xA0;and&#xA0;</mml:mtext></mml:mrow><mml:mi mathvariant="normal">&#x2200;</mml:mi><mml:mi>l</mml:mi><mml:mo>&#x003C;</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thinmathspace" /><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.</mml:mn></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:math></disp-formula></p>
<p>Here, <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the bitrate of version <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, representing the bandwidth required for its transmission. This ensures that the requested version <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is transmitted if cached, or the highest available lower version is transmitted if not cached. If no lower-quality versions are available, <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mi>B</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:math></inline-formula>, indicating no bandwidth consumption.</p>
<p>We consider three utility models based on the bandwidth provided by the edge cache: proportional (linear growth), convex (accelerating growth), and concave (diminishing growth) relationships. These models enable adaptability to diverse utility requirements and reflect various profit structures [<xref ref-type="bibr" rid="ref-25">25</xref>]. Let <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> represent the utility derived from the caching decisions <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula>. The utility function is defined as:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing=".2em" columnspacing="1em" displaystyle="false"><mml:mtr><mml:mtd><mml:mi>&#x03B1;</mml:mi><mml:msub><mml:mi>B</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mtext>linear model</mml:mtext></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>&#x03B2;</mml:mi><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mtext>convex model</mml:mtext></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>&#x03B3;</mml:mi><mml:mi>ln</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:mrow><mml:mtext>concave model</mml:mtext></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mi>&#x03B1;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> are scaling factors, <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mi>r</mml:mi></mml:math></inline-formula> controls the growth rate in the convex model, and <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:msub><mml:mi>B</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> denotes the bandwidth provided for content <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mi>j</mml:mi></mml:math></inline-formula> at edge cache <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mi>i</mml:mi></mml:math></inline-formula> under the caching strategy <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula>. The concave utility model uses <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mi>ln</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> to ensure the function is well-defined for <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:msub><mml:mi>B</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:math></inline-formula>. This formulation provides a consistent and flexible representation of utility, aligning with the diverse requirements of caching systems while avoiding potential confusion with previously defined terms such as <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>.</p>
</sec>
<sec id="s4">
<label>4</label>
<title>Synthetic Dataset Generation</title>
<p>In this section, we describe the process of generating a synthetic dataset for evaluating VoD caching systems. The dataset models two key aspects: the total number of requests across time intervals and the dynamic popularity of videos and their versions.
<list list-type="bullet">
<list-item>
<p>The normalized number of requests <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>k</mml:mi><mml:mrow><mml:mrow><mml:mtext>req</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> in time interval <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:mi>k</mml:mi></mml:math></inline-formula> varies over time following a diurnal access pattern, reflecting realistic user activity [<xref ref-type="bibr" rid="ref-26">26</xref>]. This pattern is modeled using a normal distribution <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:mrow><mml:mi>&#x1D4A9;</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>&#x03BC;</mml:mi><mml:mi>k</mml:mi><mml:mrow><mml:mrow><mml:mtext>req</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x03C3;</mml:mi><mml:mi>k</mml:mi><mml:mrow><mml:mrow><mml:mtext>req</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, where <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:msubsup><mml:mi>&#x03BC;</mml:mi><mml:mi>k</mml:mi><mml:mrow><mml:mrow><mml:mtext>req</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:msubsup><mml:mi>&#x03C3;</mml:mi><mml:mi>k</mml:mi><mml:mrow><mml:mrow><mml:mtext>req</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> represent the mean and standard deviation of the normalized number of requests in time interval <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mi>k</mml:mi></mml:math></inline-formula>, respectively. This approach effectively captures realistic temporal variations in user demand throughout the day [<xref ref-type="bibr" rid="ref-26">26</xref>].</p></list-item>
<list-item>
<p>Video popularity: Each video <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:mi>i</mml:mi></mml:math></inline-formula> has a rank <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:msub><mml:mi>R</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> that changes dynamically at each interval <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:mi>k</mml:mi></mml:math></inline-formula>. Videos are divided into <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>group</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> groups based on their initial ranks <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:msub><mml:mi>R</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>. The ranks are updated using a Zipf-shuffle model, which preserves the Zipf distribution&#x2019;s characteristics while allowing random rank variability within each group [<xref ref-type="bibr" rid="ref-27">27</xref>]. The video popularity for interval <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:mi>k</mml:mi></mml:math></inline-formula> is generated based on the updated rank and a Zipf distribution with a varying skewness parameter.</p></list-item>
<list-item>
<p>Version popularity: Within each video, the popularity of its bitrate versions is modeled using a normal distribution, <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:mrow><mml:mi>&#x1D4A9;</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> where <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:msup><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:msup><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> represent the mean and standard deviation of version popularity [<xref ref-type="bibr" rid="ref-26">26</xref>]. The popularity <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> for version <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:mi>j</mml:mi></mml:math></inline-formula> of video <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:mi>i</mml:mi></mml:math></inline-formula> at interval <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:mi>k</mml:mi></mml:math></inline-formula> is then determined by multiplying the video popularity (from the Zipf-shuffle model) with the version popularity.</p></list-item>
</list></p>
<p>Using these models, a synthetic dataset is generated with <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>sample</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> samples per version. The dataset, <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:mrow><mml:mi>&#x1D49F;</mml:mi></mml:mrow></mml:math></inline-formula>, is represented as an <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>sample</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>video</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> matrix, where each entry captures the version-level popularity. For each video <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mi>i</mml:mi></mml:math></inline-formula>, the average number of concurrent requests, denoted as <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mrow><mml:mtext>avg</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>, is represented as a set of <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>avg</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>, where <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>avg</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> is defined as the product of the duration of segment <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and its average access probability over all samples in the dataset <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:mrow><mml:mi>&#x1D49F;</mml:mi></mml:mrow></mml:math></inline-formula>.</p>
<p>These modeling components are grounded in prior literature on caching workload simulation [<xref ref-type="bibr" rid="ref-26">26</xref>,<xref ref-type="bibr" rid="ref-27">27</xref>], enhancing the realism and reproducibility of the generated dataset. Furthermore, by varying model parameters such as request skewness and version-level diversity, researchers can customize the dataset to evaluate a broad range of caching algorithms under diverse operational scenarios.</p>
<p>It is worth noting that the dataset is constructed solely from content-level popularity metrics and request distributions, without involving any user identifiers, behavioral traces, or session-level logs. As a result, the training and evaluation processes remain fully compliant with privacy-preserving principles in edge computing.</p>
</sec>
<sec id="s5">
<label>5</label>
<title>Problem Formulation</title>
<p>Let us first define the normalized concurrent request count <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>con</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>, which represents the expected number of concurrent requests for version <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> during interval <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mi>k</mml:mi></mml:math></inline-formula>. Using Little&#x2019;s Law [<xref ref-type="bibr" rid="ref-28">28</xref>], <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>con</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> is given as:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>con</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mi>k</mml:mi><mml:mrow><mml:mrow><mml:mtext>req</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo></mml:math></disp-formula>where <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:msub><mml:mi>L</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> represents the average duration (in appropriate time units) for which video <inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:mi>i</mml:mi></mml:math></inline-formula> is streamed.</p>
<p>Next, the weighted utility <inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:msubsup><mml:mi>U</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>weight</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> for version <inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> during interval <inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:mi>k</mml:mi></mml:math></inline-formula> is defined by multiplying its utility <inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> by the normalized concurrent request count <inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>con</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>, as:
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:msubsup><mml:mi>U</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>weight</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>con</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo>.</mml:mo></mml:math></disp-formula></p>
<p>For each video <inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:mi>i</mml:mi></mml:math></inline-formula>, the request-driven utility is calculated as the sum of the weighted utilities across all intervals <inline-formula id="ieqn-71"><mml:math id="mml-ieqn-71"><mml:mi>k</mml:mi></mml:math></inline-formula> and versions <inline-formula id="ieqn-72"><mml:math id="mml-ieqn-72"><mml:mi>j</mml:mi></mml:math></inline-formula>:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mrow><mml:mtext>request</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>interval</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:munderover><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:munderover><mml:msubsup><mml:mi>U</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>weight</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>.</mml:mo></mml:math></disp-formula></p>
<p>The optimization problem can be formulated to maximize the expected popularity-weighted utility, considering the stochastic nature of video popularity and the number of requests per interval. The storage size of version <inline-formula id="ieqn-73"><mml:math id="mml-ieqn-73"><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is denoted by <inline-formula id="ieqn-74"><mml:math id="mml-ieqn-74"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and the total storage capacity is <inline-formula id="ieqn-75"><mml:math id="mml-ieqn-75"><mml:msup><mml:mi>S</mml:mi><mml:mrow><mml:mrow><mml:mtext>limit</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula>. The objective is to maximize the expected popularity-weighted utility under the storage capacity constraint:
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:mtext>Maximize</mml:mtext></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>video</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:munderover><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mrow><mml:mtext>request</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>]</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mtext>subject to</mml:mtext></mml:mrow></mml:mtd><mml:mtd><mml:mi></mml:mi><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>video</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:munderover><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:munderover><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2264;</mml:mo><mml:msup><mml:mi>S</mml:mi><mml:mrow><mml:mrow><mml:mtext>limit</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<list list-type="simple">
<list-item><label>1.</label><p>Objective Function: The objective function computes the expected total request-driven utility over all intervals <inline-formula id="ieqn-76"><mml:math id="mml-ieqn-76"><mml:mi>k</mml:mi></mml:math></inline-formula>.</p></list-item>
<list-item><label>2.</label><p>Constraints:
<list list-type="simple">
<list-item><label>&#x2022;</label>
<p>The total storage cost of all cached versions must not exceed the available storage <inline-formula id="ieqn-77"><mml:math id="mml-ieqn-77"><mml:msup><mml:mi>S</mml:mi><mml:mrow><mml:mrow><mml:mtext>limit</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula>.</p></list-item>
</list></p></list-item>
<list-item><label>3.</label><p>Stochastic Parameters:
<list list-type="simple">
<list-item><label>&#x2022;</label>
<p>Both <inline-formula id="ieqn-78"><mml:math id="mml-ieqn-78"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>k</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> and <inline-formula id="ieqn-79"><mml:math id="mml-ieqn-79"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>k</mml:mi><mml:mrow><mml:mrow><mml:mtext>req</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> are stochastic and modeled using datasets that represent variations in popularity and request volumes across intervals.</p></list-item>
</list></p></list-item>
</list></p>
<p>This formulation optimizes caching strategies to maximize the expected popularity-weighted utility while adhering to storage capacity constraints. DRL is the most suitable approach for addressing this problem, as it can learn optimal policies that adapt in real-time to highly dynamic environments, effectively handle stochastic elements, and balance both short-term performance and long-term efficiency.</p>
</sec>
<sec id="s6">
<label>6</label>
<title>DRL-Based Caching Algorithm</title>
<p>We propose a DRL-based cache determination algorithm (CDA) that uses proximal policy optimization (PPO) due to its ease of implementation, sample efficiency, and reliable convergence [<xref ref-type="bibr" rid="ref-29">29</xref>,<xref ref-type="bibr" rid="ref-30">30</xref>]. PPO stabilizes training through clipped probability ratios, making it particularly suitable for dynamic environments with fluctuating content popularity [<xref ref-type="bibr" rid="ref-29">29</xref>,<xref ref-type="bibr" rid="ref-30">30</xref>].</p>
<p>The algorithm operates in two phases: training and decision. In the training phase, the agent learns its caching policy over <inline-formula id="ieqn-80"><mml:math id="mml-ieqn-80"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>episode</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> episodes by interacting with a simulated environment. In the decision phase, the agent applies the trained policy to determine caching actions without further learning.</p>
<p>We model the caching process as a sequential decision-making problem, where the agent selects which bitrate versions of a video to cache, processing one video at each decision step. Therefore, the time step in the DRL formulation corresponds directly to the video index, denoted as <inline-formula id="ieqn-81"><mml:math id="mml-ieqn-81"><mml:mi>i</mml:mi></mml:math></inline-formula>. That is, the agent processes videos in order, and each decision step is associated with selecting bitrate versions for video <inline-formula id="ieqn-82"><mml:math id="mml-ieqn-82"><mml:mi>i</mml:mi></mml:math></inline-formula>. This sequential structure naturally follows the framework of a markov decision process (MDP), where each decision influences the subsequent state transitions.</p>
<p>At each step <inline-formula id="ieqn-83"><mml:math id="mml-ieqn-83"><mml:mi>i</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>video</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula>, the following operations occur:
<list list-type="simple">
<list-item><label>1.</label><p>The agent observes the current state of the environment, which includes features such as storage usage and popularity estimates for video <inline-formula id="ieqn-84"><mml:math id="mml-ieqn-84"><mml:mi>i</mml:mi></mml:math></inline-formula>.</p></list-item>
<list-item><label>2.</label><p>The agent selects an action <inline-formula id="ieqn-85"><mml:math id="mml-ieqn-85"><mml:msub><mml:mi>a</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>, indicating the set of bitrate versions to cache for video <inline-formula id="ieqn-86"><mml:math id="mml-ieqn-86"><mml:mi>i</mml:mi></mml:math></inline-formula>, subject to remaining storage constraints.</p></list-item>
<list-item><label>3.</label><p>The environment updates its state and calculates a reward based on the utility gained and whether minimum caching requirements are met.</p></list-item>
</list></p>
<p>The DRL algorithm is built upon three main components:
<list list-type="simple">
<list-item><label>1.</label><p>Action space: The action space <italic>A</italic> is defined as the set of all possible versions an agent can select: <inline-formula id="ieqn-87"><mml:math id="mml-ieqn-87"><mml:mi>A</mml:mi><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mo>,</mml:mo></mml:math></inline-formula> where <inline-formula id="ieqn-88"><mml:math id="mml-ieqn-88"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> is the total number of versions.</p></list-item>
<list-item><label>2.</label><p>Observation space: The observation space represents the current state of the environment. It includes relevant metrics such as video popularity, cache utilization, and system constraints that influence caching decisions.</p></list-item>
<list-item><label>3.</label><p>Reward model: The reward model assigns a scalar value to each action, reflecting its immediate benefit or cost. The agent&#x2019;s objective is to maximize cumulative rewards, which represent the overall performance of the caching policy.</p></list-item>
</list></p>
<p>The valid action space <inline-formula id="ieqn-89"><mml:math id="mml-ieqn-89"><mml:msubsup><mml:mi>A</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mrow><mml:mtext>valid</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> is dynamically updated based on the remaining storage capacity. Versions are evaluated sequentially, and those that would exceed the capacity are excluded.</p>
<p>The observation state at each video index <inline-formula id="ieqn-90"><mml:math id="mml-ieqn-90"><mml:mi>i</mml:mi></mml:math></inline-formula> is:
<disp-formula id="ueqn-7"><mml:math id="mml-ueqn-7" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:msubsup><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mrow><mml:mtext>obj</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mrow><mml:mtext>str</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>avg</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mtext>avg</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>where <inline-formula id="ieqn-91"><mml:math id="mml-ieqn-91"><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mrow><mml:mtext>str</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> denotes cumulative storage used until step <inline-formula id="ieqn-92"><mml:math id="mml-ieqn-92"><mml:mi>i</mml:mi></mml:math></inline-formula>, and <inline-formula id="ieqn-93"><mml:math id="mml-ieqn-93"><mml:msubsup><mml:mi>N</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mrow><mml:mtext>avg</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> is the average concurrent requests for each version of video <inline-formula id="ieqn-94"><mml:math id="mml-ieqn-94"><mml:mi>i</mml:mi></mml:math></inline-formula>.</p>
<p>The reward function is designed to promote caching decisions that maximize utility while ensuring that at least one version is cached for every video. At each decision step <inline-formula id="ieqn-95"><mml:math id="mml-ieqn-95"><mml:mi>i</mml:mi></mml:math></inline-formula>, the agent receives an immediate reward <inline-formula id="ieqn-96"><mml:math id="mml-ieqn-96"><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>, computed as the ratio of the achieved request-weighted utility <inline-formula id="ieqn-97"><mml:math id="mml-ieqn-97"><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mrow><mml:mtext>request</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> to a reference utility <inline-formula id="ieqn-98"><mml:math id="mml-ieqn-98"><mml:msup><mml:mi>U</mml:mi><mml:mrow><mml:mrow><mml:mtext>greedy</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula>, obtained from a baseline greedy algorithm. This ratio is scaled by a factor <inline-formula id="ieqn-99"><mml:math id="mml-ieqn-99"><mml:mi>&#x03BB;</mml:mi></mml:math></inline-formula> to normalize the reward magnitude:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x03BB;</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mrow><mml:mtext>request</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msup><mml:mi>U</mml:mi><mml:mrow><mml:mrow><mml:mtext>greedy</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:mfrac><mml:mo>.</mml:mo></mml:math></disp-formula></p>
<p>To penalize scenarios in which no version of a video is cached&#x2014;thus resulting in zero utility&#x2014;a penalty term is applied. For each video <inline-formula id="ieqn-100"><mml:math id="mml-ieqn-100"><mml:mi>i</mml:mi></mml:math></inline-formula>, if the selected action <inline-formula id="ieqn-101"><mml:math id="mml-ieqn-101"><mml:msub><mml:mi>a</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> results in no cached versions (i.e., an empty set), an indicator function adds a penalty of 1. The total penalty across all videos is expressed as:
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:msup><mml:mi>r</mml:mi><mml:mrow><mml:mrow><mml:mtext>penalty</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>video</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:munderover><mml:mrow><mml:mi mathvariant="double-struck">I</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2223;</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2229;</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="normal">&#x2205;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:math></disp-formula></p>
<p>The total reward for an episode is then computed by summing the individual rewards <inline-formula id="ieqn-102"><mml:math id="mml-ieqn-102"><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> across all videos and subtracting the cumulative penalty. This final reward is used as the optimization objective for PPO training:
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:msup><mml:mi>r</mml:mi><mml:mrow><mml:mrow><mml:mtext>episode</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msup><mml:mi>r</mml:mi><mml:mrow><mml:mrow><mml:mtext>penalty</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo>.</mml:mo></mml:math></disp-formula></p>
<p>This reward structure encourages the agent to not only prioritize caching versions that yield higher utility but also to ensure that at least one version is cached per video. As a result, the agent learns a balanced policy that improves overall utility.</p>
<p>To visually clarify the agent&#x2019;s behavior during training, <xref ref-type="fig" rid="fig-2">Fig. 2</xref> illustrates the decision-making process at each step <inline-formula id="ieqn-103"><mml:math id="mml-ieqn-103"><mml:mi>i</mml:mi></mml:math></inline-formula>. At this step, the agent observes the system state&#x2014;including video popularity, cache utilization, and video index&#x2014;then selects a multi-version caching action based on the current policy. The environment returns a reward that guides the learning process.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Agent&#x2019;s decision process at step <inline-formula id="ieqn-104"><mml:math id="mml-ieqn-104"><mml:mi>i</mml:mi></mml:math></inline-formula></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_66754-fig-2.tif"/>
</fig>
<p>Algorithm 1 describes the training phase of the proposed CDA algorithm. In this phase, the agent learns caching strategies by interacting with a simulated environment over multiple episodes. At the beginning of each episode, a sample scenario is selected from the dataset <inline-formula id="ieqn-105"><mml:math id="mml-ieqn-105"><mml:mrow><mml:mi>&#x1D49F;</mml:mi></mml:mrow></mml:math></inline-formula>, which defines the access probabilities and request patterns for the video versions. Based on this sample, the concurrent request counts <inline-formula id="ieqn-106"><mml:math id="mml-ieqn-106"><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>con</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> are computed using Little&#x2019;s Law.</p>
<fig id="fig-11">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_66754-fig-11.tif"/>
</fig>
<p>The agent processes the videos sequentially. For each video <inline-formula id="ieqn-125"><mml:math id="mml-ieqn-125"><mml:mi>i</mml:mi></mml:math></inline-formula>, it first identifies the valid action set <inline-formula id="ieqn-126"><mml:math id="mml-ieqn-126"><mml:msubsup><mml:mi>A</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mrow><mml:mtext>valid</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>, considering the remaining cache capacity. The agent then selects an action <inline-formula id="ieqn-127"><mml:math id="mml-ieqn-127"><mml:msub><mml:mi>a</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>, indicating which bitrate versions of the video to store. The caching decision <inline-formula id="ieqn-128"><mml:math id="mml-ieqn-128"><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> is updated accordingly, and the reward <inline-formula id="ieqn-129"><mml:math id="mml-ieqn-129"><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> is computed using <xref ref-type="disp-formula" rid="eqn-7">Eq. (7)</xref>. This process continues until either all videos are processed or the storage constraint is violated, at which point the cumulative reward for the episode <inline-formula id="ieqn-130"><mml:math id="mml-ieqn-130"><mml:msup><mml:mi>r</mml:mi><mml:mrow><mml:mrow><mml:mtext>episode</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> is computed using <xref ref-type="disp-formula" rid="eqn-9">Eq. (9)</xref>. The PPO policy model <inline-formula id="ieqn-131"><mml:math id="mml-ieqn-131"><mml:mrow><mml:mi>&#x1D4AB;</mml:mi></mml:mrow></mml:math></inline-formula> is then updated based on the collected experience to improve future decisions.</p>
<p>Algorithm 2 presents the decision phase of the CDA algorithm, during which the trained PPO model <inline-formula id="ieqn-132"><mml:math id="mml-ieqn-132"><mml:mrow><mml:mi>&#x1D4AB;</mml:mi></mml:mrow></mml:math></inline-formula> is used to determine the actual caching decisions without further learning. This phase is applied in a testing environment, where the agent makes one-shot decisions based on the learned policy.</p>
<p>Given the trained model and a test dataset, the concurrent request values <inline-formula id="ieqn-133"><mml:math id="mml-ieqn-133"><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>con</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> are first computed to reflect the workload scenario. For each video <inline-formula id="ieqn-134"><mml:math id="mml-ieqn-134"><mml:mi>i</mml:mi></mml:math></inline-formula>, the agent determines the valid action set <inline-formula id="ieqn-135"><mml:math id="mml-ieqn-135"><mml:msubsup><mml:mi>A</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mrow><mml:mtext>valid</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> based on current storage availability and uses the trained policy to select an action <inline-formula id="ieqn-136"><mml:math id="mml-ieqn-136"><mml:msub><mml:mi>a</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>. The caching decision vector <inline-formula id="ieqn-137"><mml:math id="mml-ieqn-137"><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> is updated accordingly. This process continues for all videos, or until no further actions are valid due to storage constraints. The resulting caching configuration <inline-formula id="ieqn-138"><mml:math id="mml-ieqn-138"><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:mover><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>video</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula> defines the final output of the CDA algorithm.</p>
<fig id="fig-12">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_66754-fig-12.tif"/>
</fig>
<p>To optimize the performance of the PPO-based caching agent, we employed Optuna [<xref ref-type="bibr" rid="ref-31">31</xref>], a state-of-the-art hyperparameter optimization framework that uses a Bayesian optimization strategy. The objective was to maximize the average episode reward during training by identifying the most effective hyperparameter configurations for the PPO algorithm. The search space was defined over three key hyperparameters: the learning rate, the clipping range, and the value function coefficient.</p>
<p>Specifically, as shown in <xref ref-type="table" rid="table-2">Table 2</xref>, the final PPO hyperparameter settings were as follows. The learning rate was set to 0.0003 to balance convergence speed and policy stability. Each policy update used 2048 steps to collect sufficient experience, and the batch size was set to 64. We trained the model for 10 epochs per update. The discount factor was set to 0.99 to account for long-term rewards, while the clip range was fixed at 0.2 to stabilize the policy updates. Lastly, the value function coefficient was set to 0.5, balancing the optimization of the policy and the accuracy of the value function estimation.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>PPO hyperparameters and values</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Hyperparameter</th>
<th>Learning rate</th>
<th>Steps/Update</th>
<th>Batch size</th>
<th>Epochs</th>
<th>Discount <inline-formula id="ieqn-151"><mml:math id="mml-ieqn-151"><mml:mi mathvariant="bold-italic">&#x03B3;</mml:mi></mml:math></inline-formula></th>
<th>Clip range</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>Value</bold></td>
<td>0.0003</td>
<td>2048</td>
<td>64</td>
<td>10</td>
<td>0.99</td>
<td>0.2</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Optuna conducted a total of 100 trials using the Tree-structured Parzen Estimator (TPE) as the sampling algorithm. Each trial trained the PPO agent for a fixed number of episodes and reported the mean episode reward as the objective metric. The best-performing configuration was selected based on its ability to maximize long-term utility across diverse caching scenarios.</p>
</sec>
<sec id="s7">
<label>7</label>
<title>Experimental Results</title>
<p>We conducted simulations to evaluate the proposed scheme, with the DRL algorithm implemented using the PyTorch framework. The experimental parameters were configured with <inline-formula id="ieqn-152"><mml:math id="mml-ieqn-152"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>video</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>500</mml:mn></mml:math></inline-formula> and <inline-formula id="ieqn-153"><mml:math id="mml-ieqn-153"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>7</mml:mn></mml:math></inline-formula>, corresponding to seven recommended bitrates (30, 24, 15, 12, 10, 6, 4 Mbps) as suggested by YouTube [<xref ref-type="bibr" rid="ref-26">26</xref>]. The video lengths were randomly assigned within a range of 30 to 90 min, and <inline-formula id="ieqn-154"><mml:math id="mml-ieqn-154"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>interval</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> was set to 24. The value of <inline-formula id="ieqn-155"><mml:math id="mml-ieqn-155"><mml:msubsup><mml:mi>&#x03BC;</mml:mi><mml:mi>k</mml:mi><mml:mrow><mml:mrow><mml:mtext>req</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> was set between <inline-formula id="ieqn-156"><mml:math id="mml-ieqn-156"><mml:mn>0.3</mml:mn></mml:math></inline-formula> and <inline-formula id="ieqn-157"><mml:math id="mml-ieqn-157"><mml:mn>0.8</mml:mn></mml:math></inline-formula>, while <inline-formula id="ieqn-158"><mml:math id="mml-ieqn-158"><mml:msubsup><mml:mi>&#x03C3;</mml:mi><mml:mi>k</mml:mi><mml:mrow><mml:mrow><mml:mtext>req</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> was fixed at <inline-formula id="ieqn-159"><mml:math id="mml-ieqn-159"><mml:mn>0.05</mml:mn></mml:math></inline-formula>.</p>
<p>The utility model parameters were set as follows: <inline-formula id="ieqn-160"><mml:math id="mml-ieqn-160"><mml:mi>&#x03B1;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.33</mml:mn></mml:math></inline-formula>, <inline-formula id="ieqn-161"><mml:math id="mml-ieqn-161"><mml:mi>&#x03B2;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.19</mml:mn></mml:math></inline-formula>, <inline-formula id="ieqn-162"><mml:math id="mml-ieqn-162"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>3.28</mml:mn></mml:math></inline-formula>, and <inline-formula id="ieqn-163"><mml:math id="mml-ieqn-163"><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mn>0.1</mml:mn></mml:math></inline-formula>. The experiments varied five parameters: the cache-to-total video size ratio, version popularity, Zipf skewness, rank-change groups, and the utility model. The default values were set as follows: cache ratio <inline-formula id="ieqn-164"><mml:math id="mml-ieqn-164"><mml:mn>5</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-165"><mml:math id="mml-ieqn-165"><mml:msup><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>4</mml:mn></mml:math></inline-formula>, <inline-formula id="ieqn-166"><mml:math id="mml-ieqn-166"><mml:msup><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>1.5</mml:mn></mml:math></inline-formula>, Zipf skewness 0.3&#x2013;0.7, <inline-formula id="ieqn-167"><mml:math id="mml-ieqn-167"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>group</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>5</mml:mn></mml:math></inline-formula>, and a linear utility model.</p>
<p>To evaluate the performance of CDA, we adopted the total utility&#x2014;normalized to the utility achieved by CDA&#x2014;as the primary performance metric. We compared CDA against six baseline caching strategies, including four greedy heuristics and two predictive models based on long short-term memory (LSTM) networks. In all cases, video popularity is quantified by the number of concurrent requests:
<list list-type="simple">
<list-item><label>1.</label><p>Initial Popularity-Capacity Ratio (IPCR): Videos are cached based on their initial popularity (i.e., number of concurrent requests) divided by their size, prioritizing those with the highest ratio.</p></list-item>
<list-item><label>2.</label><p>Average Popularity-Capacity Ratio (APCR): Similar to IPCR, but uses the average popularity across all dataset samples instead of the initial value.</p></list-item>
<list-item><label>3.</label><p>Initial Popularity Greedy (IPG): Videos are cached in descending order of their initial popularity.</p></list-item>
<list-item><label>4.</label><p>Average Popularity Greedy (APG): Videos are cached in descending order of their average popularity across all samples.</p></list-item>
<list-item><label>5.</label><p>LSTM-based Popularity Predictor (LSTM-P): An LSTM model is trained on the synthetic dataset <inline-formula id="ieqn-168"><mml:math id="mml-ieqn-168"><mml:mrow><mml:mi>&#x1D49F;</mml:mi></mml:mrow></mml:math></inline-formula> to forecast the number of concurrent requests for each video. Videos are cached in descending order of predicted popularity.</p></list-item>
<list-item><label>6.</label><p>LSTM-based Popularity-Capacity Ratio Predictor (LSTM-PCR): Another LSTM model is used to estimate the popularity-to-capacity ratio. Caching is performed in descending order of this predicted ratio.</p></list-item>
</list></p>
<p><xref ref-type="fig" rid="fig-3">Fig. 3</xref> illustrates the impact of the cache-to-total video size ratio on overall utility. CDA consistently outperforms the benchmark algorithms, achieving utility improvements ranging from <inline-formula id="ieqn-169"><mml:math id="mml-ieqn-169"><mml:mn>3.32</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula> to <inline-formula id="ieqn-170"><mml:math id="mml-ieqn-170"><mml:mn>26.73</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula> (with an average improvement of <inline-formula id="ieqn-171"><mml:math id="mml-ieqn-171"><mml:mn>11.73</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula>). These results highlight CDA&#x2019;s ability to effectively learn from training datasets and optimize utility across varying cache capacity constraints. The performance gap between CDA and the benchmark algorithms is more pronounced under lower cache capacity conditions and gradually narrows as capacity increases. These findings validate CDA&#x2019;s capability to optimize resource allocation and maximize total utility, even under stringent cache capacity limitations.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Normalized utility relative to CDA for different cache-to-total video size ratio values</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_66754-fig-3.tif"/>
</fig>
<p><xref ref-type="fig" rid="fig-4">Fig. 4</xref> illustrates the impact of varying <inline-formula id="ieqn-172"><mml:math id="mml-ieqn-172"><mml:msup><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula>, the mean value of the version popularity distribution, where higher values indicate increased popularity for high-bitrate versions. CDA consistently outperforms the benchmark algorithms, with utility improvements ranging from <inline-formula id="ieqn-173"><mml:math id="mml-ieqn-173"><mml:mn>3.32</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula> to <inline-formula id="ieqn-174"><mml:math id="mml-ieqn-174"><mml:mn>135.87</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula> (average <inline-formula id="ieqn-175"><mml:math id="mml-ieqn-175"><mml:mn>28.47</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula>). The performance gap is more pronounced when low-bitrate versions are more popular (lower <inline-formula id="ieqn-176"><mml:math id="mml-ieqn-176"><mml:msup><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula>) or when <inline-formula id="ieqn-177"><mml:math id="mml-ieqn-177"><mml:msup><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> is randomized, indicating unpredictable version popularity. These results demonstrate CDA&#x2019;s ability to adapt effectively under conditions where version request patterns favor low-bitrate versions or exhibit high variability.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Normalized utility compared to CDA for varying <inline-formula id="ieqn-178"><mml:math id="mml-ieqn-178"><mml:msup><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mrow><mml:mtext>ver</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> values</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_66754-fig-4.tif"/>
</fig>
<p><xref ref-type="fig" rid="fig-5">Fig. 5</xref> illustrates the impact of popularity bias on performance, where the Zipf skewness parameter indicates the degree of bias in the popularity distribution. Higher values reflect stronger bias, while lower values represent a more uniform distribution. CDA consistently outperforms the benchmark algorithms, achieving utility improvements ranging from <inline-formula id="ieqn-179"><mml:math id="mml-ieqn-179"><mml:mn>3.06</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula> to <inline-formula id="ieqn-180"><mml:math id="mml-ieqn-180"><mml:mn>50.71</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula> (average <inline-formula id="ieqn-181"><mml:math id="mml-ieqn-181"><mml:mn>24.23</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula>). The performance gap is most pronounced at moderate skewness levels (e.g., around <inline-formula id="ieqn-182"><mml:math id="mml-ieqn-182"><mml:mn>0.5</mml:mn></mml:math></inline-formula>) and diminishes when the bias becomes either excessively strong or weak.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Normalized utility relative to CDA for varying Zipf parameters</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_66754-fig-5.tif"/>
</fig>
<p><xref ref-type="fig" rid="fig-6">Fig. 6</xref> shows the impact of varying <inline-formula id="ieqn-183"><mml:math id="mml-ieqn-183"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>group</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula>. As <inline-formula id="ieqn-184"><mml:math id="mml-ieqn-184"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>group</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> decreases, the number of videos per group increases, leading to more dynamic ranking changes. CDA consistently outperforms the benchmark algorithms, with utility improvements ranging from <inline-formula id="ieqn-185"><mml:math id="mml-ieqn-185"><mml:mn>3.32</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula> to <inline-formula id="ieqn-186"><mml:math id="mml-ieqn-186"><mml:mn>69.49</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula> (average <inline-formula id="ieqn-187"><mml:math id="mml-ieqn-187"><mml:mn>23.26</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula>). The performance gap is most significant when <inline-formula id="ieqn-188"><mml:math id="mml-ieqn-188"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>group</mml:mtext></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>4</mml:mn></mml:math></inline-formula>, demonstrating CDA&#x2019;s efficiency in handling dynamic environments with abrupt ranking changes.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Normalized utility compared to CDA for varying <inline-formula id="ieqn-189"><mml:math id="mml-ieqn-189"><mml:msup><mml:mi>N</mml:mi><mml:mrow><mml:mrow><mml:mtext>group</mml:mtext></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> values</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_66754-fig-6.tif"/>
</fig>
<p><xref ref-type="fig" rid="fig-7">Fig. 7</xref> illustrates the impact of utility types on performance, categorized as linear, convex, or concave. CDA consistently outperforms the benchmark algorithms, with utility improvements ranging from <inline-formula id="ieqn-190"><mml:math id="mml-ieqn-190"><mml:mn>3.32</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula> to <inline-formula id="ieqn-191"><mml:math id="mml-ieqn-191"><mml:mn>47.34</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula> (average <inline-formula id="ieqn-192"><mml:math id="mml-ieqn-192"><mml:mn>23.84</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula>). The performance gap is most pronounced for concave utility types, where the utility growth rate slows down as resources increase. This pronounced gap in concave utility types may occur because greedy algorithms tend to allocate cache storage inefficiently in scenarios with diminishing returns. In contrast, CDA, leveraging its learning-based approach, adapts more effectively by optimizing caching decisions to maximize utility. Notably, even when compared with LSTM-based predictors (LSTM-P and LSTM-PCR), which aim to forecast content popularity, CDA achieves consistently superior performance across all utility types. This result demonstrates that CDA not only outperforms static heuristics but also surpasses predictive approaches by directly learning caching policies under varying utility dynamics.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Normalized utility compared to CDA for different utility models</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_66754-fig-7.tif"/>
</fig>
<p><xref ref-type="fig" rid="fig-8">Fig. 8</xref> illustrates the impact of the total number of videos on caching performance. CDA consistently outperforms the benchmark algorithms across all tested video counts, achieving utility improvements ranging from 1.02% to 55.87% (average 23.81%). These results demonstrate that CDA effectively prioritizes high-utility versions even as the content scale increases, highlighting the strong adaptability of the DRL model.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>Normalized utility compared to CDA for different number of videos</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_66754-fig-8.tif"/>
</fig>
<p>To assess CDA under realistic conditions, we leveraged video popularity information from a VoD access trace [<xref ref-type="bibr" rid="ref-32">32</xref>], focusing on the top 500 most requested videos. <xref ref-type="fig" rid="fig-9">Fig. 9</xref> shows the normalized utility of each method relative to CDA. CDA consistently outperformed all baselines, achieving utility gains ranging from 0.05% to 41.9%, with an average improvement of 14.2%. These results demonstrate the practicality and robustness of our DRL-based caching strategy under realistic content dynamics.</p>
<fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Normalized utility relative to CDA using real-world VoD dataset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_66754-fig-9.tif"/>
</fig>
<p>CDA requires an average of 26 min to execute 1 million timesteps during the training phase. In the decision phase, the inference time depends on the total number of videos, as each video corresponds to a single decision step. As shown in <xref ref-type="table" rid="table-3">Table 3</xref>, the overall decision time increases with the number of videos. These results were obtained in an environment with an 8-core CPU running at a clock speed of 4.05 GHz.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Decision time by number of videos</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Number of videos</th>
<th>300</th>
<th>500</th>
<th>700</th>
<th>900</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>Decision time (s)</bold></td>
<td>0.1</td>
<td>0.3</td>
<td>0.37</td>
<td>0.49</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To further validate the learning stability of the proposed DRL model, we plotted the average episode reward across training. As shown in <xref ref-type="fig" rid="fig-10">Fig. 10</xref>, the learning curve stabilizes after sufficient training episodes, confirming convergence of the PPO-based caching agent.</p>
<fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>Learning curve of the PPO-based caching agent during training</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_66754-fig-10.tif"/>
</fig>
</sec>
<sec id="s8">
<label>8</label>
<title>Discussion</title>
<p>We highlight the following major findings:
<list list-type="bullet">
<list-item>
<p>The proposed CDA algorithm consistently outperforms both conventional greedy-based algorithms and LSTM-based prediction-driven strategies across various scenarios. It achieves significant utility improvements under limited cache capacity, high popularity of low-bitrate versions, skewed content popularity, and dynamic rank variations. In addition, CDA maintains robust performance across different utility models, demonstrating its adaptability to diverse system environments.</p></list-item>
<list-item>
<p>The relative performance gain of CDA becomes more pronounced as the uncertainty of the environment increases, such as in scenarios with highly dynamic version popularity or rapid changes in content rankings. This indicates that the proposed DRL-based approach effectively learns adaptive caching policies rather than relying on static strategies, thereby ensuring robust performance even in dynamic caching environments.</p></list-item>
<list-item>
<p>We analyzed the proposed approach in terms of optimization. The training process is stable, as the reward converges reliably, and the learned policy consistently outperforms baselines across varying utility models, video set sizes, Zipf parameters, and cache capacities. The trained policy also supports real-time decision-making due to its minimal inference time.</p></list-item>
<list-item>
<p>The proposed DRL-based caching framework is designed to be adaptable to various edge environments, as it learns directly from request patterns and utility definitions. By adjusting these inputs, the framework can generalize to different content types where popularity varies over time, such as live video streams, trending news articles, short-form video clips, or social media posts.</p></list-item>
<list-item>
<p>Although DRL methods are often computationally intensive, our approach is lightweight at inference, making caching decisions in just 0.32 s on average. It requires minimal CPU and memory, as it uses pre-trained parameters without online learning or per-user processing. These features enable practical deployment for real-time or periodic cache updates on resource-constrained edge devices.</p></list-item>
<list-item>
<p>Our framework ensures privacy by relying solely on content-level popularity and aggregate request patterns, without using user identifiers or session-level logs. This design inherently supports privacy-preserving learning and aligns with data protection requirements for deployment in edge environments.</p></list-item>
<list-item>
<p>While collaborative filtering has proven effective for personalized recommendation and, in some cases, caching, such methods depend on identifiable user behavior. Our approach, by contrast, operates solely on aggregate access patterns, enabling robust caching decisions without requiring content semantics.</p></list-item>
</list></p>
</sec>
<sec id="s9">
<label>9</label>
<title>Conclusion</title>
<p>This paper presented a deep reinforcement learning-based caching algorithm designed to optimize content placement in edge caching systems under dynamic video popularity and resource constraints. Unlike traditional greedy or static strategies, our proposed cache decision algorithm (CDA) leverages the proximal policy optimization (PPO) framework to learn adaptive caching policies through experience-based exploration and feedback. We introduced a novel reward structure that integrates both cache utility and video quality degradation, enabling the agent to make balanced and informed decisions that align with provider-centric utility goals.</p>
<p>To rigorously evaluate the algorithm, we developed a comprehensive synthetic dataset generator that models real-world access behaviors including diurnal variations, Zipf-based popularity dynamics, and probabilistic bitrate version preferences. In addition, we conducted an experiment using the top 500 most popular videos from a real-world VoD trace to validate the algorithm under realistic access patterns.The experimental results confirm that CDA significantly outperforms both greedy baselines and LSTM-based prediction-driven caching strategies across a wide range of scenarios. Notably, CDA exhibits strong performance under severe cache constraints, unpredictable version preferences, and when the utility model reflects diminishing returns, such as in concave cases. The utility gains are most pronounced in highly dynamic environments where traditional and prediction-based methods typically struggle.</p>
<p>In addition to utility performance, we validated the computational efficiency of CDA, showing that the model can be trained within practical timeframes and deployed with minimal inference overhead. This makes our approach suitable for real-time edge deployment on low-power platforms. Future work will extend this model to multi-agent cooperative caching scenarios across federated edge nodes and explore transfer learning techniques to reduce training overhead under non-stationary popularity shifts. In addition, collaborative filtering techniques may be incorporated in environments where user-level data is available to enhance personalized caching.</p>
</sec>
</body>
<back>
<ack>
<p>We would like to thank the anonymous reviewers for their insightful comments.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>This work was supported by Inha University Research Grant.</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>Concept and design: Mingoo Kwon, Minseok Song; data collection: Mingoo Kwon; analysis of results: Kyeongmin Kim, Minseok Song; manuscript writing: Mingoo Kwon, Kyeongmin Kim, Minseok Song. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>Not applicable.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sun</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wen</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wei</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>X</given-names></string-name></person-group>. <article-title>A DQN-based cache strategy for mobile edge networks</article-title>. <source>Comput Mater Contin</source>. <year>2021</year>;<volume>71</volume>:<fpage>3277</fpage>&#x2013;<lpage>91</lpage>. doi:<pub-id pub-id-type="doi">10.32604/cmc.2022.020471</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>F</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Xing</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>ECC: edge collaborative caching strategy for differentiated services load-balancing</article-title>. <source>Comput Mater Contin</source>. <year>2021</year>;<volume>69</volume>(<issue>2</issue>):<fpage>2045</fpage>&#x2013;<lpage>59</lpage>. doi:<pub-id pub-id-type="doi">10.32604/cmc.2021.018303</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mehrabi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Siekkinen</surname> <given-names>M</given-names></string-name>, <string-name><surname>Yla-Jaaski</surname> <given-names>A</given-names></string-name></person-group>. <article-title>QoE-traffic optimization through collaborative edge caching in adaptive mobile video streaming</article-title>. <source>IEEE Access</source>. <year>2018</year>;<volume>6</volume>:<fpage>52261</fpage>&#x2013;<lpage>76</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2018.2870855</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><collab>Cisco</collab></person-group>. <article-title>Cisco Annual Internet Report (2018&#x2013;2023)</article-title>; <year>2020</year>. [cited 2025 Jun 4]. Available from: <ext-link ext-link-type="uri" xlink:href="https://www.cisco.com/./annual-internet-report/index.html,">https://www.cisco.com/./annual-internet-report/index.html,</ext-link>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>W</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Ding</surname> <given-names>H</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Yuan</surname> <given-names>D</given-names></string-name></person-group>. <article-title>QoE-aware collaborative edge caching and computing for adaptive video streaming</article-title>. <source>IEEE Trans Wirel Commun</source>. <year>2023</year>;<volume>23</volume>(<issue>6</issue>):<fpage>6453</fpage>&#x2013;<lpage>66</lpage>. doi:<pub-id pub-id-type="doi">10.1109/twc.2023.3331724</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>S</given-names></string-name>, <string-name><surname>He</surname> <given-names>P</given-names></string-name>, <string-name><surname>Suto</surname> <given-names>K</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>P</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>L</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Cooperative edge caching in user-centric clustered mobile networks</article-title>. <source>IEEE Trans Multimedia</source>. <year>2017</year>;<volume>17</volume>(<issue>8</issue>):<fpage>1791</fpage>&#x2013;<lpage>805</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tmc.2017.2780834</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Chiu</surname> <given-names>D</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>Z</given-names></string-name></person-group>. <article-title>Modeling dynamics of online video popularity</article-title>. <source>IEEE Trans Multimedia</source>. <year>2016</year>;<volume>18</volume>(<issue>9</issue>):<fpage>1882</fpage>&#x2013;<lpage>95</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tmm.2016.2579600</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Cui</surname> <given-names>L</given-names></string-name>, <string-name><surname>Ni</surname> <given-names>E</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>J</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Towards real-time video caching at edge servers: a cost-aware deep Q-learning solution</article-title>. <source>IEEE Trans Multimedia</source>. <year>2021</year>;<volume>25</volume>:<fpage>302</fpage>&#x2013;<lpage>14</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tmm.2021.3125803</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sun</surname> <given-names>C</given-names></string-name>, <string-name><surname>Li</surname> <given-names>X</given-names></string-name>, <string-name><surname>Wen</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Han</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Leung</surname> <given-names>V</given-names></string-name></person-group>. <article-title>Federated deep reinforcement learning for recommendation-enabled edge caching in mobile edge-cloud computing networks</article-title>. <source>IEEE J Sel Areas Commun</source>. <year>2023</year>;<volume>41</volume>(<issue>3</issue>):<fpage>690</fpage>&#x2013;<lpage>705</lpage>. doi:<pub-id pub-id-type="doi">10.1109/jsac.2023.3235443</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wu</surname> <given-names>P</given-names></string-name>, <string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Shi</surname> <given-names>L</given-names></string-name>, <string-name><surname>Ding</surname> <given-names>M</given-names></string-name>, <string-name><surname>Cai</surname> <given-names>K</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Dynamic content update for wireless edge caching via deep reinforcement learning</article-title>. <source>IEEE Commun Lett</source>. <year>2019</year>;<volume>23</volume>(<issue>10</issue>):<fpage>1773</fpage>&#x2013;<lpage>7</lpage>. doi:<pub-id pub-id-type="doi">10.1109/lcomm.2019.2931688</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhong</surname> <given-names>C</given-names></string-name>, <string-name><surname>Gursoy</surname> <given-names>MC</given-names></string-name>, <string-name><surname>Velipasalar</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Deep reinforcement learning-based edge caching in wireless networks</article-title>. <source>IEEE Trans Cogn Commun Netw</source>. <year>2020</year>;<volume>6</volume>(<issue>1</issue>):<fpage>48</fpage>&#x2013;<lpage>61</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tccn.2020.2968326</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kwon</surname> <given-names>M</given-names></string-name>, <string-name><surname>Song</surname> <given-names>M</given-names></string-name></person-group>. <article-title>A deep reinforcement learning-based technique for enhancing cache hit rate by adapting to dynamic file request patterns</article-title>. <source>Korean Instit Smart Media</source>. <year>2025</year>;<volume>14</volume>:<fpage>26</fpage>&#x2013;<lpage>34</lpage>. doi:<pub-id pub-id-type="doi">10.30693/SMJ.2025.14.1.26</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Lee</surname> <given-names>D</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Song</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Cost-effective, quality-oriented transcoding of live-streamed video on edge-servers</article-title>. <source>IEEE J Sel Areas Commun</source>. <year>2023</year>;<volume>16</volume>(<issue>4</issue>):<fpage>2503</fpage>&#x2013;<lpage>16</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tsc.2023.3256425</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tran</surname> <given-names>A-T</given-names></string-name>, <string-name><surname>Dao</surname> <given-names>N-N</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Bitrate adaptation for video streaming services in edge caching systems</article-title>. <source>IEEE Access</source>. <year>2020</year>;<volume>8</volume>:<fpage>135844</fpage>&#x2013;<lpage>52</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2020.3011517</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Dao</surname> <given-names>N</given-names></string-name>, <string-name><surname>Ngo</surname> <given-names>D</given-names></string-name>, <string-name><surname>Dinh</surname> <given-names>N</given-names></string-name>, <string-name><surname>Phan</surname> <given-names>T</given-names></string-name>, <string-name><surname>Vo</surname> <given-names>N</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>S</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Hit ratio and content quality tradeoff for adaptive bitrate streaming in edge caching systems</article-title>. <source>IEEE Syst J</source>. <year>2023</year>;<volume>15</volume>(<issue>4</issue>):<fpage>5094</fpage>&#x2013;<lpage>7</lpage>. doi:<pub-id pub-id-type="doi">10.1109/jsyst.2020.3019035</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Tran</surname> <given-names>A</given-names></string-name>, <string-name><surname>Lakew</surname> <given-names>D</given-names></string-name>, <string-name><surname>Nguyen</surname> <given-names>T</given-names></string-name>, <string-name><surname>Tuong</surname> <given-names>V</given-names></string-name>, <string-name><surname>Truong</surname> <given-names>T</given-names></string-name>, <string-name><surname>Dao</surname> <given-names>N</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Hit ratio and latency optimization for caching systems: a survey</article-title>. In: <conf-name>Proceedings of the International Conference on Information Networking</conf-name>; <publisher-loc>Jeju Island, Republic of Korea</publisher-loc>; <year>2021</year>. p. <fpage>577</fpage>&#x2013;<lpage>81</lpage>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Araldo</surname> <given-names>A</given-names></string-name>, <string-name><surname>Martignon</surname> <given-names>F</given-names></string-name>, <string-name><surname>Rossi</surname> <given-names>D</given-names></string-name></person-group>. <article-title>Representation selection problem: optimizing video delivery through caching</article-title>. In: <conf-name>Proceedings of the IFIP Networking Conference and Workshops</conf-name>; <publisher-loc>Vienna, Austria</publisher-loc>; <year>2016</year>. p. <fpage>323</fpage>&#x2013;<lpage>31</lpage>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhou</surname> <given-names>J</given-names></string-name>, <string-name><surname>Simeone</surname> <given-names>O</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>W</given-names></string-name></person-group>. <article-title>Adaptive offline and online similarity-based caching</article-title>. <source>IEEE Netw Lett</source>. <year>2020</year>;<volume>2</volume>(<issue>4</issue>):<fpage>175</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1109/lnet.2020.3031961</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Garetto</surname> <given-names>M</given-names></string-name>, <string-name><surname>Leonardi</surname> <given-names>E</given-names></string-name>, <string-name><surname>Neglia</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Content placement in networks of similarity caches</article-title>. <source>Comput Netw</source>. <year>2021</year>;<volume>201</volume>(<issue>2</issue>):<fpage>108570</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.comnet.2021.108570</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Xiong</surname> <given-names>F</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>H</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Similarity caching in dynamic cooperative edge networks: an adversarial bandit approach</article-title>. <source>IEEE Trans Mob Comput</source>. <year>2024</year>;<volume>24</volume>(<issue>4</issue>):<fpage>2769</fpage>&#x2013;<lpage>82</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tmc.2024.3500132</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tran</surname> <given-names>T</given-names></string-name>, <string-name><surname>Pompili</surname> <given-names>D</given-names></string-name></person-group>. <article-title>Adaptive bitrate video caching and processing in mobile-edge computing networks</article-title>. <source>IEEE Trans Mob Comput</source>. <year>2018</year>;<volume>18</volume>(<issue>9</issue>):<fpage>1965</fpage>&#x2013;<lpage>78</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tmc.2018.2871147</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Bayhan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Maghsudi</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zubow</surname> <given-names>A</given-names></string-name></person-group>. <article-title>EdgeDASH: exploiting network-assisted adaptive video streaming for edge caching</article-title>. <source>IEEE Trans Netw Serv Manag</source>. <year>2020</year>;<volume>18</volume>(<issue>2</issue>):<fpage>1732</fpage>&#x2013;<lpage>45</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tnsm.2020.3037147</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>C</given-names></string-name>, <string-name><surname>Ye</surname> <given-names>T</given-names></string-name>, <string-name><surname>Zong</surname> <given-names>T</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>L</given-names></string-name>, <string-name><surname>Cao</surname> <given-names>H</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Coffee: cost-effective edge caching for 360 degree live video streaming</article-title>. <source>arXiv:2312.13470</source>. <year>2023</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Ge</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Luo</surname> <given-names>B</given-names></string-name></person-group>. <article-title>A utility-based optimization framework for edge service entity caching</article-title>. <source>IEEE Trans Parallel Distrib Syst</source>. <year>2019</year>;<volume>30</volume>(<issue>11</issue>):<fpage>2384</fpage>&#x2013;<lpage>95</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tpds.2019.2915218</pub-id>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Mehrabi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Siekkinen</surname> <given-names>M</given-names></string-name>, <string-name><surname>Yl&#x00E4;-J&#x00E4;&#x00E4;ski</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Cache-aware QoE-traffic optimization in mobile edge assisted adaptive video streaming</article-title>. <comment>arXiv:1805.09255. 2018</comment>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Lee</surname> <given-names>D</given-names></string-name>, <string-name><surname>Song</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Quality-aware transcoding task allocation under limited power in live-streaming systems</article-title>. <source>IEEE Syst J</source>. <year>2022</year>;<volume>16</volume>(<issue>3</issue>):<fpage>4368</fpage>&#x2013;<lpage>79</lpage>. doi:<pub-id pub-id-type="doi">10.1109/jsyst.2021.3103526</pub-id>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhou</surname> <given-names>S</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>C</given-names></string-name>, <string-name><surname>Mao</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yan</surname> <given-names>H</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>S</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Caching in dynamic environments: a near-optimal online learning approach</article-title>. <source>IEEE Trans Multimedia</source>. <year>2021</year>;<volume>25</volume>:<fpage>792</fpage>&#x2013;<lpage>804</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tmm.2021.3132156</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Little</surname> <given-names>JDC</given-names></string-name></person-group>. <article-title>A proof for the queuing formula: L &#x003D; &#x03BB;W</article-title>. <source>Operat Res</source>. <year>1961</year>;<volume>9</volume>(<issue>3</issue>):<fpage>383</fpage>&#x2013;<lpage>7</lpage>. doi:<pub-id pub-id-type="doi">10.1287/opre.9.3.383</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Schulman</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wolski</surname> <given-names>F</given-names></string-name>, <string-name><surname>Dhariwal</surname> <given-names>P</given-names></string-name>, <string-name><surname>Radford</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kimov</surname> <given-names>O</given-names></string-name></person-group>. <article-title>Proximal policy optimization algorithms</article-title>. <comment>arXiv:1707. 06347. 2017</comment>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>C</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>W</given-names></string-name>, <string-name><surname>You</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Implementing action mask in proximal policy optimization (PPO) algorithm</article-title>. <source>ICT Express</source>. <year>2020</year>;<volume>6</volume>(<issue>3</issue>):<fpage>200</fpage>&#x2013;<lpage>3</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.icte.2020.05.003</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Akiba</surname> <given-names>T</given-names></string-name>, <string-name><surname>Sano</surname> <given-names>S</given-names></string-name>, <string-name><surname>Yanase</surname> <given-names>T</given-names></string-name>, <string-name><surname>Ohta</surname> <given-names>T</given-names></string-name>, <string-name><surname>Koyama</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Optuna: a next-generation hyperparameter optimization framework</article-title>. In: <conf-name>Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery &#x0026; Data Mining</conf-name>; <publisher-loc>Anchorage, AK, USA</publisher-loc>; <year>2019</year>. p. <fpage>2623</fpage>&#x2013;<lpage>31</lpage>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zink</surname> <given-names>M</given-names></string-name>, <string-name><surname>Suh</surname> <given-names>K</given-names></string-name>, <string-name><surname>Gu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Kurose</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Characteristics of youtube network traffic at a campus network-measurements, models, and implications</article-title>. <source>Comput Netw</source>. <year>2009</year>;<volume>53</volume>(<issue>4</issue>):<fpage>501</fpage>&#x2013;<lpage>14</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.comnet.2008.09.022</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>