<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">64103</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.064103</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Cluster Federated Learning with Intra-Cluster Correction</article-title>
<alt-title alt-title-type="left-running-head">Cluster Federated Learning with Intra-Cluster Correction</alt-title>
<alt-title alt-title-type="right-running-head">Cluster Federated Learning with Intra-Cluster Correction</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Yang</surname><given-names>Yunong</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Ma</surname><given-names>Long</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Fan</surname><given-names>Liang</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-4" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Xie</surname><given-names>Tao</given-names></name><xref ref-type="aff" rid="aff-3">3</xref><email>xietao@swu.edu.cn</email></contrib>
<aff id="aff-1"><label>1</label><institution>College of Computer and Information Science, Chongqing Normal University</institution>, <addr-line>Chongqing, 401331</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>Research Office, Chongqing Normal University</institution>, <addr-line>Chongqing, 401331</addr-line>, <country>China</country></aff>
<aff id="aff-3"><label>3</label><institution>Faculty of Education, Southwest University</institution>, <addr-line>Chongqing, 400715</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Tao Xie. Email: <email>xietao@swu.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>03</day><month>07</month><year>2025</year>
</pub-date>
<volume>84</volume>
<issue>2</issue>
<fpage>3459</fpage>
<lpage>3476</lpage>
<history>
<date date-type="received">
<day>05</day>
<month>2</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>19</day>
<month>5</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_64103.pdf"></self-uri>
<abstract>
<p>Federated learning has emerged as an essential technique of protecting privacy since it allows clients to train models locally without explicitly exchanging sensitive data. Extensive research has been conducted on the issue of data heterogeneity in federated learning, but effective model training with severely imbalanced label distributions remains an unexplored area. This paper presents a novel Cluster Federated Learning Algorithm with Intra-cluster Correction (CFIC). First, CFIC selects samples from each cluster during each round of sampling, ensuring that no single category of data dominates the model training. Second, in addition to updating local models, CFIC adjusts its own parameters based on information shared by other clusters, allowing the final cluster models to better reflect the true nature of the entire dataset. Third, CFIC refines the cluster models into a global model, ensuring that even when label distributions are extremely imbalanced, the negative effects are significantly mitigated, thereby improving the global model&#x2019;s performance. We conducted extensive experiments on seven datasets and six benchmark algorithms. The results show that the CFIC algorithm has a higher generalization ability than the benchmark algorithms. CFIC maintains high accuracy and rapid convergence rates even in a variety of non-independent identically distributed label skew distribution settings. The findings indicate that the proposed algorithm has the potential to become a trustworthy and practical solution for privacy preservation, which might be applied to fields such as medical image analysis, autonomous driving technologies, and intelligent educational platforms.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Federated learning</kwd>
<kwd>non-IID</kwd>
<kwd>client clustering</kwd>
<kwd>intra-cluster correction</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>National Natural Science Foundation</funding-source>
<award-id>62277043</award-id>
</award-group>
<award-group id="awg2">
<funding-source>Technology Research Project of Chongqing Education</funding-source>
<award-id>KJZD-K202300515</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>As information technology rapidly advances, the generation and widespread application of big data have become significant characteristics of modern society [<xref ref-type="bibr" rid="ref-1">1</xref>]. From business analytics to healthcare and public services, data is being used to drive innovation, improve efficiency, and enhance the quality of life. However, recent frequent incidents of personal data breaches have heightened societal concern for data privacy protection [<xref ref-type="bibr" rid="ref-2">2</xref>]. The traditional method of direct data collection and centralized training on servers is limited by data security risks. To address compliance challenges in data use, federated learning has emerged as a novel machine learning paradigm gaining high recognition in both academia and industry. Federated learning allows clients to train models locally and send model parameters to a central server for aggregation [<xref ref-type="bibr" rid="ref-3">3</xref>]. By aggregating model parameters from different clients, federated learning can leverage cross-client data information to train an optimized global model [<xref ref-type="bibr" rid="ref-4">4</xref>].</p>
<p>Data distribution significantly impacts the performance of federated learning algorithms, making client data heterogeneity a critical challenge in implementing federated learning. In many general scenarios, client datasets often exhibit non-independent identically distributed (non-IID) characteristics, which not only significantly degrade the overall performance of the global model but also slow down its convergence [<xref ref-type="bibr" rid="ref-5">5</xref>]. Studies have identified label distribution shift as one of the primary factors leading to non-IID states, where there are substantial differences in label distributions across clients. This inconsistency in label distribution makes it particularly challenging to construct a globally effective generalization capability, directly affecting the fairness and accuracy of the final model [<xref ref-type="bibr" rid="ref-6">6</xref>]. To mitigate the impact of data heterogeneity, researchers have proposed various strategies to alleviate the negative effects caused by data heterogeneity. These methods include employing robust optimization techniques, designing specialized model structures tailored for non-IID data, and developing new communication protocols [<xref ref-type="bibr" rid="ref-7">7</xref>,<xref ref-type="bibr" rid="ref-8">8</xref>]. For example, some researchers recommend dynamic client clustering methods combined with gradient regularization and global label alignment to effectively reduce the impact of distribution differences among clients on global model performance [<xref ref-type="bibr" rid="ref-8">8</xref>]. Others adopt adaptive label alignment techniques that adjust the global model&#x2019;s alignment strategy based on local data distribution [<xref ref-type="bibr" rid="ref-9">9</xref>,<xref ref-type="bibr" rid="ref-10">10</xref>]. Additionally, researchers use variational Bayesian inference techniques to enhance federated learning performance by building complex probabilistic distribution models [<xref ref-type="bibr" rid="ref-11">11</xref>&#x2013;<xref ref-type="bibr" rid="ref-13">13</xref>]. However, these methods require abundant local data to approximate posterior distributions, leading to significantly weaker inference effects in clients with scarce data or missing label categories, resulting in suboptimal global model performance. In addition, these approaches attempt to compensate for data heterogeneity by increasing model complexity, but this contradicts the basic goal of adapting to resource constraints on edge devices. On clients with missing labels, local training may overfit to a few labels, and global regularization cannot effectively correct this bias, instead suppressing the model&#x2019;s generalization capability.</p>
<p>Although there has been extensive research on the issue of data heterogeneity in federated learning, effective model training with severely unbalanced label distribution is still an underexplored area [<xref ref-type="bibr" rid="ref-14">14</xref>]. Extreme imbalances in label distribution have been observed in a variety of real-world applications, including but not limited to medical image analysis, autonomous driving technology, and intelligent educational platforms [<xref ref-type="bibr" rid="ref-15">15</xref>,<xref ref-type="bibr" rid="ref-16">16</xref>]. Taking modern education systems as an example, unequal distribution of educational resources can result in a significant shift in the learning materials available for different subjects on online learning platforms [<xref ref-type="bibr" rid="ref-17">17</xref>]. Popular courses may attract a large number of students, resulting in massive high-quality datasets; however, materials related to niche areas appear to be scarce, posing challenges for the development of effective intelligent tutoring systems [<xref ref-type="bibr" rid="ref-18">18</xref>,<xref ref-type="bibr" rid="ref-19">19</xref>]. Therefore, conducting extensive research into these label imbalance phenomena and developing corresponding solutions is critical for furthering the development of federated learning technologies.</p>
<p>For this purpose, we innovatively propose a Cluster Federated Learning Algorithm with Intra-cluster Correction (CFIC). This method is particularly suited for scenarios where clients predominantly possess only a few or even a single type of label. By extracting label distribution characteristics, clients with similar data distributions are clustered together into the same group. Each cluster then independently aggregates its model to mitigate the impact of non-IID data on federated learning. Specifically, CFIC selects samples from each cluster in every round of sampling, ensuring a more reasonable overall data distribution and preventing certain specific categories from disproportionately dominating the model training process. However, the cluster models obtained by aggregating each cluster might deviate to some extent from the direction of the global optimum, leading to a decline in overall performance. To address this issue, CFIC not only leverages local information but also focuses on maintaining consistency with the global model. During each iteration, CFIC adjusts its parameters based on the information shared by other clusters besides updating the local model based on local data. This ensures that the final cluster model better reflects the true situation of the entire dataset. This process can be viewed as a correction in the direction of the global model. It ensures that even when label distributions are extremely imbalanced, the resulting negative impacts are significantly mitigated, thereby improving the global model&#x2019;s performance. Based on the above analysis, we summarize the following main contributions.
<list list-type="bullet">
<list-item>
<p>We cluster clients based on their label distribution characteristics and then use a specific sampling strategy within these clusters to ensure dynamic label balance in each round of sampling. This method reduces the decline in model performance caused by the non-IID nature of client data.</p></list-item>
<list-item>
<p>We incorporate a model correction strategy into cluster federated learning. This strategy enables the cluster models to update towards the direction of the global model&#x2019;s optimal solution, thereby enhancing the convergence speed and accuracy of the model.</p></list-item>
<list-item>
<p>We conducted extensive experiments on real-world datasets, demonstrating that our method outperforms baseline algorithms in terms of convergence speed and testing accuracy. Particularly, in scenarios with extreme label imbalance, our approach achieves superior results compared to benchmark methods.</p></list-item>
</list></p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Works</title>
<sec id="s2_1">
<label>2.1</label>
<title> Federated Learning</title>
<p>Federated learning has emerged as a pivotal approach to safeguard data privacy while effectively integrating disparate data silos. FedAvg, as a classic federated learning framework, can enhance the performance of edge device models through an algorithm that ensures user privacy [<xref ref-type="bibr" rid="ref-4">4</xref>]. In this algorithm, the central server first selects a subset of clients to participate in the training and distributes the global model to these selected clients. Each client independently trains the model using their local data and the central server aggregates the model updates from the participating clients to form a new global model. However, due to differences in user demographics or usage habits, the local data on clients may exhibit non-IID characteristics. Li et al. further confirmed that traditional FedAvg faces challenges such as slow convergence of the global model and deviation from optimal solutions under non-IID conditions [<xref ref-type="bibr" rid="ref-6">6</xref>].</p>
<p>Many researchers have proposed to mitigate the biases that arise during local training on non-IID data, aiming to alleviate the adverse effects in the federated averaging process and enhance the performance of the global model. For instance, FedProx introduces a regularization term during local training that constrains updates based on the distance between the local model and the global model, thereby reducing overfitting of the local models [<xref ref-type="bibr" rid="ref-20">20</xref>]. MOON normalizes local training by leveraging the similarity between the representations of local and global models, incorporating a contrastive learning approach that maximizes the similarity to improve model generalization [<xref ref-type="bibr" rid="ref-21">21</xref>]. FedLC further calibrates at the logit level to reduce updates for minority classes, thereby enhancing model performance when dealing with imbalanced data [<xref ref-type="bibr" rid="ref-22">22</xref>]. FedNova refines the global aggregation phase by adjusting the contribution weights of each client, making the global model more aligned with the global optimum [<xref ref-type="bibr" rid="ref-23">23</xref>]. These existing methods have addressed label shift issues to some extent, but further research and improvements are necessary to enhance their effectiveness in real-world scenarios.</p>
<p>A comprehensive survey categorizes heterogeneous federated learning into data space, statistical, system, and model heterogeneity, suggesting further research to improve model generalization and performance across diverse clients [<xref ref-type="bibr" rid="ref-24">24</xref>]. Researchers have proposed several strategies to mitigate these issues. For example, HeteroFL addresses computational heterogeneity by enabling the training of heterogeneous local models with varying complexities [<xref ref-type="bibr" rid="ref-25">25</xref>]. Exploiting Model and Data Heterogeneity in FL (MDH-FL) employs knowledge distillation and symmetric loss to tackle both data and model heterogeneity [<xref ref-type="bibr" rid="ref-26">26</xref>]. Recent approaches also explore adaptive data distribution, regularization terms, contrastive learning, and multi-task learning to address heterogeneity issues [<xref ref-type="bibr" rid="ref-27">27</xref>]. These methods aim to optimize algorithms and model structures to cope with heterogeneity, but they do not address the diversity of client data distributions. Consequently, some researchers propose clustered federated learning (CFL) to group clients with similar data distributions, thereby improving model performance and enhancing privacy protection level.</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title> Clustered Federated Learning</title>
<p>CFL is a model-agnostic distributed multi-task optimization framework that enhances both model performance and privacy protection. In recent years, various CFL frameworks have been proposed, including confidential aggregation techniques for securing individual updates and customized systems tailored for human activity recognition applications [<xref ref-type="bibr" rid="ref-28">28</xref>]. The latest advancements focus on efficiently identifying the distributional similarity between client data subspaces using principal angles, which accelerates cluster formation and provides convergence guarantees for non-convex objectives [<xref ref-type="bibr" rid="ref-29">29</xref>]. Furthermore, some researchers have adopted non-convex pairwise fusion, enabling autonomous estimation of cluster structures without prior knowledge [<xref ref-type="bibr" rid="ref-30">30</xref>]. As specific examples in the CFL domain, ClusterFL presents a multi-task federated learning framework that automatically captures inherent clustering relationships among nodes, thereby improving accuracy and reducing communication overhead, particularly in human activity recognition applications [<xref ref-type="bibr" rid="ref-28">28</xref>]. ACFL introduces a mean-shift clustering algorithm and an auction-based client selection strategy aimed at mitigating data heterogeneity and balancing energy consumption in mobile edge computing systems [<xref ref-type="bibr" rid="ref-31">31</xref>]. By leveraging the geometric properties of the federated learning loss surface, clients are grouped into clusters with jointly trainable data distributions, suitable for general non-convex objectives, while performing multi-task optimization under privacy preservation [<xref ref-type="bibr" rid="ref-32">32</xref>]. IFCA addresses non-IID data through iterative clustering and model updates, partitioning clients with similar data distributions into several clusters where each cluster independently conducts model training and updates followed by global model aggregation [<xref ref-type="bibr" rid="ref-33">33</xref>]. FedAC effectively integrates global knowledge into cluster learning by decoupling neural networks and employing different aggregation methods for each submodule, thus significantly enhancing performance [<xref ref-type="bibr" rid="ref-22">22</xref>]. FeSEM introduces an expectation-maximization algorithm for client clustering, ensuring that clients within each cluster share similar data distributions [<xref ref-type="bibr" rid="ref-34">34</xref>].</p>
<p>The CFL methods described above address the issue of model parameter inconsistency by incorporating clustering steps during local training, reducing the global model performance decline caused by label shift. They excel at increasing model accuracy, convergence rate, and efficiency. However, there are two major issues that most studies have not addressed. The first issue is model consistency; model consistency across different clusters can have an impact on overall performance, especially when data distribution is complex. The second issue is the ability to adapt dynamically. Most existing methods&#x2019; adaptive adjustment capacity may be restricted for extremely uneven data distributions. To address data heterogeneity in federated learning distributed training, this study introduces an intra-cluster correction method based on the CFL framework. We use a specific sampling strategy to ensure that labels are dynamically balanced with each round of sampling, allowing the cluster model to update towards the optimal global solution.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>The Proposed CFIC Algorithm</title>
<p>Traditional federated learning&#x2019;s global averaging aggregation assumes that client data follows the same underlying distribution. However, in actual non-independent and identically distributed (non-IID) scenarios, client data may belong to multiple significantly different sub-distributions. Directly averaging the parameters of all clients blurs these distribution boundaries. If client data can be divided into multiple clusters, we can identify these clusters and then sample and aggregate the cluster models, which can significantly reduce task conflicts. Additionally, even if clients within the same cluster have similar label distributions, their data may still suffer from feature shifts or noise interference. A mechanism is needed to correct the intra-cluster models, forcing the alignment of different client models in latent space, thereby reducing the impact of feature shifts on the global model.</p>
<p>The overall framework of the algorithm proposed in this paper is shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. This algorithm is a variation on traditional independent and identically distributed federated learning. Unlike the standard federated learning approach, which simply averages all uploaded data to update the global model, our study uses clustering analysis performed by the central server based on the label features received from each client. The goal of this manipulation is to identify a group of clients who share similar characteristics, thereby better capturing potential pattern differences between populations. Rather than creating a global aggregated model for all nodes, the server examines the label features collected from clients and divides them into multiple clusters. For each identified cluster, the server computes the information contributed by its members and creates a local model that represents the cluster&#x2019;s characteristics. Finally, the global model update direction is determined by the multiple cluster models obtained during the preceding process.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Overall framework</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64103-fig-1.tif"/>
</fig>
<sec id="s3_1">
<label>3.1</label>
<title> Problem Formulation</title>
<p>In traditional federated learning algorithms, each iteration typically involves either selecting all clients to participate in the aggregation of the global model or choosing a subset of clients based on a specific sampling strategy. However, when the private datasets on client devices exhibit non-IID characteristics, the performance of the aggregated global model can significantly deteriorate. The purpose of CFIC is to mitigate the issue of client model drift caused by label distribution shifts, thereby obtaining a globally aggregated model with guaranteed performance. The symbols used in this study are shown in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Notations</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Notation</th>
<th>Explanation</th>
</tr>
</thead>
<tbody>
<tr>
<td><inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>K</mml:mi></mml:math></inline-formula></td>
<td>The total number of clients</td>
</tr>
<tr>
<td><inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>Client <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mi>i</mml:mi></mml:math></inline-formula></td>
</tr>
<tr>
<td><inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow></mml:math></inline-formula></td>
<td>Model parameters for minimizing loss</td>
</tr>
<tr>
<td><inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>The local data of client <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mi>i</mml:mi></mml:math></inline-formula></td>
</tr>
<tr>
<td><inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>The probability of user <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mi>i</mml:mi></mml:math></inline-formula> being selected to participate in the training of the global model.</td>
</tr>
<tr>
<td><inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>w</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td>The loss error prediction for the model parameter <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow></mml:math></inline-formula> by the user <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mi>i</mml:mi></mml:math></inline-formula> on <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>.</td>
</tr>
<tr>
<td><inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>Local distribution of client <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mi>k</mml:mi></mml:math></inline-formula></td>
</tr>
<tr>
<td><inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:msup><mml:mi>Q</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>A simulated IID distribution consistent with the number of client-<inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mi>k</mml:mi></mml:math></inline-formula> samples.</td>
</tr>
<tr>
<td><inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x1D4A2;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>Cluster <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:mi>i</mml:mi></mml:math></inline-formula> obtained from client-side clustering</td>
</tr>
<tr>
<td><inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>The cluster model aggregated from Cluster <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:mi>i</mml:mi></mml:math></inline-formula></td>
</tr>
<tr>
<td><inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mi>&#x03B1;</mml:mi></mml:math></inline-formula></td>
<td>The parameter of historically adjusted gradient direction</td>
</tr>
<tr>
<td><inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mi>&#x03B2;</mml:mi></mml:math></inline-formula></td>
<td>The weights of the global model aggregation direction refined by cluster models.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In the context of federated learning across devices, consider a system composed of <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mi>K</mml:mi></mml:math></inline-formula> clients, denoted as <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x22EF;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>. All clients are dedicated to integrating their respective data <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x22EF;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> in order to obtain a better-performing machine learning model. The central server interacts with these clients to collaboratively find model parameters <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow></mml:math></inline-formula> that minimize loss through <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>.
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:munder><mml:mrow><mml:mo form="prefix">min</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="bold">w</mml:mi></mml:mrow></mml:mrow></mml:munder><mml:mspace width="thinmathspace" /><mml:mi>f</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="bold">w</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:munderover><mml:mspace width="thinmathspace" /><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="bold">w</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="bold">w</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">]</mml:mo></mml:math></disp-formula></p>
<p>In this setup, <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents the probability that participant user <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mi>i</mml:mi></mml:math></inline-formula> is chosen to contribute to the global model training process, and this probability <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> satisfies <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2265;</mml:mo><mml:mn>0</mml:mn></mml:math></inline-formula> and <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula>. <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>w</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> indicates the loss error prediction for model parameters <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow></mml:math></inline-formula> by the client <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:mi>j</mml:mi></mml:math></inline-formula> on its local dataset <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, typically calculated using function <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mi>&#x2113;</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, where <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:mrow><mml:mi>&#x2113;</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is a predefined loss function relevant to the specific task. Typically, the FedAvg method is used to minimize model parameters <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow></mml:math></inline-formula> through unbiased sampling. The probability <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> follows a uniform distribution, implying that all users have an equal random chance of being selected to participate in the training.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Label Distribution Clustering of Clients</title>
<p>Existing research primarily focuses on clustering client-trained local models based on their similarities, which poses a significant challenge for large-scale distributed applications in federated learning. In iterative clustering methods, calculating model similarity is a resource-intensive process that requires substantial computational resources. This is especially true in highly heterogeneous environments where the dataset may contain only a few labels. To address this issue, we can introduce a simulated IID data distribution as a reference. By comparing the local client&#x2019;s data distribution with this reference distribution, we can obtain the differences between them. We map these distributional differences to a value that represents the deviation of different clients&#x2019; data from the ideal distribution. By performing calculations based on these values, we can significantly reduce the required computational scale. Following the work [<xref ref-type="bibr" rid="ref-35">35</xref>], we assume a uniform global class distribution when computing local discrepancies to prevent global label information leakage. Simultaneously, only the maximum categorical label is retained during constructed a feature extraction function <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:mrow><mml:mi mathvariant="normal">&#x03A8;</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, thereby mitigating unnecessary local label information exposure, as shown in <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref>.
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mi mathvariant="normal">&#x03A8;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>arg</mml:mi><mml:mo>&#x2061;</mml:mo><mml:munder><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:munder><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:munder><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>In this equation, <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> represents the actual proportion of samples with label <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:mi>i</mml:mi></mml:math></inline-formula> in the data set <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:munderover><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> of client <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:mi>k</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:mi>C</mml:mi></mml:math></inline-formula> denotes the total number of labels, and <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:math></inline-formula> indicates the distributional variance of label <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:mi>i</mml:mi></mml:math></inline-formula> within client <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:mi>k</mml:mi></mml:math></inline-formula>. Clients then upload their extracted label distribution features to the server. The server uses these uploaded features to perform client clustering without needing to predefine the number of clusters. The number of clusters is determined by the distribution characteristics of the clients. Client clustering is initiated only during the first round of communication and when new clients join, thereby conserving significant computational resources. This approach involves clients locally mapping feature extraction functions to numerical values. Clients then only upload these mapped values, ensuring that their own label distributions are not exposed. This method provides a level of privacy protection, as the raw data remains on the clients&#x2019; local machines and only aggregated, mapped values are shared.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Global Model with Intra-Cluster Correction</title>
<p>In this step, we mitigate the issue of client model drift caused by extreme label distribution imbalance. In such scenarios, most clients contain only a few or even just one type of label. For clustering groups, the data within each group is more likely to follow an independent and identically distributed distribution. Traditionally, scholars often adopt aggregation functions such as FedAvg, Krum, and Trimmed Mean, which can effectively protect user data privacy [<xref ref-type="bibr" rid="ref-36">36</xref>]. These aggregation functions are used to summarize local model updates from clients into a global model. The choice of aggregation function directly affects the robustness of the model. However, the results of ablation experiments show that using the FedAvg aggregation function decreases model performance. While the Krum and Trimmed Mean aggregation functions have advantages in terms of robustness and simplicity, they require the use of local model gradients from the previous round, making it vulnerable to adversarial samples and limiting their applications in certain dynamically changing scenarios [<xref ref-type="bibr" rid="ref-37">37</xref>]. In this paper, instead of using traditional aggregation functions, we aggregate each cluster into a cluster model, which exhibits higher accuracy for the labels within its respective cluster. However, these cluster models might deviate from the direction of model aggregation in terms of loss-minimizing model parameters <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow></mml:math></inline-formula>. Therefore, this paper introduces an intra-cluster model correction strategy to update the cluster models towards the direction of the global model&#x2019;s optimal solution.</p>
<p>To ensure model stability, we have improved the sampling strategy to make the number of samples from each cluster relatively uniform across iterations. Specifically, <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:mi>K</mml:mi></mml:math></inline-formula> clients are clustered into <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:mi>m</mml:mi></mml:math></inline-formula> clusters represented as <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x1D4A2;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x1D4A2;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x1D4A2;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x22EF;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x1D4A2;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>, with a sampling ratio of <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula>. Within each cluster, each client&#x2019;s opportunity to participate in training is determined by uniform random sampling, denoted as <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:munder><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mi>&#x1D4A2;</mml:mi></mml:mrow></mml:mrow></mml:munder><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula>. The sampling number for each cluster is <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x230A;</mml:mo><mml:mfrac><mml:mrow><mml:mi>K</mml:mi><mml:mi>&#x03B3;</mml:mi></mml:mrow><mml:mi>m</mml:mi></mml:mfrac><mml:mo>&#x230B;</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, and then additional <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:mi>K</mml:mi><mml:mi>&#x03B3;</mml:mi><mml:mo>&#x2212;</mml:mo><mml:munder><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mrow><mml:mi>&#x1D4A2;</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:munder><mml:munder><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x230A;</mml:mo><mml:mfrac><mml:mrow><mml:mi>K</mml:mi><mml:mi>&#x03B3;</mml:mi></mml:mrow><mml:mi>m</mml:mi></mml:mfrac><mml:mo>&#x230B;</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> clients are sampled from the remaining clients to form a final set of clients <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:msup><mml:mi>S</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula> as the sampling result for the <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:mi>t</mml:mi></mml:math></inline-formula>-th communication round. If it is the first round of communication, <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:mi>K</mml:mi><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> clients are randomly and uniformly sampled to form client set <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:msup><mml:mi>S</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:math></inline-formula>.</p>
<p>At the beginning of the <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mi>t</mml:mi></mml:math></inline-formula>-th communication, the server obtains the current global model parameters <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and distributes them to each local client to get local model parameters <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:msubsup><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula>. This paper aggregates <inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:mi>m</mml:mi></mml:math></inline-formula> cluster models <inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x22EF;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> from <inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:mi>m</mml:mi></mml:math></inline-formula> clusters. The cluster model update formula is given in <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref>.
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:msubsup><mml:mi>g</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x1D4A2;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munder><mml:mfrac><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi>n</mml:mi></mml:mfrac><mml:msubsup><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:math></disp-formula></p>
<p>For the standard gradient descent formula <inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:mi>w</mml:mi><mml:mo>=</mml:mo><mml:mi>w</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B7;</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:mrow><mml:mi>w</mml:mi></mml:math></inline-formula>, where <inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:mi>&#x03B7;</mml:mi></mml:math></inline-formula> represents the learning rate and <inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:mrow><mml:mi>w</mml:mi></mml:math></inline-formula> is the step size for gradient adjustment at each time step. As it approaches the optimal value, the gradient becomes smaller. Since the learning rate is fixed, the standard gradient descent method converges slowly and may even fall into local optima. This paper introduces momentum <inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:mi>h</mml:mi></mml:math></inline-formula> to correct the direction of the global model. Specifically, this is done by comparing the deviation between the cluster model and the direction of the previous round&#x2019;s global model to correct the direction of the global model aggregation in the current round. We also consider that the cluster model might deviate from the direction of the optimal solution of the global model in some rounds. Therefore, we add the historical correction gradient direction to improve the stability of the model, shown in <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref>.
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03B1;</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:munderover><mml:mfrac><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mi>n</mml:mi></mml:mfrac><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">g</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:msup><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">g</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>In this formula, <inline-formula id="ieqn-71"><mml:math id="mml-ieqn-71"><mml:mi>&#x03B1;</mml:mi></mml:math></inline-formula> is the momentum term, representing the influence of historical gradients. The larger <inline-formula id="ieqn-72"><mml:math id="mml-ieqn-72"><mml:mi>&#x03B1;</mml:mi></mml:math></inline-formula> is, the greater the impact of the historical correction gradient direction on the current round. The incorporation of the momentum term offers significant advantages in the high non-IID scenario, which is the focus of this study. On one hand, it accelerates the escape from local minima by leveraging historical accumulation. On the other hand, it mitigates the update oscillations caused by client sampling fluctuations. This design is inspired by the classical convergence theory of distributed optimization, where the core idea is to reduce client gradient variance through exponential smoothing, thereby enhancing convergence efficiency. Therefore, the server can smooth the direction of historical updates to alleviate the impact of heterogeneity among client updates. Finally, we update the global model <inline-formula id="ieqn-73"><mml:math id="mml-ieqn-73"><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow></mml:math></inline-formula> by averaging the local models <inline-formula id="ieqn-74"><mml:math id="mml-ieqn-74"><mml:msubsup><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> trained by clients in this round and adding the correction from the cluster model, shown in <xref ref-type="disp-formula" rid="eqn-5">Eq. (5)</xref>.
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">t</mml:mtext></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mtext mathvariant="bold">1</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mi>S</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:munder><mml:mfrac><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi>n</mml:mi></mml:mfrac><mml:msubsup><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>Based on the above theoretical analysis, we provide the following pseudocode for the CFIC algorithm as follows (Algorithm 1):</p>
<fig id="fig-6">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64103-fig-6.tif"/>
</fig>
<p>In the provided pseudocode, the algorithm takes the number of clients <inline-formula id="ieqn-109"><mml:math id="mml-ieqn-109"><mml:mi>k</mml:mi></mml:math></inline-formula>, the number of communication rounds <inline-formula id="ieqn-110"><mml:math id="mml-ieqn-110"><mml:mi>t</mml:mi></mml:math></inline-formula>, the sampling ratio <inline-formula id="ieqn-111"><mml:math id="mml-ieqn-111"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula>, the server-side client list <inline-formula id="ieqn-112"><mml:math id="mml-ieqn-112"><mml:mrow><mml:mrow><mml:mi>&#x1D49C;</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>, gradient information <inline-formula id="ieqn-113"><mml:math id="mml-ieqn-113"><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, initial model parameters <inline-formula id="ieqn-114"><mml:math id="mml-ieqn-114"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, and the learning rate <inline-formula id="ieqn-115"><mml:math id="mml-ieqn-115"><mml:mi>&#x03B7;</mml:mi></mml:math></inline-formula> as input. On the server side, the algorithm initializes <inline-formula id="ieqn-116"><mml:math id="mml-ieqn-116"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> with all client values <inline-formula id="ieqn-117"><mml:math id="mml-ieqn-117"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> set to empty (Line 1). For the <inline-formula id="ieqn-118"><mml:math id="mml-ieqn-118"><mml:mi>t</mml:mi></mml:math></inline-formula>-th communication round, the algorithm randomly and uniformly samples clients within clusters based on the sampling ratio <inline-formula id="ieqn-119"><mml:math id="mml-ieqn-119"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula>, forming the client set for this round (Line 3). The algorithm then iterates over the client set for this round and updates the client model parameters (Lines 4&#x2013;5). If a client is not in the central server&#x2019;s-maintained client table, the algorithm updates the client in the central server&#x2019;s client table and re-clusters the client to form a new cluster (Lines 6&#x2013;8). After completing these steps, the algorithm updates the cluster model at Line 9, corrects the gradient at Line 10, and updates the global model parameters at Line 11.</p>
<p>The client-side update process is as follows: The client first obtains the model parameters at Line 13 and then processes the client set in parallel, updating the model parameters (Lines 14&#x2013;15). If the client is not within the label distribution characteristics, the algorithm uses a label distribution feature extraction function for client clustering (Lines 16&#x2013;17). Finally, the client returns the model parameters and label distribution characteristics to the server side (Line 18).</p>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Cold Start Problem of CFIC Algorithm</title>
<p>The cold start problem typically arises when new clients join a federated learning system for the first time. Due to the lack of sufficient historical data to train their local models, these new clients are unable to make meaningful contributions to the global model during the initial phase. This issue not only affects the performance of the new clients&#x2019; own models but also potentially slows down the overall performance and stability of the entire system. To address this challenge, we propose the CFIC algorithm, which can meet the needs of new clients joining the federated learning system by mitigating the cold start issue. Specifically, we maintain a client table on the server-side that records all participating clients in the federated learning system. When a new client attempts to join the system, the server detects its absence from the maintained client table <inline-formula id="ieqn-120"><mml:math id="mml-ieqn-120"><mml:mrow><mml:mrow><mml:mi>&#x1D49C;</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula> and subsequently triggers a re-clustering process. During this re-clustering phase, the new client is incorporated into an appropriate cluster based on certain criteria, thereby facilitating its integration and enabling it to contribute more effectively to the global model.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Experiment</title>
<sec id="s4_1">
<label>4.1</label>
<title> Datasets</title>
<p>We conducted experiments on seven real-world datasets to validate the accuracy and fairness of our algorithm. The purpose of using these real-world datasets is to test the performance of the proposed algorithm in a realistic setting when compared with other algorithms, as well as to evaluate the accuracy of various algorithms on these datasets under heterogenous conditions.
<list list-type="bullet">
<list-item>
<p>MNIST dataset [<xref ref-type="bibr" rid="ref-38">38</xref>]. This dataset consists of images of handwritten digits from 0 to 9. The input for the dataset is 784-dimensional (28 &#x00D7; 28) flattened images, and the output is class labels ranging from 0 to 9.</p></list-item>
<list-item>
<p>CIFAR10 dataset [<xref ref-type="bibr" rid="ref-39">39</xref>]. This dataset contains 32 &#x00D7; 32-pixel RGB images. There are 60,000 samples in total, with 50,000 samples for training and 10,000 for testing. CIFAR10 includes 10 classes of objects, labeled from 0 to 9.</p></list-item>
<list-item>
<p>CIFAR100 dataset [<xref ref-type="bibr" rid="ref-40">40</xref>]. This dataset has 100 classes. Each class containing 600 images with 500 for training and 100 for testing. Each image comes with a &#x201C;fine&#x201D; label indicating the specific class it belongs to and a &#x201C;coarse&#x201D; label indicating the broader category it falls under.</p></list-item>
<list-item>
<p>EMNIST dataset [<xref ref-type="bibr" rid="ref-41">41</xref>]. This dataset includes both the &#x201C;by class&#x201D; and &#x201C;by merge&#x201D; datasets, each containing a complete set of 814,255 characters. These datasets differ in the number of categories assigned. Consequently, the distribution of sample letters varies between the two datasets, while the number of samples in the digit class remains consistent across them.</p></list-item>
<list-item>
<p>SVHN dataset [<xref ref-type="bibr" rid="ref-42">42</xref>]. This dataset is a benchmark for digit classification, consisting of 600,000 32 &#x00D7; 32 RGB cropped images of handwritten digits (from 0 to 9) extracted from house number plates. The cropped images center around the digit of interest but include nearby digits and other distracting elements within the image.</p></list-item>
<list-item>
<p>FMNIST (Fashion-MNIST) dataset. This dataset is an MNIST replacement featuring 10 fashion categories, each with 6000 28 &#x00D7; 28 grayscale training images and 1000 test images. Its fine-grained details (e.g., shirt vs. coat differences) and built-in non-IID data structure make it a key benchmark for testing federated learning under uneven data splits, especially for simulating real-world data imbalances.</p></list-item>
<list-item>
<p>FEMNIST is a benchmark dataset in federated learning, extended from EMNIST, containing 62 classes of handwritten characters with both digits and letters. Partitioned by real users via the LEAF framework, it inherently exhibits client-level non-IID characteristics. Each client corresponds to a writer, comprising approximately 800,000 28 &#x00D7; 28 grayscale images across 3400 clients, directly reflecting data heterogeneity in federated scenarios. It enables algorithm validation without artificial synthesis and is widely applied to image classification, aggregation strategy evaluation, and privacy-preserving assessments.</p></list-item>
</list></p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title> Baseline Algorithms</title>
<p>We compared our proposed algorithm with the following six federated learning algorithms to test its performance.
<list list-type="simple">
<list-item><label>1.</label><p>FedAvg algorithm. This algorithm is a widely used aggregation method in federated learning. In each round, the server sends the global model parameters to a randomly selected group of clients. Each client trains the model on its local dataset and then sends the local updates back to the server for aggregation. The updated global model is subsequently disseminated to the clients for the next round of training.</p></list-item>
<list-item><label>2.</label><p>FedProx algorithm [<xref ref-type="bibr" rid="ref-20">20</xref>]. This algorithm was designed to address issues related to non-IID data distributions and asymmetric participation among clients. In each round, the server distributes the global model parameters to the clients. Clients train the model on their local datasets and incorporate a regularization term to constrain the extent of local parameter changes, thereby mitigating divergence between clients&#x2019; models. The local updates are then aggregated at the server, and the refined global model is redistributed to the clients for subsequent rounds.</p></list-item>
<list-item><label>3.</label><p>FedDyn algorithm [<xref ref-type="bibr" rid="ref-42">42</xref>]. This algorithm introduces an adaptive risk objective for each client. During each communication round, the current global model is sent to selected active devices. Each device optimizes its local empirical loss along with a dynamically updated penalty function based on the difference between the local device model and the received server model. This ensures that the optimal direction of the devices aligns consistently with the static point of the global empirical loss.</p></list-item>
<list-item><label>4.</label><p>FedFa algorithm [<xref ref-type="bibr" rid="ref-43">43</xref>]. This algorithm incorporates a dual momentum gradient optimization scheme to accelerate model convergence. It also proposes a weighting algorithm that combines training accuracy and frequency information to measure the appropriateness of weights. This approach helps mitigate fairness issues in federated learning that may arise due to certain clients&#x2019; preferences.</p></list-item>
<list-item><label>5.</label><p>FedMGDA&#x002B; algorithm [<xref ref-type="bibr" rid="ref-44">44</xref>]. This algorithm is a multi-objective optimization algorithm aimed at resolving conflicting gradient issues in federated learning. By calculating multiple clients&#x2019; gradients in each round and performing multi-objective optimization at the server side, the algorithm balances these gradients, thus reducing inter-client conflicts.</p></list-item>
<list-item><label>6.</label><p>Clustered sampling algorithm [<xref ref-type="bibr" rid="ref-45">45</xref>]. This algorithm enhances the efficiency and effectiveness of federated learning by clustering clients. This ensures that each round of training involves representative clients, thereby improving the generalization capability of the global model. Additionally, by reducing the number of participating clients in each training round, the algorithm can lower communication overhead.</p></list-item>
</list></p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title> Setups</title>
<p>The model structure used in this study is as follows: MNIST adopts dual 5 &#x00D7; 5 convolutional layers (32 &#x2192; 64 channels) and dual fully connected layers (3136 &#x2192; 512 &#x2192; 10); CIFAR-10/100 and SVHN are both dual 5 &#x00D7; 5 convolutions (64 channels) combined with three fully connected layers (1600 &#x2192; 384 &#x2192; 192 &#x2192; output), with CIFAR-100 adjusted to 192 &#x2192; 100 through the final linear layer; EMNIST uses a three convolutional layer (32 &#x2192; 32 &#x2192; 64 channels) and a double fully connected layer (576 &#x2192; 256 &#x2192; 62), with a unique pooling connection sequence and custom dimension transformation layers. All models use ReLU activation and 2 &#x00D7; 2 max pooling, and non-linear mapping between fully connected layers is achieved through ReLU. The training and testing ratio is 0.9: 0.1.</p>
<p>The experimental setup utilized an Intel(R) Xeon(R) Gold 5218 CPU operating at 2.30 GHz and CentOS Linux release 7.9.2009 (Core) as the operating system. The platform was built using the Python library PyTorch and equipped with three NVIDIA Tesla V100S PCIe 32 GB graphics cards. We assumed a scenario involving 100 devices participating in federated learning with a sampling rate of 0.3. The batch size was set to 64, and the learning rate was uniformly set to 0.01 across all experiments. Additionally, the number of local iterations was fixed at five for each round of communication.</p>
<p>To simulate the class distribution of client data, we employed the Dirichlet distribution to partition the datasets. Sampling was conducted based on corresponding probability values. For the five real-world datasets, different alpha values (0.05, 0.1, 0.3, 0.5) were selected to conduct experiments aimed at evaluating the testing accuracy under various Dir partitions. Due to variations in dataset sizes and partitioning methods, different algorithms required varying numbers of communications to achieve convergence in terms of testing accuracy and training loss across the datasets. Consequently, the number of communications used in the experiments varied accordingly among the five real-world datasets. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> illustrates the distribution of classes owned by clients for the CIFAR10 dataset under different alpha values. The distributions of the other datasets are omitted due to space constraints.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Distribution of client samples under different values. The alpha value of left part is 0.05 and the alpha value of right part is 0.5</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64103-fig-2.tif"/>
</fig>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Experimental Results</title>
<p>(1) <bold>Testing accuracy</bold></p>
<p><xref ref-type="table" rid="table-2">Table 2</xref> illustrates the testing accuracy of various algorithms under different data distribution scenarios. FedAvg exhibits a significant disadvantage in handling highly heterogeneous data, particularly on the CIFAR-100 dataset where its accuracy is only 13.50%, compared to 44.94% on the CIFAR-10 dataset. This demonstrates that FedAvg&#x2019;s global model generalization ability is insufficient when addressing label imbalance across clients. Conversely, algorithms that incorporate regularization and optimization strategies, such as FedProx and FedDyn, achieve performance improvements. FedProx reduces inter-client model update discrepancies through regularization, while FedDyn mitigates the impact of label shift with dynamic regularization, performing particularly well on large-scale datasets like CIFAR-10 and CIFAR-100.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Comparison of accuracy of algorithms under different data distributions. The values in bold indicate the best performance</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Dataset</th>
<th align="center">Dir(<bold>&#x03B1;</bold>)</th>
<th align="center">FedAvg</th>
<th align="center">FedProx</th>
<th align="center">FedFa</th>
<th align="center">Clustered Sampling</th>
<th align="center">FedDyn</th>
<th align="center">FedMGDA<bold>&#x002B;</bold></th>
<th align="center">Ours</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="4">MINIST</td>
<td>0.05</td>
<td>93.47 &#x00B1; 0.13</td>
<td>93.16 &#x00B1; 0.31</td>
<td>95.28 &#x00B1; 0.19</td>
<td>93.37 &#x00B1; 0.09</td>
<td>98.15 &#x00B1; 0.09</td>
<td>96.76 &#x00B1; 0.31</td>
<td><bold>99.02 &#x00B1; 0.04</bold></td>
</tr>
<tr>
<td>0.1</td>
<td>94.76 &#x00B1; 0.37</td>
<td>94.56 &#x00B1; 0.37</td>
<td>96.23 &#x00B1; 0.23</td>
<td>94.71 &#x00B1; 0.35</td>
<td>98.38 &#x00B1; 0.07</td>
<td>97.58 &#x00B1; 0.22</td>
<td><bold>98.97 &#x00B1; 0.03</bold></td>
</tr>
<tr>
<td>0.3</td>
<td>96.17 &#x00B1; 0.24</td>
<td>96.17 &#x00B1; 0.12</td>
<td>97.36 &#x00B1; 0.14</td>
<td>96.13 &#x00B1; 0.16</td>
<td>98.58 &#x00B1; 0.07</td>
<td>98.40 &#x00B1; 0.18</td>
<td><bold>99.07 &#x00B1; 0.07</bold></td>
</tr>
<tr>
<td>0.5</td>
<td>96.76 &#x00B1; 0.03</td>
<td>96.73 &#x00B1; 0.09</td>
<td>97.62 &#x00B1; 0.01</td>
<td>96.78 &#x00B1; 0.05</td>
<td>98.52 &#x00B1; 0.10</td>
<td>98.51 &#x00B1; 0.20</td>
<td><bold>99.15 &#x00B1; 0.12</bold></td>
</tr>
<tr>
<td rowspan="4">EMINIST</td>
<td>0.05</td>
<td>72.07 &#x00B1; 0.61</td>
<td>69.33 &#x00B1; 0.51</td>
<td>60.6 &#x00B1; 31.1</td>
<td>71.61 &#x00B1; 0.49</td>
<td>66.76 &#x00B1; 0.25</td>
<td>70.6 &#x00B1; 0.54</td>
<td><bold>79.92 &#x00B1; 0.55</bold></td>
</tr>
<tr>
<td>0.1</td>
<td>73.18 &#x00B1; 0.39</td>
<td>70.79 &#x00B1; 0.74</td>
<td>75.71 &#x00B1; 0.26</td>
<td>73.39 &#x00B1; 0.39</td>
<td>70.04 &#x00B1; 1.08</td>
<td>71.8 &#x00B1; 0.32</td>
<td><bold>80.42 &#x00B1; 0.29</bold></td>
</tr>
<tr>
<td>0.3</td>
<td>75.43 &#x00B1; 0.42</td>
<td>73.71 &#x00B1; 0.37</td>
<td>77.22 &#x00B1; 0.21</td>
<td>75.09 &#x00B1; 0.21</td>
<td>73.64 &#x00B1; 0.66</td>
<td>74.22 &#x00B1; 0.77</td>
<td><bold>80.66 &#x00B1; 0.34</bold></td>
</tr>
<tr>
<td>0.5</td>
<td>76.79 &#x00B1; 0.08</td>
<td>75.67 &#x00B1; 0.38</td>
<td>78.47 &#x00B1; 0.18</td>
<td>76.67 &#x00B1; 0.37</td>
<td>76.12 &#x00B1; 0.61</td>
<td>76.06 &#x00B1; 0.36</td>
<td><bold>81.04 &#x00B1; 0.65</bold></td>
</tr>
<tr>
<td rowspan="4">SVHN</td>
<td>0.05</td>
<td>50.86 &#x00B1; 1.73</td>
<td>48.01 &#x00B1; 3.20</td>
<td>75.47 &#x00B1; 1.58</td>
<td>50.78 &#x00B1; 2.89</td>
<td>81.45 &#x00B1; 0.42</td>
<td>67.67 &#x00B1; 1.44</td>
<td><bold>88.38 &#x00B1; 0.52</bold></td>
</tr>
<tr>
<td>0.1</td>
<td>62.97 &#x00B1; 1.67</td>
<td>60.25 &#x00B1; 3.15</td>
<td>78.33 &#x00B1; 0.66</td>
<td>61.85 &#x00B1; 0.86</td>
<td>82.94 &#x00B1; 0.46</td>
<td>73.12 &#x00B1; 1.96</td>
<td><bold>88.12 &#x00B1; 0.44</bold></td>
</tr>
<tr>
<td>0.3</td>
<td>78.41 &#x00B1; 0.97</td>
<td>77.96 &#x00B1; 1.16</td>
<td>83.58 &#x00B1; 0.52</td>
<td>78.78 &#x00B1; 1.06</td>
<td>85.25 &#x00B1; 0.12</td>
<td>82.32 &#x00B1; 1.32</td>
<td><bold>88.90 &#x00B1; 0.19</bold></td>
</tr>
<tr>
<td>0.5</td>
<td>82.79 &#x00B1; 0.24</td>
<td>82.33 &#x00B1; 0.44</td>
<td>85.33 &#x00B1; 0.26</td>
<td>82.54 &#x00B1; 0.16</td>
<td>85.73 &#x00B1; 0.35</td>
<td>84.50 &#x00B1; 0.40</td>
<td><bold>89.28 &#x00B1; 0.13</bold></td>
</tr>
<tr>
<td rowspan="4">CIFAR-10</td>
<td>0.05</td>
<td>44.44 &#x00B1; 4.39</td>
<td>43.93 &#x00B1; 5.05</td>
<td>45.9 &#x00B1; 0.69</td>
<td>44.65 &#x00B1; 5.01</td>
<td>60.16 &#x00B1; 4.83</td>
<td>47.21 &#x00B1; 1.12</td>
<td><bold>71.87 &#x00B1; 1.45</bold></td>
</tr>
<tr>
<td>0.1</td>
<td>44.94 &#x00B1; 0.52</td>
<td>44.26 &#x00B1; 0.46</td>
<td>50.03 &#x00B1; 0.77</td>
<td>45.58 &#x00B1; 0.27</td>
<td>61.09 &#x00B1; 0.75</td>
<td>53.08 &#x00B1; 1.43</td>
<td><bold>73.65 &#x00B1; 0.41</bold></td>
</tr>
<tr>
<td>0.3</td>
<td>51.77 &#x00B1; 0.59</td>
<td>51.35 &#x00B1; 0.54</td>
<td>59.42 &#x00B1; 0.1</td>
<td>52.17 &#x00B1; 0.34</td>
<td>67.53 &#x00B1; 0.62</td>
<td>60.35 &#x00B1; 1.07</td>
<td><bold>75.43 &#x00B1; 0.43</bold></td>
</tr>
<tr>
<td>0.5</td>
<td>52.97 &#x00B1; 0.62</td>
<td>53.13 &#x00B1; 0.79</td>
<td>62.68 &#x00B1; 0.82</td>
<td>53.41 &#x00B1; 0.32</td>
<td>68.44 &#x00B1; 0.48</td>
<td>62.71 &#x00B1; 0.64</td>
<td><bold>75.76 &#x00B1; 0.32</bold></td>
</tr>
<tr>
<td rowspan="4">CIFAR-100</td>
<td>0.05</td>
<td>8.16 &#x00B1; 0.32</td>
<td>7.51 &#x00B1; 0.44</td>
<td>1.11 &#x00B1; 0.14</td>
<td>8.08 &#x00B1; 0.82</td>
<td>17.53 &#x00B1; 1.96</td>
<td>11.14 &#x00B1; 0.84</td>
<td><bold>28.08 &#x00B1; 1.27</bold></td>
</tr>
<tr>
<td>0.1</td>
<td>13.50 &#x00B1; 0.48</td>
<td>12.97 &#x00B1; 0.45</td>
<td>7.42 &#x00B1; 8.74</td>
<td>13.42 &#x00B1; 0.51</td>
<td>25.70 &#x00B1; 0.70</td>
<td>16.8 &#x00B1; 0.36</td>
<td><bold>33.05 &#x00B1; 1.30</bold></td>
</tr>
<tr>
<td>0.3</td>
<td>16.72 &#x00B1; 0.25</td>
<td>16.20 &#x00B1; 0.41</td>
<td>14.13 &#x00B1; 12.09</td>
<td>16.71 &#x00B1; 0.40</td>
<td>28.02 &#x00B1; 1.32</td>
<td>20.27 &#x00B1; 0.78</td>
<td><bold>34.27 &#x00B1; 1.69</bold></td>
</tr>
<tr>
<td>0.5</td>
<td>19.04 &#x00B1; 0.18</td>
<td>18.67 &#x00B1; 0.34</td>
<td>19.95 &#x00B1; 10.99</td>
<td>19.16 &#x00B1; 0.26</td>
<td>29.82 &#x00B1; 2.14</td>
<td>22.82 &#x00B1; 1.37</td>
<td><bold>35.09 &#x00B1; 1.92</bold></td>
</tr>
<tr>
<td rowspan="4">FMNIST</td>
<td>0.05</td>
<td>81.29 &#x00B1; 0.10</td>
<td>80.88 &#x00B1; 0.38</td>
<td>81.51 &#x00B1; 0.18</td>
<td>81.19 &#x00B1; 0.21</td>
<td><bold>83.27 &#x00B1; 0.10</bold></td>
<td>81.15 &#x00B1; 0.80</td>
<td>82.73 &#x00B1; 0.16</td>
</tr>
<tr>
<td>0.1</td>
<td>81.43 &#x00B1; 0.21</td>
<td>81.07 &#x00B1; 0.11</td>
<td>81.70 &#x00B1; 0.15</td>
<td>81.38 &#x00B1; 0.25</td>
<td><bold>83.40 &#x00B1; 0.06</bold></td>
<td>81.85 &#x00B1; 0.83</td>
<td>82.07 &#x00B1; 0.37</td>
</tr>
<tr>
<td>0.3</td>
<td>82.16 &#x00B1; 0.17</td>
<td>82.04 &#x00B1; 0.19</td>
<td>82.53 &#x00B1; 0.13</td>
<td>82.26 &#x00B1; 0.13</td>
<td><bold>83.62 &#x00B1; 0.06</bold></td>
<td>83.08 &#x00B1; 0.24</td>
<td>82.96 &#x00B1; 0.20</td>
</tr>
<tr>
<td>0.5</td>
<td>82.35 &#x00B1; 0.18</td>
<td>82.31 &#x00B1; 0.12</td>
<td>82.64 &#x00B1; 0.16</td>
<td>82.41 &#x00B1; 0.12</td>
<td><bold>83.34 &#x00B1; 0.07</bold></td>
<td>82.79 &#x00B1; 0.33</td>
<td>82.62 &#x00B1; 0.36</td>
</tr>
<tr>
<td rowspan="4">FEMNIST</td>
<td>0.05</td>
<td>74.18 &#x00B1; 3.33</td>
<td>70.18 &#x00B1; 2.27</td>
<td>67.65 &#x00B1; 27.25</td>
<td>69.82 &#x00B1; 10.07</td>
<td>72.08 &#x00B1; 7.30</td>
<td>76.02 &#x00B1; 1.07</td>
<td><bold>82.60 &#x00B1; 0.96</bold></td>
</tr>
<tr>
<td>0.1</td>
<td>78.91 &#x00B1; 1.29</td>
<td>73.94 &#x00B1; 0.69</td>
<td>81.26 &#x00B1; 0.29</td>
<td>78.28 &#x00B1; 1.18</td>
<td>66.71 &#x00B1; 10.12</td>
<td>78.08 &#x00B1; 0.81</td>
<td><bold>83.45 &#x00B1; 0.60</bold></td>
</tr>
<tr>
<td>0.3</td>
<td>79.83 &#x00B1; 1.47</td>
<td>73.50 &#x00B1; 6.94</td>
<td>66.85 &#x00B1; 33.68</td>
<td>79.38 &#x00B1; 2.50</td>
<td>74.51 &#x00B1; 4.14</td>
<td>78.99 &#x00B1; 0.56</td>
<td><bold>84.05 &#x00B1; 0.33</bold></td>
</tr>
<tr>
<td>0.5</td>
<td>80.17 &#x00B1; 0.89</td>
<td>66.72 &#x00B1; 16.80</td>
<td>79.61 &#x00B1; 6.40</td>
<td>79.07 &#x00B1; 3.70</td>
<td>72.77 &#x00B1; 7.99</td>
<td>80.09 &#x00B1; 0.36</td>
<td><bold>84.04 &#x00B1; 0.46</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Comparative experiments demonstrate that the proposed CFIC algorithm significantly improves testing accuracy under non-IID conditions through its intra-cluster calibration strategy. While demonstrating suboptimal performance on FMNIST, the proposed algorithm consistently outperformed state-of-the-art methods across all six remaining benchmark datasets. Specifically, CFIC achieved accuracy rates of 73.65%, 33.05%, and 83.45% on CIFAR-10, CIFAR-100, and FEMNIST benchmarks, surpassing all baseline algorithms. Notably, CFIC maintains efficient convergence and superior accuracy even under extreme label imbalance conditions. Although FedFa and FedMGDA&#x002B; demonstrate strong generalization capabilities in certain scenarios, their effectiveness diminishes significantly when handling severe data skewness, particularly under small alpha values where substantial performance degradation is observed.</p>
<p>(2) <bold>Communication efficiency</bold></p>
<p>To evaluate the communication efficiency of the CFIC algorithm, we measured the number of communication rounds required for each algorithm to reach the target accuracy on various datasets. The accuracy achieved by FedAvg in the last round was used as the benchmark, where &#x201C;0&#x201D; indicates that the algorithm failed to reach the target accuracy within a limited number of communication rounds. <xref ref-type="fig" rid="fig-3">Fig. 3</xref> illustrates the number of communication rounds needed for different algorithms to achieve this benchmark accuracy across various datasets. By incorporating a unique intra-cluster correction mechanism, the CFIC algorithm optimizes the direction of global model updates, significantly reducing the number of communication rounds. This advantage has been validated across seven different datasets and has significantly lowered the data transmission frequency during federated learning, effectively reducing communication costs and latency. These results demonstrate that CFIC improves both model performance and overall system efficiency in federated learning.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Comparison of communication efficiency</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64103-fig-3.tif"/>
</fig>
<p>(3) <bold>Client experiment</bold></p>
<p>To further validate the effectiveness of the CFIC algorithm in handling data heterogeneity, we conducted a series of experiments on the CIFAR-10 dataset, examining the impact of varying client numbers (20, 500) in a Dir(0.05) heterogeneous environment. The results, shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, indicate that as the number of clients increases, the CFIC algorithm not only adapts to large-scale federated learning environments but also maintains high accuracy and stability across all scales. Specifically, in an experiment with 20 clients, CFIC achieved an accuracy of approximately 37.19%; with 500 clients, the accuracy reached 52.85%, surpassing other algorithms. These outcomes robustly demonstrate CFIC&#x2019;s superior performance in non-IID data settings, highlighting its reliability and scalability under different scales and extreme data heterogeneity conditions.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Comparison of accuracy under different clients</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64103-fig-4.tif"/>
</fig>
<p>(4) <bold>Ablation experiment</bold></p>
<p>The CFIC algorithm is divided into two main parts, i.e., inter-cluster sampling and intra-cluster correction. During the inter-cluster sampling phase, it involves sampling members from each clustered group. This effectively reduces the impact caused by non-IID data among clients, thereby optimizing communication efficiency and the consistency of model training. In the model aggregation phase, the global model is adjusted through an intra-cluster correction strategy, enhancing the generalization performance of the global model. This phase critically affects the model&#x2019;s ability to handle heterogeneous data. To validate the effectiveness of the CFIC, we remove either of the two core components to evaluate their specific impact on model performance. The experimental results, as shown in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>, indicate that the removal of either component leads to a significant performance drop on both the CIFAR-100 and SVHN datasets.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Comparison of accuracy of ablation experiment. IC is the abbreviation for intra-cluster correction, and IS the abbreviation for inter-cluster sampling</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64103-fig-5.tif"/>
</fig>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>This paper presents a cluster-based correction federated learning algorithm that enhances the generalization ability of models in non-IID data scenarios through clustering based on label distribution characteristics and global model weight adjustment. To validate the effectiveness of the CFIC algorithm, extensive experiments were conducted. The results demonstrate that the CFIC algorithm has significant advantages in heterogeneous data scenarios and maintains high accuracy even with extreme label shifts, exhibiting minimal impact from data heterogeneity. However, there are certain limitations that must be acknowledged. First, although incorporating momentum helps improve stability during model training, research on optimizing momentum calculation to further enhance algorithm performance is still insufficient. Future work could explore more dynamic and adaptive momentum adjustment strategies. For example, automatic tuning of momentum parameters based on changes in client data distribution or model update gradients could achieve better convergence speed and generalization ability. Second, while this study uses a label distribution characteristic extraction function for client clustering, existing analyses have not thoroughly examined the effectiveness of these feature extraction functions under various types of label distribution skew and their relationship with model performance. Future work should also develop feature representation methods that adapt to multiple label skew patterns and clarify the impact of the feature extraction mechanism on the overall performance of the federated learning system through theoretical analysis and experimental evidence. Last, the scale and complexity of clients in the experimental scenarios do not match real-world applications. For instance, practical implementations often involve tens of thousands or more edge devices, and client computing conditions and communication conditions may be affected by many uncertain tasks. Therefore, future work should be conducted in realistic complex application scenarios to further verify the robustness and scalability of the CFIC algorithm.</p>
</sec>
</body>
<back>
<ack>
<p>None.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>This work was supported by National Natural Science Foundation of China under Grant (No. 62277043) and Science and Technology Research Project of Chongqing Education Commission under Grant (No. KJZD-K202300515).</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>Yunong Yang: Conceptualization, funding acquisition, methodology, writing&#x2014;original draft; Long Ma: data curation, formal analysis, investigation, software, writing&#x2014;original draft, validation; Liang Fan: resources, supervision, validation; Tao Xie: conceptualization, funding acquisition, methodology, writing&#x2014;review and editing, supervision. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>Data supporting the results presented in this article are available from the corresponding author upon reasonable request.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sery</surname> <given-names>T</given-names></string-name>, <string-name><surname>Shlezinger</surname> <given-names>N</given-names></string-name>, <string-name><surname>Cohen</surname> <given-names>K</given-names></string-name>, <string-name><surname>Eldar</surname> <given-names>YC</given-names></string-name></person-group>. <article-title>Over-the-air federated learning from heterogeneous data</article-title>. <source>IEEE Trans Signal Process</source>. <year>2021</year>;<volume>69</volume>:<fpage>3796</fpage>&#x2013;<lpage>811</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tsp.2021.3090323</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kairouz</surname> <given-names>P</given-names></string-name>, <string-name><surname>McMahan</surname> <given-names>HB</given-names></string-name>, <string-name><surname>Avent</surname> <given-names>B</given-names></string-name>, <string-name><surname>Bellet</surname> <given-names>A</given-names></string-name>, <string-name><surname>Bennis</surname> <given-names>M</given-names></string-name>, <string-name><surname>Bhagoji</surname> <given-names>AN</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Advances and open problems in federated learning</article-title>. <source>Found Trends&#x00AE; Mach Learn</source>. <year>2021</year>;<volume>14</volume>(<issue>1&#x2013;2</issue>):<fpage>1</fpage>&#x2013;<lpage>210</lpage>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Bonawitz</surname> <given-names>K</given-names></string-name>, <string-name><surname>Ivanov</surname> <given-names>V</given-names></string-name>, <string-name><surname>Kreuter</surname> <given-names>B</given-names></string-name>, <string-name><surname>Marcedone</surname> <given-names>A</given-names></string-name>, <string-name><surname>McMahan</surname> <given-names>HB</given-names></string-name>, <string-name><surname>Patel</surname> <given-names>S</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Practical secure aggregation for privacy-preserving machine learning</article-title>. In: <conf-name>Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security</conf-name>; <year>2017 Oct 30&#x2013;Nov 3</year>; <publisher-loc>Dallas, TX, USA</publisher-loc>. p. <fpage>1175</fpage>&#x2013;<lpage>91</lpage>. doi:<pub-id pub-id-type="doi">10.1145/3133956.3133982</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>McMahan</surname> <given-names>B</given-names></string-name>, <string-name><surname>Moore</surname> <given-names>E</given-names></string-name>, <string-name><surname>Ramage</surname> <given-names>D</given-names></string-name>, <string-name><surname>Hampson</surname> <given-names>S</given-names></string-name>, <string-name><surname>Arcas</surname> <given-names>BA</given-names></string-name></person-group>. <article-title>Communication-efficient learning of deep networks from decentralized data</article-title>. In: <conf-name>Proceedings of the Artificial Intelligence and Statistics</conf-name>; <year>2017 Apr 20&#x2013;22</year>; <publisher-loc>Fort. Lauderdale, FL, USA</publisher-loc>. p. <fpage>1273</fpage>&#x2013;<lpage>82</lpage>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Karimireddy</surname> <given-names>SP</given-names></string-name>, <string-name><surname>Kale</surname> <given-names>S</given-names></string-name>, <string-name><surname>Mohri</surname> <given-names>M</given-names></string-name>, <string-name><surname>Reddi</surname> <given-names>S</given-names></string-name>, <string-name><surname>Stich</surname> <given-names>S</given-names></string-name>, <string-name><surname>Suresh</surname> <given-names>AT</given-names></string-name></person-group>. <article-title>SCAFFOLD: stochastic controlled averaging for federated learning</article-title>. In: <conf-name>Proceedings of the International Conference on Machine Learning</conf-name>; <year>2020 Jul 13&#x2013;18</year>. p. <fpage>5132</fpage>&#x2013;<lpage>43</lpage>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Diao</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Q</given-names></string-name>, <string-name><surname>He</surname> <given-names>B</given-names></string-name></person-group>. <article-title>Federated learning on non-IID data silos: an experimental study</article-title>. In: <conf-name>Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE)</conf-name>; <year>2020 May 9&#x2013;12</year>; <publisher-loc>Piscataway, NJ, USA</publisher-loc>: 2022. <publisher-name>IEEE</publisher-name>. p. <fpage>965</fpage>&#x2013;<lpage>78</lpage>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Duan</surname> <given-names>M</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>D</given-names></string-name>, <string-name><surname>Ji</surname> <given-names>X</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>R</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>X</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>FedGroup: efficient clustered federated learning via decomposed data-driven measure</article-title>. <comment>arXiv:2010.06870. 2020</comment>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Guo</surname> <given-names>J</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Fan</surname> <given-names>J</given-names></string-name>, <string-name><surname>Lam</surname> <given-names>KY</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Dynamic user clustering for efficient and privacy-preserving federated learning</article-title>. <source>IEEE Trans Dependable Secur Comput</source>. <year>2024</year>;<fpage>1</fpage>&#x2013;<lpage>12</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tdsc.2024.3355458</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhou</surname> <given-names>T</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Tsang</surname> <given-names>DHK</given-names></string-name></person-group>. <article-title>FedFA: federated learning with feature anchors to align features and classifiers for heterogeneous data</article-title>. <source>IEEE Trans Mob Comput</source>. <year>2023</year>;<volume>23</volume>(<issue>6</issue>):<fpage>6731</fpage>&#x2013;<lpage>42</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tmc.2023.3325366</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sun</surname> <given-names>W</given-names></string-name>, <string-name><surname>Yan</surname> <given-names>R</given-names></string-name>, <string-name><surname>Jin</surname> <given-names>R</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>R</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Z</given-names></string-name></person-group>. <article-title>FedAlign: federated model alignment via data-free knowledge distillation for machine fault diagnosis</article-title>. <source>IEEE Trans Instrum Meas</source>. <year>2023</year>;<volume>73</volume>:<fpage>1</fpage>&#x2013;<lpage>12</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tim.2023.3345910</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>W</given-names></string-name>, <string-name><surname>Guo</surname> <given-names>K</given-names></string-name>, <string-name><surname>Shao</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Personalized federated learning via variational bayesian inference</article-title>. In: <conf-name>Proceedings of the International Conference on Machine Learning</conf-name>; <year>2022 Jul 17&#x2013;23</year>. p. <fpage>26293</fpage>&#x2013;<lpage>310</lpage>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kotelevskii</surname> <given-names>N</given-names></string-name>, <string-name><surname>Vono</surname> <given-names>M</given-names></string-name>, <string-name><surname>Durmus</surname> <given-names>A</given-names></string-name>, <string-name><surname>Moulines</surname> <given-names>E</given-names></string-name></person-group>. <article-title>Fedpop: a bayesian approach for personalised federated learning</article-title>. <source>Adv Neural Inf Process Syst</source>. <year>2022</year>;<volume>35</volume>:<fpage>8687</fpage>&#x2013;<lpage>701</lpage>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>X</given-names></string-name>, <string-name><surname>Blaschko</surname> <given-names>MB</given-names></string-name></person-group>. <article-title>Confidence-aware personalized federated learning via variational expectation maximization</article-title>. In: <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>; <year>2023 Jun 17&#x2013;24</year>; <publisher-loc>Vancouver, BC, Canada</publisher-loc>. p. <fpage>24542</fpage>&#x2013;<lpage>51</lpage>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>T</given-names></string-name>, <string-name><surname>Sahu</surname> <given-names>AK</given-names></string-name>, <string-name><surname>Talwalkar</surname> <given-names>A</given-names></string-name>, <string-name><surname>Smith</surname> <given-names>V</given-names></string-name></person-group>. <article-title>Federated learning: challenges, methods, and future directions</article-title>. <source>IEEE Signal Process Mag</source>. <year>2020</year>;<volume>37</volume>(<issue>3</issue>):<fpage>50</fpage>&#x2013;<lpage>60</lpage>. doi:<pub-id pub-id-type="doi">10.1109/msp.2020.2975749</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yan</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Wicaksana</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Cheng</surname> <given-names>KT</given-names></string-name></person-group>. <article-title>Variation-aware federated learning with multi-source decentralized medical image data</article-title>. <source>IEEE J Biomed Health Inform</source>. <year>2020</year>;<volume>25</volume>(<issue>7</issue>):<fpage>2615</fpage>&#x2013;<lpage>28</lpage>. doi:<pub-id pub-id-type="doi">10.1109/jbhi.2020.3040015</pub-id>; <pub-id pub-id-type="pmid">33232246</pub-id></mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhou</surname> <given-names>X</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>W</given-names></string-name>, <string-name><surname>She</surname> <given-names>J</given-names></string-name>, <string-name><surname>Yan</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Kevin</surname> <given-names>I</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>K</given-names></string-name></person-group>. <article-title>Two-layer federated learning with heterogeneous model aggregation for 6G supported internet of vehicles</article-title>. <source>IEEE Trans Veh Technol</source>. <year>2021</year>;<volume>70</volume>(<issue>6</issue>):<fpage>5308</fpage>&#x2013;<lpage>17</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tvt.2021.3077893</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ma</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>S</given-names></string-name>, <string-name><surname>Qin</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>A state-of-the-art survey on solving non-IID data in federated learning</article-title>. <source>Future Gener Comput Syst</source>. <year>2022</year>;<volume>135</volume>(<issue>3</issue>):<fpage>244</fpage>&#x2013;<lpage>58</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.future.2022.05.003</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Tao</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>H</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Enhancing dropout prediction in distributed educational data using learning pattern awareness: a federated learning approach</article-title>. <source>Mathematics</source>. <year>2023</year>;<volume>11</volume>(<issue>24</issue>):<fpage>4977</fpage>. doi:<pub-id pub-id-type="doi">10.3390/math11244977</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chu</surname> <given-names>YW</given-names></string-name>, <string-name><surname>Hosseinalipour</surname> <given-names>S</given-names></string-name>, <string-name><surname>Tenorio</surname> <given-names>E</given-names></string-name>, <string-name><surname>Cruz</surname> <given-names>L</given-names></string-name>, <string-name><surname>Douglas</surname> <given-names>K</given-names></string-name>, <string-name><surname>Lan</surname> <given-names>AS</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Multi-layer personalized federated learning for mitigating biases in student predictive analytics</article-title>. <source>IEEE Trans Emerg Top Comput</source>. <year>2024</year>;<fpage>1</fpage>&#x2013;<lpage>15</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tetc.2024.3407716</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>T</given-names></string-name>, <string-name><surname>Sahu</surname> <given-names>AK</given-names></string-name>, <string-name><surname>Zaheer</surname> <given-names>M</given-names></string-name>, <string-name><surname>Sanjabi</surname> <given-names>M</given-names></string-name>, <string-name><surname>Talwalkar</surname> <given-names>A</given-names></string-name>, <string-name><surname>Smith</surname> <given-names>V</given-names></string-name></person-group>. <article-title>Federated optimization in heterogeneous networks</article-title>. <source>Proc Mach Learn Syst</source>. <year>2020</year>;<volume>2</volume>:<fpage>429</fpage>&#x2013;<lpage>50</lpage>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>Q</given-names></string-name>, <string-name><surname>He</surname> <given-names>B</given-names></string-name>, <string-name><surname>Song</surname> <given-names>D</given-names></string-name></person-group>. <article-title>Model-contrastive federated learning</article-title>. In: <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>; <year>2021 Jun 20&#x2013;25</year>; <publisher-loc>Nashville, TN, USA</publisher-loc>. p. <fpage>10713</fpage>&#x2013;<lpage>22</lpage>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Li</surname> <given-names>B</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ding</surname> <given-names>S</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Federated learning with label distribution skew via logits calibration</article-title>. In: <conf-name>Proceedings of the International Conference on Machine Learning</conf-name>; <year>2022 Jul 17&#x2013;23</year>. p. <fpage>26311</fpage>&#x2013;<lpage>29</lpage>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Joshi</surname> <given-names>G</given-names></string-name>, <string-name><surname>Poor</surname> <given-names>HV</given-names></string-name></person-group>. <article-title>Tackling the objective inconsistency problem in heterogeneous federated optimization</article-title>. <source>Adv Neural Inf Process Syst</source>. <year>2020</year>;<volume>33</volume>:<fpage>7611</fpage>&#x2013;<lpage>23</lpage>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Gao</surname> <given-names>D</given-names></string-name>, <string-name><surname>Yao</surname> <given-names>X</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Q</given-names></string-name></person-group>. <article-title>A survey on heterogeneous federated learning</article-title>. <comment>arXiv:2210.04505. 2022</comment>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Towards efficient scheduling of federated mobile devices under computational and statistical heterogeneity</article-title>. <source>IEEE Trans Parallel Distrib Syst</source>. <year>2020</year>;<volume>32</volume>(<issue>2</issue>):<fpage>394</fpage>&#x2013;<lpage>410</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tpds.2020.3023905</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Madni</surname> <given-names>HA</given-names></string-name>, <string-name><surname>Umer</surname> <given-names>RM</given-names></string-name>, <string-name><surname>Foresti</surname> <given-names>GL</given-names></string-name></person-group>. <article-title>Federated learning for data and model heterogeneity in medical imaging</article-title>. In: <conf-name>Proceedings of the International Conference on Image Analysis and Processing</conf-name>; <year>2023 Sep 11&#x2013;15</year>; <publisher-loc>Berlin/Heidelberg, Germany</publisher-loc>: <publisher-name>Springer</publisher-name>; 2023. p. <fpage>167</fpage>&#x2013;<lpage>78</lpage>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Cheng</surname> <given-names>N</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>L</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>R</given-names></string-name>, <string-name><surname>Chai</surname> <given-names>R</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>N</given-names></string-name></person-group>. <article-title>Digital twin-assisted knowledge distillation framework for heterogeneous federated learning</article-title>. <source>China Commun</source>. <year>2023</year>;<volume>20</volume>(<issue>2</issue>):<fpage>61</fpage>&#x2013;<lpage>78</lpage>. doi:<pub-id pub-id-type="doi">10.23919/jcc.2023.02.005</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ouyang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Xie</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>J</given-names></string-name>, <string-name><surname>Xing</surname> <given-names>G</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>J</given-names></string-name></person-group>. <article-title>ClusterFL: a clustering-based federated learning system for human activity recognition</article-title>. <source>ACM Trans Sens Netw</source>. <year>2022</year>;<volume>19</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>32</lpage>. doi:<pub-id pub-id-type="doi">10.1145/3554980</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Vahidian</surname> <given-names>S</given-names></string-name>, <string-name><surname>Morafah</surname> <given-names>M</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Kungurtsev</surname> <given-names>V</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>C</given-names></string-name>, <string-name><surname>Shah</surname> <given-names>M</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Efficient distribution similarity identification in clustered federated learning via principal angles between client data subspaces</article-title>. In: <conf-name>Proceedings of the AAAI Conference on Artificial Intelligence</conf-name>; <year>2023 Feb 7&#x2013;14</year>; <publisher-loc>Washington, DC, USA</publisher-loc>. p. <fpage>10043</fpage>&#x2013;<lpage>52</lpage>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Clustered federated learning based on nonconvex pairwise fusion</article-title>. <source>Inf Sci</source>. <year>2024</year>;<volume>678</volume>:<fpage>120956</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ins.2024.120956</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Lu</surname> <given-names>R</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Zhong</surname> <given-names>X</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>H</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Auction-based cluster federated learning in mobile edge computing systems</article-title>. <source>IEEE Trans Parallel Distrib Syst</source>. <year>2023</year>;<volume>34</volume>(<issue>4</issue>):<fpage>1145</fpage>&#x2013;<lpage>58</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tpds.2023.3240767</pub-id>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sattler</surname> <given-names>F</given-names></string-name>, <string-name><surname>M&#x00FC;ller</surname> <given-names>KR</given-names></string-name>, <string-name><surname>Samek</surname> <given-names>W</given-names></string-name></person-group>. <article-title>Clustered federated learning: model-agnostic distributed multitask optimization under privacy constraints</article-title>. <source>IEEE Trans Neural Netw Learn Syst</source>. <year>2020</year>;<volume>32</volume>(<issue>8</issue>):<fpage>3710</fpage>&#x2013;<lpage>22</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tnnls.2020.3015958</pub-id>; <pub-id pub-id-type="pmid">32833654</pub-id></mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ghosh</surname> <given-names>A</given-names></string-name>, <string-name><surname>Chung</surname> <given-names>J</given-names></string-name>, <string-name><surname>Yin</surname> <given-names>D</given-names></string-name>, <string-name><surname>Ramchandran</surname> <given-names>K</given-names></string-name></person-group>. <article-title>An efficient framework for clustered federated learning</article-title>. <source>IEEE Trans Inf Theory</source>. <year>2022</year>;<volume>68</volume>(<issue>12</issue>):<fpage>8076</fpage>&#x2013;<lpage>91</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tit.2022.3192506</pub-id>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Long</surname> <given-names>G</given-names></string-name>, <string-name><surname>Xie</surname> <given-names>M</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>T</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>T</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Multi-center federated learning: clients clustering for better personalization</article-title>. <source>World Wide Web</source>. <year>2023</year>;<volume>26</volume>(<issue>1</issue>):<fpage>481</fpage>&#x2013;<lpage>500</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s11280-022-01046-x</pub-id>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Ye</surname> <given-names>R</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>C</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>S</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>FedDisco: federated learning with discrepancy-aware collaboration</article-title>. <comment>arXiv.2305.19229. 2023</comment>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Nabavirazavi</surname> <given-names>S</given-names></string-name>, <string-name><surname>Taheri</surname> <given-names>R</given-names></string-name>, <string-name><surname>Shojafar</surname> <given-names>M</given-names></string-name>, <string-name><surname>Sitharama Iyengar</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Impact of aggregation function randomization against model poisoning in federated learning</article-title>. In: <conf-name>Proceedings of the 22nd IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom</conf-name>; <year>2023 Nov 1&#x2013;3</year>; <publisher-loc>Exeter, UK. Piscataway, NJ, USA</publisher-loc>: <publisher-name>Institute of Electrical and Electronics Engineers Inc.</publisher-name>; 2024. p. <fpage>165</fpage>&#x2013;<lpage>72</lpage>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Taheri</surname> <given-names>R</given-names></string-name>, <string-name><surname>Arabikhan</surname> <given-names>F</given-names></string-name>, <string-name><surname>Gegov</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Robust aggregation function in federated learning</article-title>. In: <conf-name>Proceedings of the International Conference on Information and Knowledge Systems</conf-name>; <year>2023 Jun 22&#x2013;23</year>; <publisher-loc>Portsmouth, UK. Cham, Switzerland</publisher-loc>: <publisher-name>Springer Nature</publisher-name>; 2023. p. <fpage>168</fpage>&#x2013;<lpage>75</lpage>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>LeCun</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Bottou</surname> <given-names>L</given-names></string-name>, <string-name><surname>Bengio</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Haffner</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Gradient-based learning applied to document recognition</article-title>. <source>Proc IEEE</source>. <year>1998</year>;<volume>86</volume>(<issue>11</issue>):<fpage>2278</fpage>&#x2013;<lpage>324</lpage>. doi:<pub-id pub-id-type="doi">10.1109/5.726791</pub-id>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Krizhevsky</surname> <given-names>A</given-names></string-name>, <string-name><surname>Hinton</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Learning multiple layers of features from tiny images</article-title>; <year>2009 [cited 2025 May 18]</year>. Available from: <ext-link ext-link-type="uri" xlink:href="https://www.cs.utoronto.ca/&#x007E;kriz/learning-features-2009-TR.pdf">https://www.cs.utoronto.ca/&#x007E;kriz/learning-features-2009-TR.pdf</ext-link>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Cohen</surname> <given-names>G</given-names></string-name>, <string-name><surname>Afshar</surname> <given-names>S</given-names></string-name>, <string-name><surname>Tapson</surname> <given-names>J</given-names></string-name>, <string-name><surname>van Schaik</surname> <given-names>A</given-names></string-name></person-group>. <article-title>EMNIST: anextension of mnist to handwritten letters</article-title>. <comment>arXiv:1702.05373. 2017</comment>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Netzer</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Coates</surname> <given-names>A</given-names></string-name>, <string-name><surname>Bissacco</surname> <given-names>A</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>B</given-names></string-name>, <string-name><surname>Ng</surname> <given-names>AY</given-names></string-name></person-group>. <article-title>Reading digits in natural images with unsupervised feature learning</article-title>. In: <conf-name>Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning</conf-name>; <year>2011 Dec 12&#x2013;17</year>; <publisher-loc>Granada, Spain</publisher-loc>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Acar</surname> <given-names>DAE</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Navarro</surname> <given-names>RM</given-names></string-name>, <string-name><surname>Mattina</surname> <given-names>M</given-names></string-name>, <string-name><surname>Whatmough</surname> <given-names>PN</given-names></string-name>, <string-name><surname>Saligrama</surname> <given-names>V</given-names></string-name></person-group>. <article-title>Federated learning based on dynamic regularization</article-title>. <comment>arXiv:2111.04263. 2021</comment>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Huang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Li</surname> <given-names>T</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>D</given-names></string-name>, <string-name><surname>Du</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Fairness and accuracy in federated learning</article-title>. <comment>arXiv:2012.10069. 2020</comment>.</mixed-citation></ref>
<ref id="ref-44"><label>[44]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Shaloudegi</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>G</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Federated learning meets multi-objective optimization</article-title>. <source>IEEE Trans Netw Sci Eng</source>. <year>2022</year>;<volume>9</volume>(<issue>4</issue>):<fpage>2039</fpage>&#x2013;<lpage>51</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tnse.2022.3169117</pub-id>.</mixed-citation></ref>
<ref id="ref-45"><label>[45]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Fraboni</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Vidal</surname> <given-names>R</given-names></string-name>, <string-name><surname>Kameni</surname> <given-names>L</given-names></string-name>, <string-name><surname>Lorenzi</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Clustered sampling: low-variance and improved representativity for clients selection in federated learning</article-title>. In: <conf-name>Proceedings of the International Conference on Machine Learning</conf-name>; <year>2021 Jul 12&#x2013;14</year>. p. <fpage>3407</fpage>&#x2013;<lpage>16</lpage>.</mixed-citation></ref>
</ref-list>
</back></article>