<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">71532</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.071532</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Multi-Criteria Discovery of Communities in Social Networks Based on Services</article-title>
<alt-title alt-title-type="left-running-head">Multi-Criteria Discovery of Communities in Social Networks Based on Services</alt-title>
<alt-title alt-title-type="right-running-head">Multi-Criteria Discovery of Communities in Social Networks Based on Services</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Boudjebbour</surname><given-names>Karim</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Belkhir</surname><given-names>Abdelkader</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-3" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Kheddar</surname><given-names>Hamza</given-names></name><xref ref-type="aff" rid="aff-2">2</xref><xref rid="cor1" ref-type="corresp">&#x002A;</xref><email>kheddar.hamza@univ-medea.dz</email></contrib>
<aff id="aff-1"><label>1</label><institution>Laboratory of Computer Systems, University of Sciences and Technology Houari Boumediene (USTHB)</institution>, <addr-line>Algiers, 16009</addr-line>, <country>Algeria</country></aff>
<aff id="aff-2"><label>2</label><institution>Laboratory of Advanced Electronic Systems (LSEA), University of Medea</institution>, <addr-line>Medea, 26000</addr-line>, <country>Algeria</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Hamza Kheddar. Email: <email>kheddar.hamza@univ-medea.dz</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2026</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>12</day><month>1</month><year>2026</year>
</pub-date>
<volume>86</volume>
<issue>3</issue>
<elocation-id>39</elocation-id>
<history>
<date date-type="received">
<day>06</day>
<month>08</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>17</day>
<month>10</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_71532.pdf"></self-uri>
<abstract>
<p>Identifying the community structure of complex networks is crucial to extracting insights and understanding network properties. Although several community detection methods have been proposed, many are unsuitable for social networks due to significant limitations. Specifically, most approaches depend mainly on user&#x2013;user structural links while overlooking service-centric, semantic, and multi-attribute drivers of community formation, and they also lack flexible filtering mechanisms for large-scale, service-oriented settings. Our proposed approach, called community discovery-based service (CDBS), leverages user profiles and their interactions with consulted web services. The method introduces a novel similarity measure, global similarity interaction profile (GSIP), which goes beyond typical similarity measures by unifying user and service profiles for all attributes types into a coherent representation, thereby clarifying its novelty and contribution. It applies multiple filtering criteria related to user attributes, accessed services, and interaction patterns. Experimental comparisons against Louvain, Hierarchical Agglomerative Clustering, Label Propagation and Infomap show that CDBS reveals the higher performance as it achieves 0.74 modularity, 0.13 conductance, 0.77 coverage, and significantly fast response time of 9.8 s, even with 10,000 users and 400 services. Moreover, community discovery-based service consistently detects a larger number of communities with distinct topics of interest, underscoring its capacity to generate detailed and efficient structures in complex networks. These results confirm both the efficiency and effectiveness of the proposed method. Beyond controlled evaluation, communities discovery based service is applicable to targeted recommendations, group-oriented marketing, access control, and service personalization, where communities are shaped not only by user links but also by service engagement.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Social network</kwd>
<kwd>communities discovery</kwd>
<kwd>complex network</kwd>
<kwd>clustering</kwd>
<kwd>web services</kwd>
<kwd>similarity measure</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Community discovery is a long standing challenge at the heart of complex network research, with successful applications spanning various scientific fields such as physics, biology, sociology, social sciences, mathematics, and computer science. Research efforts have been devoted to developing methods and algorithms that can effectively reveal this hidden structure of a network [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-2">2</xref>].</p>
<p>Over the last decade the literature has advanced along several complementary directions. Classical topology-based algorithms and modularity-maximization methods emphasize structural cohesion but ignore node attributes and semantics [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-3">3</xref>]. Spectral and embedding approaches incorporate attribute or typed-edge information via low-dimensional representations, improving detection when rich node features exist but often increasing computational cost and sensitivity to class imbalance [<xref ref-type="bibr" rid="ref-2">2</xref>,<xref ref-type="bibr" rid="ref-4">4</xref>]. Probabilistic and generative models tackle edge formation and attribute likelihoods in a principled way (e.g., reciprocity-aware models), yet they commonly assume binary or stationary networks and face scalability limits [<xref ref-type="bibr" rid="ref-5">5</xref>]. Hybrid content&#x002B;structure methods that add textual or topical signals recover semantically coherent groups but tend to be computationally intensive and domain-sensitive [<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-7">7</xref>]. Other research streams explore graph compression and coarsening for large networks [<xref ref-type="bibr" rid="ref-8">8</xref>], random-walk/evolutionary optimisers [<xref ref-type="bibr" rid="ref-9">9</xref>], and motif/hypergraph refinements and Ricci-flow approaches to capture higher-order structure [<xref ref-type="bibr" rid="ref-10">10</xref>,<xref ref-type="bibr" rid="ref-11">11</xref>]; each reduces specific failure modes but introduces trade-offs in cost, parameter sensitivity, or restricted applicability. Across these strands the recurring gaps are clear: many methods (i) rely mainly on user&#x2013;user topology and thus miss cohorts formed by shared service usage [<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-12">12</xref>] (for example, customers of the same product or students of the same course), (ii) do not natively handle heterogeneous, multi-valued and semantic attributes together, (iii) provide limited task-driven filtering or interpretability, and (iv) face a scalability vs. expressiveness trade-off that complicates deployment on service-rich platforms. These open problems motivate a service-aware, multi-criteria, and interpretable discovery framework. To address these shortcomings, CDBS explicitly constructs a user&#x2013;service bipartite view and introduces the global similarity interaction profile (GSIP): a single, interpretable, multi-criteria similarity that fuses numeric, categorical, multi-valued, access-intensity, and semantic/textual signals. Unlike standard measures such as Jaccard or cosine, GSIP supports attribute weighting, tolerates missing values without heavy preprocessing, and provides per-attribute interpretability&#x2014;enabling discovery of semantically coherent communities even when direct user&#x2013;user ties are weak or absent.</p>
<p>To summarize, this paper introduces CDBS, a novel approach that detects communities from user interactions with web services. The main contributions of the proposed CDBS are as follows:
<list list-type="bullet">
<list-item>
<p>The CDBS framework for service-aware community discovery that integrates user profiles, service profiles, and access interactions with flexible multi-criteria filtering.</p></list-item>
<list-item>
<p>To the best of the authors&#x2019; knowledge, GSIP is introduced here for the first time as a novel similarity metric, which handles heterogeneous attribute types (including semantic text via linked open data (LOD)-based similarity measure approach), supports attribute weighting, and tolerates missing values without costly transformations.</p></list-item>
<list-item>
<p>Efficient graph-construction and filtering procedures that preserve interpretability (communities tied to service domains) and improve scalability for large datasets.</p></list-item>
<list-item>
<p>A comprehensive experimental evaluation and comparison with established baselines (Louvain, HAC), demonstrating improved discovery granularity, stronger modularity, and competitive runtime on large scenarios.</p></list-item>
</list></p>
<p>By bridging these dimensions, CDBS provides a more realistic framework for community discovery in modern social platforms, especially in online social networks.</p>
<p>The paper is structured into five sections: <xref ref-type="sec" rid="s2">Section 2</xref> presents some similar work to show the evolution of research in this area and the difference between these works. <xref ref-type="sec" rid="s3">Section 3</xref> details the architecture processes of the proposed approach for discovering communities CDBS. In <xref ref-type="sec" rid="s4">Section 4</xref>, the idea is implemented on a platform to observe the actual results of this approach and compare it with robust algorithms in the field of community discovery. Finally, a conclusion and some perspectives close this paper in <xref ref-type="sec" rid="s5">Section 5</xref>.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>With the development of service computing, searching communities&#x2019; discovery in social networks according to users&#x2019; interaction is becoming a significant research topic. The work [<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-14">14</xref>] provides a comprehensive review of community detection methods. Reference [<xref ref-type="bibr" rid="ref-13">13</xref>] presents a classification of community detection algorithms in social networks, evaluating their performance on real datasets, and highlighting future research paths. Moving on, the research in [<xref ref-type="bibr" rid="ref-14">14</xref>] focuses on quality metrics for community detection, discussing limitations in current metrics and the need for more robust ones, especially for handling small communities and scalability challenges in large networks.</p>
<p>In the context of <italic>community discovery</italic>, Akachar et al. in [<xref ref-type="bibr" rid="ref-6">6</xref>] propose a method for community discovery in social networks that integrates both content and structural information. It exploits user-generated texts to determine topics of interest using statistical and semantic metrics such as the Chir statistic and conditional mutual information. These topics are used to divide users into distinct groups, each representing a specific topic. Within each group, a static algorithm like Louvain is applied to detect tightly interconnected users. This hybrid approach enhances detection accuracy, but may struggle with vague or domain-specific terms and can be computationally intensive due to the need for semantic analysis and clustering of large amounts of text.</p>
<p>In the context of <italic>community detection</italic>, research on has explored diverse strategies to balance scalability, accuracy, and adaptability. Zhao et al. [<xref ref-type="bibr" rid="ref-8">8</xref>] introduced a graph compression approach that reduces network size by merging low-degree vertices, offering competitive performance on large-scale networks. However, its focus on undirected graphs and the loss of structural nuances limit its applicability to more complex scenarios. Moving on, Alfaqeeh and Skillicorn [<xref ref-type="bibr" rid="ref-4">4</xref>] advanced this concept by embedding typed graphs to unify multiple attribute modalities; while effective on large, rich datasets like Instagram, this method incurs high computational overhead and may underperform on smaller or imbalanced data.</p>
<p>Other studies shifted toward probabilistic and attribute-aware models. Contisciani et al. [<xref ref-type="bibr" rid="ref-5">5</xref>] developed a probabilistic generative model leveraging reciprocity, achieving strong edge prediction but remaining constrained to binary networks. Cai et al. [<xref ref-type="bibr" rid="ref-7">7</xref>] addressed structural and attribute integration through a novel similarity measure, termed an importance score, which reflects the density around each node. This approach improves detection accuracy but comes at the cost of computational scalability, particularly for large networks or noisy attributes.</p>
<p>Amin et al. [<xref ref-type="bibr" rid="ref-15">15</xref>] introduced a global-local model with Eigen-based influential node detection and label propagation, eliminating the need for predefined parameters. Although efficient in many scenarios, the randomness inherent to label propagation introduces instability and reduces accuracy for overlapping communities in large networks. Similarly, Dabaghi-Zarandi and KamaliPour [<xref ref-type="bibr" rid="ref-16">16</xref>] proposed a hybrid local-global approach for community merging, which improved modularity and density metrics but relied heavily on fixed thresholds, making it less flexible for networks with overlapping or ambiguous boundaries.</p>
<p>Recent approaches have also combined structural, interaction-based, and topological refinements to improve accuracy and robustness. Sayari et al. [<xref ref-type="bibr" rid="ref-17">17</xref>] introduced a robust community detection method combining user interaction degree, structural similarity, and frequent pattern mining. Experiments showed superior accuracy, robustness, and performance compared to five existing approaches. However, this method is sensitive to parameter settings, and assumes static networks. Later, the same authors proposed CDILPV [<xref ref-type="bibr" rid="ref-18">18</xref>], a robust community detection method that integrates user interactions and structural measures, introducing vertical pattern mining and a hybrid metric for direct and indirect membership calculation to generate denser and more resilient communities, outperforming existing approaches in accuracy and robustness. Nonetheless, CDILPV remains limited to static snapshots, underexplored across diverse domains, and dependent on high-quality interaction data. Dabaghi-Zarandi et al. [<xref ref-type="bibr" rid="ref-9">9</xref>] employed random walks and evolutionary optimization to refine partitions, achieving higher modularity and NMI scores. Although both methods [<xref ref-type="bibr" rid="ref-9">9</xref>,<xref ref-type="bibr" rid="ref-18">18</xref>] enhance detection performance, they remain computationally demanding and parameter-sensitive, with the former further constrained by its reliance on large-scale interaction data and the latter restricted to non-overlapping communities. Likewise, Karampour et al. [<xref ref-type="bibr" rid="ref-10">10</xref>] applied discrete Ricci flow with spatial&#x2013;temporal features to capture geometric and fine-grained patterns, while Madhulika et al. [<xref ref-type="bibr" rid="ref-19">19</xref>] stabilized label propagation using motif-based hypergraph reweighting and similarity-driven propagation. These methods achieve high modularity, robustness, and stability but remain computationally intensive and less scalable, especially in large or sparse networks. Finally, Khawaja et al. [<xref ref-type="bibr" rid="ref-11">11</xref>] proposed a common-neighbor based refinement for overlapping community detection, improving flexibility and robustness in detecting subtle overlaps, though it is sensitive to threshold tuning and computationally costly due to repeated similarity calculations.</p>
<p>The discussed works previously explore various community detection methods, each exhibiting varying degrees of efficiency depending on specific application criteria, including: (1) Community meaning&#x2014;the ability of the approach to uncover semantically meaningful themes or topics within detected communities; (2) Node influence&#x2014;the capacity to recognize key or influential nodes that play central roles within their respective communities; (3) Detection filters&#x2014;the use of filtering mechanisms that enhance the precision and selectivity of the detection process; (4) Overlapping communities&#x2014;the capability to detect nodes that belong to multiple communities simultaneously, reflecting complex real-world relationships; (5) Outlier tolerance&#x2014;the robustness of the method in handling noisy or anomalous data without degrading the overall detection quality; (6) Computational efficiency&#x2014;how quickly a method can discover communities, which is particularly crucial in time-sensitive; (7) Structural adaptability&#x2014;the effectiveness of the method across diverse network types, including social, biological, and information networks; (8) Low complexity&#x2014;the scalability of the approach in terms of handling networks of varying sizes and densities with minimal computational resources.</p>
<p><xref ref-type="table" rid="table-1">Table 1</xref> indicates that CDBS generally excels in most categories, especially in using filters, detecting overlapped communities, handling outliers, running efficiently, and maintaining low complexity, although it may require further development to support all network structures. This shows a well-rounded approach that balances performance, community structure detail, and computational efficiency.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Comparison with some cited researches. The tick mark (<inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula>) indicates that the criteria exist, whereas the cross mark (&#x00D7;) indicates that the criteria do not exist</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Criteria</th>
<th>[<xref ref-type="bibr" rid="ref-4">4</xref>]</th>
<th>[<xref ref-type="bibr" rid="ref-8">8</xref>]</th>
<th>[<xref ref-type="bibr" rid="ref-6">6</xref>]</th>
<th>[<xref ref-type="bibr" rid="ref-5">5</xref>]</th>
<th>[<xref ref-type="bibr" rid="ref-7">7</xref>]</th>
<th>[<xref ref-type="bibr" rid="ref-15">15</xref>]</th>
<th>[<xref ref-type="bibr" rid="ref-16">16</xref>]</th>
<th>[<xref ref-type="bibr" rid="ref-17">17</xref>]</th>
<th>[<xref ref-type="bibr" rid="ref-18">18</xref>]</th>
<th>[<xref ref-type="bibr" rid="ref-9">9</xref>]</th>
<th>[<xref ref-type="bibr" rid="ref-10">10</xref>]</th>
<th>[<xref ref-type="bibr" rid="ref-19">19</xref>]</th>
<th>[<xref ref-type="bibr" rid="ref-11">11</xref>]</th>
<th>CDBS (Our)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Community meaning (topic)</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
</tr>
<tr>
<td>Node influence (leading members)</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
</tr>
<tr>
<td>Detection filters</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
</tr>
<tr>
<td>Overlapping communities</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
</tr>
<tr>
<td>Outlier tolerance</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
</tr>
<tr>
<td>Computational efficiency</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
</tr>
<tr>
<td>Structural adaptability</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>(Future work)</td>
</tr>
<tr>
<td>Low complexity</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
<td>&#x00D7;</td>
<td>&#x2713;</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3">
<label>3</label>
<title>CDBS Methodology</title>
<p>Service providers offer many services published in the Universal, Description, Discovery, and Integration (UDDI) registration directory that are accessible to users. Users search for services in this directory that meet their requests. Once a service is selected, a link is established between the service and the user, meaning the user accesses this service. Our solution aims to discover communities based on user interactions with services. We also consider various criteria to filter this relationship. The interaction between services and users using a bipartite graph (also known as a two-mode graph) is that the first subset of vertices corresponds to user profiles, while the second subset represents service profiles. Each edge in the graph connects a service to a user if the user accesses the service.</p>
<p>The proposed strategy is divided into four primary stages, as illustrated in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. The first step involves conducting a comprehensive study to define the characteristics of both user and service profiles. Consequently, a user profile is defined that consists of multiple attributes. Subsequently, we align each attribute within the user profile with a corresponding element in the web service profile. After creating the profiles, a novel similarity measure is proposed, known as GSIP, to identify web services similar to the users. This process establishes a connection between the user and the web service which is used to construct a web service discovery graph. In the third step, CDBS responds to queries by filtering the graph based on various criteria related to the user or service, such as age, localization, interests, language, gender, etc. Finally, the last step involves applying a new community detection algorithm to discover similar user groups, representing the communities according to predefined standards.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Proposed CDBS architecture</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_71532-fig-1.tif"/>
</fig>
<sec id="s3_1">
<label>3.1</label>
<title>Creation of Profile for User or Service</title>
<p>Each user on the web has a profile where his information is registered (name, first name, date of birth, location, preference, device used, etc.). All of this information forms its digital identity. The user profile is also defined by information found in computer systems and by his interests in one or more fields (e.g., culture, sports), which may vary depending on contextual information (e.g., time, location). In contrast, the service profile is defined by parameters such as service quality, service usage cost, and geographical restrictions covered by the service. To model user and service profiles, the user profile format defined in [<xref ref-type="bibr" rid="ref-20">20</xref>] is extended by adding several quantitative and qualitative characteristics. This extension establishes a common profile structure for both the user and the service. Furthermore, the unique identifier denoted by the identity, <xref ref-type="table" rid="table-2">Table 2</xref> exhibits all the profile elements clasified by their meaning in the user/service profile. To enable a meaningful comparison between user and service entities, we define a shared attribute schema that aligns semantically related fields across the two profiles. For example, a user&#x2019;s age is matched to the authorized age range of the service, the declared languages correspond to the supported service languages, and the device model/operating system specified in the user profile is in line with the requirements of the service platform. Similarly, the geographical location of a user is matched against the service availability region, while interests and domain preferences are mapped to the service domain classification. Finally, the textual description provided by the user is evaluated against the semantic description of the service using ontology-based similarity. Through this aligned schema, user-service compatibility can be quantified consistently across multiple attribute types, as formally expressed in <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>. The elements were divided into four primary categories based on the type of information provided.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>User and service context attributes</title>
</caption>
<table>
<colgroup>
<col/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th>Cat.</th>
<th align="center">Attribute</th>
<th align="center">Meaning in the user context</th>
<th align="center">Meaning in the service context</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="11"><bold>GI</bold></td>
<td>Domain</td>
<td>Service area searched by the user</td>
<td>Service area</td>
</tr>
<tr>
<td>Name</td>
<td>Service name searched by the user</td>
<td>Service name offered</td>
</tr>
<tr>
<td>Contact</td>
<td>User contact</td>
<td>Contact of the service provider</td>
</tr>
<tr>
<td>CD</td>
<td>Brief description of the service being sought</td>
<td>Functionalities offered by the service</td>
</tr>
<tr>
<td>Date</td>
<td>Service search date</td>
<td>Publication date of the service</td>
</tr>
<tr>
<td>Age</td>
<td>User age</td>
<td>Age ranges of the persons authorized to use the service</td>
</tr>
<tr>
<td>Gender</td>
<td>User gender</td>
<td>People gender to whom the service is addressed</td>
</tr>
<tr>
<td>Nationality</td>
<td>User nationality</td>
<td>People nationalities to whom the service is addressed</td>
</tr>
<tr>
<td>Level of study</td>
<td>User&#x2019;s level of education</td>
<td>Study level of persons authorized to use the service</td>
</tr>
<tr>
<td>AT</td>
<td>Religious or political affiliation of the user</td>
<td>Affiliation of persons authorized to use the service</td>
</tr>
<tr>
<td>Interests</td>
<td>List of user&#x2019;s interests</td>
<td>What area of interest is the service aimed at</td>
</tr>
<tr>
<td rowspan="2"><bold>P</bold></td>
<td>QoS</td>
<td>Parameters of the service quality desired</td>
<td>Ensures quality, security, and cost of web service provided</td>
</tr>
<tr>
<td>Languages</td>
<td>Languages list mastered by the user</td>
<td>Languages list in which the service is provided</td>
</tr>
<tr>
<td rowspan="9"><bold>D</bold></td>
<td>Type</td>
<td>User device type</td>
<td>Service device type</td>
</tr>
<tr>
<td>Model</td>
<td>User device model</td>
<td>Device model required by the service</td>
</tr>
<tr>
<td>MN</td>
<td>Manufacturer name of the user device</td>
<td>Manufacturer name of the service device</td>
</tr>
<tr>
<td>Screen</td>
<td>User screen size</td>
<td>Screen size required by the service</td>
</tr>
<tr>
<td>Resolution</td>
<td>User&#x2019;s screen display resolution</td>
<td>Screen display resolution supported by the service</td>
</tr>
<tr>
<td>Color</td>
<td>Number of colors of the user&#x2019;s screen display</td>
<td>Number of colors supported by the service</td>
</tr>
<tr>
<td>Software type</td>
<td>Operating system type of the user&#x2019;s device</td>
<td>Operating system type supported by the service</td>
</tr>
<tr>
<td>Version</td>
<td>Operating system version of the user device</td>
<td>Operating system version supported by the service</td>
</tr>
<tr>
<td>Navigator</td>
<td>Browser name and version used by the user</td>
<td>Browser name and version supported by the service</td>
</tr>
<tr>
<td rowspan="5"><bold>L</bold></td>
<td>Country</td>
<td>Access country of the user to the web service</td>
<td>Countries for which the service is authorized</td>
</tr>
<tr>
<td>City</td>
<td>User access city</td>
<td>Cities for which the service is authorized</td>
</tr>
<tr>
<td>Region</td>
<td>User access region</td>
<td>Regions for which the service is authorized</td>
</tr>
<tr>
<td>Longitude</td>
<td>Coordinate longitude of access</td>
<td>Longitude coordinates for which the service is authorized</td>
</tr>
<tr>
<td>Latitude</td>
<td>Coordinate latitude of access</td>
<td>Latitude coordinates for which the service is authorized</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="table-2fn1" fn-type="other">
<p>Abbreviations: General information (GI); Preference (P); Device (D); Localization (L); Manufacturer&#x2019;s name (MN); Contextual description (CD); Affiliation trend (AT).</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Discovery of Web Services</title>
<p>After defining both the user and service profiles, the goal is to find the most appropriate web services for each user, i.e., to find the web services similar to each user&#x2019;s profile. The discovery process, bipartite graph construction, will be based on a similarity measure that uses the interaction between users and web services. The similarity calculation between the two profiles supports all previously defined elements in <xref ref-type="table" rid="table-2">Table 2</xref>. The challenge is that these attributes are of different types, as the similarity measure for numeric attributes is not applicable to string attributes. Most existing similarity measures do not address this type of problem [<xref ref-type="bibr" rid="ref-21">21</xref>]. To meet this constraint, we have created a new similarity measure called GSIP, adapted to all attributes type. GSIP assigns variable weights to each attribute based on its intervention in the similarity calculation, depending on the nature of the study.</p>

<sec id="s3_2_1">
<label>3.2.1</label>
<title>GSIP Similarity</title>
<p>GSIP similarity is calculated according to the following formula (<xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>):
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mrow><mml:mtext>GSIP</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>ASim</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mtext>Prmtrs</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where, <italic>X</italic> and <italic>Y</italic> represent a user and a service, respectively. <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> is the value of a user&#x2019;s attribute, and <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> is the value of a service&#x2019;s attribute. <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mtext>Prmtrs</mml:mtext></mml:math></inline-formula> denotes the weight of an attribute. <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mo>&#x2211;</mml:mo><mml:mtext>Prmtrs</mml:mtext></mml:math></inline-formula> is the sum of the weights, serving as a normalization factor. <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mtext>GSIP</mml:mtext><mml:mo stretchy="false">(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2208;</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula> represents the overall similarity score between the user and the service. <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mtext>ASim</mml:mtext></mml:math></inline-formula> (Attribute Similarity) measures the similarity between the user attribute <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> and the service attribute <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>.</p>
<p>The function ASim returns a real value in the interval <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mo stretchy="false">[</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula>, calculated between a user attribute and the corresponding attribute of a web service. The value of ASim varies depending on the type of profile attributes.</p>
<p><bold>a) ASim for numerical attributes:</bold> The numerical attributes included in the ASim similarity calculation are mostly related to quantitative properties. Their values can be real values or numerical intervals. For real values, the ASim calculation is based on atomic similarity. Mathematically, it is defined as follows (<xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref>):
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mrow><mml:mtext>ASim</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing=".2em" columnspacing="1em" displaystyle="false"><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mtext>if&#xA0;</mml:mtext></mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mstyle></mml:mtd><mml:mtd><mml:mrow><mml:mtext>if&#xA0;</mml:mtext></mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2260;</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>For numerical intervals, we consider the well-known allen temporal formalism called the Allen interval algebra [<xref ref-type="bibr" rid="ref-22">22</xref>]. This formalism includes thirteen fundamental relationships between these intervals, which are used to model the various qualitative situations between temporal entities.</p>
<p>Based on Allen algebra, ASim is defined mathematically as follows (<xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref>):
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mrow><mml:mtext>ASim</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing=".2em" columnspacing="1em" displaystyle="false"><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mtext>if&#xA0;</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mi>p</mml:mi><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mtext>&#xA0;or&#xA0;</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mi>a</mml:mi><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2229;</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x222A;</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mtd><mml:mtd><mml:mrow><mml:mtext>if&#xA0;</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mi>m</mml:mi><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mtext>&#xA0;or&#xA0;</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mtext>&#xA0;or&#xA0;</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mi>o</mml:mi><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mtext>&#xA0;or&#xA0;</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mi>o</mml:mi><mml:mi>i</mml:mi><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mfrac></mml:mstyle></mml:mtd><mml:mtd><mml:mrow><mml:mtext>otherwise</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>If <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> contains only one value, <xref ref-type="disp-formula" rid="eqn-4">Formula (4)</xref> defines <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mtext>ASim</mml:mtext></mml:math></inline-formula> as follows:
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mrow><mml:mtext>ASim</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing=".2em" columnspacing="1em" displaystyle="false"><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mtext>if&#xA0;</mml:mtext></mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mtext>&#xA0;belongs to the interval&#xA0;</mml:mtext></mml:mrow><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mtext>otherwise</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula></p>
<p><bold>b) ASim for enumerated value attributes:</bold> An enumerated value field takes one value for the user from a defined collection of values for the service. <xref ref-type="disp-formula" rid="eqn-5">Formula (5)</xref> calculates the ASim similarity for this attribute type:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mrow><mml:mtext>ASim</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing=".2em" columnspacing="1em" displaystyle="false"><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mtext>if&#xA0;</mml:mtext></mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mtext>&#xA0;or&#xA0;</mml:mtext></mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mtext>&#xA0;is included in&#xA0;</mml:mtext></mml:mrow><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mtext>otherwise</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula></p>
<p><bold>c) ASim for multi-valued attributes:</bold> These attributes can have multiple values (list of values) for the same user instance from a collection of service values. The ASim similarity for this type of attribute is calculated using <xref ref-type="disp-formula" rid="eqn-6">Formula (6)</xref> as follows:</p>
<p><disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mrow><mml:mtext>ASim</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing=".2em" columnspacing="1em" displaystyle="false"><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mtext>if&#xA0;</mml:mtext></mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2229;</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2260;</mml:mo><mml:mi mathvariant="normal">&#x2205;</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mtext>otherwise</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula></p>
<p><bold>d) ASim for string type attributes:</bold> Several formulas exist for calculating a similarity between two strings (such as two texts) (Levenshtein distance, Jaro-Winkler distance, Sorensen-Dice coefficient, N-gram distance, etc.). However, these distances do not consider the semantic and multi-language comparison between the two strings. Similarity based on LOD offers a robust solution for this issue because the data is already structured and easily retrievable through SPARQL access points. Many similar calculation formulas use LOD [<xref ref-type="bibr" rid="ref-23">23</xref>]. We explore and modify the LOD-based similarity measure (LODS) as detailed in [<xref ref-type="bibr" rid="ref-24">24</xref>] to compare descriptions in user and web service profiles. To achieve this, the descriptions must be formatted to consist of a set of keywords by eliminating spaces and irregular expressions. Each user/service description will be associated with a set of terms (keywords) <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula> where a set of LOD resources can annotate each term <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mi>A</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mtext>LOD</mml:mtext></mml:math></inline-formula> using the annotation relation <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mi>&#x03B1;</mml:mi></mml:math></inline-formula>, such as <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mi mathvariant="normal">&#x2200;</mml:mi><mml:mi>t</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>T</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">&#x2203;</mml:mi><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mi>A</mml:mi><mml:mo>&#x2223;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mi>&#x03B1;</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>. The ASim similarity between the user description <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> and the service description <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula>, annotated by a set of resources LOD <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mi>A</mml:mi><mml:mi>u</mml:mi></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mi>A</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:math></inline-formula>, based on the LODS measure is calculated in <xref ref-type="disp-formula" rid="eqn-7">Formula (7)</xref>:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mi>A</mml:mi><mml:mi>S</mml:mi><mml:mi>I</mml:mi><mml:mi>M</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>a</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:munder><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>b</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>u</mml:mi></mml:msub></mml:mrow></mml:munder><mml:mi>L</mml:mi><mml:mi>O</mml:mi><mml:mi>D</mml:mi><mml:mi>S</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>u</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>This similarity uses a classical measure of aggregation that enables two objects annotated with semantic concepts to be compared according to the following two steps:
<list list-type="simple">
<list-item><label>1.</label><p>Sums the scores obtained by applying the LODS measurement to each combination of the Cartesian product of the two sets being compared.</p></list-item>
<list-item><label>2.</label><p>Then, it divides the sum by the number of combinations to get a final score normalized in the interval [0, 1].</p></list-item>
</list></p>
</sec>
<sec id="s3_2_2">
<label>3.2.2</label>
<title>Caracteristics and Evaluation of GSIP Similarity</title>
<p>Our comparison is not to promote a single similarity measure that fits all situations but to clarify and illuminate the important differences between five similarity measures. The decision on which similarity measure to apply depends on the nature of the data used, the observations we want to make, and on each individual definition of similarity. The conceptual, theoretical, and experimental characteristics of the most popular measures are a fundamental evidence-base for making that decision. GSIP similarity is used to match user profiles with web services, mainly in social domain applications. This similarity supports all types of attributes, making it challenging to compare with other similarity measures that are specific to certain data types or require transformations of attribute types. Unlike GSIP, most existing similarities are methods for measuring the proximity between two vectors in a vector space, using only atomics values. This is a major value addition for our similarity. Additionally, the results of GSIP vary according to the attribute parameters chosen, such as the weight of each attribute based on its importance, which heavily influences the results obtained. This proves that GSIP can handle missing values in the data without significantly affecting the results. Furthermore, GSIP supports symantic textual comparison when calculating the description similarity of user and web service profiles. It uses the LOD-based similarity measure approach [<xref ref-type="bibr" rid="ref-24">24</xref>], as presented in the previous subsection. Although GSIP is a comprehensive multi-criteria framework, we compare it here only on the individual similarity level to isolate the contribution of each component. A more extensive comparison with other multi-criteria frameworks remains future work. We compare GSIP with four well-known similarity measures: Jaro-Winkler [<xref ref-type="bibr" rid="ref-25">25</xref>], Jaccard [<xref ref-type="bibr" rid="ref-26">26</xref>], Cosine [<xref ref-type="bibr" rid="ref-27">27</xref>], Manhattan [<xref ref-type="bibr" rid="ref-28">28</xref>]. This comparison is based on different parameters: the execution time of similarity (running time), dependency on data quality with missing values (tolerance to outliers), the type of values supported: continuous, categorical, numeric, string, etc. (attribute type), the need to transform attribute or preprocess data (attribute transformation), and the support for semantics in comparison (semantic comparison).</p>
<p><xref ref-type="table" rid="table-3">Table 3</xref> shows that the performance of the different similarity measures varies depending on the desired characteristics. GSIP&#x2019;s support for several vectors during the comparison clearly influences its calculation time which remains higher compared to other similarity measures. However, this difference is negligible considering that GSIP does not require attribute transformation, simplifying its use in raw data scenarios. The other measures need some form of data preprocessing or transformation, which adds a step to the data preparation process. For the other characteristics, GSIP shows superiority compared to the other similarities for social applications. GSIP similarity evaluates attributes independently and normalizes over the available fields, ensuring that missing profile information does not bias the result. Ambiguous or noisy LOD annotations are down-weighted during semantic comparison, limiting their influence on the overall similarity score. Moreover, because the bipartite user&#x2013;service graph is inherently sparse, applying minimum access and similarity thresholds suppresses weak or uninformative links and preserves only meaningful interactions. This process also mitigates imbalance issues and enables users or services to be integrated in the cold-start through profile-based similarity rather than relying on historical interaction data.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Comparison of similarity parameters</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Parameters</th>
<th>GSIP</th>
<th>Jaccard</th>
<th>Cosine</th>
<th>Manhattan</th>
<th>Jaro-Winkler</th>
</tr>
</thead>
<tbody>
<tr>
<td>Running time</td>
<td>Medium</td>
<td>Fast</td>
<td>Fast</td>
<td>Fast</td>
<td>Medium</td>
</tr>
<tr>
<td>Tolerance to outliers</td>
<td>Yes</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>No</td>
</tr>
<tr>
<td>Attribute type</td>
<td>All</td>
<td>Categorical</td>
<td>Numeric</td>
<td>Numeric</td>
<td>String</td>
</tr>
<tr>
<td>Attribute transformation</td>
<td>No need</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr>
<td>Semantic comparison</td>
<td>Yes</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>No</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The GSIP similarity has a complexity of <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mi>O</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>u</mml:mi></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:mi>d</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, with <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mi>d</mml:mi></mml:math></inline-formula> small and constant, making the cost scale linearly with user&#x2013;service pairs. Memory usage is moderate since no attribute transformation is required. The &#x201C;Medium&#x201D; runtime in <xref ref-type="table" rid="table-3">Table 3</xref> comes mainly from the semantic (LOD-based) enrichment, not from heavy computation.</p>

</sec>
<sec id="s3_2_3">
<label>3.2.3</label>
<title>Construction of the Web Services Discovery Graph</title>
<p>After calculating the similarity between each user and service profile, a link is established between the profiles where the similarity exceeds a predefined threshold. This results in a bipartite similarity graph. Algorithm 1 elucidates the approach for constructing this graph, with comments indicated by:</p>
<fig id="fig-3">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_71532-fig-3.tif"/>
</fig>
<p>A similarity between user and service profiles does not necessarily imply that the user accesses a similar service. A user might engage with a service out of curiosity or by mistake. To validate the link and eliminate intruders, the number of accesses for each pair of profiles must be constrained by a lower bound. To achieve this, we have devised a method where multiple accesses are created for each link between a user and a service, occurring in different locations, on different dates, and with varying durations of access. Algorithm 2 illustrates the construction of web services discovery graph.</p>
<fig id="fig-4">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_71532-fig-4.tif"/>
</fig>
</sec>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Filtering of Web Services Discovery Graph</title>
<p>The proposed approach is centered around community discovery, incorporating various filters pertaining to users, web services, and user access to web services. A query can define the filters to be applied to the web services access graph, providing substantial insights into the discovered communities. User filters encompass interests, localization, age, and gender, alongside additional filters such as affiliation, level of study, and language. Filters associated with the service include primarily the domain, which is a mandatory filter and contributes significantly to the interpretation of the discovered communities, particularly when combined with other filters such as service location. Moreover, we divide filters concerning user access to web services into three main sub-filters: (i) &#x201C;Access date&#x201D; sub-filter, (ii) &#x201C;Access duration&#x201D; sub-filter, and (iii) &#x201C;Access location&#x201D; sub-filter. These three criteria were selected because they collectively provide a comprehensive view of user behavior in terms of temporal patterns (access date), engagement levels (access duration), and geographical context (access location). Including these dimensions allows for more detailed analysis of community discovery and user interaction with web services.</p>
<sec id="s3_3_1">
<label>3.3.1</label>
<title>Access Date Sub-Filter</title>
<p>The purpose of this filter is to restrict user access to services based on clearly defined time constraints and to address various temporal considerations. Users access services on different dates; for example, the use of tourist services typically increases during vacation periods, while demand for defense, information, and communication services may spike during times of conflict. This pattern applies broadly to many other types of services. By analyzing access dates, it becomes possible to identify temporal trends and seasonal variations, which are critical for understanding user behavior and community dynamics. This criterion captures the temporal dimension of service usage, enabling the identification of patterns such as daily, weekly, or monthly peaks, insights that are essential for effective resource allocation and service optimization.</p>
<p>Drawing inspiration from Allen&#x2019;s algebra model of time [<xref ref-type="bibr" rid="ref-22">22</xref>], the &#x201C;access date&#x201D; sub-filter involves retrieving all accesses occurring between or outside two defined dates. Additionally, it involves retrieving accesses on a specific date or before/after that date.</p>
</sec>
<sec id="s3_3_2">
<label>3.3.2</label>
<title>Access Duration Sub-Filter</title>
<p>The &#x201C;access duration&#x201D; sub-filter enables the retrieval of accesses that fall within a predefined time interval. The length of time users spend accessing a service indicates their level of engagement and commitment. Longer access durations may suggest greater user interest or the complexity of the service being used. For example, longer durations in accessing educational services might reflect intensive study sessions or prolonged use of learning resources. By examining the duration of the access, the developer can differentiate between casual users and dedicated users, which helps to identify key users or influencers within a community. This information is valuable for tailoring services and improving user experience.</p>
</sec>
<sec id="s3_3_3">
<label>3.3.3</label>
<title>Access Localization Sub-Filter</title>
<p>The user accesses a service from various locations, necessitating the preservation of the user&#x2019;s global positioning system (GPS) access coordinates. The purpose of this filter is to confine the geographical scope of user community discovery, addressing different location-related constraints. The geographical location from which users access services provides context about their physical environment and potential constraints or preferences. For example, users who access services from urban areas may have different needs compared to those from rural areas. In addition, location data can reveal regional trends and the spatial distribution of service usage. Understanding access location helps address location-specific issues and tailoring services to meet regional demands. It also enables the identification of local communities and the analysis of geographic factors influencing service adoption and usage. Two scenarios may arise upon request:
<list list-type="bullet">
<list-item>
<p>Filtering users who access a service from a specific location within a designated area.</p></list-item>
<list-item>
<p>Filtering users located within a geographic region defined by a central point and radius, forming a circular area. The Haversine distance formula [<xref ref-type="bibr" rid="ref-29">29</xref>] is ideal for this purpose, as it calculates the shortest distance along the great circle between two points.</p></list-item>
</list></p>
<p>Filter parameters are flexible and adjustable according to the task requirements. Low-activity users/services are removed by defining a threshold access parameter <italic>nbr_access_min</italic>. The services are significant to discovery if accessed by at least 50 users. Candidate pairs are retained only if their GSIP similarity exceeds a defined threshold parameter <italic>threshold_sim</italic>. The communities are then constructed from this similarity graph with thresholds chosen through data-driven tuning. Algorithm 3 outlines the filtering procedure using various filters.</p>
<fig id="fig-5">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_71532-fig-5.tif"/>
</fig>
</sec>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Communities Discovery</title>
<p>Many methods for community discovery have been proposed, yet all are constrained by limitations when applied to complex networks due to their reliance on user interaction for discovery [<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-30">30</xref>]. The proposed CDBS approach aims to address this issue by organizing users into communities, each representing one or more services with the attached users. This final step involves generating communities from the filtered access graph by grouping users who have accessed the same services. Since a user can access several services, the same user node may be simultaneously linked to multiple service nodes. As a consequence, a user may belong to more than one community. Community membership is therefore calculated by collecting all users connected to a given service and then extending this membership iteratively through service intersections that reflect multiple groups of interest. CDBS is divided into two steps:
<list list-type="bullet">
<list-item>
<p><bold>Step 1:</bold> The objective of this phase is to create communities that adhere to the specified filters, leveraging the results obtained earlier (filtered access graph). At this stage, the advantage of addressing the problem with a bipartite graph becomes evident. Taking into account previously acquired accesses, it is adequate for each service to identify the user nodes associated with it. These users will form a community associated with the service. However, detecting communities related to individual services alone is insufficient. The aim is to uncover communities associated with a set of services.</p></list-item>
<list-item>
<p><bold>Step 2:</bold> The decomposition of the network in the preceding step is marked by overlaps among communities, stemming from the possibility of users accessing multiple web services and thus belonging to multiple communities. In this phase, we systematically evaluate pairs of communities obtained from the previous step. For each such pair, a new community is formed by merging their respective services. This resultant community comprises users extracted from the intersection of the merged services. This process is iterative until there is no additional overlap between communities.</p></list-item>
</list></p>
<p>These steps group users by services with complexity of <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mi>O</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>E</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, where <italic>E</italic> is the number of edges and merging overlaps that cost <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:mi>O</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>C</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, where <italic>C</italic> is the number of detected communities. Filtering reduces both <italic>E</italic> and <italic>C</italic>. On the scale <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:mn>10,000</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>400</mml:mn></mml:math></inline-formula>, the interaction matrix is sparse with <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:mo>&#x003C;</mml:mo><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mn>2</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula>, ensuring efficiency in memory and time, which explains the sublinear growth of runtime observed in the implementation (<xref ref-type="sec" rid="s4">Section 4</xref>). Algorithm 4 outlines this approach of community discovery.</p>
<fig id="fig-6">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_71532-fig-6.tif"/>
</fig>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Implementation of the Proposed Architecture</title>
<sec id="s4_1">
<label>4.1</label>
<title>Execution Environment</title>
<p>The execution environment must rely on a high-performance, open-source platform capable of processing large volumes of distributed data, particularly for complex network analysis. The NetBeans platform [<xref ref-type="bibr" rid="ref-31">31</xref>], with its Java Integrated Development Environment (IDE), meets this requirement effectively. Moreover, remote method invocation (RMI), a Java API, enables transparent manipulation of remote objects as if they were local, ensuring seamless execution across distributed systems. For visualization, Pajek software [<xref ref-type="bibr" rid="ref-32">32</xref>] offers robust network analysis and visualization capabilities, with cross-platform compatibility on Windows, Mac, and Linux.</p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Evaluation of the Proposed Approach</title>
<p>The proposed architecture requires the creation of detailed user and service profiles, as defined in <xref ref-type="sec" rid="s3_1">Section 3.1</xref>, along with specifying each user&#x2019;s access to their desired services. Since no real data set containing such detailed profile and access information is publicly available, and given that the specific data values do not directly affect the community discovery process, a synthetic database was created for experimentation. In this phase, possible values were defined for each attribute within the user and service profiles. Considering an example with 50 services and 10,000 users. After assigning attribute weights, we generated a similarity discovery graph with a target similarity rate of 50% or higher, identifying the web services most similar to each user through the proposed GSIP metric. The similarity calculation incorporated weighted attributes, and for every matching pair of profiles, multiple accesses were simulated across diverse dates, times, and locations, yielding the web service discovery graph. The subsequent step involves filtering the web services discovery graph based on filters defined by a query. These filters pertain to the user, the web service, and user access to web services. We choose the following filters values:
<list list-type="bullet">
<list-item>
<p><bold>Service categories:</bold> &#x201C;Social media,&#x201D; &#x201C;Gaming,&#x201D; and &#x201C;Sport.&#x201D;</p></list-item>
<list-item>
<p><bold>Community members:</bold> Men.</p></list-item>
<list-item>
<p><bold>Connection duration:</bold> Exceeds 110 ms.</p></list-item>
</list></p>
<p>To visualize the bipartite graph of filtered accesses, a file in a specific format (.net) is generated that can be viewed using Pajek software [<xref ref-type="bibr" rid="ref-32">32</xref>]. <xref ref-type="fig" rid="fig-2">Fig. 2a</xref> shows the result obtained. The final step involves creating communities by selecting the degree of communities, which represents the number of services involved in each community. The partial graph on <xref ref-type="fig" rid="fig-2">Fig. 2b</xref> depicts communities with a single service.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Community and discovery graph visualizations. (<bold>a</bold>) Graph of communities with a single service. Circles denote users, squares represent services. (<bold>b)</bold> Visualization of the filtered discovery graph</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_71532-fig-2.tif"/>
</fig>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Comparison of Multi-Criteria Approach vs. No Criteria Approach</title>
<p>To evaluate the performance of our multi-criteria approach, a run-time comparison with non-criteria strategy was performed during community generation. The study was conducted through a series of experiments on a profile sample using a machine with an Intel Core i5 processor and 16 GB RAM. <xref ref-type="table" rid="table-4">Table 4</xref> outlines the attributes used in a similarity calculation along with their corresponding weights, expressed as percentages. These weights reflect the relative importance of each attribute in the calculation process. The weights in <xref ref-type="table" rid="table-4">Table 4</xref> were obtained through a calibration process that began with equal weighting and then gradually adjusted the relative importance of the attributes, using response time as the main evaluation criterion. Although these calibrated values yielded the most stable results in our dataset, the weights are not universally fixed and can be manually modified depending on the objectives of the task, for example, emphasizing gender in marketing, location in regional studies, or age in demographic analyses. Sensitivity analysis confirmed that such context-driven adjustments produce consistent and robust community structures, highlighting the framework&#x2019;s flexibility.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Attribute weights for similarity calculation</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Attribute</th>
<th>Age</th>
<th>Interest</th>
<th>Description</th>
<th>Country</th>
<th>Language</th>
<th>Gender</th>
<th>Affiliation</th>
<th>Security</th>
<th>Others</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>Weight</bold></td>
<td>17%</td>
<td>17%</td>
<td>09%</td>
<td>20%</td>
<td>10%</td>
<td>05%</td>
<td>03%</td>
<td>08%</td>
<td>11%</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref ref-type="table" rid="table-5">Table 5</xref> presents some values of the criteria (filters) chosen to generate communities. For the no-criteria approach, we select all areas concerning web services without filters. <xref ref-type="table" rid="table-6">Table 6</xref> presents the tests and results of the experiments, highlighting the computational impact of introducing multi-criteria in community discovery. As expected, execution time increases with the number of profiles, and the comparison with the baseline (with and without criteria) shows the additional cost of handling heterogeneous attributes and semantic similarity. These results underline the efficiency of the proposed method in managing richer profile information within large and complex networks.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>Attribute values for community generation</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Attribute</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Age</td>
<td>between 13&#x2013;40 years old</td>
</tr>
<tr>
<td>Interest</td>
<td>gaming, social media</td>
</tr>
<tr>
<td>Gender</td>
<td>both (M and F)</td>
</tr>
<tr>
<td>Connection time</td>
<td>between 110&#x2013;15,000 ms</td>
</tr>
<tr>
<td>Level of Study</td>
<td>any</td>
</tr>
<tr>
<td>Language</td>
<td>English, French, Spanish</td>
</tr>
<tr>
<td>Similarity</td>
<td><inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mo>&#x2265;</mml:mo></mml:math></inline-formula>50</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-6">
<label>Table 6</label>
<caption>
<title>Response time to generate communities with mean and standard deviation over 10 runs</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th rowspan="2">Test</th>
<th rowspan="2">[Us, Ss]</th>
<th colspan="2">Response time (ms)</th>
</tr>
<tr>
<th>MCD</th>
<th>DWC</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>[100, 10]</td>
<td>6 <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.4</td>
<td>17 <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 1.1</td>
</tr>
<tr>
<td>2</td>
<td>[250, 25]</td>
<td>11 <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.7</td>
<td>62 <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 3.5</td>
</tr>
<tr>
<td>3</td>
<td>[500, 50]</td>
<td>19 <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 1.2</td>
<td>71 <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 4.1</td>
</tr>
<tr>
<td>4</td>
<td>[750, 75]</td>
<td>31 <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 2.0</td>
<td>499 <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 12.4</td>
</tr>
<tr>
<td>5</td>
<td>[1000, 100]</td>
<td>47 <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 3.3</td>
<td>1279 <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 28.5</td>
</tr>
<tr>
<td>6</td>
<td>[2000, 200]</td>
<td>760 <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 18.5</td>
<td>2890 <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 55.8</td>
</tr>
<tr>
<td>7</td>
<td>[5000, 300]</td>
<td>2350 <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 45.7</td>
<td>7421 <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 136.4</td>
</tr>
<tr>
<td>8</td>
<td>[10,000, 400]</td>
<td>9875 <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 152.6</td>
<td>15234 <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 285.7</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="table-6fn1" fn-type="other">
<p>Abbreviation: Users (Us); Services (Ss); Multi-criteria discovery (MCD); Discovery without criteria (DWC).</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Comparison with State-of-the-Art Algorithms</title>
<p>The proposed CDBS approach is compared with four powerful algorithms in the field of community discovery (Louvain [<xref ref-type="bibr" rid="ref-33">33</xref>], hierarchical agglomerative clustering (HAC) [<xref ref-type="bibr" rid="ref-34">34</xref>], Label Propagation [<xref ref-type="bibr" rid="ref-35">35</xref>], and Infomap [<xref ref-type="bibr" rid="ref-36">36</xref>]). <xref ref-type="table" rid="table-7">Table 7</xref> provides a comprehensive overview of the performance of the five methods when applied to datasets with varying numbers of users and services, in terms of running time, number of communities, and community quality. Synthetic attributes are used because no public dataset simultaneously provides rich user profiles, detailed service metadata, and fine-grained access logs at the granularity required by CDBS. Attribute ranges (e.g., age, access duration) were chosen to be realistic and exercise the entire pipeline while stress testing scalability (up to 10,000 users and 400 services), sparsity (<inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mo>&#x003C;</mml:mo><mml:mspace width="negativethinmathspace" /><mml:mn>2</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula> interaction density), and robustness via threshold-sensitivity analyses; results remained stable across settings. The materials used, attribute weights, and filters are the same as those cited in the previous comparison.</p>
<table-wrap id="table-7">
<label>Table 7</label>
<caption>
<title>Comparison of CDBS, Louvain, HAC, Label propagation, and Infomap algorithms</title>
</caption>
<table>
<colgroup>
<col/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th rowspan="2">Test</th>
<th align="center" rowspan="2">[Us, Ss]</th>
<th colspan="5">Response time (ms)</th>
<th colspan="5">NCs</th>
<th colspan="5">Modularity</th>
<th colspan="5">Conductance</th>
<th align="center" colspan="5">Coverage</th>
</tr>
<tr>
<th>CDBS</th>
<th>L</th>
<th>HAC</th>
<th>LP</th>
<th>I</th>
<th>CDBS</th>
<th>L</th>
<th>HAC</th>
<th>LP</th>
<th>I</th>
<th>CDBS</th>
<th>L</th>
<th>HAC</th>
<th>LP</th>
<th>I</th>
<th>CDBS</th>
<th>L</th>
<th align="center">HAC</th>
<th align="center">LP</th>
<th align="center">I</th>
<th align="center">CDBS</th>
<th align="center">L</th>
<th align="center">HAC</th>
<th align="center">LP</th>
<th align="center">I</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>[100, 10]</td>
<td>6</td>
<td>1</td>
<td>15</td>
<td>1</td>
<td>2</td>
<td>4</td>
<td>3</td>
<td>3</td>
<td>2</td>
<td>3</td>
<td>0.52</td>
<td>0.52</td>
<td>0.51</td>
<td>0.48</td>
<td>0.53</td>
<td>0.21</td>
<td>0.23</td>
<td>0.25</td>
<td>0.28</td>
<td>0.22</td>
<td>0.61</td>
<td>0.58</td>
<td>0.56</td>
<td>0.54</td>
<td>0.60</td>
</tr>
<tr>
<td>2</td>
<td>[250, 25]</td>
<td>11</td>
<td>2</td>
<td>109</td>
<td>3</td>
<td>6</td>
<td>8</td>
<td>3</td>
<td>3</td>
<td>2</td>
<td>4</td>
<td>0.57</td>
<td>0.55</td>
<td>0.54</td>
<td>0.50</td>
<td>0.58</td>
<td>0.19</td>
<td>0.22</td>
<td>0.24</td>
<td>0.27</td>
<td>0.20</td>
<td>0.64</td>
<td>0.60</td>
<td>0.57</td>
<td>0.56</td>
<td>0.63</td>
</tr>
<tr>
<td>3</td>
<td>[500, 50]</td>
<td>19</td>
<td>8</td>
<td>656</td>
<td>4</td>
<td>13</td>
<td>15</td>
<td>3</td>
<td>3</td>
<td>4</td>
<td>6</td>
<td>0.60</td>
<td>0.57</td>
<td>0.55</td>
<td>0.52</td>
<td>0.61</td>
<td>0.18</td>
<td>0.21</td>
<td>0.23</td>
<td>0.27</td>
<td>0.19</td>
<td>0.66</td>
<td>0.61</td>
<td>0.59</td>
<td>0.57</td>
<td>0.65</td>
</tr>
<tr>
<td>4</td>
<td>[750, 75]</td>
<td>31</td>
<td>16</td>
<td>2047</td>
<td>12</td>
<td>22</td>
<td>26</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>8</td>
<td>0.63</td>
<td>0.60</td>
<td>0.58</td>
<td>0.53</td>
<td>0.64</td>
<td>0.17</td>
<td>0.20</td>
<td>0.22</td>
<td>0.26</td>
<td>0.18</td>
<td>0.68</td>
<td>0.63</td>
<td>0.60</td>
<td>0.58</td>
<td>0.67</td>
</tr>
<tr>
<td>5</td>
<td>[1000, 100]</td>
<td>47</td>
<td>34</td>
<td>4484</td>
<td>28</td>
<td>42</td>
<td>34</td>
<td>3</td>
<td>3</td>
<td>6</td>
<td>11</td>
<td>0.66</td>
<td>0.62</td>
<td>0.59</td>
<td>0.55</td>
<td>0.67</td>
<td>0.16</td>
<td>0.19</td>
<td>0.21</td>
<td>0.25</td>
<td>0.17</td>
<td>0.70</td>
<td>0.65</td>
<td>0.62</td>
<td>0.59</td>
<td>0.69</td>
</tr>
<tr>
<td>6</td>
<td>[2000, 200]</td>
<td>760</td>
<td>785</td>
<td>33,843</td>
<td>327</td>
<td>610</td>
<td>51</td>
<td>4</td>
<td>3</td>
<td>7</td>
<td>18</td>
<td>0.68</td>
<td>0.64</td>
<td>0.61</td>
<td>0.56</td>
<td>0.69</td>
<td>0.15</td>
<td>0.18</td>
<td>0.20</td>
<td>0.25</td>
<td>0.16</td>
<td>0.72</td>
<td>0.67</td>
<td>0.63</td>
<td>0.60</td>
<td>0.71</td>
</tr>
<tr>
<td>7</td>
<td>[5000, 300]</td>
<td>2350</td>
<td>4680</td>
<td>448,219</td>
<td>1674</td>
<td>1980</td>
<td>84</td>
<td>5</td>
<td>3</td>
<td>9</td>
<td>26</td>
<td>0.70</td>
<td>0.67</td>
<td>0.63</td>
<td>0.57</td>
<td>0.70</td>
<td>0.14</td>
<td>0.17</td>
<td>0.19</td>
<td>0.25</td>
<td>0.15</td>
<td>0.74</td>
<td>0.69</td>
<td>0.65</td>
<td>0.61</td>
<td>0.73</td>
</tr>
<tr>
<td>8</td>
<td>[10,000, 400]</td>
<td>9875</td>
<td>25,350</td>
<td>3,349,781</td>
<td>6953</td>
<td>8230</td>
<td>143</td>
<td>5</td>
<td>4</td>
<td>11</td>
<td>40</td>
<td>0.74</td>
<td>0.70</td>
<td>0.67</td>
<td>0.58</td>
<td>0.73</td>
<td>0.13</td>
<td>0.16</td>
<td>0.18</td>
<td>0.24</td>
<td>0.14</td>
<td>0.77</td>
<td>0.71</td>
<td>0.67</td>
<td>0.62</td>
<td>0.76</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="table-7fn1" fn-type="other">
<p><bold>Abbreviation:</bold> Number of communities (NCs); Users (Us); Services (Ss); Louvain (L); Label propagation (LP); Infomap (I)</p>
</fn>
</table-wrap-foot>
</table-wrap>
<sec id="s4_4_1">
<label>4.4.1</label>
<title>Comparison in Running Time</title>
<p>It should be noted that the algorithms Louvain, HAC, Label propagation and Infomap were executed on a powerful platform developed by professionals, whereas the CDBS method was implemented by researchers without a focus on optimizing the developed algorithms. Both the algorithms and the data used in all the experiments presented in this work are available in the GitHub repository<xref ref-type="fn" rid="fn-1"><sup>1</sup></xref><fn id="fn-1"><label>1</label><p><ext-link ext-link-type="uri" xlink:href="https://github.com/bkarim78/Communities_Discovery_Based_Service">https://github.com/bkarim78/Communities_Discovery_Based_Service</ext-link> (accessed on 15 October 2025)</p></fn>. Each experiment was repeated several times, and the reported results represent the average values.</p>
<p><xref ref-type="table" rid="table-7">Table 7</xref> reports the response times of CDBS, Louvain, HAC, Label Propagation (LP), and Infomap under increasing network sizes. For small datasets, Louvain remains the fastest, followed closely by LP and Infomap. As network size grows, LP maintains relatively low runtimes, outperforming Louvain. Infomap and CDBS achieve intermediate runtimes, slower than Louvain and LP, until large-scale scenarios are reached. The scalability advantage of CDBS becomes clear in large networks. At 10,000 users and 400 services, CDBS executes in 9.8 s, outperforming Louvain (25.3 s) and HAC (55&#x002B; min). While LP remains the fastest, Infomap shows competitive performance, but their community quality metrics (<xref ref-type="table" rid="table-7">Table 7</xref>) consistently fall short compared to those achieved by CDBS. This demonstrates that CDBS achieves the best trade-off between execution time and community quality, making it the most robust option for large and complex networks.</p>

</sec>
<sec id="s4_4_2">
<label>4.4.2</label>
<title>Comparison in the Number of Communities</title>
<p>The number of detected communities is a key factor, as it determines the level of specialization and the interpretability of the results. A large community may contain several sub-communities; these sub-communities form groups that share different interests, and each group requires different decision-making. The results obtained for the CDBS method vary depending on the number of services required in the final discovery. <xref ref-type="table" rid="table-7">Table 7</xref> shows that CDBS consistently produces more communities than Louvain, HAC, and Label Propagation, while maintaining a balanced scale compared to Infomap. For example, at 1000 users and 100 services, CDBS identifies 34 communities, compared to only 3 for Louvain and HAC, 6 for Label Propagation, and 11 for Infomap. At the largest scale (10,000 users and 400 services), CDBS detects 143 communities, while Louvain, HAC, LP, and Infomap identify only 5, 4, 11, and 40, respectively. These results show that CDBS achieves a balanced granularity: it identifies enough communities to capture user interests and remove intruders, while avoiding excessive fragmentation that hinders interpretability. By leveraging service information, each community is semantically characterized by its dominant service domain, making the structures both specialized and meaningful for real-world applications. Furthermore, combining two or more services can reduce the number of communities while still producing coherent and interpretable communities.</p>

</sec>
<sec id="s4_4_3">
<label>4.4.3</label>
<title>Comparison of the Communities Quality</title>
<p>To evaluate the effectiveness of community detection algorithms, several metrics can be employed (modularity, normalized mutual information (NMI), conductance, coverage, density, silhouette score, etc.) [<xref ref-type="bibr" rid="ref-3">3</xref>], each with its methodology, focus and limitations [<xref ref-type="bibr" rid="ref-14">14</xref>]. The most effective metric can vary depending on the specific goals and context of the analysis. Community quality was assessed using three widely adopted metrics: modularity, conductance, and coverage. As reported in <xref ref-type="table" rid="table-7">Table 7</xref>, CDBS consistently outperforms the four baseline algorithms for the eight tests. Regarding modularity, CDBS (0.52&#x2013;0.74) and infomap (0.53&#x2013;0.73) achieve the highest values, demonstrating stronger intra-community cohesion than Louvain (0.52&#x2013;0.70), HAC (0.51&#x2013;0.67) and Label Propagation (0.48&#x2013;0.58). For conductance, CDBS records the lowest values (0.21&#x2013;0.13), which indicates well-separated communities; in contrast, HAC and LP present the highest conductance (<inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:mo>&#x003E;</mml:mo><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mn>0.24</mml:mn></mml:math></inline-formula>), reflecting weaker separation. In terms of coverage, CDBS again achieves the best results (0.61&#x2013;0.77), retaining more intra-community connections compared to Louvain (0.58&#x2013;0.71), HAC (0.56&#x2013;0.67), LP (0.54&#x2013;0.62), and Infomap (0.60&#x2013;0.76). The Infomap algorithm performs well, but slightly lags behind CDBS. This analysis suggests that for applications that require strong, meaningful and well-structured community detection in large networks, CDBS method would be the preferred choice. The quality of CDBS is further highlighted by an additional key aspect: its ability to reinforce community orientation. By positioning the web service as the central node within each community, CDBS ensures that the service is not merely treated as another element in the network, but as a highly relevant reference point for identifying leading members. These leaders are the users whose profiles show the greatest similarity to the community&#x2019;s service, making them both representative and influential within their groups.</p>

<p>To validate the robustness of the modularity improvements, two statistical significance tests were performed on the modularity values over 10 independent runs. The Student&#x2019;s t-test indicated that the improvements of CDBS over Louvain (<inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn>0.012</mml:mn></mml:math></inline-formula>), HAC (<inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn>0.004</mml:mn></mml:math></inline-formula>), Label Propagation (<inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn>0.009</mml:mn></mml:math></inline-formula>), and Infomap (<inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn>0.021</mml:mn></mml:math></inline-formula>) are statistically significant (<inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:mi>p</mml:mi><mml:mo>&#x003C;</mml:mo><mml:mn>0.05</mml:mn></mml:math></inline-formula>). The Wilcoxon signed-rank test produced consistent results, with <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn>0.018</mml:mn></mml:math></inline-formula> (Louvain), <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn>0.007</mml:mn></mml:math></inline-formula> (HAC), <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn>0.014</mml:mn></mml:math></inline-formula> (Label Propagation), and <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn>0.028</mml:mn></mml:math></inline-formula> (Infomap). These results confirm that CDBS achieves statistically significant modularity gains over the baseline algorithms.</p>
</sec>
</sec>
<sec id="s4_5">
<label>4.5</label>
<title>Ablation Study</title>
<p>To assess the contribution of individual components in the proposed CDBS framework, we conducted an ablation study. Key modules were removed or replaced, and results were compared against the full model.
<list list-type="bullet">
<list-item>
<p>GSIP similarity: Replacing GSIP with standard measures (Cosine) reduced modularity by 12.16%, confirming GSIP&#x2019;s advantage in handling heterogeneous and semantic attributes.</p></list-item>
<list-item>
<p>Filtering: Disabling multi-criteria filtering produced larger but noisy communities, with modularity dropping by 10.81%.</p></list-item>
<list-item>
<p>Overlapping detection: Forcing users into single communities reduced modularity by 08.10%, though runtime improved slightly.</p></list-item>
<list-item>
<p>Attribute weights: Using equal weights instead of calibrated ones decreased modularity by 05.41%, showing the importance of weighting.</p></list-item>
</list></p>
<p><xref ref-type="table" rid="table-8">Table 8</xref> summarizes the results.</p>
<table-wrap id="table-8">
<label>Table 8</label>
<caption>
<title>Ablation study results on CDBS components</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Variant tested</th>
<th>Modularity</th>
<th>Communities</th>
<th>Runtime (ms)</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>Full CDBS</td>
<td><bold>0.74</bold></td>
<td>143</td>
<td>9875</td>
<td>Best balance of performance and interpretability.</td>
</tr>
<tr>
<td>No GSIP (Cosine)</td>
<td>0.65</td>
<td>101</td>
<td>8420</td>
<td>Loses semantic and heterogeneous matching.</td>
</tr>
<tr>
<td>No filtering</td>
<td>0.66</td>
<td>85</td>
<td>9100</td>
<td>Larger but less coherent communities.</td>
</tr>
<tr>
<td>No overlap</td>
<td>0.68</td>
<td>92</td>
<td>8650</td>
<td>Faster but unrealistic memberships.</td>
</tr>
<tr>
<td>Equal weights</td>
<td>0.70</td>
<td>120</td>
<td>9550</td>
<td>Lower stability, reduced modularity.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Overall, all four components (GSIP, filtering, overlap, weighting) are critical, and the full CDBS system achieves the best results.</p>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>This work introduced community discovery in complex networks through a novel service-driven discovery framework CDBS, that goes beyond topology-only methods by jointly leveraging heterogeneous user-service attributes, a novel GSIP similarity that handles all type of attributes, with semantic text matching, and multi-criteria filtering to produce interpretable, domain-anchored communities. Compared with four community detection algorithms, CDBS achieved superior community quality, yielding a 3.26% improvement over the strongest baseline, and maintained fast large-scale runtimes of just 9.8 s on networks of 10,000 users and 400 services, demonstrating both efficiency and scalability. CDBS surpasses topology-only methods by linking communities to service engagement, yielding finer-grained community structure with clearer thematic orientation, demonstrating both scalability and precision while enhancing interpretability. Nonetheless, challenges remain with parameter sensitivity, which can affect stability if not carefully tuned. Our study highlighted the critical role of calibrated weights and multi-criteria filtering, suggesting that future work should develop data-driven and adaptive tuning mechanisms to enhance robustness across heterogeneous datasets. Furthermore, service-based community discovery suffers from a scarcity of publicly available datasets and this lack of real-world benchmarks continues to limit external validation and generalizability. Addressing this gap requires building large-scale, service-oriented datasets from real platforms such as Facebook, LinkedIn, and e-learning systems. Such resources would enable rigorous validation, support real-world applications including targeted marketing, access control, and e-learning communities, and drive advances in refined filtering, predictive discovery, and personalized recommendations.</p>
</sec>
</body>
<back>
<ack>
<p>The authors thank all collaborators for contributing to this research.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>The authors received no specific funding for this study.</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: Conceptualization and design, Karim Boudjebbour; methodology, Karim Boudjebbour and Abdelkader Belkhir; data conception and implementation, Karim Boudjebbour; analysis and interpretation of results, Karim Boudjebbour and Abdelkader Belkhir; draft manuscript preparation, Karim Boudjebbour; writing&#x2014;review and editing, Karim Boudjebbour and Hamza Kheddar; supervision, Abdelkader Belkhir and Hamza Kheddar. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>Data are contained within the article.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Dey</surname> <given-names>AK</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Gel</surname> <given-names>YR</given-names></string-name></person-group>. <article-title>Community detection in complex networks: from statistical foundations to data science applications</article-title>. <source>Wiley Interdiscip Rev: Computat Stat</source>. <year>2022</year>;<volume>14</volume>(<issue>2</issue>):<fpage>e1566</fpage>. doi:<pub-id pub-id-type="doi">10.1002/wics.1566</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jin</surname> <given-names>D</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Jiao</surname> <given-names>P</given-names></string-name>, <string-name><surname>Pan</surname> <given-names>S</given-names></string-name>, <string-name><surname>He</surname> <given-names>D</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>J</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>A survey of community detection approaches: from statistical modeling to deep learning</article-title>. <source>IEEE Trans Knowl Data Eng</source>. <year>2021</year>;<volume>35</volume>(<issue>2</issue>):<fpage>1149</fpage>&#x2013;<lpage>70</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tkde.2021.3104155</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Cheng</surname> <given-names>HM</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>ZY</given-names></string-name></person-group>. <article-title>Evaluation of community detection methods</article-title>. <source>IEEE Trans Knowl Data Eng</source>. <year>2019</year>;<volume>32</volume>(<issue>9</issue>):<fpage>1736</fpage>&#x2013;<lpage>46</lpage>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Alfaqeeh</surname> <given-names>M</given-names></string-name>, <string-name><surname>Skillicorn</surname> <given-names>DB</given-names></string-name></person-group>. <article-title>Community detection in social networks by spectral embedding of typed graphs</article-title>. <source>Soc Netw Anal Min</source>. <year>2023</year>;<volume>14</volume>(<issue>1</issue>):<fpage>12</fpage>. doi:<pub-id pub-id-type="doi">10.1007/s13278-023-01172-y</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Contisciani</surname> <given-names>M</given-names></string-name>, <string-name><surname>Safdari</surname> <given-names>H</given-names></string-name>, <string-name><surname>De Bacco</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Community detection and reciprocity in networks by jointly modelling pairs of edges</article-title>. <source>J Complex Netw</source>. <year>2022</year>;<volume>10</volume>(<issue>4</issue>):<fpage>cnac034</fpage>. doi:<pub-id pub-id-type="doi">10.1093/comnet/cnac034</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Akachar</surname> <given-names>E</given-names></string-name>, <string-name><surname>Ouhbi</surname> <given-names>B</given-names></string-name>, <string-name><surname>Frikh</surname> <given-names>B</given-names></string-name></person-group>. <article-title>A new algorithm for detecting communities in social networks based on content and structure information</article-title>. <source>Int J Web Inf Syst</source>. <year>2019</year>;<volume>16</volume>(<issue>1</issue>):<fpage>79</fpage>&#x2013;<lpage>93</lpage>. doi:<pub-id pub-id-type="doi">10.1108/ijwis-06-2019-0030</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Cai</surname> <given-names>J</given-names></string-name>, <string-name><surname>Hao</surname> <given-names>J</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>X</given-names></string-name>, <string-name><surname>Xun</surname> <given-names>Y</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>A new community detection method for simplified networks by combining structure and attribute information</article-title>. <source>Expert Syst Appl</source>. <year>2024</year>;<volume>246</volume>(<issue>8</issue>):<fpage>123103</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.eswa.2023.123103</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhao</surname> <given-names>X</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>J</given-names></string-name></person-group>. <article-title>A community detection algorithm based on graph compression for large-scale social networks</article-title>. <source>Inf Sci</source>. <year>2021</year>;<volume>551</volume>(<issue>3</issue>):<fpage>358</fpage>&#x2013;<lpage>72</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ins.2020.10.057</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Dabaghi-Zarandi</surname> <given-names>F</given-names></string-name>, <string-name><surname>Afkhami</surname> <given-names>MM</given-names></string-name>, <string-name><surname>Ashoori</surname> <given-names>MH</given-names></string-name></person-group>. <article-title>Community detection method based on random walk and multi objective evolutionary algorithm in complex networks</article-title>. <source>J Netw Comput Appl</source>. <year>2025</year>;<volume>234</volume>(<issue>4</issue>):<fpage>104070</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.jnca.2024.104070</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Karampour</surname> <given-names>E</given-names></string-name>, <string-name><surname>Malek</surname> <given-names>MR</given-names></string-name>, <string-name><surname>Eidi</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Discrete Ricci Flow: a powerful method for community detection in location-based social networks</article-title>. <source>Comput Electr Eng</source>. <year>2025</year>;<volume>123</volume>(<issue>2</issue>):<fpage>110302</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.compeleceng.2025.110302</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Khawaja</surname> <given-names>FR</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Ullah</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Common-neighbor based overlapping community detection in complex networks</article-title>. <source>Soc Netw Anal Min</source>. <year>2025</year>;<volume>15</volume>(<issue>1</issue>):<fpage>61</fpage>. doi:<pub-id pub-id-type="doi">10.1007/s13278-025-01480-5</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chandrika</surname> <given-names>GN</given-names></string-name>, <string-name><surname>Alnowibet</surname> <given-names>K</given-names></string-name>, <string-name><surname>Kautish</surname> <given-names>KS</given-names></string-name>, <string-name><surname>Reddy</surname> <given-names>ES</given-names></string-name>, <string-name><surname>Alrasheedi</surname> <given-names>AF</given-names></string-name>, <string-name><surname>Mohamed</surname> <given-names>AW</given-names></string-name></person-group>. <article-title>Graph transformer for communities detection in social networks</article-title>. <source>Comput Mater Contin</source>. <year>2022</year>;<volume>70</volume>(<issue>3</issue>):<fpage>5707</fpage>&#x2013;<lpage>20</lpage>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Azaouzi</surname> <given-names>M</given-names></string-name>, <string-name><surname>Rhouma</surname> <given-names>D</given-names></string-name>, <string-name><surname>Ben Romdhane</surname> <given-names>L</given-names></string-name></person-group>. <article-title>Community detection in large-scale social networks: state-of-the-art and future directions</article-title>. <source>Soc Netw Anal Min</source>. <year>2019</year>;<volume>9</volume>(<issue>1</issue>):<fpage>23</fpage>. doi:<pub-id pub-id-type="doi">10.1007/s13278-019-0566-x</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Arab</surname> <given-names>M</given-names></string-name>, <string-name><surname>Hasheminezhad</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Limitations of quality metrics for community detection and evaluation</article-title>. In: <conf-name>2017 3th International Conference on Web Research (ICWR); 2017 Apr 19&#x2013;20</conf-name>; <publisher-loc>Tehran, Iran</publisher-loc>. p. <fpage>7</fpage>&#x2013;<lpage>14</lpage>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Amin</surname> <given-names>F</given-names></string-name>, <string-name><surname>Choi</surname> <given-names>JG</given-names></string-name>, <string-name><surname>Choi</surname> <given-names>GS</given-names></string-name></person-group>. <article-title>Advanced community identification model for social networks</article-title>. <source>Comput Mater Contin</source>. <year>2021</year>;<volume>69</volume>(<issue>2</issue>):<fpage>1687</fpage>&#x2013;<lpage>707</lpage>. doi:<pub-id pub-id-type="doi">10.32604/cmc.2021.017870</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Dabaghi-Zarandi</surname> <given-names>F</given-names></string-name>, <string-name><surname>KamaliPour</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Community detection in complex network based on an improved random algorithm using local and global network information</article-title>. <source>J Netw Comput Appl</source>. <year>2022</year>;<volume>206</volume>(<issue>2</issue>):<fpage>103492</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.jnca.2022.103492</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sayari</surname> <given-names>S</given-names></string-name>, <string-name><surname>Harounabadi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Banirostam</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Community detection based on improved user interaction degree, weighted quasi-local path-based similarity and frequent pattern mining</article-title>. <source>J Supercomput</source>. <year>2024</year>;<volume>80</volume>(<issue>13</issue>):<fpage>18544</fpage>&#x2013;<lpage>72</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s11227-024-06178-7</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sayari</surname> <given-names>S</given-names></string-name>, <string-name><surname>Harounabadi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Banirostam</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Robust community detection based on improved user interaction, enhanced local path index and pattern mining in social networks</article-title>. <source>Inf Process Manag</source>. <year>2025</year>;<volume>62</volume>(<issue>3</issue>):<fpage>104008</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ipm.2024.104008</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Madhulika</surname> <given-names>Kaur P</given-names></string-name>, <string-name><surname>Sabharwal</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Hybrid label propagation based on motifs and similarity measures for community detection</article-title>. <source>Knowl Inf Syst</source>. <year>2025</year>;<volume>6</volume>(<issue>3</issue>):<fpage>115</fpage>. doi:<pub-id pub-id-type="doi">10.1007/s10115-025-02557-5</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Boudjebbour</surname> <given-names>K</given-names></string-name>, <string-name><surname>Belkhir</surname> <given-names>A</given-names></string-name>, <string-name><surname>Toubal</surname> <given-names>EB</given-names></string-name>, <string-name><surname>Rahim</surname> <given-names>M</given-names></string-name></person-group>. <article-title>User web access prediction based on web services and user profile</article-title>. In: <conf-name>2022 International Conference on Advanced Aspects of Software Engineering (ICAASE); 2022 Sep 17&#x2013;18</conf-name>; <publisher-loc>Constantine, Algeria</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>6</lpage>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Obidallah</surname> <given-names>WJ</given-names></string-name>, <string-name><surname>Raahemi</surname> <given-names>B</given-names></string-name>, <string-name><surname>Ruhi</surname> <given-names>U</given-names></string-name></person-group>. <article-title>Clustering and association rules for web service discovery and recommendation: a systematic literature review</article-title>. <source>SN Comput Sci</source>. <year>2020</year>;<volume>1</volume>(<issue>1</issue>):<fpage>27</fpage>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Eriksson</surname> <given-names>L</given-names></string-name>, <string-name><surname>Lagerkvist</surname> <given-names>V</given-names></string-name></person-group>. <article-title>Improved algorithms for allen&#x2019;s interval algebra: a dynamic programming approach</article-title>. In: <conf-name>Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21); 2021 Aug 19&#x2013;27</conf-name>; <publisher-loc>Montreal, QC, USA</publisher-loc>. p. <fpage>1873</fpage>&#x2013;<lpage>9</lpage>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Piao</surname> <given-names>G</given-names></string-name>, <string-name><surname>Breslin</surname> <given-names>JG</given-names></string-name></person-group>. <article-title>Measuring semantic distance for linked open data-enabled recommender systems</article-title>. In: <conf-name>Proceedings of the 31st Annual ACM Symposium on Applied Computing; 2016 Apr 3&#x2013;8</conf-name>; <publisher-loc>Pisa, Italy</publisher-loc>. p. <fpage>315</fpage>&#x2013;<lpage>20</lpage>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Cheniki</surname> <given-names>N</given-names></string-name>, <string-name><surname>Belkhir</surname> <given-names>A</given-names></string-name>, <string-name><surname>Sam</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Messai</surname> <given-names>N</given-names></string-name></person-group>. <article-title>Lods: A linked open data based similarity measure</article-title>. In: <conf-name>2016 IEEE 25th International Conference on eNabling Technologies: Infrastructure for Collaborative Enterprises (WETICE); 2016 Jun 13&#x2013;15</conf-name>; <publisher-loc>Paris, France</publisher-loc>. p. <fpage>229</fpage>&#x2013;<lpage>34</lpage>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Friendly</surname> <given-names>F</given-names></string-name></person-group>. <chapter-title>Jaro&#x2013;Winkler distance improvement for approximate string search using indexing data for multiuser application</chapter-title>. In: <source>Journal of Physics: Conference Series</source>. Vol. <volume>1361</volume>. <publisher-loc>Bristol, UK</publisher-loc>: <publisher-name>IOP Publishing</publisher-name>; <year>2019</year>. <fpage>012080</fpage> p.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Verma</surname> <given-names>V</given-names></string-name>, <string-name><surname>Aggarwal</surname> <given-names>RK</given-names></string-name></person-group>. <article-title>A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective</article-title>. <source>Soc Netw Anal Min</source>. <year>2020</year>;<volume>10</volume>(<issue>1</issue>):<fpage>43</fpage>. doi:<pub-id pub-id-type="doi">10.1007/s13278-020-00660-9</pub-id>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Alobed</surname> <given-names>M</given-names></string-name>, <string-name><surname>Altrad</surname> <given-names>AM</given-names></string-name>, <string-name><surname>Bakar</surname> <given-names>ZBA</given-names></string-name></person-group>. <article-title>A comparative analysis of Euclidean, Jaccard and Cosine similarity measure and arabic wordnet for automated arabic essay scoring</article-title>. In: <conf-name>2021 Fifth International Conference on Information Retrieval and Knowledge Management (CAMP); 2021 Jun 15&#x2013;16; Online</conf-name>. p. <fpage>70</fpage>&#x2013;<lpage>4</lpage>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Temple</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Characteristics of distance matrices based on Euclidean, Manhattan and Hausdorff coefficients</article-title>. <source>J Classif</source>. <year>2023</year>;<volume>40</volume>(<issue>2</issue>):<fpage>214</fpage>&#x2013;<lpage>32</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s00357-023-09435-1</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Setyorini</surname> <given-names>I</given-names></string-name>, <string-name><surname>Ramayanti</surname> <given-names>D</given-names></string-name></person-group>. <article-title>Finding nearest mosque using Haversine formula on Android platform</article-title>. <source>Jurnal Online Informatika</source>. <year>2019</year>;<volume>4</volume>(<issue>1</issue>):<fpage>57</fpage>&#x2013;<lpage>62</lpage>. doi:<pub-id pub-id-type="doi">10.15575/join.v4i1.267</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Jin</surname> <given-names>D</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>He</surname> <given-names>D</given-names></string-name>, <string-name><surname>Gabrys</surname> <given-names>B</given-names></string-name>, <string-name><surname>Musial</surname> <given-names>K</given-names></string-name></person-group>. <article-title>Robust detection of communities with multi-semantics in large attributed networks</article-title>. In: <conf-name>Knowledge Science, Engineering and Management: 11th International Conference, KSEM 2018</conf-name>. <publisher-loc>Cham, Switzerland</publisher-loc>: <publisher-name>Springer</publisher-name>; <year>2018</year>. p. <fpage>362</fpage>&#x2013;<lpage>76</lpage>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Kostaras</surname> <given-names>I</given-names></string-name>, <string-name><surname>Drabo</surname> <given-names>C</given-names></string-name>, <string-name><surname>Juneau</surname> <given-names>J</given-names></string-name>, <string-name><surname>Reimers</surname> <given-names>S</given-names></string-name>, <string-name><surname>Schr&#x00F6;der</surname> <given-names>M</given-names></string-name>, <string-name><surname>Wielenga</surname> <given-names>G</given-names></string-name>, <etal>et al</etal></person-group>. <chapter-title>Porting an application to the netbeans platform</chapter-title>. In: <source>Pro Apache NetBeans: Building Applications on the Rich Client Platform</source>. <publisher-loc>Berkeley, CA, USA</publisher-loc>: <publisher-name>Apress</publisher-name>; <year>2020</year>. p. <fpage>255</fpage>&#x2013;<lpage>97</lpage> doi: <pub-id pub-id-type="doi">10.1007/978-1-4842-5370-0_9</pub-id>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mrvar</surname> <given-names>A</given-names></string-name>, <string-name><surname>Batagelj</surname> <given-names>V</given-names></string-name></person-group>. <article-title>Programs for analysis and visualization of very large networks reference manual</article-title>. <source>Recuperado El</source>. <year>2018</year>;<volume>12</volume>:<fpage>3</fpage>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Ghosh</surname> <given-names>S</given-names></string-name>, <string-name><surname>Halappanavar</surname> <given-names>M</given-names></string-name>, <string-name><surname>Tumeo</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kalyanaraman</surname> <given-names>A</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Chavarria-Miranda</surname> <given-names>D</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Distributed louvain algorithm for graph community detection</article-title>. In: <conf-name>2018 IEEE international parallel and distributed processing symposium (IPDPS); 2018 May 21&#x2013;25</conf-name>; <publisher-loc>Vancouver, BC, Canada</publisher-loc>. p. <fpage>885</fpage>&#x2013;<lpage>95</lpage>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Eckhardt</surname> <given-names>CM</given-names></string-name>, <string-name><surname>Madjarova</surname> <given-names>SJ</given-names></string-name>, <string-name><surname>Williams</surname> <given-names>RJ</given-names></string-name>, <string-name><surname>Ollivier</surname> <given-names>M</given-names></string-name>, <string-name><surname>Karlsson</surname> <given-names>J</given-names></string-name>, <string-name><surname>Pareek</surname> <given-names>A</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Unsupervised machine learning methods and emerging applications in healthcare</article-title>. <source>Knee Surg Sports Traumatol Arthrosc</source>. <year>2023</year>;<volume>31</volume>(<issue>2</issue>):<fpage>376</fpage>&#x2013;<lpage>81</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s00167-022-07233-7</pub-id>; <pub-id pub-id-type="pmid">36378293</pub-id></mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhao</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Xie</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>N</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>A</given-names></string-name>, <string-name><surname>Yan</surname> <given-names>R</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Node influence-based label propagation for community detection using both topology and attributes</article-title>. <source>Expert Syst Appl</source>. <year>2025</year>;<volume>287</volume>(<issue>5</issue>):<fpage>127999</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.eswa.2025.127999</pub-id>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sun</surname> <given-names>B</given-names></string-name>, <string-name><surname>Tu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Song</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Leveraging community detection for clustered federated learning on Non-IID data: from an information-theoretic perspective</article-title>. <source>Future Gener Comput Syst</source>. <year>2026</year>;<volume>174</volume>(<issue>2</issue>):<fpage>108005</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.future.2025.108005</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>