Novel Path Counting-Based Method for Fractal Dimension Estimation of the Ultra-Dense Networks

Next-generation networks, including the Internet of Things (IoT), fifthgeneration cellular systems (5G), and sixth-generation cellular systems (6G), suffer from the dramatic increase of the number of deployed devices. This puts high constraints and challenges on the design of such networks. Structural changing of the network is one of such challenges that affect the network performance, including the required quality of service (QoS). The fractal dimension (FD) is considered one of the main indicators used to represent the structure of the communication network. To this end, this work analyzes the FD of the network and its use for telecommunication networks investigation and planning. The cluster growing method for assessing the FD is introduced and analyzed. The article proposes a novel method for estimating the FD of a communication network, based on assessing the network’s connectivity, by searching for the shortest routes. Unlike the cluster growing method, the proposed method does not require multiple iterations, which reduces the number of calculations, and increases the stability of the results obtained. Thus, the proposed method requires less computational cost than the cluster growing method and achieves higher stability. The method is quite simple to implement and can be used in the tasks of research and planning of modern and promising communication networks. The developed method is evaluated for two different network structures and compared with the cluster growing method. Results validate the developed method.


Introduction
describe the network environment, while in [26] it is used to describe the size of digital clusters. In the previously mentioned works, FD is used in relation to artificial objects of the network environment space, e.g., buildings and roads; however, real objects in the surrounding world should be considered fractal. In some cases, when it comes to networks of natural origin in the microcosm, e.g., molecular connections and neural network, physics and chemistry also resort to their description using the concept of FD [27]. The communication network is an artificial object; however, its development is associated with an evolutionary process of natural objects, and the current numerical indicators allow it to be put on a par with such objects.
The main contributions of this article can be summarized as follows. 1. Design a network model based on graph theory, and model the problem of network structure. 2. Design a framework to describe the structure of the communication network based on the concept of FD. 3. Development of a method for estimating the fractal dimension of the wireless network based on the connectivity. 4. Design a novel method for extracting features of the network structure. This is achieved by employing the FD values as numerical indicators that characterize the network features. The value of FD is not an exhaustive characteristic of the structure, but FD can be an attribute of network indicators since FD provides information that is necessary for evaluating and comparing network structures.

Network Model and Problem Statement
We consider a wireless network with a model represented by a graph G (V, E); V is the set of network vertices, and E is the set of communication links. For a physical network structure, the set of vertices V is associated with a set of network nodes, and the set of edges E is associated with a set of communication lines. For a logical structure, i.e., the structure of information links, the set of vertices V is also associated with a set of network nodes, while the set of edges E is associated with a set of routes, i.e., information links.
In general, an oriented weighted graph should be used to describe the communication network since the parameters of the connections may not be symmetric. For the modeled network, it is assumed that the communication channels are symmetric, and thus we consider an undirected weighted graph. The weights of the edges of graph G reflect some numerical characteristics essential for solving a specific problem, e.g., distance, data transfer rate, delivery delay, and probability of losses.
A network model in the form of a graph considers its structural and geographical features by setting the corresponding coordinates to the vertices, i.e., weights to the edges of the graph. Moreover, it can reflect other topological features. The considered model is an undirected weighted graph, and thus, the FD of the network, i.e., graph, can be defined for it [28]. To evaluate the FD of the network, the cluster growing method is used [29].
The cluster formation process in a communication network starts by randomly selecting of some vertices [30]. Then, the selected vertices are connected, starting from the first selected one, with edges, if the distance between them does not exceed a given value r. The resulting cluster contains m vertices. Each of the vertices is assigned a certain weight, using the mass coefficient, V j , and thus, the total mass of the cluster, C, is M C , and is calculated as in Eq. (1).
For a unity weight of different vertices, i.e., V j = 1, the average mass of the cluster is equal to the average number of vertices in it. Increasing the value of r, results in an increase in the size of the resulting cluster, i.e., the number of vertices in the cluster. This is due to the fact that the cluster in this case includes vertices that are further away from the neighboring vertices. As r increases, the cluster growth process is observed. This process characterizes a set of vertices, i.e., nodes, from the point of view of forming a network according to the metric r. The network operation parameter can be selected as this metric. For example, if it is a distance, then this process can characterize a change in the connectivity of a wireless network. Other examples, if it is a delay, then it characterizes the network's ability to meet data delivery time requirements, and if it is the probability of failure (loss), then this characteristic corresponds to the probability of delivery.  Fig. 1a shows the clustering result with a cluster size of r = 23 m, while Fig. 1b shows the result of the clustering with a cluster size of r = 27 m. In this example, a geometric model is given, in which the nodes of the graph model, i.e., network nodes, are located in a flat square area with a 500 m side. The network deploys 500 randomly distributed nodes, i.e., graph vertices. In the first case, the average number of nodes in the cluster, i.e., cluster mass, is 6.3 while in the second is 15.4. From the presented graph, all vertices can be connected, while the connectivity condition is not to exceed the distance r.
Obviously, the process of cluster growth depends on the choice of the initial vertex, and thus, one observation does not give a complete picture of the network. Therefore, the considered cluster growth method involves a lot of observations. Thus, the process of growing a cluster is repeated many times with different initial conditions that the process is started from different vertices of the network k times. These different vertices are randomly selected. The cluster's average size, i.e., weight, is calculated as in Eq. (2).
To estimate the fractional dimension, FD, the fundamental relation is used as follows. log where a is a constant coefficient. This equation is a linear equation with the slope representing the value of FD, d f , as illustrated in Fig. 2. Thus, the essence of the method of estimating the FD of a graph is to analyze the dependence of the average number of cluster vertices, i.e., cluster mass, on its size.
Obviously, the meaning of the FD differs when choosing different metrics for the weight coefficients attributed to the edges of the graph, i.e., the magnitude of r. For example, suppose the value of the data delivery delay is used as the weighting factor. In that case, the clustering will result in clusters made up of nodes, and the data delivery time between them does not exceed a specified value. The value of the FD, in this case, reflects the dependence of the number of nodes, in such clusters, on the value of the allowable delay. Such characteristics are important in evaluating communication networks, and for planning them. As it gives an idea of the logical structure of the network and its dependence on the requirements of the parameter selected as a metric.
The accuracy of the estimation process depends on the selected number of iterations k. Theoretically, this number is equal to the number of network nodes. However, in most practical cases, this number of itterations results in massive computations, and with different values of k, the results obtained may differ significantly. Labor intensity is a significant difficulty in implementing this method. Thus, the characteristic of the FD network can be used for promising communication networks as a characteristic of the structure for various functional parameters. For the practical use of FD, novel methods with limited computations should be introduced.

The Proposed Method for Estimating the Fractal Dimension of a Network Based on Connectivity
As mentioned in the previous section, the FD of a network can be estimated using Eq. (3), which represents the dependence of the number of vertices in the cluster on the value of r. This dependence characterizes the process of cluster growth with increasing r. Such a process is well identified in the theory of random graphs and percolation theory [31,32]. In the theory of random graphs, this process is considered as the process of phase transition of a graph from a disconnected state to a connected state,  [31]. The percolation theory also considers the phase transition of a medium from one state to another, e.g., from a non-conducting state to a conducting or another physical state, which is characterized by the formation of an infinite, i.e., percolation, cluster [32].
Both mentioned theories are similar in considering the process of cluster growth, clusters or graph components, which leads to a change in the properties of the network, environment, or graph. In this case, a cluster is understood as a set of vertices representing a graph component, i.e., a fully connected subgraph. Thus, when using the cluster growth method, if the range of the variation of r is large enough, the cluster in question is connected to a giant component after a certain step of increasing r. However, for small values of r, it most likely does not belong to a giant component. The choice of different clustering options, e.g., initial vertices, averages this process and allows to judge it as a process of changing the connectivity of the graph, i.e., network.
In this section, we propose a method for evaluating the FD of a network by evaluating its connectivity, which characterizes the reachability of the graph vertices. It is estimated through the number of shortest paths that can be established between its vertices. The proposed method is referred to as the method of counting paths.
When the graph is connected, there is at least one path between any pair of its vertices. Thus, in a connected graph, there is the shortest path between any pair of vertices, and when the several paths are of equal weight, then any of them is chosen as the shortest. Furthermore, if the graph contains N vertices and is completely connected, the number of shortest paths between its vertices is N 2 -N, excluding paths from the vertex to itself. When the value of r changes, the number of shortest paths in the graph changes from 0, a completely disconnected state, to the maximum possible value, the state of complete connectivity.
To estimate the FD, Eq. (3) is modified by replacing the cluster mass estimate with the number of shortest paths in the graph, as follows.

S r ð Þ a r d f
where S(r) is the number of shortest paths in the graph that can be estimated using Floyd's algorithm [33]. The initial data of the algorithm is the distance matrix, D, which is defined as follows.
where d i,j is the distance between vertices, i.e., nodes, i and j. The original distance matrix is modified by taking into account the value of r as follows.
where, d BIG is a sufficiently large number that exceeds the maximum possible path length, which in this case is equivalent to an infinitely large number. Obviously, a change in the value of r leads to a change in the distance matrix and, consequently, the shortest path search result.
Moreover, the Floyd's algorithm is used to find the weights between all pairs of vertices and find the shortest paths. The weights of the shortest paths are represented by the C matrix defined in Eq. (8).
If the weight of the found path is not infinity, this indicates that this path exists; otherwise, there is no path. Thus, the number of paths is calculated as follows.
where, I is the indicator function that is defined as follows.
where, S(r) is the number of shortest paths calculated using Eq. (9). The main steps of the Floyd's algorithm for the developed route counting method are presented in Fig. 3.
The cycle is performed k-times depending on the range of the changes of r, and the step size of its change Δr. Thus, k can be calculated as follows.
where r max and r min are the maximum and minimum values of r, respectively. The maximum and minimum values are selected based on the initial data from the distance matrix D.
The value of Δr is chosen based on the considerations of obtaining a sufficient number of points for constructing a linear regression, practically, several tens of points.

Performance Evaluation
In this section, the developed method for estimating the fractal dimension of a communication network based on connectivity is evaluated. This is for the network shown in Fig. 1, with 500 nodes evenly distributed in a square. Fig. 4 shows the result of the FD evaluation for the considered network using the proposed method and the cluster growing method.
As presented in Fig. 4, both methods achieve nearly similar results. The mentioned points in the results are obtained by calculating the estimated indicators; the number of vertices in accordance with the cluster growing method, i.e., lower group of curves, and the number of routes in accordance with the proposed method, i.e., upper group of curves. These points are connected by dotted lines, which demonstrate the nature of the dependence of the corresponding indicator on the value of r. From the above dependencies, it can be seen that the laws of change of the considered indicators are very similar. The construction of linear regressions for the obtained results also confirms their similarity.
The FD was obtained by the cluster growing method for the given network, and its value is 5.06, while from the route counting method is 5.09. The proximity of the results obtained by these methods has been tested on a sufficiently large samples of different networks, while the difference between the estimates obtained has not exceed 5%. This gives grounds to assert that the proposed method can be used to evaluate the FD of a communication network with sufficient accuracy for practical application. This 5% difference in the obtained values, when evaluating the cluster growth method is due to the number of iterations, i.e., initial conditions, is limited to reduce the number of calculations.
Thus, the proposed method is similar to the cluster growing method in the sense that it is also based on an estimate of the number of connected network nodes. The difference is that the estimation of this number is based on an estimate of the number of routes in the network.
The main advantage of the proposed method is the simplest implementation of the method that does not require many iterations, which reduces the implementation time and cost. Moreover, it is enough to obtain only one dependence of S(r). Fig. 5 shows another two constructed network structures formed by the placement of nodes. Both network structures are of 500 nodes distributed over a square area. The two Figure 4: The result of the FD assessment by two methods considered structures differ in the distribution of nodes over the network area. In Fig. 5a, the coordinates of the nodes are distributed according to a two-dimensional, 2D, normal distribution, with a scattering point in the middle of the square and a standard deviation of 80. Fig. 5b shows a network structure, while nodes are distributed according to a multi-modal mixed distribution obtained from four 2D normal distributions.
The FD is estimated using the developed route calculation method for both considered structures, and the results are presented in Fig. 6. Results indicate that the nature of the number dependencies of routes on the value of r is different. Moreover, the numerical values of the FD of the two network structures are different. For the first of network structure the value of FD is 2.4, while for the second is 4.7.
There are two limiting cases for analyzing the boundaries of the change of the FD network using the developed method. The first case is when the distances between network nodes are very small, i.e., the network is pulled to a point. Changing r, in such case, does not change the number of nodes in the cluster or the number of routes and thus achieves a zero-regression, i.e., linear regression coefficient is zero. This results in a zero FD. The second case is when all nodes of the network are located at equal distances from each other, i.e., nodes that form a flat grid with square cells. In this case, changing the number of nodes in the cluster, or the number of routes results in an instantaneous peak when r becomes equal to the distance between the nodes. This results in a maximum regression that achieves an infinite FD. These cases describe boundary states, the real value of the FD of the network lies between them, and depends on the distribution of the value of r.
FD is really a characteristic of the network that can be used independently or in addition to other characteristics. The main advantage of the developed route counting method is the stability of the results obtained, compared to the cluster growing method. However, it should be noted that the volume of calculations of this method is determined by the cube of the number of vertices, i.e., nodes of the network, N 3 , which is the number of operations performed by the Floyd algorithm. With a large number of network nodes, N, a significant amount of computing resources may be required. In such case, a limited selection of nodes should, probably, be used; however, this leads to a decrease in the stability of the results. Thus, the developed route counting method is advisable, when there is data on all/most of the network nodes. The use of the cluster growing method is advisable, when a selective analysis of the network is assumed without covering all nodes.

Conclusions
The concept of the FD network can be applied to communication networks to characterize the features of their structure. This parameter gives a numerical characteristic of the network structure, describing its properties in relation to a selected metric. This metric can be any parameter of the quality of network functioning indicators. Therefore, for communication networks, the FD parameter should be given in the context of the analyzed parameter of the quality of functioning. The numerical value of this parameter characterizes the degree of the invariance of the network structure to the scale determined by the value of the selected functioning parameter. The estimation of the FD of the network can be performed on the graph model of the network by the developed cluster growing method. With a relatively large number of network nodes, this method requires quite large computing resources. Also, the results obtained with a limited number of iterations are random and cannot always be used to compare the simulated structures. The article proposes a novel method for estimating the FD of a communication network, based on assessing the connectivity of the network by searching for the shortest routes. Unlike the cluster growing method, the proposed method does not require multiple iterations, which reduces the number of calculations, and increases the stability of the results obtained. The method is quite simple to implement and can be used in research and planning modern and promising communication networks.