<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMES</journal-id>
<journal-id journal-id-type="nlm-ta">CMES</journal-id>
<journal-id journal-id-type="publisher-id">CMES</journal-id>
<journal-title-group>
<journal-title>Computer Modeling in Engineering &#x0026; Sciences</journal-title>
</journal-title-group>
<issn pub-type="epub">1526-1506</issn>
<issn pub-type="ppub">1526-1492</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">24470</article-id>
<article-id pub-id-type="doi">10.32604/cmes.2023.024470</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Aggregate Point Cloud Geometric Features for Processing</article-title>
<alt-title alt-title-type="left-running-head">Aggregate Point Cloud Geometric Features for Processing</alt-title>
<alt-title alt-title-type="right-running-head">Aggregate Point Cloud Geometric Features for Processing</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Li</surname><given-names>Yinghao</given-names></name><xref ref-type="aff" rid="aff-1">1</xref>
<xref ref-type="aff" rid="aff-2">2</xref>
<xref ref-type="aff" rid="aff-3">3</xref></contrib>
<contrib id="author-2" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Xia</surname><given-names>Renbo</given-names></name><xref ref-type="aff" rid="aff-1">1</xref>
<xref ref-type="aff" rid="aff-2">2</xref><email>xiarb@sia.cn</email></contrib>
<contrib id="author-3" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Zhao</surname><given-names>Jibin</given-names></name><xref ref-type="aff" rid="aff-1">1</xref>
<xref ref-type="aff" rid="aff-2">2</xref><email>jbzhao@sia.cn</email></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Chen</surname><given-names>Yueling</given-names></name><xref ref-type="aff" rid="aff-1">1</xref>
<xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western"><surname>Tao</surname><given-names>Liming</given-names></name><xref ref-type="aff" rid="aff-1">1</xref>
<xref ref-type="aff" rid="aff-2">2</xref>
<xref ref-type="aff" rid="aff-3">3</xref></contrib>
<contrib id="author-6" contrib-type="author">
<name name-style="western"><surname>Zou</surname><given-names>Hangbo</given-names></name><xref ref-type="aff" rid="aff-1">1</xref>
<xref ref-type="aff" rid="aff-2">2</xref>
<xref ref-type="aff" rid="aff-3">3</xref></contrib>
<contrib id="author-7" contrib-type="author">
<name name-style="western"><surname>Zhang</surname><given-names>Tao</given-names></name><xref ref-type="aff" rid="aff-1">1</xref>
<xref ref-type="aff" rid="aff-2">2</xref>
<xref ref-type="aff" rid="aff-3">3</xref></contrib>
<aff id="aff-1"><label>1</label><institution>Shenyang Institute of Automation, Chinese Academy of Sciences</institution>, <addr-line>Shenyang, 110016</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences</institution>, <addr-line>Shenyang, 110169</addr-line>, <country>China</country></aff>
<aff id="aff-3"><label>3</label><institution>University of Chinese Academy of Sciences</institution>, <addr-line>Beijing, 100049</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Authors: Renbo Xia. Email: <email>xiarb@sia.cn</email>; Jibin Zhao. Email: <email>jbzhao@sia.cn</email></corresp>
</author-notes>
<pub-date publication-format="print" date-type="pub" iso-8601-date="2023-01-04"><day>04</day>
<month>01</month>
<year>2023</year></pub-date>
<volume>136</volume>
<issue>1</issue>
<fpage>555</fpage>
<lpage>571</lpage>
<history>
<date date-type="received"><day>31</day><month>5</month><year>2022</year></date>
<date date-type="accepted"><day>20</day><month>9</month><year>2022</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 Li et al.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Li et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMES_24470.pdf"></self-uri>
<abstract>
<p>As 3D acquisition technology develops and 3D sensors become increasingly affordable, large quantities of 3D point cloud data are emerging. How to effectively learn and extract the geometric features from these point clouds has become an urgent problem to be solved. The point cloud geometric information is hidden in disordered, unstructured points, making point cloud analysis a very challenging problem. To address this problem, we propose a novel network framework, called Tree Graph Network (TGNet), which can sample, group, and aggregate local geometric features. Specifically, we construct a Tree Graph by explicit rules, which consists of curves extending in all directions in point cloud feature space, and then aggregate the features of the graph through a cross-attention mechanism. In this way, we incorporate more point cloud geometric structure information into the representation of local geometric features, which makes our network perform better. Our model performs well on several basic point clouds processing tasks such as classification, segmentation, and normal estimation, demonstrating the effectiveness and superiority of our network. Furthermore, we provide ablation experiments and visualizations to better understand our network.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Deep learning</kwd>
<kwd>point-based models</kwd>
<kwd>point cloud analysis</kwd>
<kwd>3D shape analysis</kwd>
<kwd>point cloud processing</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1"><label>1</label><title>Introduction</title>
<p>With the rapid development of 3D vision sensors such as RGB-D cameras, 3D point cloud data has proliferated, which can provide rich 3D geometric information. The analysis of 3D point clouds is receiving more and more attention as they can be used in many aspects such as autonomous driving, robotics, and remote sensing [<xref ref-type="bibr" rid="ref-1">1</xref>]. Intelligent (automatic, efficient, and reliable) feature learning and representation of these massive point cloud data is a key problem for 3D understanding (including 3D object recognition, semantic segmentation and 3D object generation, etc.).</p>
<p>Thanks to deep learning&#x2019;s powerful ability to learn features, deep learning has attracted extensive attention. It has also achieved fruitful results in the field of image understanding over the past few years [<xref ref-type="bibr" rid="ref-2">2</xref>&#x2013;<xref ref-type="bibr" rid="ref-10">10</xref>]. As traditional 3D point cloud features rely on artificial design, they cannot describe semantic information at a high level, making adaptations to complex real-life situations difficult. However, deep learning methods with autonomous feature learning capacity have great advantages in these aspects. Since point clouds are disordered and unstructured, traditional deep learning methods that work well on 2D images cannot be directly used to process point clouds. Inferring shape information from these irregular points is complicated.</p>
<p>In order to process point clouds using raw data, Qi&#x00A0;et&#x00A0;al.&#x00A0;proposed PointNet [<xref ref-type="bibr" rid="ref-8">8</xref>], which uses multilayer perceptrons (MLPs) with shared parameters to map each point to a high-dimensional feature space, and then passes in a Max Pooling layer to extract global features. Since PointNet mainly focuses on the overall features and ignores the neighborhood structure information, it is difficult for PointNet to capture local geometric structure information. Qi&#x00A0;et&#x00A0;al.&#x00A0;proposed PointNet&#x002B;&#x002B; [<xref ref-type="bibr" rid="ref-9">9</xref>], which introduces a multilayer network structure in PointNet to better capture geometric structure information from the neighborhood of each point. The network structure for PointNet&#x002B;&#x002B; is similar to image convolutional neural network. PointNet&#x002B;&#x002B; extracts local neighborhood features using PointNet as basic components and abstracts the extracted features layer by layer using a hierarchical network structure. Due to their simplicity and powerful presentation, many networks have been developed based on PointNet and PointNet&#x002B;&#x002B; [<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-11">11</xref>&#x2013;<xref ref-type="bibr" rid="ref-18">18</xref>].</p>
<p>Local feature aggregation is an important basic operation that has been extensively studied in recent years [<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-14">14</xref>,<xref ref-type="bibr" rid="ref-19">19</xref>], which is mainly used to discover the correlations between points in local regions. For each key point, its neighbors are first grouped according to predefined rules (e.g., KNN). Next, the features between query points and neighboring points are passed into various point-based transformations and aggregation modules for local geometric feature extraction.</p>
<p>Local feature aggregation can incorporate some prior knowledge into local features by predefined rules. For example, KNN-based approaches explicitly assume that local features are related to neighboring points and independent of non-adjacent features in same layers. They incorporate this information into local features by KNN. However, the above operation lacks long-range relations, Li&#x00A0;et&#x00A0;al.&#x00A0;proposed a non-local module to capture them [<xref ref-type="bibr" rid="ref-6">6</xref>]. It not only considers neighboring points, but also the whole point cloud sampling points. It incorporates this priori information into the local features by L-NL Module.</p>
<p>We argue that these approaches are insufficient to extract long-range relations. For this reason, we propose an end-to-end point cloud processing network named TGNet, which can efficiently, robustly, and adequately depict the geometry of point clouds. <xref ref-type="fig" rid="fig-1">Fig. 1</xref> intuitively compares our aggregation method with local and non-local aggregation methods. Compared with local aggregation approaches, our method can better capture long-range dependencies. Compared with non-local aggregation approaches, our method avoids the global point-to-point mapping and can extract geometric features more efficiently.</p>
<p>Our main contributions can be summarized as follows:
<list list-type="order">
<list-item><p>We propose a novel robust end-to-end point cloud processing network, named TGNet, which can effectively enhance point clouds processing.</p></list-item>
<list-item><p>We design a local feature grouping block TGSG (Tree Graph Sampling and Grouping) that enables our network to better trade off the balance of local and long-range dependencies.</p></list-item>
<list-item><p>We further design a transformer-based point cloud aggregation block TGA, which can efficiently aggregate Tree Graph features.</p></list-item>
</list></p>
<fig id="fig-1"><label>Figure 1</label><caption><title>Common aggregations and tree graph (our) aggregation. Red points denote key points, yellow points denote query points and green circles denote query range. <bold>Left:</bold> local aggregation method. <bold>Middle:</bold> non-local aggregation method. <bold>Right:</bold> our aggregation method (Note that our query points are in the feature space)</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMES_24470-fig-1.png"/></fig>
<p>Our approach achieves state-of-the-art performance in extensive experiments on point cloud classification, segmentation, and normal estimation, which validates the effectiveness of our work.</p>
<p>We note that an earlier version of this paper appeared in [<xref ref-type="bibr" rid="ref-20">20</xref>]. This manuscript has been expanded, revised, and refined based on conference papers. Our description of the method provides a more complete explanation. In the experiments section, supplementary experiments and visualizations have been added to further understand our model.</p>
</sec>
<sec id="s2"><label>2</label><title>Related Work</title>
<sec id="s2_1"><label>2.1</label><title>Deep Learning on Point Cloud</title>
<p>The biggest challenge of point cloud processing is its unstructured representations. According to the form of the data input to neural network, existing learning methods can be classified as volume-based [<xref ref-type="bibr" rid="ref-2">2</xref>,<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>,<xref ref-type="bibr" rid="ref-7">7</xref>,<xref ref-type="bibr" rid="ref-10">10</xref>], projection-based [<xref ref-type="bibr" rid="ref-3">3</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>&#x2013;<xref ref-type="bibr" rid="ref-24">24</xref>], and point-based methods [<xref ref-type="bibr" rid="ref-8">8</xref>,<xref ref-type="bibr" rid="ref-9">9</xref>,<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-14">14</xref>,<xref ref-type="bibr" rid="ref-16">16</xref>&#x2013;<xref ref-type="bibr" rid="ref-18">18</xref>,<xref ref-type="bibr" rid="ref-25">25</xref>&#x2013;<xref ref-type="bibr" rid="ref-31">31</xref>]. Projection-based methods project an unstructured point cloud into a set of 2D images, while volume-based methods transform the point cloud into regular 3D grids. Then, the task is completed using 2D or 3D convolutional neural networks. These methods do not use raw point cloud directly and suffer from explicit information loss and extensive computation. For volume-based methods, low-resolution voxelization will result in the loss of detailed structural information of objects, while high-resolution voxelization will result in huge memory and computational requirements. For projection-based methods, they are more sensitive to viewpoints selection and object occlusion. Furthermore, such methods cannot adequately extract geometric and structural information from 3D point clouds due to information loss during 3D-to-2D projection.</p>
<p>PointNet is a pioneer of point-based methods, which directly uses raw point clouds as input to neural networks to extract point cloud features through shared MLP and global Max Pooling. To capture delicate geometric structures from local regions, Qi&#x00A0;et&#x00A0;al.&#x00A0;proposed a hierarchical network PointNet&#x002B;&#x002B; [<xref ref-type="bibr" rid="ref-9">9</xref>]. Local features are learned from local geometric structures and abstracted layer by layer. The point-based approach does not require any voxelization or projection and thus does not introduce explicit information loss and is gaining popularity. Following them, recent work has focused on designing advanced convolution operations, considering a wider range of neighborhoods and adaptive aggregation of query points. In this paper, point-based approach is also used to construct our network.</p>
</sec>
<sec id="s2_2"><label>2.2</label><title>Advanced Convolution Operations</title>
<p>Although unstructured point clouds make it difficult to design convolution kernels, advanced convolution kernels in recent literature have overcome these drawbacks and achieved promising results on basic point cloud analysis tasks. Current 3D convolution methods can be divided into continuous [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-16">16</xref>,<xref ref-type="bibr" rid="ref-17">17</xref>], discrete [<xref ref-type="bibr" rid="ref-28">28</xref>,<xref ref-type="bibr" rid="ref-32">32</xref>] and graph-based convolution methods [<xref ref-type="bibr" rid="ref-12">12</xref>]. Continuous convolution methods define the convolution operation depending on the spatial distribution of local regions. The convolution output is a weighted combination of adjacent point features, and the convolution weights of adjacent points are determined based on their spatial distribution to the centroids. For example, RS-CNN [<xref ref-type="bibr" rid="ref-17">17</xref>] maps predefined low-level neighborhood relationships (e.g., relative position and distance) to high-level feature relationships via MLPs and uses them to determine the weights of neighborhood points. In PointConv [<xref ref-type="bibr" rid="ref-16">16</xref>], the convolution kernel is considered as a nonlinear function of local neighborhood point coordinates, consisting of weight and density functions. The weight functions are learned by MLPs, and the kernelized density estimates are used to learn the density functions.</p>
<p>Discrete convolution method defines a convolution operation on regular grids, where the offset about the centroid determines the weights of the neighboring points. In GeoConv [<xref ref-type="bibr" rid="ref-32">32</xref>], Edge features are decomposed into six bases, which encourages the network to learn edge features independently along each base. Then, the features are aggregated according to the geometric relationships between the edge features and the bases. Learning in this way can preserve the geometric structure information of point clouds.</p>
<p>Graph-based convolution methods use a graph to organize raw unordered 3D point cloud, where the vertices of the graph are defined by points in the point cloud, and the directed edges of the graph are generated by combining the centroids and neighboring points. Features learning and aggregation are performed in spatial or spectral domains. In DGCNN [<xref ref-type="bibr" rid="ref-12">12</xref>], its graph is built in feature space and changes as features are extracted. EdgeConv is used to generate edge features and search for neighbors in their feature space.</p>
</sec>
<sec id="s2_3"><label>2.3</label><title>Wider Range of Neighborhoods</title>
<p>Due to the geometric structure of the point cloud itself, it is difficult to determine precisely which global points are associated with local point cloud features. During the information extraction and abstraction process, local features are roughly assumed to be associated only with neighboring points. Recent state-of-the-art methods in literatures attempt to address the above difficulties and achieve promising results on basic point cloud analysis tasks. SOCNN [<xref ref-type="bibr" rid="ref-33">33</xref>] and PointASNL [<xref ref-type="bibr" rid="ref-6">6</xref>] sample global and local points and then fuse them with features. With these computed features, point cloud processing can be executed with greater accuracy and robustness.</p>
<p>Unlike all existing sampling methods, we follow explicit rules for sampling and grouping points on the surface of the point cloud. In this way, our local features will contain rich information describing the shape and geometry of the object.</p>
</sec>
<sec id="s2_4"><label>2.4</label><title>Adaptive Aggregation</title>
<p>There are currently two main types of feature aggregation operators: local and non-local. Local feature aggregation operators fuse existing features of neighboring points to obtain new features. After that, the new features are abstracted layer by layer through a hierarchical network structure to get global features. Different from the local feature aggregation operator, non-local aggregation operators introduce global information when computing local features. Non-local aggregation operators start with nonlocal neural networks [<xref ref-type="bibr" rid="ref-34">34</xref>], which essentially use self-attention to compute a new feature by fusing the features of neighboring points with global information. Due to the success of the transformer in vision tasks [<xref ref-type="bibr" rid="ref-35">35</xref>&#x2013;<xref ref-type="bibr" rid="ref-37">37</xref>] and the fact that the transformer [<xref ref-type="bibr" rid="ref-38">38</xref>] itself has inherently permutation invariant and is well suited for point cloud processing, the transformer has received extensive attention in extracting non-local features for point cloud processing [<xref ref-type="bibr" rid="ref-14">14</xref>,<xref ref-type="bibr" rid="ref-19">19</xref>]. As a representative, Qi&#x00A0;et&#x00A0;al.&#x00A0;propose PCT [<xref ref-type="bibr" rid="ref-14">14</xref>], where global features are used to learn multi-to-one feature mappings after transformation and aggregation.</p>
<p>Unlike the two feature aggregation operators mentioned above, we argue that point cloud processing can be better achieved by taking special consideration of local geometry. By aggregating additional geometric information, local features will carry more information and thus achieve better results.</p>
</sec>
</sec>
<sec id="s3"><label>3</label><title>Method</title>
<p>In this paper, we design a novel framework TGNet (Tree Graph Network), which improves the ability to extract local features and brings the global information into point representation. TGNet consists of several TGSG (Tree Graph Sampling and Grouping) blocks for sampling and grouping and TGA (Tree Graph Aggregation) blocks for aggregating features. For each block, the TGSG block first receives the output from the previous block. It then follows explicit rules for sampling, grouping, and simple processing, which can assemble additional information about the geometric structure of local regions. TGA block uses a self-attention mechanism to aggregate Tree Graph to obtain new features for the next module.</p>
<p>We first introduce the TGSG block in <xref ref-type="sec" rid="s3_1">Section 3.1</xref> and the TGA block in <xref ref-type="sec" rid="s3_2">Section 3.2</xref>, respectively. Then, the TGSG and TGA blocks are combined in a hierarchical manner to form our TGNet proposed in <xref ref-type="sec" rid="s3_3">Section 3.3</xref>.</p>
<sec id="s3_1"><label>3.1</label><title>Tree Graph Sampling and Grouping (TGSG) Block</title>
<p>A point cloud is a set of three-dimensional coordinate points in spatial space, denoted as <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mrow><mml:mtext mathvariant="bold">P</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">p</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">p</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">p</mml:mtext></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>. Relatively, its features can be expressed as <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> which can represent a variety of information, including color, surface normal, geometric structure, and high-level semantic information. In a hierarchical network framework, the output of the previous layer is the input of the subsequent layer, and the subsequent layer abstracts the features of the previous layer. In different feature layers, the feature <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow></mml:math></inline-formula> of the point cloud carries different information.</p>
<p>We first construct a graph <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mrow><mml:mrow><mml:mi mathvariant="script">G</mml:mi></mml:mrow></mml:mrow><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">E</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> containing nodes <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mrow><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula> and edges <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mrow><mml:mrow><mml:mi mathvariant="script">E</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula> based on the spatial relationships of the 3D point cloud. Each vertex corresponds to a point in the point cloud, and each point is connected to its spatially adjacent <italic>K</italic> nearest points by edges. In this way, we transform the point cloud into a graph feature space.</p>
<p>Using the definition of the curve in CurveNet [<xref ref-type="bibr" rid="ref-39">39</xref>], a curve of length <italic>l</italic> can be generated from a series of points in the features space <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow></mml:math></inline-formula> such that <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>D</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. Unlike them, we adopt a deterministic strategy where our curves follow a specific explicit rule <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mi>&#x03C0;</mml:mi></mml:math></inline-formula> extending in the feature space. Deterministic strategies can reduce learnable parameters and achieves similar results as non-deterministic strategies. In <xref ref-type="fig" rid="fig-2">Fig. 2a</xref>, <italic>m</italic> curves of length <italic>l</italic> extending in different directions form a Tree Graph, such that <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mrow><mml:mtext mathvariant="bold">TG</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>D</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. Local point clouds with different geometries can form different Tree Graph, while these Tree Graph can carry their geometric information. <xref ref-type="fig" rid="fig-2">Fig. 2a</xref> shows a Tree Graph with 5 curves.</p>
<fig id="fig-2"><label>Figure 2</label><caption><title>Illustration of TGSG block. (a) Tree graph with 5 curves in point cloud feature space. (b) illustration for tree graph. <inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> denote curves of tree graph. Red balls denote the nodes of curves. Green balls in the center are the key point for feature aggregation. Green circle denotes the query range of node <inline-formula id="ieqn-71"><mml:math id="mml-ieqn-71"><mml:msub><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>. (c) Convolution kernel operation on image <inline-formula id="ieqn-72"><mml:math id="mml-ieqn-72"><mml:mrow><mml:mtext mathvariant="bold">I</mml:mtext></mml:mrow></mml:math></inline-formula>. (d) Using the GAP operation on the feature map to create the vector <inline-formula id="ieqn-73"><mml:math id="mml-ieqn-73"><mml:mi mathvariant="bold-italic">z</mml:mi></mml:math></inline-formula></title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMES_24470-fig-2.png"/></fig>
<p>Next, we will describe the construction process of Tree Graph in detail as shown in <xref ref-type="fig" rid="fig-2">Fig. 2b</xref>. We first randomly sample the starting points in feature space to get <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">P</mml:mtext></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">p</mml:mtext></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">p</mml:mtext></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">p</mml:mtext></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>v</mml:mi></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">p</mml:mtext></mml:mrow><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext mathvariant="bold">P</mml:mtext></mml:mrow><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula> and the corresponding feature <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mi>v</mml:mi></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula>. Due to the high computational efficiency of KNN, we obtain the neighborhood <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mrow><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> of each point feature <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow></mml:math></inline-formula> by this method. Then, we iteratively obtain the nodes <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> of the curve <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> in the point cloud using predefined strategy <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:mi>&#x03C0;</mml:mi></mml:math></inline-formula>:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x03C0;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>where <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> is numerically equal to <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, that is <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>.</p>
<p>In our neural network model TGNet, we use a simpler approach as strategy <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mi mathvariant="bold-italic">&#x03C0;</mml:mi></mml:math></inline-formula>, which can ensure that the curves extend as far as possible in all directions. The node <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msub><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> on <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:msub><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> can be obtained by executing the predefined policy <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:mi mathvariant="bold-italic">&#x03C0;</mml:mi></mml:math></inline-formula>.
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing=".2em" columnspacing="1em" displaystyle="false"><mml:mtr><mml:mtd><mml:munder><mml:mrow><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>g</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:msub><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:munder><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext mathvariant="bold">D</mml:mtext></mml:mrow><mml:msubsup><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">V</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>K</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:munder><mml:mrow><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>g</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:munder><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext mathvariant="bold">D</mml:mtext></mml:mrow><mml:msubsup><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">V</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">t</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mi>j</mml:mi><mml:mo>&#x2260;</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>K</mml:mi></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mrow><mml:mtext mathvariant="bold">D</mml:mtext></mml:mrow><mml:msub><mml:mrow><mml:mtext mathvariant="bold">V</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the learnable direction vector of the <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mi>i</mml:mi></mml:math></inline-formula>th curve <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">f</mml:mtext></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is the neighboring feature of <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mrow><mml:mtext mathvariant="bold">p</mml:mtext></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">N</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is neighboring feature of <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">t</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents the direction vector between point <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">t</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>In this way, we can obtain a Tree Graph that contains both local and non-global long-range information.</p>
<p>When the number of <italic>m</italic> increases, the query points and query ranges will be more clustered around the center point because the curve will become more. When the number of <italic>l</italic> increases, the query points and query ranges will be farther from the center point because the curve will be longer. So, we can adjust <italic>m</italic> and <italic>l</italic> to enable the network to balance local information and long-range dependencies. When the product of <italic>m</italic> and <italic>l</italic> is constant, increasing the length of <italic>l</italic> enables the network to obtain more information over long distances. Conversely, decreasing <italic>l</italic> allows the network to focus more on local information.</p>
<p>In <xref ref-type="fig" rid="fig-2">Fig. 2c</xref>, we convert the graph <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mi>T</mml:mi><mml:mi>G</mml:mi></mml:math></inline-formula> into an image <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mrow><mml:mtext mathvariant="bold">I</mml:mtext></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> of size <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:mi>m</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>l</mml:mi></mml:math></inline-formula> with dimension <italic>D</italic> using the following manner:
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mrow><mml:mtext mathvariant="bold">I</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x22EE;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mtable columnalign="center center center center center" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:mo>&#x22EF;</mml:mo></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:mo>&#x22EF;</mml:mo></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:mo>&#x22EF;</mml:mo></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x22EE;</mml:mo></mml:mtd><mml:mtd><mml:mo>&#x22EE;</mml:mo></mml:mtd><mml:mtd><mml:mo>&#x22EE;</mml:mo></mml:mtd><mml:mtd><mml:mo>&#x22F1;</mml:mo></mml:mtd><mml:mtd><mml:mo>&#x22EE;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mtd><mml:mtd><mml:mo>&#x22EF;</mml:mo></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable><mml:mo>]</mml:mo></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>Note that <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> an element <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow></mml:math></inline-formula>.</p>
<p>In the above way, we obtain a tensor similar to an image feature map. In <xref ref-type="fig" rid="fig-2">Fig. 2d</xref>, we use a simple method to process the image <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:mrow><mml:mtext mathvariant="bold">I</mml:mtext></mml:mrow></mml:math></inline-formula> to get local features. We obtain the local features <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">I</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x00D7;</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> of the starting points of each Tree Graph image <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">I</mml:mtext></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">I</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">I</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">I</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mi>D</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> by a convolution kernel of size 3 &#x00D7; 3. Then use GAP (global average pooling) on the image to get <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:mi>z</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mi>D</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msup></mml:math></inline-formula>. Finally, we convert image <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">I</mml:mtext></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> to a vector <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow></mml:math></inline-formula> to represent the entire Tree Graph, which is used to represent the local geometric structure information.</p>
</sec>
<sec id="s3_2"><label>3.2</label><title>Tree Graph Aggregation (TGA) Block</title>
<p>With the TGSG block, we obtain a Tree Graph containing local information and non-global long-range dependency information. In this subsection, we will use TGA block to fuse Tree Graph and global information into local features. To simplify the notation, we define the local features of the point cloud as <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. We take advantage of the cross-attention to fuse feature <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:mi mathvariant="bold-italic">z</mml:mi></mml:math></inline-formula> into local feature <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:mi mathvariant="bold-italic">x</mml:mi></mml:math></inline-formula>. The multi-head cross attention from local to global is defined as follows:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mo>+</mml:mo><mml:mi>L</mml:mi><mml:mi>N</mml:mi><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mi>A</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">W</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">Q</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">W</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">K</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">W</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">V</mml:mtext></mml:mrow></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>:</mml:mo><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mtext mathvariant="bold">W</mml:mtext></mml:mrow><mml:mrow><mml:mi>o</mml:mi></mml:mrow></mml:msup></mml:math></disp-formula></p>
<p>With multi-head cross attention (MCA) and feed forward layer (FFN), <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:mrow><mml:mtext mathvariant="bold">H</mml:mtext></mml:mrow></mml:math></inline-formula> can be computed as:
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mrow><mml:mtext mathvariant="bold">H</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mi>L</mml:mi><mml:mi>N</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>F</mml:mi><mml:mi>F</mml:mi><mml:mi>N</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p><inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow></mml:math></inline-formula> are split as <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula> and <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">]</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2264;</mml:mo><mml:mi>h</mml:mi><mml:mo>&#x2264;</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> for multi-head attention with <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> heads. <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:msubsup><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">W</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">Q</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>, <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:msubsup><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">W</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">K</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:msubsup><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">W</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">V</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> are the projection matrix in the <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:mi>h</mml:mi></mml:math></inline-formula>th head. <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:msubsup><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">W</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">o</mml:mtext></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> is used to merge multiple heads together. LN is layer normalization function. <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:mi>A</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>n</mml:mi></mml:math></inline-formula> is standard attention function as:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mi>A</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext mathvariant="bold">Q</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext mathvariant="bold">K</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext mathvariant="bold">V</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mtext mathvariant="italic">softmax</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">Q</mml:mtext></mml:mrow><mml:msup><mml:mrow><mml:mtext mathvariant="bold">K</mml:mtext></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:msqrt><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:msqrt></mml:mfrac><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mtext mathvariant="bold">V</mml:mtext></mml:mrow></mml:math></disp-formula></p>
<p>Absolute and relative positions of the point cloud are very important. As shown in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, we incorporate them into. <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mrow><mml:mtext mathvariant="bold">H</mml:mtext></mml:mrow></mml:math></inline-formula>. We concatenate <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:mrow><mml:mtext mathvariant="bold">P</mml:mtext></mml:mrow></mml:math></inline-formula> (the absolute spatial positions), <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">P</mml:mtext></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> (neighboring points spatial positions) and <inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:mrow><mml:mtext mathvariant="bold">P</mml:mtext></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msup><mml:mrow><mml:mtext mathvariant="bold">P</mml:mtext></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. The concatenated features are mapped to higher dimensions through a single layer MLP. <inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">H</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msup><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mtext mathvariant="bold">H</mml:mtext></mml:mrow></mml:math></inline-formula> added with <inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">P</mml:mtext></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> are passed into another MLP layer and Max Pooling layer get result <inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>.</p>
<fig id="fig-3"><label>Figure 3</label><caption><title>Tree graph aggregation block</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMES_24470-fig-3.png"/></fig>
</sec>
<sec id="s3_3"><label>3.3</label><title>Tree Graph Network (TGNet)</title>
<p><xref ref-type="fig" rid="fig-4">Fig. 4</xref> shows the architecture of a TGNet (Tree Graph network), which stacks several TGSG blocks and TGA blocks. It starts with a local feature extraction block (LFE), which takes key points&#x2019; absolute position, neighboring points&#x2019; relative position, and neighboring points&#x2019; absolute position as input. LFE contains an MLP layer and a Max Pooling layer to initially extract point cloud features. In all TGSGs, the length and the number of curves are set to 5. In all TGAs, the number of attention heads is 3, and the ratio in FFN is 2 instead of 4 to reduce computations. In this paper, TGNet is used for point cloud classification, segmentation, and surface normal estimation, which can all be trained in an end-to-end manner.</p>
<p>For classification, the point cloud is passed into a local feature extraction block (LFE) to initially extract local features. The extracted local features are abstracted layer by layer through 8 TGSA and TGA modules, and the global features are obtained by Max-Pooling. Finally, we get class scores by using two layers of MLPs. The category with the highest score is what TGNet&#x2019;s prediction.</p>
<p>The point cloud segmentation task is similar to the normal estimation task, and we use almost the same architecture. We all use attention U-Net style networks to learn multi-level representations. For segmentation, its outputs per point prediction score for semantic labels. For normal estimation, it outputs per point normal prediction.</p>
<fig id="fig-4"><label>Figure 4</label><caption><title><bold>Top: </bold>TGNet applied to point cloud classification task. <bold>Bottom: </bold>TGNet applied to point cloud segmentation and normal estimation task</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMES_24470-fig-4.png"/></fig>
</sec>
</sec>
<sec id="s4"><label>4</label><title>Experiments</title>
<p>We evaluate our network on multiple point cloud processing tasks, including point cloud classification, segmentation, and normal estimation. To further understand TGNet, we also performed ablation experiments and visualizations to help further understand our network.</p>
<sec id="s4_1"><label>4.1</label><title>Classification</title>
<p>We evaluate TGNet on ModelNet40 [<xref ref-type="bibr" rid="ref-2">2</xref>] for classification, which contains 12311 CAD models of 3D objects belonging to 40 categories. The dataset consists of two parts: the training dataset contains 9843 objects, and the test dataset contains 2468 objects. We uniformly sample 1024 points from the surface of each CAD model. For processing purposes, all 3D point clouds are normalized to a unit sphere. During training, we augment the data by scaling in the range [0.67, 1.5] and panning in the range [&#x2212;0.2, 0.2]. We trained our network for 200 epochs, using SGD with a learning rate of 0.001, and reduced the learning rate to 0.0001 using cosine annealing. The batch sizes for training and testing are set to 48 and 24, respectively.</p>
<p><xref ref-type="table" rid="table-1">Table 1</xref> reports the results of our TGNet and current most advanced methods. In contrast to other methods, ours uses only 1024 sampling points and does not require additional surface normals. In addition, when we do not use the voting strategy [<xref ref-type="bibr" rid="ref-17">17</xref>], our method achieves a state-of-the-art score of 93.8, which is already better than many methods. Surprisingly, our method achieves 94.0 accuracies using the voting strategy. These improvements demonstrate the robustness of TGNet to various geometric shapes.</p>
<table-wrap id="table-1"><label>Table 1</label><caption><title>Classification results on ModelNet40</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Method</th>
<th align="left">Input</th>
<th align="left">Points</th>
<th align="left">Acc</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Pointwise-CNN [<xref ref-type="bibr" rid="ref-29">29</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">86.1</td>
</tr>
<tr>
<td align="left">PointNet [<xref ref-type="bibr" rid="ref-8">8</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">89.2</td>
</tr>
<tr>
<td align="left">MO-Net [<xref ref-type="bibr" rid="ref-40">40</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">89.3</td>
</tr>
<tr>
<td align="left">KD-Net (depth&#x2009;&#x003D;&#x2009;10) [<xref ref-type="bibr" rid="ref-29">29</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">90.6</td>
</tr>
<tr>
<td align="left">PointNet&#x002B;&#x002B; [<xref ref-type="bibr" rid="ref-9">9</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">90.7</td>
</tr>
<tr>
<td align="left">SO-Net [<xref ref-type="bibr" rid="ref-18">18</xref>]</td>
<td align="left">xyz, nr</td>
<td align="left">2048</td>
<td align="left">90.9</td>
</tr>
<tr>
<td align="left">PAT [<xref ref-type="bibr" rid="ref-41">41</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">91.7</td>
</tr>
<tr>
<td align="left">PointCNN [<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">92.2</td>
</tr>
<tr>
<td align="left">DGCNN [<xref ref-type="bibr" rid="ref-12">12</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">92.2</td>
</tr>
<tr>
<td align="left">PointWeb [<xref ref-type="bibr" rid="ref-42">42</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">92.3</td>
</tr>
<tr>
<td align="left">PCNN [<xref ref-type="bibr" rid="ref-15">15</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">92.3</td>
</tr>
<tr>
<td align="left">SpiderCNN [<xref ref-type="bibr" rid="ref-43">43</xref>]</td>
<td align="left">xyz, nr</td>
<td align="left">5120</td>
<td align="left">92.4</td>
</tr>
<tr>
<td align="left">PointConv [<xref ref-type="bibr" rid="ref-16">16</xref>]</td>
<td align="left">xyz, nr</td>
<td align="left">1024</td>
<td align="left">92.5</td>
</tr>
<tr>
<td align="left">KPConv [<xref ref-type="bibr" rid="ref-13">13</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">92.7</td>
</tr>
<tr>
<td align="left">PointASNL [<xref ref-type="bibr" rid="ref-6">6</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">92.9</td>
</tr>
<tr>
<td align="left">RS-CNN [<xref ref-type="bibr" rid="ref-17">17</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">92.9</td>
</tr>
<tr>
<td align="left">PCT [<xref ref-type="bibr" rid="ref-14">14</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">93.2</td>
</tr>
<tr>
<td align="left">DensePoint [<xref ref-type="bibr" rid="ref-11">11</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">93.2</td>
</tr>
<tr>
<td align="left">GeoCNN [<xref ref-type="bibr" rid="ref-32">32</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">93.4</td>
</tr>
<tr>
<td align="left">RS-CNN [<xref ref-type="bibr" rid="ref-17">17</xref>]<sup>&#x002A;</sup></td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">93.6</td>
</tr>
<tr>
<td align="left">PointTransformer [<xref ref-type="bibr" rid="ref-19">19</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">93.7</td>
</tr>
<tr>
<td align="left"><bold>TGNet (ours)</bold></td>
<td align="left"><bold>xyz</bold></td>
<td align="left"><bold>1024</bold></td>
<td align="left"><bold>93.8</bold></td>
</tr>
<tr>
<td align="left"><bold>TGNet (ours)<sup>&#x002A;</sup></bold></td>
<td align="left"><bold>xyz</bold></td>
<td align="left"><bold>1024</bold></td>
<td align="left"><bold>94</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_2"><label>4.2</label><title>Segmentation</title>
<p>We evaluate the ability of our network for fine-grained shape analysis on the ShapeNetPart [<xref ref-type="bibr" rid="ref-44">44</xref>] benchmark. ShapeNetPart dataset contains 16881 shape models in 16 categories, labeled as 50 segmentation parts. We use 12137 models for training and the rest for validation and testing. We uniformly select 2048 points from each model as input to our network. We train our network for 200 epochs with a learning rate of 0.05 and a batch size of 32. <xref ref-type="table" rid="table-2">Table 2</xref> summarizes the comparison of current advanced methods, where TGNet achieves the best performance of 86.5&#x0025; overall mIoU. Segmentation is a more difficult task than shape classification. Even without fine-tuning parameters, our method still achieves high scores. The effectiveness of our Tree Graph features strategy is confirmed. <xref ref-type="fig" rid="fig-5">Fig. 5</xref> shows our segmentation results. The segmentation predictions made by TGNet are very close to the ground truth.</p>
<table-wrap id="table-2"><label>Table 2</label><caption><title>Segmentation results on shapnetpart</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Method</th>
<th align="left">Input</th>
<th align="left">Points</th>
<th align="left">mIoU</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">PointNet [<xref ref-type="bibr" rid="ref-8">8</xref>]</td>
<td align="left">xyz</td>
<td align="left">2048</td>
<td align="left">83.7</td>
</tr>
<tr>
<td align="left">SO-Net [<xref ref-type="bibr" rid="ref-18">18</xref>]</td>
<td align="left">xyz, nr</td>
<td align="left">1024</td>
<td align="left">84.6</td>
</tr>
<tr>
<td align="left">DGCNN [<xref ref-type="bibr" rid="ref-12">12</xref>]</td>
<td align="left">xyz</td>
<td align="left">2048</td>
<td align="left">85.1</td>
</tr>
<tr>
<td align="left">PointNet&#x002B;&#x002B; [<xref ref-type="bibr" rid="ref-9">9</xref>]</td>
<td align="left">xyz, nr</td>
<td align="left">2048</td>
<td align="left">85.1</td>
</tr>
<tr>
<td align="left">PointCNN [<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td align="left">xyz</td>
<td align="left">2048</td>
<td align="left">86.1</td>
</tr>
<tr>
<td align="left">PointASNL [<xref ref-type="bibr" rid="ref-6">6</xref>]</td>
<td align="left">xyz</td>
<td align="left">2048</td>
<td align="left">86.1</td>
</tr>
<tr>
<td align="left">RS-CNN [<xref ref-type="bibr" rid="ref-17">17</xref>]</td>
<td align="left">xyz</td>
<td align="left">2048</td>
<td align="left">86.2</td>
</tr>
<tr>
<td align="left">PCT [<xref ref-type="bibr" rid="ref-14">14</xref>]</td>
<td align="left">xyz</td>
<td align="left">2048</td>
<td align="left">86.4</td>
</tr>
<tr>
<td align="left"><bold>TGNet (ours)</bold></td>
<td align="left"><bold>xyz</bold></td>
<td align="left"><bold>2048</bold></td>
<td align="left"><bold>86.5</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="fig-5"><label>Figure 5</label><caption><title>Segmentation results on ShapeNetPart benchmark. <bold>Top:</bold> ground truth; <bold>Bottom:</bold> ours</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMES_24470-fig-5.png"/></fig>
</sec>
<sec id="s4_3"><label>4.3</label><title>Normal Estimation</title>
<p>Normal estimation is essential to many 3D point cloud processing tasks, such as 3D surface reconstruction and rendering. It is a very challenging task that requires a comprehensive understanding of object geometry. We evaluate normal estimation on the ModelNet40 dataset as a supervised regression task. We train for 200 epochs using a structure similar to point cloud segmentation, where the input is 1024 uniformly sampled points. <xref ref-type="table" rid="table-3">Table 3</xref> shows the average cosine error results for TGNet and current state-of-the-art methods. Our network shows excellent performance with an average error of only 0.12. Our method gives excellent results demonstrating that TGNet can understand 3D model shapes very well. <xref ref-type="fig" rid="fig-6">Fig. 6</xref> summarizes the normal estimation results of our method. The surface normals predicted by TGNet are very close to ground truth. Even complex 3D models, such as airplanes, can be estimated accurately.</p>
<table-wrap id="table-3"><label>Table 3</label><caption><title>Normal estimation on Modelnet40</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Method</th>
<th align="left">Input</th>
<th align="left">Points</th>
<th align="left">Error</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">PointNet [<xref ref-type="bibr" rid="ref-8">8</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">0.47</td>
</tr>
<tr>
<td align="left">PointNet&#x002B;&#x002B; [<xref ref-type="bibr" rid="ref-12">12</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">0.29</td>
</tr>
<tr>
<td align="left">RS-CNN [<xref ref-type="bibr" rid="ref-17">17</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">0.15</td>
</tr>
<tr>
<td align="left">PCT [<xref ref-type="bibr" rid="ref-14">14</xref>]</td>
<td align="left">xyz</td>
<td align="left">1024</td>
<td align="left">0.13</td>
</tr>
<tr>
<td align="left"><bold>TGNet (ours)</bold></td>
<td align="left"><bold>xyz</bold></td>
<td align="left"><bold>1024</bold></td>
<td align="left"><bold>0.12</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="fig-6"><label>Figure 6</label><caption><title>Normal estimation results on ModelNet40</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMES_24470-fig-6.png"/></fig>
</sec>
<sec id="s4_4"><label>4.4</label><title>Ablation Studies</title>
<p>We performed numerous experiments on the dataset ModelNet40 to evaluate the network entirely. <xref ref-type="table" rid="table-4">Table 4</xref> shows the ablation result. First, we introduce our baseline method for making comparisons. To replace TGSG, we use KNN for sampling and grouping and use shared MLPs to ensure that the features of their outputs have the same dimensions. TGA module is replaced by PNL (point nonlocal cell) of PointASNL. The accuracy of the baseline is only 92.8&#x0025;. The impact is investigated by simply replacing TGNet&#x2019;s components to the baseline architecture.</p>
<table-wrap id="table-4"><label>Table 4</label><caption><title>Ablation studies of TGNet</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Model</th>
<th align="left">TGSG</th>
<th align="left">TGA</th>
<th align="left">Acc (&#x0025;)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">A</td>
<td align="center"/>
<td align="center"/>
<td align="left">92.8</td>
</tr>
<tr>
<td align="left">B</td>
<td align="left">&#x2713;</td>
<td align="center"/>
<td align="left">93.2</td>
</tr>
<tr>
<td align="left">C</td>
<td align="center"/>
<td align="left">&#x2713;</td>
<td align="left">93.4</td>
</tr>
<tr>
<td align="left"><bold>TGNet</bold></td>
<td align="left">&#x2713;</td>
<td align="left">&#x2713;</td>
<td align="left"><bold>93.8</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For model B, our method shows a 0.4&#x0025; improvement over the baseline when using TGSG for model B. In contrast with the baseline, TG is used to sample and group geometric information. This illustrates the effectiveness of our Tree Graph in capturing geometric information. For model C, our method shows a 0.6&#x0025; improvement over the baseline when using TGA. This illustrates the effectiveness of our TGA in aggregating local and non-local information. Our model TGNet achieved an accuracy of 93.8 after using TGSG blocks and TGA blocks. The ablation experiment shows that introducing more geometric information into the local features by explicit methods can effectively improve the point cloud processing.</p>
</sec>
<sec id="s4_5"><label>4.5</label><title>More Experiments on TGNet</title>
<p>As mentioned before, adjusting the values of <italic>m</italic> and <italic>l</italic> enables TGNet to tradeoff local information with long-range dependent information. In this subsection, we use different number of curves and nodes for experiments on the ModelNet40 dataset. As shown in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>, we perform five experiments, and the product of <italic>l</italic> and <italic>m</italic> for each experiment is 24 except the third time, which is 25. When <italic>m</italic> equals 5 and <italic>l</italic> equals 5. We obtain the best accuracy of 93.8&#x0025; experimental results.</p>
<fig id="fig-7"><label>Figure 7</label><caption><title>Test result of using different number of <italic>m</italic> and <italic>l</italic> for experiments. The number of curves <inline-formula id="ieqn-74"><mml:math id="mml-ieqn-74"><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn>3</mml:mn></mml:math></inline-formula> and the number of nodes <inline-formula id="ieqn-75"><mml:math id="mml-ieqn-75"><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>8</mml:mn></mml:math></inline-formula> are denoted m3l8. The other cases (m4l6, m5l5, m6l4, m8l3) are similar to m3l8</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMES_24470-fig-7.png"/></fig>
<p>The experiments show that simply increasing the number of curves or the number of nodes does not lead to better results when the number of learnable parameters is close. The best results can only be achieved with a reasonable trade-off between locally and remotely relevant information.</p>
</sec>
<sec id="s4_6"><label>4.6</label><title>Visualization for Tree Graph</title>
<p>In this subsection, we visualize shallow Tree Graphs to further understand it. Since the deep Tree Graph has more high-level semantic information, a local point feature may even represent the entire point cloud geometric information. We cannot map the deep Tree Graph to the geometric space, so we do not discuss the deep Tree Graph in this subsection.</p>
<p>Our Tree Graph consists of several lines extending in different directions in feature space. However, in contrast to curves in feature space, curves do not extend in one direction. In <xref ref-type="fig" rid="fig-8">Fig. 8</xref>, we can clearly see that the nodes of the curve (also known as query points) are mainly concentrated in the corners and edges of the point cloud. These points can provide robust geometric information for the feature calculation of the center points (also known as key points). Our Tree Graphs aggregate these robust regions with distinct geometric structures as input to the next layer of the network. This is where our method differs from others and why our method is more effective.</p>
<fig id="fig-8"><label>Figure 8</label><caption><title>Visualization for tree graph. Black balls are the center of tree graph. Blue-green balls are the points of point clouds. Others are the node of curves</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMES_24470-fig-8.png"/></fig>
</sec>
</sec>
<sec id="s5"><label>5</label><title>Conclusion</title>
<p>In this paper, we propose a novel method TGNet, which obtains Tree Graphs with local and non-global long-range dependencies by explicit sampling and grouping rules. The aggregation of features is then performed in a cross-attention mechanism. In this way, the geometric spatial distribution of the point cloud can be explicitly reasoned about, and the geometric shape information can be incorporated into the local features. Due to these advantages mentioned above, our approach can achieve state-of-the-art results on several point cloud object analysis tasks.</p>
</sec>
</body>
<back>
<ack>
<p>Portions of this work were presented at the 8th International Conference on Virtual Reality in 2022, TGNet: Aggregating Geometric Features for 3D Point Cloud Processing.</p>
</ack>
<fn-group>
<fn fn-type="other"><p><bold>Funding Statement:</bold> This research was supported by the National Natural Science Foundation of China (Grant Nos. 91948203, 52075532).</p></fn>
<fn fn-type="conflict"><p><bold>Conflicts of Interest:</bold> The authors declare that they have no conflicts of interest to report regarding the present study.</p></fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>1.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Chen</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Ma</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Wan</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Xia</surname>, <given-names>T.</given-names></string-name></person-group> (<year>2017</year>). <article-title>Multi-view 3D object detection network for autonomous driving</article-title>. <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>1907</fpage>&#x2013;<lpage>1915</lpage>. <conf-loc>Honolulu, HI</conf-loc>.</mixed-citation></ref>
<ref id="ref-2"><label>2.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Wu</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Song</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Khosla</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Yu</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>L.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2015</year>). <article-title>3D shapenets: A deep representation for volumetric shapes</article-title>. <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>1912</fpage>&#x2013;<lpage>1920</lpage>. <conf-loc>Boston, MA</conf-loc>.</mixed-citation></ref>
<ref id="ref-3"><label>3.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Su</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Maji</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Kalogerakis</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Learned-Miller</surname>, <given-names>E.</given-names></string-name></person-group> (<year>2015</year>). <article-title>Multi-view convolutional neural networks for 3D shape recognition</article-title>. <conf-name>Proceedings of the IEEE International Conference on Computer Vision</conf-name>, pp. <fpage>945</fpage>&#x2013;<lpage>953</lpage>. <conf-loc>Santiago, CHILE</conf-loc>.</mixed-citation></ref>
<ref id="ref-4"><label>4.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname>, <given-names>P. S.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Guo</surname>, <given-names>Y. X.</given-names></string-name>, <string-name><surname>Sun</surname>, <given-names>C. Y.</given-names></string-name>, <string-name><surname>Tong</surname>, <given-names>X.</given-names></string-name></person-group> (<year>2017</year>). <article-title>O-CNN: Octree-based convolutional neural networks for 3D shape analysis</article-title>. <source>ACM Transactions on Graphics</source><italic>,</italic> <volume>36(4)</volume><italic>,</italic> <fpage>1</fpage>&#x2013;<lpage>11</lpage>.</mixed-citation></ref>
<ref id="ref-5"><label>5.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Riegler</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Osman Ulusoy</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Geiger</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2017</year>). <article-title>OctNet: Learning deep 3D representations at high resolutions</article-title>. <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>3577</fpage>&#x2013;<lpage>3586</lpage>. <conf-loc>Honolulu, HI, USA</conf-loc>.</mixed-citation></ref>
<ref id="ref-6"><label>6.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Yan</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Zheng</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Cui</surname>, <given-names>S.</given-names></string-name></person-group> (<year>2020</year>). <article-title>PointASNL: Robust point clouds processing using nonlocal neural networks with adaptive sampling</article-title>. <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>5589</fpage>&#x2013;<lpage>5598</lpage>. <conf-loc>Seattle, WA, USA</conf-loc>.</mixed-citation></ref>
<ref id="ref-7"><label>7.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Le</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Duan</surname>, <given-names>Y.</given-names></string-name></person-group> (<year>2018</year>). <article-title>PointGrid: A deep network for 3D shape understanding</article-title>. <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>9204</fpage>&#x2013;<lpage>9214</lpage>. <conf-loc>Salt Lake City, UT, USA</conf-loc>.</mixed-citation></ref>
<ref id="ref-8"><label>8.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Qi</surname>, <given-names>C. R.</given-names></string-name>, <string-name><surname>Su</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Mo</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Guibas</surname>, <given-names>L. J.</given-names></string-name></person-group> (<year>2017</year>). <article-title>PointNet: Deep learning on point sets for 3D classification and segmentation</article-title>. <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>652</fpage>&#x2013;<lpage>660</lpage>. <conf-loc>Honolulu, HI</conf-loc>.</mixed-citation></ref>
<ref id="ref-9"><label>9.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Qi</surname>, <given-names>C. R.</given-names></string-name>, <string-name><surname>Yi</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Su</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Guibas</surname>, <given-names>L. J.</given-names></string-name></person-group> (<year>2017</year>). <article-title>PointNet&#x002B;&#x002B;: Deep hierarchical feature learning on point sets in a metric space</article-title>. <source>Advances in Neural Information Processing Systems</source><italic>,</italic> <volume>30</volume><italic>,</italic> <fpage>5105</fpage>&#x2013;<lpage>5114</lpage>.</mixed-citation></ref>
<ref id="ref-10"><label>10.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Maturana</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Scherer</surname>, <given-names>S.</given-names></string-name></person-group> (<year>2015</year>). <article-title>VoxNet: A 3D convolutional neural network for real-time object recognition</article-title>. <conf-name>2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</conf-name>, pp. <fpage>922</fpage>&#x2013;<lpage>928</lpage>. <conf-loc>Hamburg, Germany</conf-loc>, <publisher-name>IEEE</publisher-name>.</mixed-citation></ref>
<ref id="ref-11"><label>11.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Liu</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Fan</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Meng</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Lu</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Xiang</surname>, <given-names>S.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2019</year>). <article-title>DensePoint: Learning densely contextual representation for efficient point cloud processing</article-title>. <conf-name>Proceedings of the IEEE/CVF International Conference on Computer Vision</conf-name>, pp. <fpage>5239</fpage>&#x2013;<lpage>5248</lpage>. <conf-loc>Seoul, Korea (South)</conf-loc>.</mixed-citation></ref>
<ref id="ref-12"><label>12.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Sun</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Sarma</surname>, <given-names>S. E.</given-names></string-name>, <string-name><surname>Bronstein</surname>, <given-names>M. M.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2019</year>). <article-title>Dynamic graph CNN for learning on point clouds</article-title>. <source>ACM Transactions on Graphics</source><italic>,</italic> <volume>38(5)</volume><italic>,</italic> <fpage>1</fpage>&#x2013;<lpage>12</lpage>.</mixed-citation></ref>
<ref id="ref-13"><label>13.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Thomas</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Qi</surname>, <given-names>C. R.</given-names></string-name>, <string-name><surname>Deschaud</surname>, <given-names>J. E.</given-names></string-name>, <string-name><surname>Marcotegui</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Goulette</surname>, <given-names>F.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2019</year>). <article-title>KPConv: Flexible and deformable convolution for point clouds</article-title>. <conf-name>Proceedings of the IEEE/CVF International Conference on Computer Vision</conf-name>, pp. <fpage>6411</fpage>&#x2013;<lpage>6420</lpage>. <conf-loc>Seoul, Korea (South)</conf-loc>.</mixed-citation></ref>
<ref id="ref-14"><label>14.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Guo</surname>, <given-names>M. H.</given-names></string-name>, <string-name><surname>Cai</surname>, <given-names>J. X.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>Z. N.</given-names></string-name>, <string-name><surname>Mu</surname>, <given-names>T. J.</given-names></string-name>, <string-name><surname>Martin</surname>, <given-names>R. R.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2021</year>). <article-title>PCT: Point cloud transformer</article-title>. <source>Computational Visual Media</source><italic>,</italic> <volume>7(2)</volume><italic>,</italic> <fpage>187</fpage>&#x2013;<lpage>199</lpage>.</mixed-citation></ref>
<ref id="ref-15"><label>15.</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Atzmon</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Maron</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Lipman</surname>, <given-names>Y.</given-names></string-name></person-group> (<year>2018</year>). <article-title>Point convolutional neural networks by extension operators</article-title>. arXiv preprint arXiv:1803.10091.</mixed-citation></ref>
<ref id="ref-16"><label>16.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Wu</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Qi</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Fuxin</surname>, <given-names>L.</given-names></string-name></person-group> (<year>2019</year>). <article-title>PointConv: Deep convolutional networks on 3D point clouds</article-title>. <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>9621</fpage>&#x2013;<lpage>9630</lpage>. <conf-loc>Long Beach, CA</conf-loc>.</mixed-citation></ref>
<ref id="ref-17"><label>17.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Liu</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Fan</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Xiang</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Pan</surname>, <given-names>C.</given-names></string-name></person-group> (<year>2019</year>). <article-title>Relation-shape convolutional neural network for point cloud analysis</article-title>. <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>8895</fpage>&#x2013;<lpage>8904</lpage>. <conf-loc>Long Beach, CA, USA</conf-loc>.</mixed-citation></ref>
<ref id="ref-18"><label>18.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Li</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>B. M.</given-names></string-name>, <string-name><surname>Lee</surname>, <given-names>G. H.</given-names></string-name></person-group> (<year>2018</year>). <article-title>So-Net: Self-organizing network for point cloud analysis</article-title>. <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>9397</fpage>&#x2013;<lpage>9406</lpage>. <conf-loc>Salt Lake City, UT, USA</conf-loc>.</mixed-citation></ref>
<ref id="ref-19"><label>19.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhao</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Jiang</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Jia</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Torr</surname>, <given-names>P. H.</given-names></string-name>, <string-name><surname>Koltun</surname>, <given-names>V.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Point transformer</article-title>. <conf-name>Proceedings of the IEEE/CVF International Conference on Computer Vision</conf-name>, pp. <fpage>16259</fpage>&#x2013;<lpage>16268</lpage>.</mixed-citation></ref>
<ref id="ref-20"><label>20.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Li</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Xia</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Zhao</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Zou</surname>, <given-names>H.</given-names></string-name></person-group> (<year>2022</year>). <article-title>TGNet: Aggregating geometric features for 3D point cloud processing</article-title>. <conf-name>2022 8th International Conference on Virtual Reality (ICVR)</conf-name>, pp. <fpage>55</fpage>&#x2013;<lpage>61</lpage>. <conf-loc>Nanjing, China</conf-loc>, <publisher-name>IEEE</publisher-name>.</mixed-citation></ref>
<ref id="ref-21"><label>21.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ma</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Guo</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Yang</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>An</surname>, <given-names>W.</given-names></string-name></person-group> (<year>2018</year>). <article-title>Learning multi-view representation with LSTM for 3-D shape recognition and retrieval</article-title>. <source>IEEE Transactions on Multimedia</source><italic>,</italic> <volume>21(5)</volume><italic>,</italic> <fpage>1169</fpage>&#x2013;<lpage>1182</lpage>.</mixed-citation></ref>
<ref id="ref-22"><label>22.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Feng</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Zhao</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Ji</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Gao</surname>, <given-names>Y.</given-names></string-name></person-group> (<year>2018</year>). <article-title>GvCNN: Group-view convolutional neural networks for 3D shape recognition</article-title>. <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>264</fpage>&#x2013;<lpage>272</lpage>. <conf-loc>Salt Lake City, UT, USA</conf-loc>.</mixed-citation></ref>
<ref id="ref-23"><label>23.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Qi</surname>, <given-names>C. R.</given-names></string-name>, <string-name><surname>Su</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Nie&#x00DF;ner</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Dai</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Yan</surname>, <given-names>M.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2016</year>). <article-title>Volumetric and multi-view CNNs for object classification on 3D data</article-title>. <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>5648</fpage>&#x2013;<lpage>5656</lpage>. <conf-loc>Las Vegas, NV, USA</conf-loc>.</mixed-citation></ref>
<ref id="ref-24"><label>24.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Yu</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Meng</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Yuan</surname>, <given-names>J.</given-names></string-name></person-group> (<year>2018</year>). <article-title>Multi-view harmonized bilinear network for 3D object recognition</article-title>. <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>186</fpage>&#x2013;<lpage>194</lpage>. <conf-loc>Salt Lake City, UT</conf-loc>.</mixed-citation></ref>
<ref id="ref-25"><label>25.</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Biasotti</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Lavou&#x00E9;</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Falcidieno</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Pratikakis</surname>, <given-names>I.</given-names></string-name></person-group> (<year>2019</year>). <article-title>Generalizing discrete convolutions for unstructured point clouds</article-title>. DOI <pub-id pub-id-type="doi">10.2312/3dor.20191064</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>26.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Esteves</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Allen-Blanchette</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Makadia</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Daniilidis</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2018</year>). <article-title>Learning so(3) equivariant representations with spherical CNNs</article-title>. <conf-name>Proceedings of the European Conference on Computer Vision (ECCV)</conf-name>, pp. <fpage>52</fpage>&#x2013;<lpage>68</lpage>. <conf-loc>Munich, GERMANY</conf-loc>.</mixed-citation></ref>
<ref id="ref-27"><label>27.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Lei</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Akhtar</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Mian</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2019</year>). <article-title>Octree guided cnn with spherical kernels for 3D point clouds</article-title>. <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>9631</fpage>&#x2013;<lpage>9640</lpage>. <conf-loc>Long Beach, CA</conf-loc>.</mixed-citation></ref>
<ref id="ref-28"><label>28.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Li</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Bu</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Sun</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Wu</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Di</surname>, <given-names>X.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2018</year>). <article-title>PointCNN: Convolution on X-transformed points</article-title>. <conf-name>32nd Conference on Neural Information Processing Systems (NIPS)</conf-name>, pp. <fpage>820</fpage>&#x2013;<lpage>830</lpage>. <conf-loc>Montreal, CANADA</conf-loc>.</mixed-citation></ref>
<ref id="ref-29"><label>29.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Hua</surname>, <given-names>B. S.</given-names></string-name>, <string-name><surname>Tran</surname>, <given-names>M. K.</given-names></string-name>, <string-name><surname>Yeung</surname>, <given-names>S. K.</given-names></string-name></person-group> (<year>2018</year>). <article-title>Pointwise convolutional neural networks</article-title>. <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>984</fpage>&#x2013;<lpage>993</lpage>. <conf-loc>Salt Lake City, UT, USA</conf-loc>.</mixed-citation></ref>
<ref id="ref-30"><label>30.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhang</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Hua</surname>, <given-names>B. S.</given-names></string-name>, <string-name><surname>Rosen</surname>, <given-names>D. W.</given-names></string-name>, <string-name><surname>Yeung</surname>, <given-names>S. K.</given-names></string-name></person-group> (<year>2019</year>). <article-title>Rotation invariant convolutions for 3D point clouds deep learning</article-title>. <conf-name>2019 International Conference on 3D Vision (3DV)</conf-name>, pp. <fpage>204</fpage>&#x2013;<lpage>213</lpage>. <conf-loc>Quebec City, QC, Canada</conf-loc>, <publisher-name>IEEE</publisher-name>.</mixed-citation></ref>
<ref id="ref-31"><label>31.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Duan</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Zheng</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Lu</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Zhou</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Tian</surname>, <given-names>Q.</given-names></string-name></person-group> (<year>2019</year>). <article-title>Structural relational reasoning of point clouds</article-title>. <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>949</fpage>&#x2013;<lpage>958</lpage>. <conf-loc>Long Beach, CA, USA</conf-loc>.</mixed-citation></ref>
<ref id="ref-32"><label>32.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Li</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Bu</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Sun</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Wu</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Di</surname>, <given-names>X.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2018</year>). <article-title>PointCNN: Convolution on X-transformed points</article-title>. <conf-name>32nd Conference on Neural Information Processing Systems (NIPS)</conf-name>, <conf-loc>Montreal, CANADA</conf-loc>.</mixed-citation></ref>
<ref id="ref-33"><label>33.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Lan</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Yu</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Yu</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Davis</surname>, <given-names>L. S.</given-names></string-name></person-group> (<year>2019</year>). <article-title>Modeling local geometric structure of 3D point clouds using geo-CNN</article-title>. <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>998</fpage>&#x2013;<lpage>1008</lpage>. <conf-loc>Long Beach, CA, USA</conf-loc>.</mixed-citation></ref>
<ref id="ref-34"><label>34.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhang</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Song</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Yao</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Cai</surname>, <given-names>W.</given-names></string-name></person-group> (<year>2020</year>). <article-title>Shape-oriented convolution neural network for point cloud analysis</article-title>. <conf-name>Proceedings of the AAAI Conference on Artificial Intelligence</conf-name>, pp. <fpage>12773</fpage>&#x2013;<lpage>12780</lpage>. <conf-loc>New York, NY</conf-loc>.</mixed-citation></ref>
<ref id="ref-35"><label>35.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Wang</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Girshick</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>Gupta</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>He</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2018</year>). <article-title>Non-local neural networks</article-title>. <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>7794</fpage>&#x2013;<lpage>7803</lpage>. <conf-loc>Salt Lake City, UT</conf-loc>.</mixed-citation></ref>
<ref id="ref-36"><label>36.</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Chen</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Dai</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Dong</surname>, <given-names>X.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2021</year>). <article-title>Mobile-former: Bridging mobilenet and transformer</article-title>. arXiv preprint arXiv:2108.05895.</mixed-citation></ref>
<ref id="ref-37"><label>37.</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Devlin</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Chang</surname>, <given-names>M. W.</given-names></string-name>, <string-name><surname>Lee</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Toutanova</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2018</year>). <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>. arXiv preprint arXiv:1810.04805.</mixed-citation></ref>
<ref id="ref-38"><label>38.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Liu</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Lin</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Cao</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Hu</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Wei</surname>, <given-names>Y.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2021</year>). <article-title>Swin transformer: Hierarchical vision transformer using shifted windows</article-title>. <conf-name>Proceedings of the IEEE/CVF International Conference on Computer Vision</conf-name>, pp. <fpage>10012</fpage>&#x2013;<lpage>10022</lpage>.</mixed-citation></ref>
<ref id="ref-39"><label>39.</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Vaswani</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Shazeer</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Parmar</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Uszkoreit</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Jones</surname>, <given-names>L.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2017</year>). <article-title>Attention is all you need</article-title>. <source>Advances in Neural Information Processing Systems</source><italic>,</italic> <volume>30</volume><italic>,</italic> <fpage>5998</fpage>&#x2013;<lpage>6008</lpage>.</mixed-citation></ref>
<ref id="ref-40"><label>40.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Xiang</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Song</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Yu</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Cai</surname>, <given-names>W.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Walk in the cloud: Learning curves for point clouds shape analysis</article-title>. <conf-name>Proceedings of the IEEE/CVF International Conference on Computer Vision</conf-name>, pp. <fpage>915</fpage>&#x2013;<lpage>924</lpage>.</mixed-citation></ref>
<ref id="ref-41"><label>41.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Joseph-Rivlin</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Zvirin</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Kimmel</surname>, <given-names>R.</given-names></string-name></person-group> (<year>2019</year>). <article-title>Momen(e)t: Flavor the moments in learning to classify shapes</article-title>. <conf-name>Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops</conf-name>, <conf-loc>Seoul, South Korea</conf-loc>.</mixed-citation></ref>
<ref id="ref-42"><label>42.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Yang</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Zhang</surname>, <given-names>Q.</given-names></string-name>, <string-name><surname>Ni</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Li</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Liu</surname>, <given-names>J.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2019</year>). <article-title>Modeling point clouds with self-attention and gumbel subset sampling</article-title>. <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>3323</fpage>&#x2013;<lpage>3332</lpage>. <conf-loc>Long Beach, CA</conf-loc>.</mixed-citation></ref>
<ref id="ref-43"><label>43.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhao</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Jiang</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Fu</surname>, <given-names>C. W.</given-names></string-name>, <string-name><surname>Jia</surname>, <given-names>J.</given-names></string-name></person-group> (<year>2019</year>). <article-title>Pointweb: Enhancing local neighborhood features for point cloud processing</article-title>. <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>5565</fpage>&#x2013;<lpage>5573</lpage>. <conf-loc>Long Beach, CA, USA</conf-loc>.</mixed-citation></ref>
<ref id="ref-44"><label>44.</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Xu</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Fan</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Xu</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Zeng</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Qiao</surname>, <given-names>Y.</given-names></string-name></person-group> (<year>2018</year>). <article-title>SpiderCNN: Deep learning on point sets with parameterized convolutional filters</article-title>. <conf-name>Proceedings of the European Conference on Computer Vision (ECCV)</conf-name>, pp. <fpage>87</fpage>&#x2013;<lpage>102</lpage>. <conf-loc>Munich, Germany</conf-loc>.</mixed-citation></ref>
<ref id="ref-45"><label>45.</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Chang</surname>, <given-names>A. X.</given-names></string-name>, <string-name><surname>Funkhouser</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Guibas</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Hanrahan</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Huang</surname>, <given-names>Q.</given-names></string-name> <etal>et al.</etal></person-group> (<year>2015</year>). <article-title>ShapeNet: An information-rich 3D model repository</article-title>. arXiv preprint arXiv:1512.03012.</mixed-citation></ref>
</ref-list>
</back>
</article>