<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">65047</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.065047</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Hierarchical Shape Pruning for 3D Sparse Convolution Networks</article-title>
<alt-title alt-title-type="left-running-head">Hierarchical Shape Pruning for 3D Sparse Convolution Networks</alt-title>
<alt-title alt-title-type="right-running-head">Hierarchical Shape Pruning for 3D Sparse Convolution Networks</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Long</surname><given-names>Haiyan</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Zhang</surname><given-names>Chonghao</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Qiu</surname><given-names>Xudong</given-names></name><xref ref-type="aff" rid="aff-3">3</xref></contrib>
<contrib id="author-4" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Chen</surname><given-names>Hai</given-names></name><xref ref-type="aff" rid="aff-2">2</xref><email>e21101016@stu.ahu.edu.cn</email></contrib>
<contrib id="author-5" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Chen</surname><given-names>Gang</given-names></name><xref ref-type="aff" rid="aff-4">4</xref><email>chengang9704@stu.xmu.edu.cn</email></contrib>
<aff id="aff-1"><label>1</label><institution>School of Information Engineering, Liaodong University</institution>, <addr-line>Dandong, 118003</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>School of Computer Science and Technology, Anhui University</institution>, <addr-line>Hefei, 230601</addr-line>, <country>China</country></aff>
<aff id="aff-3"><label>3</label><institution>Institute of Artificial Intelligence, Beihang University</institution>, <addr-line>Beijing, 100191</addr-line>, <country>China</country></aff>
<aff id="aff-4"><label>4</label><institution>School of Aerospace Engineering, Xiamen University</institution>, <addr-line>Xiamen, 361005</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Authors: Hai Chen. Email: <email>e21101016@stu.ahu.edu.cn</email>; Gang Chen. Email: <email>chengang9704@stu.xmu.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>03</day><month>07</month><year>2025</year>
</pub-date>
<volume>84</volume>
<issue>2</issue>
<fpage>2975</fpage>
<lpage>2988</lpage>
<history>
<date date-type="received">
<day>02</day>
<month>3</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>09</day>
<month>5</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_65047.pdf"></self-uri>
<abstract>
<p>3D sparse convolution has emerged as a pivotal technique for efficient voxel-based perception in autonomous systems, enabling selective feature extraction from non-empty voxels while suppressing computational waste. Despite its theoretical efficiency advantages, practical implementations face under-explored limitations: the fixed geometric patterns of conventional sparse convolutional kernels inevitably process non-contributory positions during sliding-window operations, particularly in regions with uneven point cloud density. To address this, we propose Hierarchical Shape Pruning for 3D Sparse Convolution (HSP-S), which dynamically eliminates redundant kernel stripes through layer-adaptive thresholding. Unlike static soft pruning methods, HSP-S maintains trainable sparsity patterns by progressively adjusting pruning thresholds during optimization, enlarging original parameter search space while removing redundant operations. Extensive experiments validate effectiveness of HSP-S across major autonomous driving benchmarks. On KITTI&#x2019;s 3D object detection task, our method reduces 93.47% redundant kernel computations while maintaining comparable accuracy (1.56% mAP drop). Remarkably, on the more complex NuScenes benchmark, HSP-S achieves simultaneous computation reduction (21.94% sparsity) and accuracy gains (1.02% mAP (mean Average Precision) and 0.47% NDS (nuScenes detection score) improvement), demonstrating its scalability to diverse perception scenarios. This work establishes the first learnable shape pruning framework that simultaneously enhances computational efficiency and preserves detection accuracy in 3D perception systems.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Shape pruning</kwd>
<kwd>model compressing</kwd>
<kwd>3D sparse convolution</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Voxels in 3D space are encoded from the point cloud, which is manifested in the form of regular 3D grids. Influenced by the sparsity of the point cloud, voxels also exhibit sparse distribution in 3D space, and when extracting voxel features, the traditional 3D convolution will require substantial computational resources on empty voxels. For this reason, 3D sparse convolution [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-2">2</xref>] has been proposed, which avoids invalid convolution and improves computational efficiency by performing computation only on non-empty voxels. 3D sparse convolution is commonly used in voxel-based 3D scene perception tasks, such as 3D semantic segmentation and 3D object detection [<xref ref-type="bibr" rid="ref-3">3</xref>&#x2013;<xref ref-type="bibr" rid="ref-6">6</xref>].</p>
<p>The convolution kernel of 3D sparse convolution is the same as that of conventional convolution, and the difference lies in the convolution output strategy. According to the different convolution output strategies, 3D sparse convolution can be divided into two types: regular 3D sparse convolution and submanifold 3D sparse convolution. The former computes the output as long as the convolution kernel covers any non-empty voxel during the sliding process, while the latter requires the convolution kernel to have an odd size and computes the output only when the center of the convolution kernel covers a non-empty voxel.</p>
<p>However, there are still many empty voxels in the spatial region covered by the convolution kernel when calculating the output (see <xref ref-type="fig" rid="fig-1">Fig. 1</xref>), whether it is regular 3D sparse convolution or submanifold 3D sparse convolution. That is, after the convolution kernel slides through the voxel space, a part of the convolution weights are computed with fewer non-empty voxels.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Visualization of the effect of updating pruning threshold in 3D sparse convolution. The solid blue squares make up the sparse voxel space, the red hollow squares indicate pruned kernel stripes, the remained active stripes are marked with green ball and green voxels. With threshold updated, the activate stripes are changed</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_65047-fig-1.tif"/>
</fig>
<p>As shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>, we take the regular 2D sparse convolution as an example. The kernel size is 3 <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 3, and the input size is 9 <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 8, after sliding with stride 1, different stripes (corresponding to different coordinates) have different compute counters, and that is a pretty big difference. For example, the computer counter of the stripe located at coordinates (&#x2212;1, &#x2212;1) is 4, and the computer counter of the stripe located at coordinates (1, 1) is 15.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Illustration of reception field redundancy in sparse convolution kernel</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_65047-fig-2.tif"/>
</fig>
<p>According to the lottery ticket hypothesis, there are a large number of redundant parameters in deep neural networks. Compared to 3D sparse convolutional networks, we consider that the redundant parameters are reflected in the stripes that have fewer computations with the nonempty voxels, i.e., a portion of the reception field of the convolutional kernel is redundant in 3D sparse convolution (see the stripes in red in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>).</p>
<p>Therefore, 3D sparse convolutional networks can be pruned according to stripes, that is, filters&#x2019; shape. Since sparse convolution is implemented through the hash table, after pruning, the algorithm for computing the convolution output needs to be modified. The original sparse convolution traverses each dimension of the convolution kernel in a nested loop, whereas for the pruned irregular convolution kernel, the coordinates of a non-zero stripes are recorded; we just need to identify them in the innermost loop.</p>
<p>Shape pruning has been proposed as SWP (Stripe-Wise Pruning) in [<xref ref-type="bibr" rid="ref-7">7</xref>] for 2D convolution networks. SWP adds a learnable parameter <italic><bold>FS</bold></italic> (Filter Skeleton, i.e., the aforementioned filter shape) of the same size as the convolutional kernel to each filter, multiplies filters parameters with <italic><bold>FS</bold></italic> in the forward computation, and applies regularized sparsity induction to its gradient after it has been backpropagated, then sets the parts of the <italic><bold>FS</bold></italic> that are below a predetermined threshold to zero, so as to make soft pruning for filters. SWP is to some extent the same as the soft pruning method SFP (Soft-Filter Pruning) proposed by [<xref ref-type="bibr" rid="ref-8">8</xref>], i.e., in subsequent training epochs, the zeroed parts of <italic><bold>FS</bold></italic> can be updated to non-zero via gradient back-propagation, which achieves the effect of enlarging the optimization space of the network parameters, and increases the possibility of achieving a better model accuracy under a given threshold.</p>
<p>However, SWP sets a global threshold to prune all layers in the network, which is not reasonable. Reference [<xref ref-type="bibr" rid="ref-9">9</xref>] has pointed out that there are differences in the sensitivity of different layers of the network to pruning. Meanwhile, since the global threshold is fixed during the training process, there is a high possibility to lead to fixed <italic><bold>FS</bold></italic>, i.e., the shape of <italic><bold>FS</bold></italic> no longer changes. In this case, the network structure is solidified during the training process, losing the advantage of the soft pruning paradigm to expand the optimization space of network parameters. In addition, SWP discards the pruning rate (<italic>P</italic>) that is used in the common pruning paradigm and instead controls the pruning ratio through the global threshold (<italic>T</italic>). However, the relationship between <italic>T</italic> and PR is unknown before pruning, it is impossible to determine <italic>T</italic> from <italic>P</italic>. For example, to get the pruning ratio 90%, SWP can only find the corresponding <italic>T</italic> through constant manual search.</p>
<p>To address the above problems, we propose a Hierarchical Shape Pruning method for 3D sparse convolution (HSP-S). HSP-S applies different and dynamically changing pruning thresholds to different sparse convolutional layers as <xref ref-type="fig" rid="fig-2">Fig. 2</xref> describes. Specifically, given the pruning rate <italic>P</italic>, the global threshold <italic>T</italic> is first applied to all sparse convolution layers, and when the global sparsity of <italic><bold>FS</bold></italic> in the network is less than <italic>P</italic> and no longer changes within a certain size of the training iteration window, we record the network layers that also no longer change, find the layer with the smallest sparsity among these layers, and amplify the pruning threshold of this layer by a certain ratio, to achieve dynamic adjustment of the pruning threshold and expand the optimization space of network parameters in the training process, while making the overall pruning ratio of the network close to a given pruning rate. In addition, to achieve the acceleration of network inference after pruning, HSP-S proposes an algorithm to obtain convolutional outputs with irregular convolutional kernels in sparse convolution.</p>
<p>The contributions of this paper can be summarized as follows:
<list list-type="order">
<list-item><p>We propose a hierarchical shape pruning method for 3D sparse convolutional networks, to deal with the redundancy of the 3D sparse convolution kernel.</p></list-item>
<list-item><p>We propose applying dynamic pruning thresholds to different layers for shape pruning.</p></list-item>
<list-item><p>To achieve the irregular sparse convolution after pruning, we propose to process only non-zero stripes when traversing the sparse convolution output of an irregular convolution kernel.</p></list-item>
<list-item><p>Experiments on two public datasets and many voxel-based 3D object detection models indicate that HSP-S preserves higher model accuracy while achieving a higher pruning rate compared to the existing shape pruning methods.</p></list-item>
</list></p>
<p>The remainder of this paper is organized as follows: <xref ref-type="sec" rid="s2">Section 2</xref> reviews related work on pruning techniques. <xref ref-type="sec" rid="s3">Section 3</xref> first introduces the primary definition (<xref ref-type="sec" rid="s3_1">Section 3.1</xref>), then describes the problem formulation (<xref ref-type="sec" rid="s3_2">Section 3.2</xref>), analyzes limitations of existing pruning methods in sparse voxel spaces. <xref ref-type="sec" rid="s3_3">Section 3.3</xref> details our HSP-S method, including its hierarchical stripe pruning mechanism; <xref ref-type="sec" rid="s3_4">Section 3.4</xref> states how to compute the input and output indices of an irregular convolutional kernel. <xref ref-type="sec" rid="s4">Section 4</xref> presents experiments on KITTI, NuScenes and CIFAR10 datasets, with ablation studies in <xref ref-type="sec" rid="s4_4">Section 4.4</xref>. Finally, <xref ref-type="sec" rid="s5">Section 5</xref> discusses the broader implications and future directions.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>Model compression has emerged as a critical paradigm for deploying deep neural networks on resource-constrained platforms, with four predominant approaches: model pruning [<xref ref-type="bibr" rid="ref-9">9</xref>,<xref ref-type="bibr" rid="ref-10">10</xref>], knowledge distillation [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-12">12</xref>], quantization [<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-14">14</xref>], and neural architecture search (NAS) [<xref ref-type="bibr" rid="ref-15">15</xref>]. While knowledge distillation transfers knowledge from large teacher models to compact student networks, and quantization reduces numerical precision of weights/activations, NAS automates the design of efficient architectures through algorithmic exploration of vast search spaces&#x2013;albeit at significant computational cost. In contrast, model pruning distinguishes itself by surgically removing redundant parameters from existing architectures without altering their fundamental operations, which coincides with the purpose of sparse convolution to accelerate voxel-based 3D perceptual models.</p>
<p>Among pruning techniques, methodologies can be systematically categorized along three dimensions: granularity, criteria and execution steps.</p>
<p>Depending on granularity, pruning techniques can be categorized into irregular and regular pruning. The object of irregular pruning is the model weights. For example, in [<xref ref-type="bibr" rid="ref-16">16</xref>], model weights below a threshold are set to zero. The pruning object for regular pruning is generally filters, or it can be the entire network layer. In [<xref ref-type="bibr" rid="ref-9">9</xref>], the authors propose to sort filters in the same network layer by their <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msub><mml:mi>&#x2113;</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math></inline-formula>-norm and remove filters with the smallest <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:msub><mml:mi>&#x2113;</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math></inline-formula>-norm according to the pruning ratio.</p>
<p>By pruning criteria, pruning techniques can be classified as norm-based pruning, feature-based pruning, loss-based, and sparse regularization-based pruning. Norm-based pruning methods [<xref ref-type="bibr" rid="ref-8">8</xref>,<xref ref-type="bibr" rid="ref-9">9</xref>] sort the pruned objects according to their <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msub><mml:mi>&#x2113;</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math></inline-formula>-norm or <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mi>&#x2113;</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math></inline-formula>-norm and remove the pruned objects with the smaller norm. Feature-based pruning methods determine whether to prune or not by measuring the importance of the output features produced by the pruned objects. For example, reference [<xref ref-type="bibr" rid="ref-17">17</xref>] proposes to decide whether to prune a filter according to the proportion of zero values in its output features, reference [<xref ref-type="bibr" rid="ref-18">18</xref>] proposes to prune filters that have the least impact on the next layer of output activation values. Loss-based pruning methods use the effect of pruning on the network loss to determine whether to prune or not. For instance, reference [<xref ref-type="bibr" rid="ref-19">19</xref>] proposes to use a first-order gradient to evaluate the importance of filters. Pruning methods based on sparse regularization apply sparse regularization to the network weights during training, so that some of the weights converge to 0 after training, thus pruning these zero weights. The NS (Network Slimming) pruning method proposed in [<xref ref-type="bibr" rid="ref-20">20</xref>] applies <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msub><mml:mi>&#x2113;</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math></inline-formula> regularization to the scaling factors of the BN (Batch Normalization) layers during training, and the <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mi>&#x2113;</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math></inline-formula> regularization pushes the scaling factors towards zero. At the end of the training, given a predetermined pruning ratio, all the scaling factors in the network are ranked and the BN layers with smaller scaling factors are labeled, thus pruning filters corresponding to such BN layers. The SWP pruning method proposed in [<xref ref-type="bibr" rid="ref-7">7</xref>], on the other hand, sets a parameter similar to the scaling factors, i.e., <italic><bold>FS</bold></italic>, for each stripe of filters, and applies <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>&#x2113;</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math></inline-formula> regularization to <italic><bold>FS</bold></italic> during training to make its distribution sparse enough, to prune the stripes by sparse <italic><bold>FS</bold></italic>. By the way, SWP can be considered as a kind of semi-regular pruning.</p>
<p>According to the pruning steps, pruning techniques can be categorized into hard pruning and soft pruning. The steps of the hard pruning paradigm are: 1. first train a complete model, 2. then prune the parameters in the model with some pruning strategy, and 3. finally retrain to recover the model accuracy. Some of the earlier pruning methods such as [<xref ref-type="bibr" rid="ref-9">9</xref>,<xref ref-type="bibr" rid="ref-16">16</xref>] fall under the category of hard pruning methods. The soft pruning paradigm first appeared in the SFP pruning method [<xref ref-type="bibr" rid="ref-8">8</xref>], where soft pruning refers to setting the weights of the pruned objects to zero instead of deleting them directly. The steps of soft pruning are: 1 first initialize the model parameters and perform a soft pruning; 2. then train the model and perform soft pruning after each training epoch; 3. finally delete the weights of the last epoch of the softly pruned objects at the end of training. The essence of the soft pruning method is that the weights of the softly pruned objects can be restored from zero to non-zero after the subsequent training epochs, thus expanding the optimization space of the model parameters and providing the possibility of finding better model accuracy. Compared to hard pruning, soft pruning requires only one training session, avoiding the time-consuming retraining process. SWP also belongs to the soft pruning paradigm, and the difference between SWP and SFP lies in the different pruning granularity, the pruning object of SFP is filters, while the pruning object of SWP is the stripes in filters, and the pruning at the filter level can be achieved when all strips of a certain filter are set to zero.</p>
<p>Pruning methods are systematically categorized by granularity (irregular weight level vs. regular filter/layer level), criteria (based on norm, features, loss or regularization), and steps (hard vs. soft pruning). Irregular pruning [<xref ref-type="bibr" rid="ref-16">16</xref>] removes small weights for high compression but requires specialized hardware, while regular pruning (e.g., reference [<xref ref-type="bibr" rid="ref-9">9</xref>] trims filters by ranking, offering hardware efficiency at coarse granularity. Norm/feature-based criteria [<xref ref-type="bibr" rid="ref-8">8</xref>,<xref ref-type="bibr" rid="ref-9">9</xref>,<xref ref-type="bibr" rid="ref-17">17</xref>,<xref ref-type="bibr" rid="ref-18">18</xref>] prioritize simplicity but neglect task dynamics, whereas sparse regularization methods [<xref ref-type="bibr" rid="ref-20">20</xref>] enable end-to-end optimization but suffer from premature mask fixation (e.g., SWP freezes stripe patterns with training iterations). Hard pruning [<xref ref-type="bibr" rid="ref-9">9</xref>,<xref ref-type="bibr" rid="ref-16">16</xref>] follows a &#x201C;train-prune-finetune&#x201D; pipeline, sacrificing optimization continuity, while soft pruning, for example, SFP and SWP preserve recoverable zeros but lacks 3D adaptation. Critical gaps persist: (1) Rigid sparsity patterns in late-stage soft pruning limit exploration; (2) Coarse granularity mismatches anisotropic 3D voxel distributions. Addressing these, we propose HSP-S, introducing hierarchical stripe pruning with voxel-density-guided threshold adaptation to achieve layer-wise shape-aware dynamic compression: resolving the trilemma of granularity, hardware efficiency, and task accuracy in 3D sparse convolution.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Hierarchical Shape Pruning for 3D Sparse Convolution</title>
<p>This section begins with a basic definition of the variable notation used in this paper, followed by the introduction of the hierarchical shape pruning method for 3D sparse convolution.</p>
<p>Equations and mathematical expressions must be inserted into the main text. Formulas should not be presented as images and can be formatted in either in-line or display style.</p>
<sec id="s3_1">
<label>3.1</label>
<title>Primary Definition</title>
<p>Given a sparse convolution network with <italic>L</italic> layers, the weight parameter of layer <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mi>l</mml:mi></mml:math></inline-formula> is <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>C</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>K</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, where <italic>N</italic> denotes the number of filters in this layer, <italic>C</italic> denotes the number of input channels, and <italic>K</italic> is the size of the convolution kernel. Then the parameters in <bold><italic>FS</italic></bold> can be expressed as <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>K</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, i.e., each filter corresponds to a <bold><italic>FS</italic></bold>, where each element corresponds to a stripe in filters, respectively. When <italic>M</italic> stripes in a filter are pruned, the remaining weights in the irregular convolutional kernel can be expressed as <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>C</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>K</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>K</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>M</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>. We define the sparsity of layer <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mi>l</mml:mi></mml:math></inline-formula> after pruning as:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:msup><mml:mi>S</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:msub><mml:mi>M</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>K</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula>where <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:msub><mml:mi>M</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> denotes the number of stripes pruned in the <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mi>i</mml:mi></mml:math></inline-formula> th filter. Then we define the sparsity of the whole sparse convolution network after pruning as:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mi>L</mml:mi></mml:munderover><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>C</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mi>N</mml:mi><mml:mi>l</mml:mi></mml:msup></mml:mrow></mml:munderover><mml:msubsup><mml:mi>M</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mi>L</mml:mi></mml:munderover><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mi>C</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mi>K</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mi>K</mml:mi><mml:mi>l</mml:mi></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:math></disp-formula>where <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msup><mml:mi>C</mml:mi><mml:mi>l</mml:mi></mml:msup></mml:math></inline-formula> is input channels dimension of layer <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:mi>l</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msup><mml:mi>N</mml:mi><mml:mi>L</mml:mi></mml:msup></mml:math></inline-formula> is the number of filters in layer <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:mi>l</mml:mi></mml:math></inline-formula>, and <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:msubsup><mml:mi>M</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi></mml:msubsup></mml:math></inline-formula> denotes the number of stripes pruned by the <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mi>i</mml:mi></mml:math></inline-formula> th filter in layer <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mi>l</mml:mi></mml:math></inline-formula>.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Problem Description</title>
<p>This subsection describes the details of the problems with the SWP pruning method mentioned in the above section.</p>
<p>In related work, we point out that SWP is in the same line as NS in that both use <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:msub><mml:mi>&#x2113;</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math></inline-formula>-norm regularization to sparsely induce the relevant parameters of the pruned object during training. NS is pruned after training using a given pruning ratio. Based on the experimental results in [<xref ref-type="bibr" rid="ref-17">17</xref>], the accuracy of the pruned model remains constant and then decreases as the pruning ratio increases. From this point, it can be seen that there exists a threshold for the scaling factors of the sparse BN layers in the network, and there is no effect on the model accuracy if filters with BN layers that have scaling factors below such threshold were pruned. SWP keenly captures this point and uses a small global threshold (typically 0.01) to softly prune filters stripes that fall below this threshold during training. In this way, <italic><bold>FS</bold></italic> is thinned but not yet close to 0 in the early training epochs, during which no pruning is needed, leaving enough time for optimization of the model in the early training process.</p>
<p>However, while the threshold controls the timing and intensity of soft pruning well, it is indeed unreasonable to use a global threshold for all network layers. After using SWP to train and prune the 3D object detection model SECOND using a global threshold of 0.01, the sparsity of different sparse convolutional layers is shown in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>. It can be seen that obvious sparsity differences between different layers are caused by using the global threshold to control the pruning ratio for all network layers. For example, the sparsity of the sparse convolutional layer in the red part of <xref ref-type="fig" rid="fig-3">Fig. 3</xref> is above 0.9, but the sparsity of the sparse convolutional layer in the light-blue part is below 0.55. The reason for this phenomenon is that different layers in the network have different sensitivities to pruning, i.e., some network layers have a greater impact on the network output, while others have a very small impact on the network output or even can be removed directly.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Differences in the sparsity of different sparse convolutional layers in SECOND after pruning with SWP (Stripe-Wise Pruning). Different color means layers within different sparsity range</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_65047-fig-3.tif"/>
</fig>
<p>In addition, SWP uses the soft pruning operation to filters stripes during training, which allows the zeroed stripes to have the ability to be recovered, expanding the model optimization space and thus increasing the possibility of obtaining better network performance. However, after tracking and analyzing the sparsity of all sparse convolutional layers in the SECOND model during SWP training, we find that most of the network layers, except for those that are already very sparse in the preperiod, no longer changed for a long time in the mid-to-late stages of training (see <xref ref-type="fig" rid="fig-4">Fig. 4</xref>). This indicates that the <bold><italic>FS</italic></bold> masks of these layers (i.e., treating a stripe larger than the threshold as 1 and a stripe lower than the threshold as 0) have been fixed in the mid-to-late stages of training, causing the model parameters to be updated in a fixed optimization space, which is unfriendly to the original purpose of the soft pruning method to expand the optimization space of model parameters.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Sparsity variation of all SECOND layers during training process with SWP (Stripe-Wise Pruning)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_65047-fig-4.tif"/>
</fig>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Hierarchical Shape Pruning</title>
<p>This section describes how to update pruning thresholds hierarchically based on SWP. As with SWP, to make the distribution of <bold><italic>FS</italic></bold> sparse enough during training, i.e., to keep some elements of <bold><italic>FS</italic></bold> below a threshold, sparse regularization needs to be applied to <bold><italic>FS</italic></bold> during training, corresponding to a model loss function of:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:munder><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>W</mml:mi><mml:mo>&#x2299;</mml:mo><mml:mi>F</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mi>&#x03BB;</mml:mi><mml:mi>g</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>F</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>where <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is the training data, <italic>W</italic> is the model parameter, <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mo>&#x2299;</mml:mo></mml:math></inline-formula> denotes the dot product operation, <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mi>&#x03BB;</mml:mi></mml:math></inline-formula> stands for the magnitude of sparse regularization, <italic>F</italic> stands for <italic><bold>FS</bold></italic>, and <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mi>g</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>F</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> denotes the sparse regularization applied to <bold><italic>FS</italic></bold> using the <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:msub><mml:mi>&#x2113;</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math></inline-formula>-norm:
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mi>g</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>F</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:munderover><mml:mi>g</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>F</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mo>(</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:munderover><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mo>|</mml:mo><mml:msubsup><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>|</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Next, we describe in detail how to set dynamic thresholds in different sparse convolutional layers. First, we define a global threshold <italic>T</italic> as the basis for the dynamic change of the threshold in each layer, and given a pruning ratio <italic>P</italic>. During training, the global sparsity <italic>S</italic> of the network is recorded after each iteration, and the standard deviation of the historical global sparsity within a certain window is counted when <italic>S</italic> is larger than <italic>P/2</italic>:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mi>S</mml:mi><mml:mi>t</mml:mi><mml:mi>d</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>G</mml:mi><mml:mi>S</mml:mi><mml:mo>,</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:msqrt><mml:mfrac><mml:mn>1</mml:mn><mml:mi>R</mml:mi></mml:mfrac><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>R</mml:mi></mml:munderover><mml:mo stretchy="false">(</mml:mo><mml:mi>G</mml:mi><mml:msub><mml:mi>S</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:msqrt><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mn>2</mml:mn></mml:msup></mml:math></disp-formula>where <italic>GS</italic> denotes all the global sparsity from the beginning of training to the current iteration, <italic>R</italic> denotes the number of historical global sparsity counted forward from the current iteration, i.e., the size of the standard deviation statistical window, <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:mi>G</mml:mi><mml:msub><mml:mi>S</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:math></inline-formula> denotes all historical global sparsity within the window, and <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:mi>&#x03BC;</mml:mi></mml:math></inline-formula> is the mean value of historical global sparsity.</p>
<p>When the standard deviation of global sparsity within the monitoring window reaches zero, it indicates that the model&#x2019;s optimization space&#x2014;the set of potentially recoverable yet currently pruned stripes&#x2014;has stagnated. This stagnation implies that the current threshold configuration can no longer explore better sparsity patterns without intervention. To reinvigorate the search, we:
<list list-type="order">
<list-item><p><bold>Identify Candidate Layers:</bold> Iterate through all layers, flagging those whose sparsity trends align with the global plateau (i.e., zero deviation) and fall within the transition zone <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:mo stretchy="false">[</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula>.</p></list-item>
<list-item><p><bold>Thresholds Adjustment:</bold> Select the flagged layer with minimal sparsity (mostly under-pruned) and increment its threshold by <italic>0.1T</italic>, where <italic>T</italic> is the base threshold. This targeted increase forces exploration of sparser configurations while preserving critical features in stable layers.</p></list-item>
<list-item><p><bold>Dynamic Thresholds Ceiling:</bold> If all flagged layers reach <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, we expand the search space by setting <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">&#x2190;</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>T</mml:mi></mml:math></inline-formula>, preventing premature convergence to suboptimal sparsity.</p></list-item>
</list></p>
<p>This mechanism ensures continuous optimization space exploration when conventional gradient-based updates plateau. See Algorithm 1 for details.</p>
<fig id="fig-5">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_65047-fig-5.tif"/>
</fig>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Compute the Input and Output Indices of an Irregular Convolutional Kernel</title>
<p>To achieve the convolution of irregular kernels generated after shape pruning, SWP splits the irregular kernel into many individual stripes, i.e., these 1D convolutions are used to accumulate the outputs after sliding them over the input data.</p>
<p>However, for sparse convolution, all that is needed in the forward derivation is to determine whether the convolution kernel stripes is legal or not when traversing the convolution kernel to obtain the index of the convolution output. While storing the parameters of the irregular sparse convolution kernel, the same irregular sparse convolution kernel is also split into a separate stripe, the difference is that the index attribute is attached to it, which is used to determine whether the index is legal or not in the forward derivation.</p>
<p>Based on the irregular convolution kernel, the detail of computing the input and output indices is shown in Algorithm 2. While for the submanifold 3D sparse convolution, if the center stripe of the convolution kernel is set to zero during the pruning process, zero is still considered to be a valid value because the submanifold 3D sparse convolution needs to use the center of the kernel to determine whether output indexing is needed. For details about the output index calculation of sparse convolution, refer to the sparse convolution library SPConv<xref ref-type="fn" rid="fn-1"><sup>1</sup></xref><fn id="fn-1"><label>1</label><p><ext-link ext-link-type="uri" xlink:href="https://github.com/traveller59/spconv">https://github.com/traveller59/spconv</ext-link> (accessed on 08 May 2025).</p></fn>.</p>
<fig id="fig-6">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_65047-fig-6.tif"/>
</fig>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Experiments</title>
<p>This section begins with an introduction to the experimental setup. Then we compare HSP-S with SWP on voxel-based 3D object detection models using sparse convolution, to highlight the superiority of the proposed HSP-S pruning method. Furthermore, to deeply explore the performance of the HSP-S pruning method, we perform experimental comparisons on 2D classification model ResNet56 [<xref ref-type="bibr" rid="ref-21">21</xref>]. Finally, we perform ablation experiments to explore the effect of hyperparameters in HSP-S on its pruning performance.</p>
<sec id="s4_1">
<label>4.1</label>
<title>Setup</title>
<p><bold>Datasets.</bold> In our experiments, we use the KITTI dataset and NuScenes dataset, which are commonly used in 3D object detection models. The KITTI dataset contains 7481 training samples and 7518 test samples. Since the test samples are not visible, we divide the 7481 training samples into two subsets the training set and the test set, with sample numbers of 3712 and 3769, respectively. Each sample scene consists of lidar data and the corresponding image data, and only the three categories of Pedestrian, Car, and Cyclist are considered in it. NuScenes dataset contains 1000 samples, of which the number of samples in the training, validation, and test sets are 700, 150, and 150, respectively. Each sample scene consists of lidar data and multi-view image data and contains 10 categories. In this paper, the 2D classification dataset used in the experiments is CIFAR10, which contains 10 categories, the training set contains 50,000 images, and the test set contains 10,000 images.</p>
<p><bold>Models.</bold> Our experiments are conducted based on MMDetection3d [<xref ref-type="bibr" rid="ref-26">26</xref>] toolbox and select the voxel-based 3D object detection models as the target models to be pruned. For the KITTI dataset, the target models include SECOND [<xref ref-type="bibr" rid="ref-2">2</xref>], PartA2 [<xref ref-type="bibr" rid="ref-5">5</xref>], PV-RCNN [<xref ref-type="bibr" rid="ref-6">6</xref>], MVXNet [<xref ref-type="bibr" rid="ref-27">27</xref>]; for the NuScenes dataset, we only test CenterPoint [<xref ref-type="bibr" rid="ref-28">28</xref>], cause is the only outdoor 3D object detection model based on voxels. Due to the complex structure of the 3D object detection models, the voxel feature extraction backbone is the target component for pruning. As for the CIFAR10 dataset, the target model is ResNet56, which is always used to be pruned by existing pruning methods.</p>
<p><bold>Baseline.</bold> For 3D object detection models based on the KITTI dataset, we train 40 epochs, and for CenterPoint, we train 20 epochs. And for ResNet56, we train 160 epochs. The parameters other than the number of training epochs, such as the learning rate, data enhancement strategy, and batch size, we just follow the original configuration of each model.</p>
<p><bold>Pruning Settings.</bold> Since SWP has not been tested on voxel-based 3D object detection models, for comparison purposes, experiments are conducted on the above 3D object detection models using SWP. For SWP, the hyperparameter <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:mi>&#x03BB;</mml:mi></mml:math></inline-formula> in <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref> is set to 1e&#x2212;5, the global threshold <italic>T</italic> is set to 0.01, and no retraining is performed after pruning. For HSP-S, the hyperparameter <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:mi>&#x03BB;</mml:mi></mml:math></inline-formula> and global threshold <italic>T</italic> is the same as for SWP, the pruning rate is set to 0.95, the sparsity standard deviation statistical window size <italic>R</italic> is set to 2 epochs, and the maximum threshold <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is set to 0.03.</p>
<p><bold>Pruning Performance.</bold> We conduct pruning on the the voxel feature extraction backbone of 3D object detection models, so the rest of the model also contains learnable parameters, making traditional pruning performance metrics of the number of parameters (Params) and the amount of computation (FLOPs, Floating Point Operations per Second) not accurately reflect the pruning effect. So we use the global sparsity <italic>S</italic> (i.e., the proportion of the number of Params decreasing) and the loss of the model&#x2019;s accuracy of the voxel feature extraction backbone network as metrics to measure the performance of pruning methods.</p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Results on Voxel-Based 3D Object Detection Models</title>
<p>This section compares the pruning performance of HSP-S and SWP on various voxel-based 3D object detection models (see <xref ref-type="table" rid="table-1">Table 1</xref>).</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Comparison of pruning performance between HSP-S and SWP (Stripe-Wise Pruning). The higher Sparsity and lower Accuracy Drop are shown in bold</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th>Dataset</th>
<th>Models</th>
<th>Pruning method</th>
<th>Sparsity (%)</th>
<th>Accuracy drop (%)</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td rowspan="8">KITTI</td>
<td rowspan="2">SECOND</td>
<td>SWP</td>
<td>89.93</td>
<td>&#x2212;0.19</td>
</tr>
<tr align="center">
<td>HSP-S</td>
<td><bold>89.81</bold></td>
<td><bold>&#x2212;0.72</bold></td>
</tr>
<tr align="center">
<td rowspan="2">PartA2</td>
<td>SWP</td>
<td>93.40</td>
<td>1.93</td>
</tr>
<tr align="center">
<td>HSP-S</td>
<td><bold>93.89</bold></td>
<td><bold>1.33</bold></td>
</tr>
<tr align="center">
<td rowspan="2">PV-RCNN</td>
<td>SWP</td>
<td>92.35</td>
<td>3.28</td>
</tr>
<tr align="center">
<td>HSP-S</td>
<td><bold>92.76</bold></td>
<td><bold>2.61</bold></td>
</tr>
<tr align="center">
<td rowspan="2">MVXNet</td>
<td>SWP</td>
<td>91.42</td>
<td>1.58</td>
</tr>
<tr align="center">
<td>HSP-S</td>
<td><bold>91.47</bold></td>
<td><bold>1.56</bold></td>
</tr>
<tr align="center">
<td rowspan="2">NuScenes</td>
<td rowspan="2">CenterPoint</td>
<td>SWP</td>
<td>20.80</td>
<td>0.04/0.17</td>
</tr>
<tr align="center">
<td>HSP-S</td>
<td><bold>21.94</bold></td>
<td><bold>&#x2212;1.02/&#x2212;0.47</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p><bold>KITTI.</bold> On the KITTI dataset, HSP-S achieves higher sparsity than SWP on PartA2, PV-RCNN and MVXNet and lower sparsity on Second. As for accuracy after pruning, HSP-S achieves higher accuracy on the 4 models above.</p>
<p><bold>NuScenes.</bold> On the NuScenes dataset, HSP-S achieves both higher sparsity and better accuracy than SWP.</p>
<p>Based on the above results, the model sparsity after HSP-S pruning is higher than that of SWP except for SECOND, and even though the sparsity after HSP-S pruning is lower than that of SWP for SECOND, the two values are very close to each other and are almost negligible. As for the model accuracy after pruning, it can be seen that HSP-S achieves even higher accuracy than the baseline model on both SECOND and CenterPoint, and both are substantially better than the model accuracy after SWP pruning. For example, for CenterPoint, the model accuracy (mAP and NDS) is improved by 1.02%/0.47% after pruning with HSP-S, while the model accuracy is decreased by 0.04%/0.17% after pruning with SWP, which shows a very obvious advantage of HSP-S. For other models, the model accuracy after HSP-S pruning is also completely higher than that after SWP pruning. In addition, both SWP and HSP-S could only achieve about 20% sparsity, which suggests that setting 0.01 as the global threshold for SWP and the base threshold for HSP-S should be smaller. Actually, this empirically driven base global threshold reveals a limitation: the need for manual tuning introduces subjectivity and compromises reproducibility across tasks.</p>
<p>By adjusting the pruning thresholds at different layers of the network, HSP-S can further expand the pruning ratio of the model to be closer to the predetermined pruning ratio, and in the process of adjusting the thresholds, the potential of the soft pruning method is explored, and the shape of filters is continually changed during the training, which further enlarges the optimization space of the model parameters.</p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Results on ResNet56</title>
<p>To further explore the pruning performance of our pruning method, we conduct experiments on non-sparse convolutional networks, i.e., ResNet56, which is a classification network for recognizing images in the CIFAR10 dataset. Specifically, based on ResNet56, we compare HSP-S with some classical pruning methods and recently proposed excellent pruning methods. The results are shown in <xref ref-type="table" rid="table-2">Table 2</xref>. It can be seen that the model accuracy drop after pruning with HSP-S is 0.08%, while with SWP pruning, the model accuracy drop is 0.12%, HSP-S is more advantageous in terms of the accuracy drop compared with SWP and other methods. At the same time, HSP-S can further reduce the model FLOPs than any other pruning methods.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Comparison of pruning performance of HSP-S and other methods on ResNet56. The bold entries indicate best performance with different metrics. For both Params Drop and FLOPs Drop, higher values are preferred, for Accuracy Drop, the lower the better</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th>Pruning method</th>
<th>Params drop (%)</th>
<th>FLOPs drop (%)</th>
<th>Accuracy drop (%)</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>SWP [<xref ref-type="bibr" rid="ref-5">5</xref>]</td>
<td>77.7</td>
<td>75.6</td>
<td>0.12</td>
</tr>
<tr align="center">
<td>L1 [<xref ref-type="bibr" rid="ref-9">9</xref>]</td>
<td>13.7</td>
<td>27.6</td>
<td>&#x2212;0.02</td>
</tr>
<tr align="center">
<td>MOP-FMS [<xref ref-type="bibr" rid="ref-10">10</xref>]</td>
<td>76.90</td>
<td>76.26</td>
<td>0.77</td>
</tr>
<tr align="center">
<td>NISP [<xref ref-type="bibr" rid="ref-22">22</xref>]</td>
<td>42.6</td>
<td>43.6</td>
<td>0.03</td>
</tr>
<tr align="center">
<td>DCP [<xref ref-type="bibr" rid="ref-23">23</xref>]</td>
<td>70.3</td>
<td>47.1</td>
<td>&#x2212;0.01</td>
</tr>
<tr align="center">
<td>GBN [<xref ref-type="bibr" rid="ref-24">24</xref>]</td>
<td>66.7</td>
<td>70.3</td>
<td>0.03</td>
</tr>
<tr align="center">
<td>HRank [<xref ref-type="bibr" rid="ref-25">25</xref>]</td>
<td>68.1</td>
<td>74.1</td>
<td>2.38</td>
</tr>
<tr align="center">
<td>HSP-S (Ours)</td>
<td><bold>71.7</bold></td>
<td><bold>76.4</bold></td>
<td><bold>0.08</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Ablation Study</title>
<p>In this subsection, we explore the effect of hyperparameter in HSP-S on its pruning performance. We mainly focus on the hyperparameter <italic>R</italic>, which controls the frequency of updating thresholds. <xref ref-type="table" rid="table-3">Table 3</xref> shows the experimental results. We find that <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:mn>1238</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>2</mml:mn></mml:math></inline-formula> gives the most acceptable sparsity and accuracy.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Performance of pruning SECOND based on different <italic>R</italic></title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th><italic>R</italic></th>
<th>Sparsity (%)</th>
<th>Accuracy drop (%)</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>1238 <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 2</td>
<td>90.04</td>
<td>&#x2212;0.49</td>
</tr>
<tr align="center">
<td>1238 <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 3</td>
<td>89.91</td>
<td>&#x2212;0.72</td>
</tr>
<tr align="center">
<td>1238 <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 4</td>
<td>89.98</td>
<td>&#x2212;0.58</td>
</tr>
<tr align="center">
<td>1238 <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 5</td>
<td>90.16</td>
<td>&#x2212;0.20</td>
</tr>
<tr align="center">
<td>1238 <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 6</td>
<td>90.17</td>
<td>&#x2212;0.63</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusions</title>
<p>This paper proposes Hierarchical Shape Pruning for 3D Sparse Convolutions (HSP-S), a novel framework that addresses critical inefficiencies in 3D sparse convolutional networks. We first establish that the sparse voxel distributions naturally align with shape-wise pruning, enabling targeted removal of underutilized convolutional stripes while preserving task-critical features. Unlike prior methods relying on static global thresholds, HSP-S introduces layer-specific dynamic adaptation, where gradient-based saliency estimates guide per-layer threshold updates, thus enlarges the optimization space further. To align sparse convolution, HSP processes only non-zero stripes during forward passes of pruned kernels. Extensive experiments on KITTI and NuScenes demonstrate that our method is superior to the existing shape pruning method. A future extension of HSP-S would involve multimodal-guided pruning that fuses camera-based semantic segmentation (e.g., Mask R-CNN(Region-based Convolutional Neural Networks) outputs) to inform stripe preservation decisions. Specifically, semantic importance maps from 2D images could be projected into 3D voxel space via calibration matrices, creating soft constraints that protect convolution stripes overlapping with high-value semantic regions.</p>
</sec>
</body>
<back>
<ack>
<p>We would like to thank the contributors of the open-source project MMDetection3D.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>The authors received no specific funding for this study.</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: Conceptualization: Haiyan Long and Chonghao Zhang; methodology: Chonghao Zhang and Hai Chen; software: Chonghao Zhang; validation: Haiyan Long, Chonghao Zhang and Gang Chen; formal analysis: Hai Chen; investigation: Chonghao Zhang; resources: Chonghao Zhang and Hai Chen; data curation: Chonghao Zhang; writing&#x2014;original draft preparation: Chonghao Zhang; writing&#x2014;review and editing: Haiyan Long, Xudong Qiu and Gang Chen; visualization: Chonghao Zhang; supervision: Hai Chen; project administration: Haiyan Long. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>The data that support the findings of this study are openly available at <ext-link ext-link-type="uri" xlink:href="https://www.cvlibs.net/datasets/kitti/">https://www.cvlibs.net/datasets/kitti/</ext-link> (accessed on 15 April 2024) and <ext-link ext-link-type="uri" xlink:href="https://www.nuscenes.org/">https://www.nuscenes.org/</ext-link> (accessed on 15 April 2024).</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Graham</surname> <given-names>B</given-names></string-name>, <string-name><surname>Engelcke</surname> <given-names>M</given-names></string-name>, <string-name><surname>Van Der Maaten</surname> <given-names>L</given-names></string-name></person-group>. <article-title>3D semantic segmentation with submanifold sparse convolutional networks</article-title>. In: <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>. <publisher-loc>Salt Lake City, UT, USA</publisher-loc>; <year>2018</year>. p. <fpage>9224</fpage>&#x2013;<lpage>32</lpage>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yan</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Mao</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>B</given-names></string-name></person-group>. <article-title>Second: sparsely embedded convolutional detection</article-title>. <source>Sensors</source>. <year>2018</year>;<volume>18</volume>(<issue>10</issue>):<fpage>3337</fpage>. doi:<pub-id pub-id-type="doi">10.3390/s18103337</pub-id>; <pub-id pub-id-type="pmid">30301196</pub-id></mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Tang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>S</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>H</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Searching efficient 3D architectures with sparse point-voxel convolution</article-title>. In: <conf-name>European Conference on Computer Vision</conf-name>. <publisher-loc>Glasgow, UK: Springer</publisher-loc>; <year>2020</year>. p. <fpage>685</fpage>&#x2013;<lpage>702</lpage>. doi:<pub-id pub-id-type="doi">10.1007/978-3-030-58604-1_41</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Shi</surname> <given-names>J</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>D</given-names></string-name></person-group>. <article-title>SSN: shape signature networks for multi-class object detection from point clouds</article-title>. In: <conf-name>Proceedings of the European Conference on Computer Vision</conf-name>. <publisher-loc>Glasgow, UK: Springer</publisher-loc>; <year>2020</year>. p. <fpage>581</fpage>&#x2013;<lpage>97</lpage>. doi:<pub-id pub-id-type="doi">10.1007/978-3-030-58595-2_35</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Shi</surname> <given-names>S</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Shi</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Li</surname> <given-names>H</given-names></string-name></person-group>. <article-title>From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network</article-title>. <source>IEEE Trans Pattern Anal Mach Intell</source>. <year>2020</year>. doi:<pub-id pub-id-type="doi">10.1109/tpami.2020.2977026</pub-id>; <pub-id pub-id-type="pmid">32142423</pub-id></mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Shi</surname> <given-names>S</given-names></string-name>, <string-name><surname>Guo</surname> <given-names>C</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Shi</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>PV-RCNN: point-voxel feature set abstraction for 3D object detection</article-title>. In: <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</conf-name>; <year>2020</year>. p. <fpage>10529</fpage>&#x2013;<lpage>38</lpage>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Meng</surname> <given-names>F</given-names></string-name>, <string-name><surname>Cheng</surname> <given-names>H</given-names></string-name>, <string-name><surname>Li</surname> <given-names>K</given-names></string-name>, <string-name><surname>Luo</surname> <given-names>H</given-names></string-name>, <string-name><surname>Guo</surname> <given-names>X</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>G</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Pruning filter in filter</article-title>. <source>Adv Neural Inf Process Syst</source>. <year>2020</year>;<volume>33</volume>:<fpage>17629</fpage>&#x2013;<lpage>40</lpage>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>He</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Kang</surname> <given-names>G</given-names></string-name>, <string-name><surname>Dong</surname> <given-names>X</given-names></string-name>, <string-name><surname>Fu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Soft filter pruning for accelerating deep convolutional neural networks</article-title>. <comment>arXiv:1808.06866</comment>. <year>2018</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>H</given-names></string-name>, <string-name><surname>Kadav</surname> <given-names>A</given-names></string-name>, <string-name><surname>Durdanovic</surname> <given-names>I</given-names></string-name>, <string-name><surname>Samet</surname> <given-names>H</given-names></string-name>, <string-name><surname>Graf</surname> <given-names>HP</given-names></string-name></person-group>. <article-title>Pruning filters for efficient convnets</article-title>. <comment>arXiv:1608.08710</comment>. <year>2016</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jiang</surname> <given-names>P</given-names></string-name>, <string-name><surname>Xue</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Neri</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Convolutional neural network pruning based on multi-objective feature map selection for image classification</article-title>. <source>Appl Soft Comput</source>. <year>2023</year>;<volume>139</volume>(<issue>1</issue>):<fpage>110229</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.asoc.2023.110229</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Hinton</surname> <given-names>G</given-names></string-name>, <string-name><surname>Vinyals</surname> <given-names>O</given-names></string-name>, <string-name><surname>Dean</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Distilling the knowledge in a neural network</article-title>. <comment>arXiv:1503.02531</comment>. <year>2015</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>R</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>JT</given-names></string-name>, <string-name><surname>Gong</surname> <given-names>Y</given-names></string-name></person-group>. <chapter-title>Learning small-size DNN with output-distribution-based criteria</chapter-title>. In: <source>Interspeech</source> <year>2014</year>. <publisher-loc>Singapore</publisher-loc>; <year>2014</year>. p. <fpage>1910</fpage>&#x2013;<lpage>4</lpage>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Nagel</surname> <given-names>M</given-names></string-name>, <string-name><surname>Baalen</surname> <given-names>MV</given-names></string-name>, <string-name><surname>Blankevoort</surname> <given-names>T</given-names></string-name>, <string-name><surname>Welling</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Data-free quantization through weight equalization and bias correction</article-title>. In: <conf-name>Proceedings of the IEEE/CVF International Conference on Computer Vision</conf-name>; <year>2019</year>; <publisher-loc>Seoul, Republic of Korea</publisher-loc>. p. <fpage>1325</fpage>&#x2013;<lpage>34</lpage>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Banner</surname> <given-names>R</given-names></string-name>, <string-name><surname>Nahshan</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Soudry</surname> <given-names>D</given-names></string-name></person-group>. <chapter-title>Post training 4-bit quantization of convolutional networks for rapid-deployment</chapter-title>. Vol. <volume>32</volume>. In: <source>Advances in neural information processing systems</source>. <publisher-loc>Vancouver, BC, Canada: Curran Associates, Inc.</publisher-loc>; <year>2019</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xue</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>C</given-names></string-name>, <string-name><surname>S&#x0142;owik</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Neural architecture search based on a multi-objective evolutionary algorithm with probability stack</article-title>. <source>IEEE Trans Evol Comput</source>. <year>2023</year>;<volume>27</volume>(<issue>4</issue>):<fpage>778</fpage>&#x2013;<lpage>86</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tevc.2023.3252612</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Han</surname> <given-names>S</given-names></string-name>, <string-name><surname>Mao</surname> <given-names>H</given-names></string-name>, <string-name><surname>Dally</surname> <given-names>WJ</given-names></string-name></person-group>. <article-title>Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding</article-title>. <comment>arXiv:1510.00149</comment>. <year>2015</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Hu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Peng</surname> <given-names>R</given-names></string-name>, <string-name><surname>Tai</surname> <given-names>YW</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>CK</given-names></string-name></person-group>. <article-title>Network trimming: a data-driven neuron pruning approach towards efficient deep architectures</article-title>. <comment>arXiv:1607.03250</comment>. <year>2016</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Luo</surname> <given-names>JH</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>W</given-names></string-name></person-group>. <article-title>Thinet: a filter level pruning method for deep neural network compression</article-title>. In: <conf-name>Proceedings of the IEEE International Conference on Computer Vision</conf-name>. <publisher-loc>Venice, Italy</publisher-loc>; <year>2017</year>. p. <fpage>5058</fpage>&#x2013;<lpage>66</lpage>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Molchanov</surname> <given-names>P</given-names></string-name>, <string-name><surname>Tyree</surname> <given-names>S</given-names></string-name>, <string-name><surname>Karras</surname> <given-names>T</given-names></string-name>, <string-name><surname>Aila</surname> <given-names>T</given-names></string-name>, <string-name><surname>Kautz</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Pruning convolutional neural networks for resource efficient inference</article-title>. <comment>arXiv:1611.06440</comment>. <year>2016</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>G</given-names></string-name>, <string-name><surname>Yan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Learning efficient convolutional networks through network slimming</article-title>. In: <conf-name>Proceedings of the IEEE International Conference on Computer Vision</conf-name>. <publisher-loc>Venice, Italy</publisher-loc>; <year>2017</year>. p. <fpage>2736</fpage>&#x2013;<lpage>44</lpage>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>He</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Ren</surname> <given-names>S</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Deep residual learning for image recognition</article-title>. In: <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>. <publisher-loc>Las Vegas, NV, USA</publisher-loc>; <year>2016</year>. p. <fpage>770</fpage>&#x2013;<lpage>8</lpage>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Yu</surname> <given-names>R</given-names></string-name>, <string-name><surname>Li</surname> <given-names>A</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>CF</given-names></string-name>, <string-name><surname>Lai</surname> <given-names>JH</given-names></string-name>, <string-name><surname>Morariu</surname> <given-names>VI</given-names></string-name>, <string-name><surname>Han</surname> <given-names>X</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Nisp: pruning networks using neuron importance score propagation</article-title>. In: <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>. <publisher-loc>Salt Lake City, UT, USA</publisher-loc>; <year>2018</year>. p. <fpage>9194</fpage>&#x2013;<lpage>203</lpage>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Zhuang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Tan</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zhuang</surname> <given-names>B</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Guo</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>Q</given-names></string-name>, <etal>et al</etal></person-group>. <chapter-title>Discrimination-aware channel pruning for deep neural networks</chapter-title>. In: <source>Advances in neural information processing systems</source>. Vol. <volume>31</volume>; <year>2018</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>You</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Yan</surname> <given-names>K</given-names></string-name>, <string-name><surname>Ye</surname> <given-names>J</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>M</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>P</given-names></string-name></person-group>. <chapter-title>Gate decorator: global filter pruning method for accelerating deep convolutional neural networks</chapter-title>. In: <source>Advances in neural information processing systems</source>. <publisher-loc>Red Hook, NY, USA: Curran Associates Inc.</publisher-loc>; <year>2019</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Lin</surname> <given-names>M</given-names></string-name>, <string-name><surname>Ji</surname> <given-names>R</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>B</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>Y</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>HRank: Filter pruning using high-rank feature map</article-title>. In: <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>. <publisher-loc>Seattle, USA</publisher-loc>; <year>2020</year>. p. <fpage>1529</fpage>&#x2013;<lpage>38</lpage>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Contributors</surname> <given-names>M</given-names></string-name></person-group>. <article-title>MMDetection3D: openMMLab next-generation platform for general 3D object detection</article-title>; <year>2020</year> <comment>[cited 2025 Apr 7]</comment>. Available from: <ext-link ext-link-type="uri" xlink:href="https://github.com/open-mmlab/mmdetection3d">https://github.com/open-mmlab/mmdetection3d</ext-link>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Sindagi</surname> <given-names>VA</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Tuzel</surname> <given-names>O</given-names></string-name></person-group>. <article-title>MVX-Net: multimodal voxelnet for 3D object detection</article-title>. In: <conf-name>2019 International Conference on Robotics and Automation (ICRA)</conf-name>. <publisher-loc>Montreal, QC, Canada</publisher-loc>; <year>2019.</year> p. <fpage>7276</fpage>&#x2013;<lpage>82</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICRA.2019.8794195</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Yin</surname> <given-names>T</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>X</given-names></string-name>, <string-name><surname>Kr&#x00E4;henb&#x00FC;hl</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Center-based 3D object detection and tracking</article-title>. In: <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</conf-name>; <year>2021</year>. p. <fpage>11784</fpage>&#x2013;<lpage>93</lpage>.</mixed-citation></ref>
</ref-list>
</back></article>