Tensor Train Random Projection

This work proposes a novel tensor train random projection (TTRP) method for dimension reduction, where pairwise distances can be approximately preserved. Our TTRP is systematically constructed through a tensor train (TT) representation with TT-ranks equal to one. Based on the tensor train format, this new random projection method can speed up the dimension reduction procedure for high-dimensional datasets and requires less storage costs with little loss in accuracy, compared with existing methods. We provide a theoretical analysis of the bias and the variance of TTRP, which shows that this approach is an expected isometric projection with bounded variance, and we show that the Rademacher distribution is an optimal choice for generating the corresponding TT-cores. Detailed numerical experiments with synthetic datasets and the MNIST dataset are conducted to demonstrate the efficiency of TTRP.


Introduction
Dimension reduction is a fundamental concept in science and engineering for feature extraction and data visualization.Exploring the properties of low-dimensional structures in high-dimensional spaces attracts broad attention.Popular dimension reduction methods include principal component analysis (PCA) [1,2], non-negative matrix factorization (NMF) [3], and t-distributed stochastic neighbor embedding (t-SNE) [4].A main procedure in dimension reduction is to build a linear or nonlinear mapping from a high-dimensional space to a low-dimensional one, which keeps important properties of the high-dimensional space, such as the distance between any two points [5].
The random projection (RP) is a widely used method for dimension reduction.It is well-known that the Johnson-Lindenstrauss (JL) transformation [6,7] can nearly preserve the distance between two points after a random projection f , which is typically called isometry property.The isometry property can be used to achieve the nearest neighbor search in high-dimensional datasets [8,9].It can also be used to [10,11], where a sparse signal can be reconstructed under a linear random projection [12].The JL lemma [6] tells us that there exists a nearly isometry mapping f , which maps high-dimensional datasets into a lower dimensional space.Typically, a choice for the mapping f is the linear random projection where x ∈ R N , and R ∈ R M×N is a matrix whose entries are drawn from the mean zero and variance one Gaussian distribution, denoted by N(0, 1).We call it Gaussian random projection (Gaussian RP).The storage of matrix R in (1) is O(MN) and the cost of computing Rx in (1) is O(MN).However, with large M and N, this construction is computationally infeasible.To alleviate the difficulty, the sparse random projection method [13] and the very sparse random projection method [14] are proposed, where the random projection is constructed by a sparse random matrix.Thus the storage and the computational cost can be reduced.To be specific, Achlioptas [13] replaced the dense matrix R by a sparse matrix whose entries follow +1, with probability 1 2s , 0, with probability 1 − 1 s , −1, with probability 1 2s . ( This means that the matrix is sampled at a rate of 1/s.Note that, if s = 1, the corresponding distribution is called the Rademacher distribution.When s = 3, the cost of computing Rx in (1) reduces down to a third of the original one but is still O(MN).When s = √ N 3, Li et al. [14] called this case as the very sparse random projection (Very Sparse RP), which significantly speeds up the computation with little loss in accuracy.It is clear that the storage of very sparse random projection is O(M √ N).However, the sparse random projection can typically distort a sparse vector [9].To achieve a low-distortion embedding, Ailon and Chazelle [9,15] proposed the Fast-Johnson-Lindenstrauss Transform (FJLT), where the preconditioning of a sparse projection matrix with a randomized Fourier transform is employed.
To reduce randomness and storage requirements, Sun [16] et al. proposed the following format: R = (R 1 • • • R d ) T , where represents the Khatri-Rao product, R i ∈ R n i ×M , and N = d i=1 n i .Each R i is a random matrix whose entries are i.i.d.random variables drawn from N(0, 1).This transformation is called the Gaussian tensor random projection (Gaussian TRP) throughout this paper.It is clear that the storage of the Gaussian TRP is O(M d i=1 n i ), which is less than that of the Gaussian random projection (Gaussian RP) .For example, when N = n 1 n 2 = 40000, the storage of Gaussian TRP is only 1/20 of Gaussian RP.Also, it has been shown that Gaussian TRP satisfies the properties of expected isometry with vanishing variance [16].
Recently, using matrix or tensor decomposition to reduce the storage of projection matrices is proposed in [17,18].The main idea of these methods is to split the projection matrix into some small scale matrices or tensors.In this work, we focus on the low rank tensor train representation to construct the random projection f .Tensor decompositions are widely used for data compression [5,[19][20][21][22][23][24].The tensor train (TT) decomposition gives the following benefits-low rank TT-formats can provide compact representations of projection matrices and efficient basic linear algebra operations of matrix-by-vector products [25].Based on these benefits, we propose a novel tensor train random projection (TTRP) method, which requires significantly smaller storage and computational costs compared with existing methods (e.g., Gaussian TRP [16], Very Sparse RP [14] and Gaussian RP [26]).While constructing projection matrices using tensor train (TT) and Canonical polyadic (CP) decompositions based on Gaussian random variables is proposed in [27], the main contributions of our work are three-fold: first our TTRP is conducted based on a rank-one TT-format, which significantly reduces the storage of projection matrices; second, we provide a novel construction procedure for the rank-one TT-format in our TTRP based on i.i.d.Rademacher random variables; third, we prove that our construction of TTRP is unbiased with bounded variance.
The rest of the paper is organized as follows.The tensor train format is introduced in section 2. Details of our TTRP approach are introduced in section 3, where we prove that the approach is an expected isometric projection with bounded variance.In section 4, we demonstrate the efficiency of TTRP with datasets including synthetic, MNIST.Finally section 5 concludes the paper.

Tensor train format
Let lowercase letters (x), boldface lowercase letters (x), boldface capital letters (X), calligraphy letters (X) be scalar, vector, matrix and tensor variables, respectively.x(i) represents the element i of a vector x.X(i, j) means the element (i, j) of a matrix X.The i-th row and j-th column of a matrix X is defined by X(i, :) and X(:, j), respectively.For a given d-th order tensor X, X(i 1 , i 2 , . . ., i d ) is its (i 1 , i 2 , . . ., i d )-th component.For a vector x ∈ R N , we denote its p norm as , for any p ≥ 1.The Kronecker product of matrices A ∈ R I×J and B ∈ R K×L is denoted by A ⊗ B of which the result is a matrix of size (IK) × (JL) and defined by .

Tensor train decomposition
Tensor Train (TT) decomposition [25] is a generalization of SVD decomposition from matrices to tensors.TT decomposition provides a compact representation for tensors, and allows for efficient application of linear algebra operations (discussed in section 2.2 and section 2.3).Given a d-th order tensor G ∈ R n 1 ×•••×n d , the tensor train decomposition [25] is where . ., n k , and the "boundary condition" is r 0 = r d = 1.The tensor G is said to be in the TT-format if each element of G can be represented by (6).
In the index form, the decomposition (6) is rewritten as the following TT-format To look more closely to (6), an element G(i 1 , i 2 , . . ., i d ) is represented by a sequence of matrix-by-vector products.Figure 1 illustrates the tensor train decomposition.It can be seen that the key ingredient in tensor train (TT) decomposition is the TT-ranks.The TT-format only uses O(ndr 2 ) memory to O(n d ) elements, where n = max {n 1 , . . ., n d } and r = max {r 0 , r 1 , . . ., r d }.Although the storage reduction is efficient only if the TT-rank is small, tensors in data science and machine learning typically have low TT-ranks.Moreover, one can apply the TT-format to basic linear algebra operations, such as matrix-by-vector products, scalar multiplications, etc.This can reduce the computational cost significantly when the data have low rank structures (see [25] for details).

Tensorizing matrix-by-vector products
The tensor train format gives a compact representation of matrices and efficient computation for matrix-by-vector products.We first review the TT-format of large matrices and vectors following [25].Defining two bijections ν : N → N d and µ : N → N d , a pair index (i, j) ∈ N 2 is mapped to a multi-index pair (ν(i), µ( j)) = (i 1 .i 2 , . . ., i d , j 1 , j 2 , . . ., j d ).Then a matrix R ∈ R M×N and a vector x ∈ R N can be tensorized in the TT-format as follows.Letting M = d i=1 m k and N = d i=1 n k , an element (i, j) of R can be written as (see [25,29]) (8) and an element j of x can be written as where enumerate the rows of R, and ( j 1 , . . ., j d ) enumerate the columns of R.
We consider the matrix-by-vector product (y = Rx), and each element of y can be tensorized in the TT-format as O(r 0 r 1 r0 r1 ) where Assuming that the TT-cores of x are known, the total cost of the matrix-by-vector product (y = Rx) in the TT-format can reduce significantly from the original complexity O(MN) to O(dmnr 2 r2 ), m = max{m 1 , m 2 , . . ., m d }, n = max{n 1 , n 2 , . . ., n d }, r = max {r 0 , r 1 , . . ., r d }, r = max {r 0 , r1 , . . ., rd }, where N is typically large and r is small.When m k = n k , r k = rk , for k = 1, . . ., d, the cost of such matrix-by-vector product in the TT-format is O(dn 2 r 4 ) [25].Note that, in the case that r equals one, the cost of such matrix-by-vector product in the TT-format is O(dmnr 2 ).

Basic Operations in the TT-format
In section 2.2, the product of matrix R and vector x which are both in the TTformat, is conducted efficiently.In the TT-format, many important operations can be readily implemented.For instance, computing the Euclidean distance between two vectors in the TT-format is more efficient with less storage than directly computing the Euclidean distance in standard matrix and vector formats.In the following, some important operations in the TT-format are discussed.
The subtraction of tensor where and and TT-ranks of Z equal the sum of TT-ranks of Y and Ŷ.The dot product of tensor Y and tensor Ŷ in the TT-format [25] is where Since V 1 , V d are vectors and V 2 , . . ., V d−1 are matrices, we compute Y, Ŷ by a sequence of matrix-by-vector products: where and we obtain ) For simplify we assume that TT-ranks of Y are the same as that of Ŷ.In (18), let and we use the reshaping Kronecker product expressions [30] for (18): ) while the disregard of Kronecker structure of y = x(B ⊗ C) leads to an O(r 4 ) calculation.Hence the complexity of computing p k (i k ) in ( 18) is O(r 3 ), because of the efficient Kronecker product computation.Then the cost of computing v k in ( 17) is O(mr 3 ), and the total cost of the dot product Y, Ŷ is O(dmr 3 ).
The Frobenius norm of a tensor Y is defined by Computing the distance between tensor Y and tensor Ŷ in the TT-format is computationally efficient by applying the dot product ( 14)-( 15), The complexity of computing the distance is also O(dmr O(mr 2 ) by ( 16) O(r 3 ) by (18) 5: O(mr 3 ) by ( 17) 6: end for O(r 2 ) by ( 18) O(mr 2 ) by ( 17) In summary, just merging the cores of two tensors in the TT-format can perform the subtraction of two tensors instead of directly subtraction of two tensors in standard tensor format.A sequence of matrix-by-vector products can achieve the dot product of two tensors in the TT-format.The cost of computing the distance between two tensors in the TT-format, reduces from the original complexity O(M) to O(dmr 3 ), where M = d i=1 m i , r M.

Tensor train random projection
Due to the computational efficiency of TT-format discussed above, we consider the TT-format to construct projection matrices.Our tensor train random projection is defined as follows.
Definition 1 (Tensor Train Random Projection).For a data point x ∈ R N , our tensor train random projection (TTRP) is where the tensorized versions (through the TT-format) of R and x are denoted by R and X (see ( 8)-( 9)), the corresponding TT-cores are denoted by k=1 respectively, we set r 0 = r 1 = . . .= r d = 1, and y := Rx is specified by (10).
Note that our TTRP is based on the tensorized version of R with TT-ranks all equal to one, which leads to significant computational efficiency and small storage costs, and comparisons for TTRP associated with different TT-ranks are conducted in section 4. When r 0 = r 1 = . . .= r d = 1, all TT-cores R i , for i = 1, . . ., d in (8) become matrices and the cost of computing Rx in TTRP (21) is O(dmnr 2 ) (see section 2.2), where m = max{m 1 , m 2 , . . ., m d }, n = max{n 1 , n 2 , . . ., n d } and r = max{r 0 , r1 , . . ., rd }.Moreover, from our analysis in the latter part of this section, we find that the Rademacher distribution introduced in section 1 is an optimal choice for generating the TT-cores of R. In the following, we prove that TTRP established by ( 21) is an expected isometric projection with bounded variance.
whose entries are independent and identically random variables with mean zero and variance one, then the following equation holds Proof Denoting y = Rx gives By the TT-format, where ( 24) is derived using ( 3) and ( 23), and then combining (24) and using the independence of TT-cores R 1 , . . ., R d give (25).
The k-th term of the right hand side of (25), for k = 1, . . ., d, can be computed by Here as we set the TT-ranks of R to be one, R k (i k , j k ) is scalar, and (26) then leads to (27).Using (4) and ( 27) gives ( 28), and we derive (30) from ( 28) by the assumption that Substituting (30) into (25) gives Substituting ( 31) into (22), it concludes that Theorem 2 Given a vector x ∈ R ) is composed of d independent TTcores R 1 , . . ., R d , whose entries are independent and identically random variables with mean zero, variance one, with the same fourth moment ∆ and Proof By the property of the variance and using Theorem 1, where note that E[y 2 (i)y 2 ( j)] E[y 2 (i)]E[y 2 ( j)] in general and a simple example can be found in Appendix A.
Similarly, the second term E y 2 (i)y 2 ( j) of the right hand side of (33), for i j, If i k i k , for k = 1, . . ., d, then the k-th term of the right hand side of (44) is computed by Supposing that i 1 = i 1 , . . ., i k i k , . . ., i d = i d and substituting (40) and (47) into (44), we obtain Similarly, if for k ∈ S ⊆ {1, . . ., d}, |S | = l, i k i k , and for k ∈ S , i k = i k , then Hence, combining ( 48) and (49) gives where m = max{m 1 , m 2 , . . ., m d }.
Therefore, using (43) and ( 50) deduces In summary, substituting (51) into (32) implies One can see that the bound of the variance (52) is reduced as M increases, which is expected.When M = m d and N = n d , we have As m increases, the upper bound in (53) tends to (N 2 M 4 − x 4 2 ) ≥ 0, and this upper bound vanishes as M increases if and only if x(1) = x(2) = • • • = x(N).Also, the upper bound (52) is affected by the fourth moment To keep the expected isometry, we need E[R 2 k (i k , j k )] = 1.Note that when the TT-cores follow the Rademacher distribution i.e., Var R 2 k (i k , j k ) = 0, the fourth moment ∆ in (52) achieves the minimum.So, the Rademacher distribution is an optimal choice for generating the TT-cores, and we set the Rademacher distribution to be our default choice for constructing TTRP (Definition 1).
Proposition 3 (Hypercontractivity [31]) Consider a degree q polynomial f (Y) = f (Y 1 , . . ., Y n ) of independent centered Gaussian or Rademacher random variables Y 1 , . . ., Y n .Then for any λ > 0 ) is the variance of the random variable f (Y) and K > 0 is an absolute constant.
Proposition 3 extends the Hanson-Wright inequality whose proof can be found in [31].
Proposition 4 Let f T T RP : R N → R M be the tensor train random projection defined by (21).Suppose that for i = 1, . . ., d, all entries of TT-cores R i are independent standard Gaussian or Rademacher random variables, with the same fourth moment ∆ and M := max i=1,...,N |x(i)|, m = max{m 1 , m 2 , . . . ,m d }, n = max{n 1 , n 2 , . . . ,n d }.For any x ∈ R N , there exist absolute constants C and K > 0 such that the following claim holds 2  2 is a polynomial of degree 2d of independent standard Gaussian or Radamecher random variables, which are the entries of TT-cores R i , for i = 1, . . ., d, we apply Proposition 3 and Theorem 2 to obtain , where M = max i=1,...,N |x(i)| and then M 4 We note that the upper bound in the concentration inequality (54) is not tight, as it involves the dimensionality of datasets (N).To give a tight bound independent of the dimensionality of datasets for the corresponding concentration inequality is our future work.
The procedure of TTRP is summarized in Algorithm 2. For the input of this algorithm, the TT-ranks of R (the tensorized version of the projection matrix R in (21)) are set to one, and from our above analysis, we generate entries of the corresponding TT-cores {R k } d k=1 through the Rademacher distribution.For a given data point x in the TT-format, Algorithm 2 gives the TT-cores of the corresponding output, and each element of f T T RP (x) in ( 21) can be represented as: where ν is a bijection from N to N d .

Algorithm 2 Tensor train random projection
end for 5: end for Output: TT-cores 1

Numerical experiments
We demonstrate the efficiency of TTRP using synthetic datasets and the MNIST dataset [32].The quality of isometry is a key factor to assess the performance of random projection methods, which in our numerical studies is estimated by the ratio of the pairwise distance where n 0 is the number of data points.Since the output of our TTRP procedure (see Algorithm 2) is in the TT-format, it is efficient to apply TT-format operations to compute the pairwise distance of (55) through Algorithm 1.In order to obtain the average performance of isometry, we repeat numerical experiments 100 times (different realizations for TT-cores) and estimate the mean and the variance for the ratio of the pairwise distance using these samples.The rest of this section is organized as follows.First, through a synthetic dataset, the effect of different TT-ranks of the tensorized version R of R in ( 21) is shown, which leads to our motivation of setting the TT-ranks to be one.After that, we focus on the situation with TT-ranks equal to one, and test the effect of different TT-cores.Finally, based on both high-dimensional synthetic and MNIST datasets, our TTRP are compared with related projection methods, including Gaussian TRP [16], Very Sparse RP [14] and Gaussian RP [26].

Effect of different TT-ranks
In Definition 1, we set the TT-ranks to be one.To explain our motivation of this settting, we investigate the effect of different TT-ranks-we herein consider the situation that the TT-ranks take r 0 = r d = 1, r k = r, k = 2, . . ., d − 1, where the rank r ∈ {1, 2, . ..}, and we keep other settings in Definition 1 unchanged.For comparison, two different distributions are considered to generate the TT-cores in this part-the Rademacher distribution (our default optimal choice) and the Gaussian distribution, and the corresponding tensor train projection is denoted by rank-r TTRP and Gaussian TT (studied in detail in [27]) respectively.For rank-r TTRP, the entries of TT-cores R 1 (i 1 , j 1 ) and R d (i d , j d ) are drawn from 1/r 1/4 or −1/r 1/4 with equal probability, and each element of R k (i k , j k ), k = 2, .., d − 1 is uniformly and independently drawn from 1/r 1/2 or −1/r 1/2 .A synthetic dataset with dimension N = 1000 and size n 0 = 10 are generated, where each entry of vectors (each vector is a sample in the synthetic dataset) is independently generated through N(0, 1).In this test problem, we set the reduced dimension to be M = 24, and the dimensions of the corresponding tensor representations are set to m 1 = 4, m 2 = 3, m 3 = 2 and n 1 = n 2 = n 3 = 10 (M = m 1 m 2 m 3 and N = n 1 n 2 n 3 ).Figure 2 shows the ratio of the pairwise distance of the two projection methods (computed through (55)).It can be seen that the estimated mean of ratio of the pairwise distance of rank-r TTRP is typically more close to one than that of Gaussian TT, i.e., rank-r TTRP has advantages for keeping the pairwise distances.Clearly, for a given rank in Figure 2, the estimated variance of the pairwise distance of rank-r TTRP is smaller than that of Gaussian TT.Moreover, focusing on rank-r TTRP, the results of both the mean and the variance are not significantly different for different TT-ranks.In order to reduce the storage, we only focus on the rank-one case (as in Definition 1) in the rest of this paper.

Effect of different TT-cores
A synthetic dataset is tested to assess the effect of different distributions for TT-cores, which consists of independent vectors x (1) , . . ., x (10) , with dimension N = 2500, whose elements are sampled from the standard Gaussian distribution.The following three distributions are investigated to construct TTRP (see Definition 1), which include the Rademacher distribution (our default choice), the standard Gaussian distribution (studied in [27]), and the 1/3-sparse distribution (i.e., s = 3 in (2)), while the corresponding projection methods are denoted by TTRP-RD, TTRP-N(0, 1), and TTRP-1/3-sparse, respectively.For this test problem, three TT-cores are utilized for m 1 = M/2, m 2 = 2, n 3 = 1 and n 1 = 25, n 2 = 10, n 3 = 10. Figure 3 shows that  the estimated mean of the ratio of the pairwise distance for TTRP-RD is very close to one, and the estimated variance of TTRP-RD is at least one order of magnitude smaller than that of TTRP-N(0, 1) and TTRP-1/3-sparse.These results are consist with Theorem 2. In the rest of this paper, we focus on our default choice for TTRPthe TT-ranks are set to one, and each element of TT-cores is independently sampled through the Rademacher distribution.

Comparison with Gaussian TRP, Very Sparse RP and Gaussian RP
The storage of the projection matrix and the cost of computing Rx (see ( 21)) of our TTRP (TT-ranks equal one), Gaussian TRP [16], Very Sparse RP [14] and Gaussian RP [26], are shown in Table 1, where   21) is tensorized in the TT-format, and TTRP is efficiently achieved by the matrix-by-vector products in the TT-format (see (10)).From Table 1, it is clear that our TTRP has the smallest storage cost and requires the smallest computational cost for computing Rx.
Two synthetic datasets with size n 0 = 10 are tested-the dimension of the first one is N = 2500 and that of the second one is N = 10 4 ; each entry of the samples is independently generated through N(0, 1).For TTRP and Gaussian TRP, the dimensions of tensor representations are set to: for N = 2500, we set We again focus on the ratio of the pairwise distance (putting the outputs of different projection methods into (55)), and estimate the mean and the variance for the ratio of the pairwise distance through repeating numerical experiments 100 times (different realizations for constructing the random projections, e.g., different realizations of the Rademacher distribution for TTRP).
Figure 4 shows that the performance of TTRP is very close to that of sparse RP and Gaussian RP, while the variance for Gaussian TRP is larger than that for the other three projection methods.Moreover, the variance for TTRP basically reduces as the dimension M increases, which is consistent with Theorem 2. To be further, more details are given for the case with M = 24 and N = 10 4 in Table 2 and Table 3, where the value of storage is the number of nonzero entries that need to be stored.It turns out that TTRP with fewer storage costs achieves a competitive performance compared with Very Sparse RP and Gaussian RP.In addition, from Table 3, for d > 2, the variance of TTRP is clearly smaller than that of Gaussian TRP, and the storage cost of TTRP is much smaller than that of Gaussian TRP.
Next the CPU times for projecting a data point using the four methods (TTRP, Gaussian TRP, Very Sparse RP and Gaussian RP) are assessed.Here, we set the reduced dimension M = 1000, and test four cases with N = 10 4 , N = dimension N increases, the computational costs of Gaussian TRP and Gaussian RP grow rapidly, while the computational cost of our TTRP grows slowly.When the data dimension is large (e.g., N = 10 6 in Figure 5), the CPU time of TTRP becomes smaller than that of Very Sparse RP, which is consist with the results in Table 1.
Finally, we validate the performance of our TTRP approach using the MNIST dataset [32].From MNIST, we randomly take n 0 = 50 data points, each of which is a vector with dimension N = 784.We consider two cases for the dimensions of tensor representations: in the first case, we set m 1 = M/2, m 2 = 2, n 1 = 196, n 2 = 4, and in the second case, we set m 1 = M/2, m 2 = 2, m 3 = 1, n 1 = 49, n 2 = 4, n 3 = 4. Figure 6 shows the properties of isometry and bounded variance of different random projections on MNIST.It can be seen that TTRP satisfies the isometry property with bounded variance.It is clear that as the reduced dimension M increases, the variances of the four methods reduce, and the variance of our TTRP is close to that of Very Sparse RP.

Conclusion
Random projection plays a fundamental role in conducting dimension reduction for high-dimensional datasets, where pairwise distances need to be approximately preserved.With a focus on efficient tensorized computation, this paper develops a novel tensor train random projection (TTRP) method.Based on our analysis for the bias and the variance, TTRP is proven to be an expected isometric projection with bounded variance.From the analysis in Theorem 2, the Rademacher distribution is shown to be an optimal choice to generate the TT-cores of TTRP.For computational convenience, the TT-ranks of TTRP are set to one, while from our numerical results, we show that different TT-ranks do not lead to significant results for the mean and the variance of the ratio of the pairwise distance.Our detailed numerical studies show that, compared with standard projection methods, our TTRP with the default setting (TT-ranks equal one and TT-cores are generated through the Rademacher distribution), requires significantly smaller storage and computational costs to achieve a competitive performance.From numerical results, we also find that our TTRP has smaller variances than tensor train random projection methods based on Gaussian distributions.Even though we have proven the properties of the mean and the variance of TTRP and the numerical results show that TTRP is efficient, the upper bound in the concentration inequality (54) involves the dimensionality of datasets (N), and our future work is to give a tight bound independent of the dimensionality of datasets for the concentration inequality.

Table 1 :
The comparison of the storage and the computational costs.

Table 2 :
The comparison of mean and variance for the ratio of the pairwise distance, and storage, for Gaussian RP and Very Sparse RP (M = 24 and N = 10 4 )., m 2 , . . ., m d } and n = max{n 1 , n 2 , . . ., n d }.Note that the matrix R in (