Effect of Measurement Error on the Multivariate CUSUM Control Chart for Compositional Data

1School of Automation, Nanjing University of Science and Technology, Nanjing, 210094, China 2School of Economics and Statistics, Guangzhou University, Guangzhou, 510006, China 3Department of Mathematics and Statistics, East China Normal University, Shanghai, 200062, China 4Department of Statistics, Government Ambala Muslim Graduate College, Sargodha, 40100, Pakistan 5Department of Statistics, University of Sargodha, Sargodha, 40100, Pakistan


Introduction
A widely used method is statistical process control (SPC) to monitor a process. SPC is a decisionmaking tool used in almost all industrial manufacturing processes to achieve process stability and sustainable quality product. W. A. Shewhart introduced the CCs in the 1920s. The goal of a CC is to find assignable causes of shifts and to minimize the production of a large number of defective items. In practice, determining which CC is appropriate for specific data can be difficult. It is possible to figure out by looking at the distribution of the underlying data process. CCs with compound or simple rules can be computed using the run-length distribution using a Markov chain approach [1]. Based on concomitant order statistics and long-run dependence, a new bivariate semiparametric CC is proposed by Koutras et al. [2]. Integrating techniques for preventive maintenance and quality control policies with the time-between-event chart for reliability-oriented design has been investigated by He et al. [3]. Recently under different classical estimators, control charts for shape parameters of power function distributions were studied by Zaka et al. [4]. Hybrid EWMA CCs are designed to monitor process dispersion efficiently for the automotive industry [5]. For trimethylchlorosilane purification, simulations and multivariate CCs are utilized to increase the effectiveness [6].
Multivariate EWMA CCs are proposed using multiple variables based on an AIC calculation for multivariate SPC [7]. The frequency of unblinded and blinded adverse events accumulating from single and multiple trials using CCs was monitored in [8]. The ARL comparison of the confidence interval chart using MEWMA CC using the inertial impact of estimating the actual value of the mean vector (MV) has been made in [9]. Further, the economic design of the MEWMA CC using the Markov chain method, a hybrid Taguchi loss method and a genetic algorithm approach is also used to obtain the optimal design [10]. Phase-II EWMA CC has been proposed to monitor exponentially distributed process based on Type-II censored data to detect the shift in location and scale parameters [11]. For big data, Ahmad et al. [12] studied the distribution of T 2 statistics to analyze high dimensional data for SPC.
Compositional data (CoDa) vectors are positive components presented as percentages, ratios, proportions or parts of a whole. Aitchison's principles and statistical tools from [13] have been used to solve many practical problems. For more information about CoDa, readers are referred to [14] and [15]. Because the CoDa variable aggregates are limited to constant values, they can not be used the same way as standard multivariate data [16]. CoDa has applications in various fields, including chemical research surveys, geology, engineering sciences and econometric data analysis. Some of the most recent articles dealing with statistical methods and CoDa processing are discussed here. An automobile market application in which the authors model brand share of the market as a function of media investments while taking account of the brand's cost using CoDa regression has been examined [17]. The sample space's algebraic-geometric structure was introduced to adhere to the principles of sample space and the structure of CoDa [18]. Recently, Zaidi et al. [19] investigated the impact of M.E's on Hotelling T 2 CoDa CC using Average run length (ARL) as a performance measure. The error model with covariate M.E is supposed to be linear. The MEWMA CC for CoDa has been studied by Tran et al. [20]. After that, Zaidi et al. [21] examined the impact of M.E's on the MEWMA-CoDa CC's based on the ARL performance.
The CUSUM charts are well-known for detecting smaller and more frequent changes than Shewhart charts and are among the most common and widely used charts in practice [22]. Risk monitoring for CUSUM CCs with model error suggested by Knoth et al. [23]. Further, Shu et al. [24] analyzed the CUSUM accounting process for monotone population changes. It is essential to optimize several multiple objective functions simultaneously when performing multi-objective optimization and the multivariate CC method is unaffected by small and moderate variable shifts [25]. Such multivariate charts are commonly used to detect changes in process parameters around their corresponding axes, which must be illustrated in a specific direction. The MCUSUM charts for detecting shifts in MV and VCM have been discussed in [26]. A double progressive mean CC for monitoring Poisson process observations has been discussed in [27]. Furthermore, Alevizakos et al. [28] proposed a triple moving average CC to improve the chart's detection ability and found them to be more effective than the moving average and double moving average CCs in detecting small to moderate shifts.
CCs are often created with the hypothesis that the factor of the quality feature is assessed accurately. However, M.E's can occur due to various circumstances. Many researchers examined the impact of M.E's on CCs. Imprecise measurement instruments or human errors are two main causes contributing to M.E's. The presence of M.E's must be taken into account while implementing CCs because it has a huge impact on their performance [29]. A cumulative sum (CUSUM) CCs has been proposed for the drilling process to monitor the variations of discharge pulses for sensing the actual condition of the whole drilling process taking M.E's into account [30]. For a finite horizon production process, Nguyen et al. [31] suggested one-sided Shewhart-type and EWMA T 2 charts handle the ratio of two normal random variables (R.V). For the X and S 2 charts, Linna et al. [32] presented LM with covariate M.E X = A + BY + e, where e is a random error (R.E) due to measurement imprecision. The model X = Y + e to investigate the effects of M.E's on the Shewhart X chart has been proposed by Bennett [33], where Y represents the actual value of the quality characteristic, and X represents the analyzed value provided by the measurement equipment.
Typically, practitioners monitor processes without considering the possibility of M.E's. Despite sophisticated measurement techniques and methodologies, M.E's is commonly caused by operational and environmental factors. As Montgomery et al. [34] outlined, Gage capacity tests commonly quantify M.E's variability. The impacts of M.E's on the functionality of a multivariate CC for the combined monitoring of normal processes MV and VCM have been studied by Maleki et al. [35]. A thorough examination of the impact of M.E's in SPC has been conducted by Maravelakis et al. [36]. The performance analysis of univariate adaptive CCs in the existence of M.E's has been examined by Hu et al. [37]. Based on an additive covariate model, Maleki et al. [38] investigated the adverse effects of M.E's on the detection performance of triple sampling X control charts. Attribute control chart performance in the presence of M.E. to monitor the parameter of a modified power function distribution is studied by Zaka et al. [39]. Two-parameter exponential distribution with M.E.s can be monitored using a general weighted moving average CC [40]. Multivariate homogeneously weighted moving average charts were used to monitor the process mean in the presence of M.E's [41].
Although many studies on the assessment of CCs in the presence of M.E's have been conducted, few have focused on multivariate data, and very few researchers have contributed to CoDa. But as far as the authors are aware, none of them has been dedicated to MCUSUM CC for CoDa in the presence of M.E. Hence this paper aims to fill this gap and investigate the impact of M.E's on the MCUSUM-CoDa CC. The paper is organized as follows: In Section 2, the modelling and transformation of CoDa are briefly explained. Then, Section 3 introduces the LM with covariate M.E for CoDa. Section 4 details the MCUSUM-CoDa CC in the presence of M.E's, and Section 5 deals with the performance evaluation of the proposed CC. Finally, two very detailed illustrative examples are provided in Section 6, and conclusions and future research directions are presented in Section 7.

Compositional Data (CoDa)
A row vector y = (y 1 , . . . , y p ) consists of p − part composition defined on the simplex space S p , known as CoDa, where, where κ is a positive constant. There are many possible values for κ, e.g., if the data of interest is in percentages, we will use κ = 100, while if the data is in the form of probabilities or proportions, κ = 1 will be used. All the CoDa vectors are supposed to be row vectors throughout this paper. One of the major properties of CoDa is relative property; two CoDa vectors are supposed to be equivalent if they carry the same relative information regardless of the actual values. The closure function of CoDa is defined as As CoDa deals with the constraint of constant sum, Euclidean geometry is not supposed to be useful for CoDa. Let us take an example to make it more clear, Suppose there are two CoDa vectors, y = (0.2, 0.5, 0.3) ∈ S p and z = (0.1, 0.25, 0.15) ∈ S p , when we add them using Euclidean geometry, we get y + z = (0.3, 0.75, 0.45) ∈ S p . In another case, by multiplying CoDa with a scalar, we get 5 × y = (1, 2.5, 1.5) ∈ S p . So, it is concluded that using Euclidean geometry operators for CoDa is not a suitable option. To deal with CoDa, Aitchison [14] introduced a new geometry known as Aitchison geometry. These new operators defined by Aitchison are • the perturbation operator ⊕ of y ∈ S p by z ∈ S p (for adding CoDa vectors) defined as • the powering operator of y ∈ S p by a constant c ∈ R (for multiplying CoDa vectors with a scalar) defined as The constant sum constraint makes CoDa difficult to deal with in its original form, hence predefined log-ratio transformations (LRT) can be used to transform CoDa from S p to R. One of the common LRT for CoDa vector y ∈ S p is the centered log-ratio transformation (CLRT), where y G is the component-wise geometric mean of y, i.e., Another LRT for a CoDa vector y ∈ S p is the isometric log-ratio transformation (ILRT), ilr(y) = y * = clr(y)B , where for B, we have many options (see [15]). B is a matrix of size (p − 1, p). In this paper, B is taken as Also, there is inverse ILRT, which can transform the ilr-coordinates y * into the composition coordinates y. The inverse ILRT is defined as In this paper, the CoDa vectors are denoted without a " * " sign, while the transformed vectors are denoted with a " * " in the superscript.

Additive Measurement Errors Model for CoDa
Assume that there are p-part compositions y i = (y i,1 , . . . , y i,p ) ∈ S p at time i = 1, 2, . . ., where y 1 , y 2 , . . . are independent multivariate normal (MNOR) random compositions with MV μ * and VCM * , i.e., y i ∼ MNOR S p (μ * , * ). Suppose y i cannot be directly observable; instead, we use quality characteristics x i,1 , . . . , x i,m to measure y i with x i,j , j = 1, . . . , m, following the L.M with covariate M.E defined below: where a ∈ S p is a perturbation and b ∈ R is a powering constant. With ε i,j random error term follow a multivariate normal (MNOR) distribution MNOR S p (0, * M ), where * M is the known M.E's VCM. This M.E's model is defined in [32]. Following [19], the researchers advocated averaging the m measurements to minimize the impact of M.E's and keep the M.E's component's variance to a minimum. In this paper, following ( [15]), the authors define the composition MV x i at the time i = 1, 2, . . .: where a * = ilr(a) and One can refer to Appendix A for the estimation of a and b.

Multivariate CUSUM CC for Compositional Data
Assume we have y t ∼ MNOR S p (μ * 0 , * ) and y t ∼ MNOR S p (μ * 1 , * ), where the IC and OOC process mean is defined by μ * 0 and μ * 1 , respectively. The MCUSUM-CoDa CC corresponding to the LM with covariate M.E monitors the following statistic: where s t is the vector in R p−1 defined as The value of k is chosen relative to the size of the shift involved, such that k = 1 2 δ, where δ is the size of the shift in standard deviation units. This approach comes very close to minimizing the ARL 1 value for detecting a shift of size δ for fixed ARL 0 . It is necessary to note that a widely used value of k in practice is k = 1 2 . Let The MCUSUM-CoDa CC displays a signal when C t > h, where h is the upper control limit (UCL). A pre-defined IC ARL 0 is used to select the suitable value of h.
To assess the MCUSUM-CoDa CC's run-length efficiency, the authors implement a Markov chain approximation suggested by Crosier [42]. The Markov Chain model needed to measure the MCUSUM-CoDa chart's ARL is given in [43].
Lowry et al. [44] proposed ARL of MCUSUM charts depends on the parameter δ for noncentrality. Where δ can be written as When M.E's is included in the process, the non-centrality parameter is denoted by δ M and is equal to Linna et al. [32] examined the effectiveness of the shift detection of multivariate charts and found them to be more effective in detecting a shift in one direction and ineffective in detecting a shift in the other direction. For this reason, Linna et al. [32] proposed using the minimum and maximum values of δ M , i.e., δ B (minimum value of δ M ) and δ W (maximum value of δ M ). Using (7), we get where λ p−1 and λ 1 are the largest and the smallest eigenvalues of the The ARL of the MCUSUM-CoDa chart can be obtained using the Markov chain technique suggested by Lowry et al. [44].
This section aims to evaluate the performance of the MCUSUM-CoDa CC in the presence of M.E's. The M.E's VCM equals * M = σ 2 M I, where σ M is the standard-deviation M.E's, and I is the (2, 2) identity matrix. The optimal couple (k, H) that minimizes the OOC ARL for a fixed value of the shift δ M subject to a constrained value for the IC ARL is chosen first in the design of the MCUSUM-CoDa CC. The procedure for determining the optimal couple consists of three steps: • Select the most appropriate value for the IC ARL0. In this case, we choose ARL0 = 370.
• Select a particular value of k. In this case, following [46], we choose k = 0.5.
• Obtain the value of H such that ARL = ARL0 when no shift occurs, i.e., when δ = δ M = 0.
• Select the optimal value of H * from all the H values such that the OOC ARL value is minimum for a given value of the shift δ for μ * (to attain the best statistical performance).
The following six cases for the CoDa VCM * are considered: Case #I uncorrelated with equal variances * = The goal is to investigate the effect of involved parameters σ M , b, p and m to evaluate the performance of the MCUSUM-CoDa CC in the presence of M.E's for the above-defined cases.

Case I
This subsection investigates the effect of the involved parameters σ M , b and m corresponding to Case #1 (i.e., uncorrelated case with equal variances). For shifts δ ∈ [0, 2], different ARL values are given in Table 1 for the MCUSUM-CoDa CC without and in the presence of M.E's. The different ARL curves are presented in Fig. 1. From Table 1, we can conclude:

Effect of Number of Variables p
This subsection investigates the effect of the number of variables involved p for all 6 cases mentioned above. The values of * corresponding to the Cases #1 to #6 are given in Table 7, when p = 3, p = 5 and p = 10 are considered.
For shifts δ ∈ [0, 2], different ARL values are given in Table 8 for the MCUSUM-CoDa CC in the presence of M.E's. The ARL curves for different values of shift for all the six cases are also presented in Figs. 7 and 8. From Table 8, we can conclude:

Practical Implementation of the Proposal in the Manufacturing Scenario
For the proposal implementation, two illustrated examples have been incorporated in this study: one involving manufacturing uncoated aspirin tablets with the composition of aspirin, microcrystalline cellulose, talcum powder and magnesium stearate and the other is the monitoring of machines that are responsible for the percentages of three components (whole-grain cereal, dried fruits, and nuts) in muesli production.

Proposal Implementation Related to Aspirin Manufacturing Process
Pharmaceutical manufacturing is the process of synthesizing pharmaceutical medications on a large scale as part of the pharmaceutical business. The medication production process may be divided into several unit activities, such as milling, granulation, coating, tablet pressing, etc. Here the authors simulated an example of a company producing uncoated aspirin tablets. The base parameters have been taken from [47]. According to Tiwari et al. [47], the composition of the tablet consists of 150 mg of aspirin, 60 mg of micro crystalline cellulose, 20 mg of talcum powder and magnesium stearate. The company wants to check the measuring unit in charge of measuring raw material (i.e., estimate a, b and M ). For this purpose, the firm cautiously prepared k = 4 samples of aspirin and measured them n = 10 times each. The data on manufactured aspirin and the ARL values are given in Table 9 and Fig. 9. By using the data in Table 9, the estimated values ofâ,b andˆ M have been obtained. The results of estimation are as follows: •â = (0.339, 0.327, 0.334), •b = 1.074, •ˆ M = 0.001 0.001 0.001 0.009 .  The estimated values of μ 0 and 0 of control parameters, 20 sub batches of aspirin were sampled and are measured thrice each. The simulation results are in Table 10 and Fig. 10. Using the data in Table 10, one can easily obtain: • μ x = (1.376, 0.719) •ˆ x = 0.0009 0.0024 0.0024 0.0219 .    The results of the MCUSUM-CoDa charts are also presented in Table 10. Fig. 11 also shows the values of the MCUSUM-CoDa chart along with UCL. In Phase-II of the production process, 20 batches of aspirin have been produced and sampled three times each. These values are shown in Table 11, and Fig. 12 also shows the MCUSUM-CoDa chart and UCL values. It can be seen from Fig. 13 that the process was IC till sample #3, but sample #4 went OOC because of an error. Hence the process stops at sample #4 to study the reason for the OOC point and restarts again after fixing the error, and it is shown in Fig. 12 that after fixing the error, the process remains IC throughout all the samples.

Proposal Implementation Related to the Muesli Manufacturing Process
For the second example, the authors are using the same example used in [19] and [21] of a company that manufactures breakfast muesli with (A) 66 percent of cereals, (B) 24 percent dried fruits, and (C) 10 percent nuts in every 100 grams. The company has decided to begin by calibrating the measurement device used to determine the percentage of each component in the manufactured muesli (i.e., estimate a, b and * M ). The company meticulously prepared k = 4 samples with perfectly-known percentages for each component and measured them m = 7 times. Table 12 shows the results obtained and the ilr transformed values to perform this calibration. y * i = ilr(y i ) and x * i,j = ilr(x i,j ). The following estimates are obtained using Table 12     Then, using (10) and (11), we have  Table 13 and plotted in Fig. 15 with the UCL = H = 9.715. One can see that the process is IC, as all the values in Fig. 15 are smaller than the UCL.  Table 14 and plotted in Fig. 17 with the UCL = H = 9.715 obtained in Phase-I.    The process seems to be IC up to sample #14, but sample #15 and sample #16 are OOC. Then, in samples #15, #16 and #17, it is discovered that a hatch malfunction had caused a sudden drop in the quantity of whole-grain cereals, causing a shift fromμ *  The CCs are an effective statistical process monitoring tool frequently used to assess the stability of manufacturing processes. They monitor the industrial process by giving practitioners an early signal about an OOC process to keep the process IC, but M.E negatively impacts the performance of CCs in quality control applications. In this paper, the authors investigate the impact of M.E on the MCUSUM-CoDa CC's performance. The identity matrix assumption has been incorporated into the diagonal matrix in the LM with covariate M.E. This makes it easier to see how the M.E effects the proposed chart's performance. Six cases have been considered for the VCM of CoDa. Different combinations of the involved parameters were investigated to inspect the effect of the parameters on the CC. The following are some key findings of the study from this research. For all six cases, (i). when we increase the values of σ M , the OOC ARL also indicates an increase (ii). an increase in the value of m imposes a slight decrease in the ARL values (iii). an increase in the value of b imposes a significant decrease in the ARL values (iv). an increase in the values of p imposes a significant increase in the ARL. All these findings are relevant to the existing studies in the literature. In the end, two illustrated examples are given to demonstrate the implementation of the proposed CCs in the presence of M.E. One is based on the manufacturing process of uncoated aspirin tablets, and the other is based on monitoring the machines used for manufacturing the muesli. For future works, the proposed chart can be studied with a variable sampling interval or variable sample size scheme. Also, the effect of estimated parameters can be seen on the previously defined CCs for CoDa.
Data and Code Availability Statement: All data, models, and codes supporting the results of this study are accessible on reasonable request from the corresponding author.
Ethics Approval: This manuscript has not been simultaneously submitted to more than one journal. So the proposed work is original and has not been published anywhere else in any form or language.
Consent to Participate and Publish: Consent was obtained from all authors for their participation in completing this manuscript and their consent to publish it.
Author's Contribution: All the authors contributed to this paper. where I is the (p, p) identity matrix. Ordinary least square method has been used to compute the (d + 1, 1) column vectorĉ for minimizing the sum of square of residuals that is, Then, an estimation of a will be the first p components ofĉ = c 1 , c 2 , . . . , c p , c p+1 , while estimation of b will be the last component (i.e., p + 1). We get the estimation of aŝ

Appendix B
Let D be a symmetric matrix of size (p, p) and E > 0 be a positive definite matrix of same size. Suppose y max is the row vector of size (1, p), satisfying y max = argmin y (yGy ), subject to the constraint y max Ey max = a > 0.
Following Mardia 1979, the author also used c = 1.

Appendix C
The