Identification of lncRNAs associated with the progression of acute lymphoblastic leukemia using a competing endogenous RNAs network

Acute lymphoblastic leukemia (ALL) is a malignancy of bone marrow lymphoid precursors. Despite effective treatments, the causes of its progression or recurrence are still unknown. Finding prognostic biomarkers is needed for early diagnosis and more effective treatment. This study was performed to identify long non-coding RNAs (lncRNAs) involved in ALL progression by constructing a competitive endogenous RNA (ceRNA) network. These lncRNAs may serve as potential new biomarkers in the development of ALL. The GSE67684 dataset identified changes in lncRNAs and mRNAs involved in ALL progression. Data from this study were re-analyzed, and probes related to lncRNAs were retrieved. Targetscan, miRTarBase, and miRcode databases were used to identify microRNAs (miRNAs) related to the discovered genes and lncRNAs. The ceRNA network was constructed, and the candidate lncRNAs were selected. Finally, the results were validated with reverse transcription quantitative real-time PCR (RT-qPCR). The ceRNA network outcomes demonstrated that the top lncRNAs associated with altered mRNAs in ALL are IRF1-AS1, MCM3AP-AS1, TRAF3IP2-AS1, HOTAIRM1, CRNDE, and TUG1. Investigations of the subnets linked to MCM3AP-AS1, TRAF3IP2-AS1, and IRF1-AS1 indicated that these lncRNAs were considerably related to pathways associated with inflammation, metastasis, and proliferation. Higher expression levels of IRF1-AS1, MCM3AP-AS1, TRAF3IP2-AS1, CRNDE, and TUG1 were found in ALL samples compared to controls. The expression of MCM3AP-AS1, TRAF3IP2-AS1, and IRF1-AS1 is significantly elevated during the progression of ALL, playing an oncogenic role. Due to their role in the main cancer pathways, lncRNAs could be suitable therapeutic and diagnostic targets in ALL.


Introduction
One of the most prevalent malignancies in children and adolescents is ALL, and substantial research on its related molecular characteristics has been conducted [1,2]. Studies show that the origins of ALL cancer cells are precursors of B and T cells in the bone marrow [3]. Molecular studies in ALL have revealed that mutations, epigenetic alterations, and deregulation of gene expression are all important contributors to the development and malignancy of the disease [4,5]. Despite significant advances in the treatment of ALL, relapse remains a significant therapeutic obstacle. Understanding the molecular mechanisms that contribute to the onset and progression of ALL is, therefore, crucial.
Long non-coding RNAs are a subclass of non-coding RNAs that have been implicated in a variety of biological functions. Extensive research on these RNAs has revealed their potential significance in cancer through modulating pathways including cell proliferation, invasion, migration, metastasis, and major cellular metabolisms [6,7]. Changes in the expression of lncRNAs have been widely reported in various cancers, including ALL. It has been demonstrated that these deregulations can be excellent diagnostic and prognostic targets [8,9]. Moreover, the expression alterations of lncRNAs in the development and malignancy of ALL, as well as their oncogenic and tumor suppressor effects, have been established [10]. Additionally, studies have shown that lncRNAs can affect the post-transcriptional regulation and expression of many genes through interacting and sponging miRNAs. Studies also suggested that lncRNAs contribute to the pathophysiology of ALL by controlling the activity of miRNAs [11,12]. Therefore, lncRNAs can be suitable therapeutic and diagnostic candidates in many diseases, including ALL.
Although extensive studies have been performed on the role and changes of lncRNA expression in ALL, the function and expression of many of them remain unknown. This study aimed to identify lncRNAs that are significantly altered during the development and pathogenesis of ALL through the ceRNA network. For this purpose, a microarray study with access number GSE67684 was analyzed to identify lncRNAs and mRNAs with significant expression changes in ALL. The miRNAs related to lncRNAs and mRNAs were identified using two databases. Next, the ceRNA network was constructed, and the lncRNAs with the highest criteria were selected. Finally, the in-silico results were evaluated using the RT-qPCR technique.

Materials and Methods
Differential gene expression analysis To identify altered lncRNAs and mRNAs in ALL, the raw data of GSE67684 were downloaded from Gene Expression Omnibus (GEO) and processed. Initial data preprocessing such as background correction, data normalization based on the Robust Multi-array Average (RMA) approach, removal of genes with zero expression levels, and data conversion to binary logarithmic form were conducted [13,14]. The data in this study were based on two different platforms, of which only GPL570 (Affymetrix hgu133Plus2) was considered. The study is based on 381 bone marrow samples of ALL patients collected at diagnosis (day zero) and day eight of remission-induction therapy. The start and the end of each gene, their corresponding chromosome bands, and their biological function were retrieved based on Affymetrix HGU133Plus2 identifiers using the Biomart package. Subsequently, the probes associated with lncRNAs and mRNAs were identified [15]. Further, the mean expression of each probe was used to investigate the differentially expressed mRNAs (DEMs) and lncRNAs (DELs) in each group. After normalization, the differences in the expression of all genes, including mRNAs and lncRNAs were evaluated in day eight patients compared to day zero patients. The expression of mRNA and lncRNAs in day eight samples was compared to day zero considering |logFC| > 1 and FDR 0.01.

Construction of ceRNA network
To identify miRNAs that can potentially interact with DELs, miRcode data were used, and the association between DELs and miRNAs was evaluated [16]. This database contains more than 10,000 lncRNA-miRNA interactions. Targetscan (www.targetscan.org) and miRTarBase (http://miRTarBase. cuhk.edu.cn) datasets databases were also utilized to obtain miRNAs that can bind to DEMs, MicroRNAs validated by both databases were selected for further processing. Furthermore, Overlapping miRNA-mRNA interactions were removed from the analysis process. Subsequently, the identified interactions between miRNA-lncRNA and miRNA-mRNA were used to construct the ceRNA network. Finally, the miRNA-mRNA-lncRNA network was constructed using obtained results from previous steps and plotted by Cytoscape software [17].

Data enrichment
In order to discover the functions of lncRNAs obtained from the ceRNA network, enrichment analysis was performed using the Molecular Signatures Database (MsigDB) repository from the Enrichr database (maayanlab. cloud/Enrichr). The mRNAs associated with candidate lncRNAs were used for enrichment [18]. Finally, Benjamini-Hochberg corrected p-value of less than 0.01 was considered for the enriched pathways.
Key subnets associated with candidate lncRNAs To investigate the role and possible pathways in which each identified lncRNA could be involved, subsystem analysis and assessment of DEMs associated with candidate lncRNAs were used. For this purpose, lncRNAs and related miRNA-mRNAs were extracted separately. Then, the subnetworks were constructed and illustrated by Cytoscape software. Finally, the candidate genes associated with each lncRNA were enriched based on the MsigDB repository.

Sample collection and classification
Blood samples were collected from 33 patients and 15 healthy donors (age range of 10-19 years) referred to the research institute for oncology, hematology, and cell therapy from 2019 to 2021. Samples were then transferred to the laboratory for cytogenetic, flow cytometric, and molecular studies. Based on flow cytometry and molecular findings, all samples were divided into three groups: Philadelphia positive ALL (ph+; N = 9); Philadelphia negative ALL (ph-; N = 17); and T cell ALL (N = 7). Informed consent was obtained from either subjects or legal guardians of minors. The laboratory tests were conducted in accordance with the ethical guidelines and regulations of the Iranian Ministry of Health and Medical Education. This work was approved by the ethical review board with an access number of IR. TUMS.VCR.REC.1399.313. The samples were kept in liquid nitrogen until use.
RNA extraction, cDNA synthesis, and RT-qPCR Peripheral blood mononuclear cells (PBMCs) were isolated from whole blood samples using a Ficoll gradient (Sigma Aldrich, Germany) method and according to the manufacturer's protocol. Total RNA extraction was executed by the TRIzol (Invitrogen, Canada) method. Extracted RNA was treated with DNase (Takara, China) to remove any possible DNA contamination. The AddScript cDNA synthesis kit (AddBio, Korea) was then used to generate cDNA. In order to evaluate the expression of candidate lncRNAs, The RT-qPCR was performed on a 96-well system (AusDiagnostics, Australia) using RealQ Plus 2x Master Mix Green (Ampliqon, Denmark). The sequence of specific primers designed for each lncRNA is summarized in Table 1. The primers were designed using the NCBI blast tool (https://www.ncbi.nlm.nih.gov/tools/primer-blast), and the specificity of each primer was considered. The expression of each gene in each sample was calculated based on 2 −▵Ct , relative to the internal control GAPDH gene [19].

Statistics and software
The R programming language (version 4.1; PBC, Boston, MA, USA) was used to process the GEO dataset, and other analyses (version 4.1). A One-way ANOVA is used to assess statistically significant differences between the means of the two groups. The t-test was conducted to identify statistically significant DELs and DEMs using the p-value and adjusted p-value. Cytoscape software (version 3.9; Boston, MA, USA) was used to construct the ceRNA network, and a significance level of p < 0.05 was considered for each statistical test. GraphPad Prism (version 8.0; La Jolla, CA, USA) software was also used to draw the graphs.

Identification of mRNAs and lncRNAs
Genes with zero or near-zero expression were first excluded from further analysis. As a result, 9189 mRNAs and 423 lncRNAs with high probe intensities in ALL of the GSE67684 samples were identified. Then, the data normalization was performed as shown in Fig. 1A. Differential gene expression between day zero and day eight showed that 714 mRNAs increased and 379 mRNAs decreased; |logFC| > 1, and FDR < 0.05 (Fig. 1B). The full list of deregulated genes is summarized in "Suppl. Data 1". On the other hand, 52 lncRNAs showed significant alterations in their expression. Twenty-one lncRNAs showed elevated expression on day eight compared to day zero samples, whereas 31 lncRNAs showed decreased expression (Fig. 1C, |logFC| > 1, FDR < 0.05).

CeRNA network of lncRNAs involved in ALL development and related-pathway
The most important function of lncRNAs is that they act as a sponge for miRNAs. The ceRNA network was constructed using lncRNAs and mRNAs identified in the previous step. The findings revealed that 10 of the 52 lncRNAs; C10ORF25, MIR22HG, IRF1-AS1, C4ORF46, MCM3AP-AS1, HAR1A, TRAF3IP2-AS1, HOTAIRM1, CRND1, CRNDE, and TUG1, were linked with 83 miRNAs based on miRcode data. The miRNA and mRNA interaction results were evaluated based on the Targetscan and miRTarBase databases. As a result, 19 miRNAs were identified as being associated with 356 common mRNAs in both databases. Unrelated candidates were excluded from the study process. The results from the previous processes were merged to create a ceRNA network, which was then used to identify significant lncRNAs. This network contained 6 lncRNA nodes, 17 miRNA nodes, 196 mRNA nodes, and 369 edges ( Fig. 2A). The results showed that six lncRNAs, namely IRF1-AS1, MCM3AP-AS1, TRAF3IP2-AS1, HOTAIRM1, CRNDE, and TUG1, interact with many miRNAs targeting deregulated mRNAs in ALL ( Fig. 2A and "Suppl. Data 2"). Enrichment results for the mRNAs in the ceRNA network showed that these are involved in inflammation through TNF-alpha, hypoxia, DNA repair (UV Response Dn), apoptosis, metastasis, proliferation (Mitotic Spindle), and other pathways related to mutation (Fig. 2B).
Association of IRF1-AS1, MCM3AP-AS1, and TRAF3IP2-AS1 subnets to cancer progression pathways Previous studies have shown the association of HOTAIRM1, CRNDE, and TUG1 expression in the pathogenesis of   various cancers, including ALL [20,21]. Hence, we decide to investigate the association of IRF1-AS1, MCM3AP-AS1, and TRAF3IP2-AS1 expression and their possible role in ALL, to which less attention has been paid. To better understand the biological functions of the three lncRNAs, subnetworks were built for each of them, and genes associated with each lncRNA were enriched. The results of the IRF1-AS1 associated subnet showed that genes connected to IRF1-AS1 are involved in pathways such as TNF-alpha signaling via NF-κB, apoptosis, DNA repair, and hypoxia (Figs. 3A and 3B, FDR < 0.01).
The MCM3AP-AS1 subnet enrichment was observed to be associated with genes related to TNF-alpha signaling via NF-κB, hypoxia, IL-2/STAT5 signaling, and DNA repair (Figs. 4A and 4B, FDR < 0.01). These results suggest that genes linked to each lncRNA are involved in vital biological processes associated with cancer cells. Therefore, they can play a vital role in the molecular pathogenesis of ALL.
These results indicate that the expression of IRF1-AS1, MCM3AP-AS1, TRAF3IP2-AS1, CRNDE, and TUG1 in ALL samples is significantly altered. Next, the relationship  between the expression changes of lncRNAs and clinical pathological features was investigated. However, no significant relationship was found.

Discussion
Different molecular studies have shown that lncRNAs can play a significant role in the pathogenesis of ALL through controlling many cellular processes [22]. Therefore, it is vital to identify lncRNAs involved in the development and progression of ALL. Given that, lncRNA inhibitors have entered the second phase of the clinical trial and have raised therapeutic hopes [23]. This study aimed to identify essential lncRNAs in ALL molecular pathogenesis as well as their mechanisms of action via the ceRNA network. For this purpose, using GEO data, we first identified lncRNAs and mRNAs whose expression changes dramatically during disease progression. Then, the associated ceRNA network and its signaling pathways were identified using bioinformatics tools. Finally, RT-qPCR was used to assess the expression of candidate lncRNAs in ALL samples.
The results of differential expression of lncRNAs in ALL samples showed that the expression of 52 lncRNAs changed significantly. Six lncRNAs were linked to many newly discovered miRNA-mRNA axes during ALL progression, including IRF1-AS1, MCM3AP-AS1, TRAF3IP2-AS1, HOTAIRM1, CRNDE, and TUG1. Therefore, it is suggested that these 6 lncRNAs can play a more prominent role in the development of ALL in the miRNA-mRNA-lncRNA axis.
Additionally, the ceRNA network shows that these lncRNAs might have roles in the progression of ALL. According to the in silico study, except for HOTAIRM1, all showed decreasing expression levels on day eight compared to day zero. However, RT-qPCR data revealed the opposite. The most important reason for this divergence could be that the GSE67684 study employed an ALL-BFM treatment for patients during this period, which is an effective chemotherapy regimen in pediatric ALL patients and young adults [24,25]. Furthermore, the absence of normal subjects in the GSE67684 investigation prevented us from determining the trend of the discovered lncRNAs during the progression of ALL. A previous study has shown that HOTAIRM1 expression is increased in acute myeloid leukemia, and its expression level is associated with the prognosis of patients [26]. HOTAIRM 1 also has specific expression in blood cell precursor lines, and changes in its expression can affect apoptosis and cell proliferation in myeloid cell lines [27]. HOTAIRM1 regulates the activity of the WNT pathway [27], which plays a vital role in the development and malignancy of ALL [28]. On the other hand, it has been demonstrated that CRNDE expression is upregulated in ALL and that this upregulation can trigger apoptosis and inhibit the growth of ALL cell lines. Additionally, CRNDE can contribute to the development of ALL via the ceRNA network in the CREB protein axis [20]. Also, CRNDE promotes cancer cell proliferation, migration, and invasion and suppresses apoptosis through complicated mechanisms, that result in the initialization and development of human cancers [29]. TUG1 is significantly increased in ph-ALL compared to the control group, and its expression can be a potential prognostic biomarker in ALL [21]. It has been reported that TUG1 can activate the MAPK1/ERK pathway through the ceRNA network, which is crucial for the pathogenesis of acute myeloid leukemia [30]. Numerous studies show that TUG1 controls drug resistance, cell differentiation, invasion, metastasis, apoptosis, tumor growth, and cell metabolism in various types of cancer [31]. Studies in other cancers have shown that MCM3AP-AS1 can play an oncogenic role in liver, breast, colorectal, lung, and thyroid cancers [32][33][34][35][36]. Our ex vivo results also showed that the expression of IRF1-AS1, MCM3AP-AS1, TRAF3IP2-AS1, CRNDE, and TUG1 in ALL samples was significantly increased compared to the control group. HOTAIRM1 expression, on the other hand, decreased in ALL samples. We have shown for the first time that the expression levels of IRF1-AS1, MCM3AP-AS1, and TRAF3IP2-AS1 are significantly increased in ALL patients and the mentioned lncRNAs could be involved in the progression and malignancy of ALL. We, therefore, suggest that IRF1-AS1, MCM3AP-AS1, and TRAF3IP2-AS1 could be appropriate therapeutic and diagnostic targets in ALL.
The subnet analyses showed that IRF1-AS1, MCM3AP-AS1, and TRAF3IP2-AS1 are associated with genes involved in TNF-alpha signaling via NF-κB, hypoxia, mitotic spindle, IL-2/STAT5 signaling, DNA repair, and epithelialmesenchymal transition. A previous study has illustrated that decreased expression of MCM3AP-AS1 in liver cancer cell lines can induce apoptosis through the ceRNA network [35]. In breast cancer, MCM3AP-AS1 has been reported to induce apoptosis through sponging microRNAs [32]. In addition, extensive studies have shown that the previously mentioned pathways play essential roles in the pathogenesis of ALL, and are among the main pathways of cancer cells [37][38][39][40][41]. These findings suggest that the expression changes of IRF1-AS1, MCM3AP-AS1, and TRAF3IP2-AS1 can play an oncogenic role in ALL. This oncogenic role is achieved through regulating TNF-alpha signaling via NF-κB, hypoxia, the mitotic spindle, IL-2/STAT5 signaling, DNA repair, and epithelial-mesenchymal transition pathways. The IRF1-AS1, MCM3AP-AS1, and TRAF3IP2-AS1 subnetworks also revealed that these lncRNAs could sponge hsa-miR-144, which has previously been shown to regulate cell proliferation and apoptosis in ALL cell lines [42].
On the other hand, studies have shown that hsa-miR-155 can regulate immune and inflammatory responses in ALL, and our results showed that this miRNA can interact with MCM3AP-AS1 and inflammation-related genes [43]. Interestingly, the ceRNA network of this study showed that hsa-miR-223 could interact with MCM3AP-AS1, and hsa-miR-22 target genes are involved in NF-κB pathways. It has been shown that this miRNA can regulate the NF-κB pathway in ALL [44].
One of the limitations of this study was the lack of access to data on days 15 and 30 in the GSE67684 study, which did not allow further investigation of the changes in the expression of this lncRNA after treatment on the above days. However, we could not conduct a more accurate analysis due to a lack of samples in various subtypes. It should also be noted that these results need to be further confirmed in vitro in other subtypes to find out how MCM3AP-AS1, TRAF3IP2-AS1, and IRF1-AS1 cause each ALL subtype at the molecular level.

Conclusion
According to in silico and ex vivo studies, MCM3AP-AS1, TRAF3IP2-AS1, and IRF1-AS1 expression increased in ALL samples, implying that they may have oncogenic functions. Our findings further suggest that the lncRNAs indicated have a role in the pathogenesis of ALL by regulating pathways including TNF-alpha signaling via NF-κB, hypoxia, mitotic spindle, IL-2 and STAT5 signaling, DNA repair, and epithelial-mesenchymal transition via the ceRNA network. MCM3AP-AS1, TRAF3IP2-AS1, and IRF1-AS1 expression could be identified as potential therapeutic and diagnostic targets in ALL patients.
Availability of Data and Materials: The processed raw data in this study is publicly available on GEO database (https:// www.ncbi.nlm.nih.gov/geo/) with accession number of GSE67684. The datasets used and/or analyzed during the current study are available to the corresponding author upon request.
Ethics Approval: All experiments were performed by relevant guidelines and regulations and all bioethical issues were reviewed and confirmed by the review board according to the criteria of the ministry of health and medical education of Iran with IR.TUMS.VCR.REC.1399.313 accession code. Informed consent was obtained from all the donors before the collection of samples.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.