Get help from the best in academic writing.

Screening for Biomarkers of Aging

Identification of biomarkers for aging based on DNA microarray data
Highlights:
Totally, 43 time series-related lncRNAs were screened.
A total of 11 clusters of 41 lncRNAs were identified.
CYP51 and FDPS were mainly enriched in pathway of cholesterol biosynthesis.
MFAP3 and MFAP5 were significantly enriched in pathway of elastic fibre formation.
Abstract
Background: The age-related disorders including cancers, chronic inflammatory and neurodegenerative diseases become a burden on health care provision in the developed countries. The objective of this study was to screen for possible lncRNAs and target genes of aging and to explore the mechanisms of aging. Methods: GSE25905 was downloaded from the Gene Expression Omnibus database. In this analysis, 3 samples of gene microarray data (peripheral white adipocytes isolated from male C57BL/6J mice of 6 months, 14 months and 18 months of age) with 3 replicates were obtained. Identification of differentially expressed lncRNAs and mRNAs were performed at three time points. Then, lncRNA target genes were predicted. Subsequently, cluster analysis of lncRNAs expression pattern was performed, following by the functional analysis for positive- and negative-regulation target genes of lncRNAs. Results: A total of 8301 time series-related mRNAs and 43 time series-related lncRNAs were identified in peripheral white adipocytes samples. Additionally, CYP51 (lanosterol 14-demethylase) and FDPS (farnesyl diphosphate synthase), the positive-regulation potential target genes of lncRNAs, were mainly enriched in pathway of cholesterol biosynthesis. Moreover, MFAP3 (microfibrillar associated protein 3) and MFAP5 were significantly enriched in pathway of elastic fibre formation. However, the negative-regulation potential target genes of lncRNAs were mainly enriched in pathways such as metabolism of proteins. Conclusion: CYP51, FDPS, MFAP3 and MFAP5 may be pivotal genes for the process of aging.
Key words: aging; long non-coding RNAs; target genes; Gene Ontology; pathway
Introduction Aging is related with damaged adipogenesis in various fat depots in humans [1, 2]. In addition, aging is connected with increased generation of pro-inflammatory signals in visceral white adipose tissue (WAT)[3]. Currently, about 800 million people are at least 60 years old, which accouts about 11% of the world’s population; by 2050, aging population is expected to increase to more than 2 billion, representing 22% of the population [4]. Moreover, aging remains an elevated risk of common diseases, including hypertension, atherosclerosis and diabetes [5, 6]. Notably, WAT is considered as an important regulator for multiple physiological processes and highly linked to the development of multiple morbidities [7-9]. Therefore, understanding the aging-adipose interactions is very important for understanding the basis of disease in the elderly.
Several studies have exhibited some genes that are implicated in aging process in an adipose depot-dependent manner. For example, age-related increase in IL-6 (interleukin 6), which was related to stress responses and cellular senescence, was observed in a fat depot-dependent manner, [1]. Sirt1 (sirtuin 1) and SOD2 (superoxide dismutase 2), which were correlated with mitochondrial aging, were significantly reduced in epididymal adipocytes with age [10]. Additionally, the expression of MMP-3 (matrix metallopeptidase 3 (stromelysin 1, progelatinase)) was increased in mouse subcutaneous fat cells and human skin fibroblasts with aging [11, 12]. Furthermore, decreased expression in PPAR? (peroxisome proliferator-activated receptor gamma) through declining fat mass has been observed in monkey subcutaneous whole fat tissue [13]. Cartwright et.al reported that the levels of adipogenic transcription factors, such as C/EBPa (CCAAT/enhancer binding protein (C/EBP), alpha), C/EBP? and PPARg (peroxisome proliferator-activated receptor gamma), were lower in differentiating adipocytes isolated from old than that of young rats [14]. Krishnamurthy et.al also have demonstrated that the expression of the Ink4a/Arf tumor suppressor locus is a robust biomarker and potential effector of mammalian aging [15]. In addition to these genes mentioned above, long non-coding RNAs (lncRNAs), which are defined as largest transcript class in human genome longer than 200 bp that lack protein-coding potential[16, 17], may play a key role in a variety of biological cellular processes and diseases development [18, 19]. In spite of much effort, the lncRNAs with known functions remains rare. Thus, efficient prediction of lncRNAs functions is still a considerable challenge.
The expression profile GSE25905 [20] was offered by Liu et al. who analyzed differentially expressed genes (DEGs) in bone marrow adipocytes and epididymal adipocytes and determined the effects of aging on genes associated with mitochondria function and inflammation in bone marrow adipocytes. However, the effects of lncRNAs on aging were not performed.
Therefore, in the current study, we performed an extensive analysis using the bioinformatics methods to identify the lncRNAs and explore the molecular alteration in the process of aging. Moreover, functional analysis of target genes of lncRNAs was carried out. The results might provide a deeper insight into the development of aging.
2. Methods and materials 2.1. Tissue samples and data acquisition
The gene expression profile was downloaded at the National Center of Biotechnology Information (NCBI) Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) database , which was accessible through GSE25905 [20]. The samples were based on GPL6246 platform of ([MoGene-1_0-st] Affymetrix Mouse Gene 1.0 ST Array. In this analysis, 3 samples of gene microarray data (peripheral whiteadipocytes isolated from male C57BL/6J mice of 6 months, 14 months and 18 months of age) with 3 replicates were obtained.
2.2. Data preprocessing and profiling of long non-coding RNAs (lncRNAs)
The gene expression profile of GSE25905 was preprocessed by the Affy package [21] provided by Brain Array Lab. Expression data of probe in CEL document were processed to corresponding genes according to the annotation of GPL6246 platform, and normalization was carried out using the robust multiarray average (RMA) algorithm [22]. Then, the expression matrix was obtained. The expression values of multi-probes probes for a given gene were reduced to a single value by computing the average expression value.
Then, lncRNAs were obtained from the authoritative data sources of GENCODE (http://www.gencodegenes.org/) [23].
2.3. Identification of differentially expressed lncRNAs and mRNAs at three time points
The BETR (Bayesian Estimation of Temporal Regulation) algorithm of BETR package [24]was applied to identify differentially expressed lncRNAs and mRNAs at three time points, which calculated the probability of differential expression for each lncRNA and gene. The probability > 0.9 was selected as the criteria.
2.4. LncRNA target prediction
Differentially expressed lncRNAs were chosen for target prediction in order to determine whether lncRNAs might play roles via regulating the corresponding mRNAs. Pearson correlation coefficient (PCC) was performed to calculate the expression similarity of lncRNAs and mRNAs at different time series. For each pair of lncRNA-mRNA, significant correlation pairs with| PCC | more than 0.95 were used to construct the lncRNA-mRNA regulatory network displayed by Cytoscape [25].
2.5 Cluster analysis of lncRNAs expression pattern
Hierarchical clustering [26, 27] is an analytical tool applied to discover the closest associations between gene profiles and specimens under evaluation. In our study, to analyze the changes of lncRNAs expression pattern, the BHC (Bayesian Hierarchical Clustering) package [28] was preformed to construct the cluster heat map of lncRNAs and samples.
2.6 Functional analysis
Gene Ontology (GO) and pathway functional analysis for positive- and negative-regulation target genes of lncRNAs above were carried out using TargetMine (http://targetmine.nibio.go.jp) [29] which was a successful approach towards the identification of candidate genes for further investigation. The adjusted p value < 0.05 was considered statistically significant which was calculated by the Holm-Bonferroni [30] method.
3. Results 3.1. Data preprocessing and profiling of lncRNAs
Based on the annotation information of GPL6246 platform, a total of 203 probes were identified and annotated as lncRNAs. Moreover, 20564 genes were screened. The data before and after normalization were shown in Figure 1A and 1B.
3.2 Identification of differentially expressed lncRNAs and mRNAs at three time points
Based on the probability > 0.9, 8301 time series-related mRNAs and 43 time series-related lncRNAs were identified in peripheral white adipocytes samples.
3.3 LncRNA target prediction
As shown in Figure 2, the regulatory network of 41 lncRNAs and corresponding to mRNAs was constructed, which was involved in 1880 genes and 2313 regulatory pairs.
3.4 Cluster analysis of lncRNAs expression pattern
To further explore the changes of the lncRNAs expression levels at three time points in peripheral white adipocytes, we performed the cluster analysis. Our results demonstrated that the expression values of most lncRNAs were higher in peripheral white adipocytes isolated from male C57BL/6J mice of 14 months old than that of 6 and 18 months old. The cluster heat map of 41 lncRNAs was shown in Figure 3.
According to the results of clustering analysis, 11 clusters were identified. The cluster heat map of 11 clusters (Figure 4) presented a decline trend of lncRNAs expression at three time points.
3.3 Functional enrichment analysis
We used the TargetMine to identify GO enriched functions and pathways for positive- and negative-regulation potential target genes of lncRNAs. The positive-regulation potential target genes of lncRNAs were mainly enriched in biology process such as vasculature development and pathways such as cholesterol biosynthesis as well as elastic fibre formation (Table 1). The negative-regulation potential target genes of lncRNAs were mainly enriched in biology process such as metabolic process and pathways such as metabolism of proteins (Table 2).
Discussion The increased occurrence of cancers, chronic inflammatory and neurodegenerative diseases related with age becomes a burden on health care provision in the developed countries [31]. In this study, gene expression profile GSE25905 was downloaded and investigated to explore the potential mechanisms of aging applying bioinformatics methods. A total of 8301 time series-related mRNAs and 43 time series-related lncRNAs were identified in peripheral white adipocytes samples. Additionally, CYP51 (lanosterol 14-demethylase) and FDPS (farnesyl diphosphate synthase), the positive-regulation potential target genes of lncRNAs, were mainly enriched in pathway of cholesterol biosynthesis. Moreover, MFAP3 (microfibrillar associated protein 3) and MFAP5 were significantly enriched in pathway of elastic fibre formation.
A former study has demonstrated that aging is associated with altered cholesterol metabolism in T cells, causing increased cholesterol levels in lipid rafts [32]. Other researches also identified several aging-dependent up-regulated processes, such as cholesterol transport , lipid catabolism and proteolysis in normally aging rats [33, 34]. In the present study, CYP51 and FDPS, the positive-regulation potential target genes of lncRNA, was significantly enriched in cholesterol biosynthesis. Moreover, the expression of CYP51 and FDPS were down-regulated with aging. CYP51, the most evolutionarily conserved member of CYP (cytochrome P450) gene superfamily, participates in the late portion of cholesterol biosynthesis [35]. Moreover, cholesterol biosynthesis is mediated via the SREBPs (sterol regulatory element binding protein transcription factors) which are regarded as the key elements in controlling cellular cholesterol homeostasis [36]. Notably, the co-regulatory of SREBPs and cAMP-dependent pathway is of great importance for maintaining the cellular cholesterol level [37]. The network of insulin/insulin-like growth factor 1, AMP-activated protein kinase/target of rapamycin and cAMP/PKA pathways modulates the organismal lifespan [38-40]. FDPS, encoded by FDPS, is a crucial enzyme in the isoprene biosynthetic pathway, which offers the cell with cholesterol. Besides, FDPS was observed to be involved in cholesterol biosynthesis in aging peripheral nervous system [41]. Therefore, we infer that CYP51 and FDPS might provide some support for a role of further cholesterol-related genes in aging.
Another significant pathway, elastic fibre formation was identified involved in MFAP3 and MFAP5 which were down-regulated with aging. Elastic fibre is a major insoluble extracellular matrix that ensures connective tissues with resilience, allowing long-range deformability as well as passive recoil and these properties are of significant importance to the function of arteries, lungs, skin and other dynamic connective tissues [42]. However, the loss of elasticity is a main contributing factor in aging [43]. In ageing and immune states, microfibrils are related with amyloid deposits and the accumulation of adhesive glycoproteins [44]. MFAP3 and MFAP5 are two members of microfibril-associated proteins. MFAP-3 and elastic fibres colocalise in skin and other tissues [45]. MFAP5 is participated in the rearrangement of elastic fibres in the extracellular space via interacting with the FBN1 (fibrillin 1) and FBN2 proteins [46, 47]. Moreover, MFAP5 had an age-dependent weakening of blood vessels [48]. In light of these conclusions, we infer MFAP3 and MFAP5 may play a critical role in the process of aging via regulation of elastic fibre formation.
In sum, the identified positive-regulation potential target genes of lncRNA, especially CYP51, FDPS, MFAP3 and MFAP5, may be pivotal genes for the process of aging. However, there remain shortcomings in this study. The results were obtained using bioinformatics methods and have not been verified by relevant experiments yet. Further experiments are needed to prove the effects and mechanisms of CYP51, FDPS, MFAP3 and MFAP5 in aging.
Figures Legends
Figure 1 A: Box plot of gene expressions in peripheral white adipocytes samples at three time points before normalization. B: Box plot of gene expressions in peripheral white adipocytes samples at three time points after normalization. The X axis stands for samples while the Y axis stands for expression level of genes. The black line in the center was the median of expression value, and the consistent distribution indicated a good standardization.
Figure 2 The regulatory network of 41 lncRNAs and their corresponding to mRNAs. The diamond nodes stand for lncRNAs; arrows represent the positive regulation; non-arrows represent the negative regulation.
Figure 3 The cluster heat map of the 41 long non-coding RNAs (lncRNAs). The color scale represents the relative levels of lncRNAs; horizontal axis represents samples; vertical coordinate represents lncRNAs.
Figure 4 The expression pattern and heat map of 11 clusters.
Table 1 Gene Ontology (GO) and pathways functional enrichment analysis of positive-regulation potential target genes of lncRNAs
Ontology
term
adjust-P
count
GO-BP
vasculature development [GO:0001944]
1.33E-08
65
GO-BP
blood vessel development [GO:0001568]
1.34E-07
60
GO-BP
regulation of locomotion [GO:0040012]
2.45E-05
51
GO-BP
blood vessel morphogenesis [GO:0048514]
2.55E-05
49
GO-BP
regulation of cellular component movement [GO:0051270]
3.92E-05
50
GO-CC
cell surface [GO:0009986]
2.46E-05
53
GO-CC
plasma membrane [GO:0005886]
9.39E-05
158
GO-CC
cell periphery [GO:0071944]
2.59E-04
163
GO-CC
side of membrane [GO:0098552]
0.015056
33
GO-CC
plasma membrane part [GO:0044459]
0.021127
101
REACT_208531
Cholesterol biosynthesis
4.00E-03
9
REACT_198996
Elastic fibre formation
5.10E-03
11
Table 2 Gene Ontology (GO) and pathways functional enrichment analysis of negative-regulation potential target genes of lncRNAs
Ontology
term
adjust-P
count
GO-BP
metabolic process [GO:0008152]
8.05E-10
380
GO-BP
organic substance metabolic process [GO:0071704]
5.74E-09
360
GO-BP
primary metabolic process [GO:0044238]
3.62E-08
343
GO-BP
cellular metabolic process [GO:0044237]
5.03E-08
344
GO-BP
mitochondrion organization [GO:0007005]
2.30E-04
32
GO-CC
intracellular [GO:0005622]
2.01E-16
478
GO-CC
intracellular part [GO:0044424]
3.17E-16
475
GO-CC
cell [GO:0005623]
4.05E-15
526
GO-CC
cell part [GO:0044464]
4.05E-15
526
GO-CC
mitochondrion [GO:0005739]
7.42E-12
134
GO-MF
catalytic activity [GO:0003824]
1.50E-03
180
REACT_188937
Metabolism
2.32E-04
125
REACT_247926
Metabolism of proteins
0.004417
58
REACT_237472
Asparagine N-linked glycosylation
0.004713
19
REACT_236283
Post-translational protein modification
0.010703
27
REACT_225686
Autodegradation of Cdh1 by Cdh1:APC/C
0.012911
13

Responses to UV and IR in PBL

Bioinformatics analysis for identifying responses to ultraviolet radiation and ionizing radiation in peripheral blood lymphocytes
Highlights:
Top 100 feature genes were classified into 4 clusters.
p53 signaling pathway and aminoacyl-tRNA biosynthesis pathway were involved in responses to UV and IR.
CDKN1A, GADD45A, CDK4 and SHMT2 were hub nodes in PPI network.
Abstract
Purpose: This study was aimed to explore the underlyingmechanisms of responses to ultraviolet radiation (UV) and ionizing radiation (IR) in peripheral blood lymphocytes (PBL) by bioinformatics analysis.
Methods: Gene expression profile GSE1977 was downloaded from the Gene Expression Omnibus. Peripheral blood lymphocytes (PBL) samples collected from 15 healthy individuals were divided into three aliquots for mock (Mock), UV and IR treatment. Feature genes were identified by interquartile range, a repeated measures analysis of variance and random forest in R. The top 100 feature genes were shown with heat map. Cluster analysis and functional enrichment analysis for top 100 feature genes were performed. Then protein-protein interaction (PPI) networks were constructed by Cytoscape software.
Results: Total 826 feature genes were obtained. The top 100 feature genes were classified into 4 clusters. Feature genes in cluster 1 were significantly enriched in p53 signaling pathway. In cluster 2, feature genes were mainly enriched in aminoacyl-tRNA biosynthesis pathway. Cyclin-dependent kinase inhibitor 1A (CDKN1A) and growth arrest and DNA-damage-inducible α (GADD45A) were hub nodes in PPI network of cluster 1[MF1]. Cyclin-dependent kinase 4 (CDK4) and serine hydroxymethyltransferase 2 (SHMT2) were hub nodes in PPI network of cluster 2.
Conclusions: p53 signaling pathway and aminoacyl-tRNA biosynthesis pathway may be associated with UV- and IR-induced responses in PBL. CDKN1A, GADD45A, CDK4 and SHMT2 may be potential detection genes for UV and IR treatment in PBL.
Key words: peripheral blood lymphocytes, ultraviolet radiation, ionizing radiation
Introduction In recent year, there has been a large number of external stimuli, such as ultraviolet radiation (UV) and ionizing radiation (IR), affect individuals[1]. These external stimuli lead to DNA damage, cell death, even body injury [2]. Peripheral blood lymphocytes (PBL) as an extremely sensitive indicator are used for detecting body changes in several diseases [3,4]. PBL are mature lymphocytes and distribute throughout the body, rather than localising to organs (such as the spleen or lymph nodes) [3]. PBL comprise T lymphocytes, natural killer (NK) cells and B lymphocytes [5].[MF2]
Excessive exposure to UV and IR can result in acute and chronic harmful effects on the eye, skin and immune system [6,7]. UV and IR trigger adaptive changes in gene expression [8], damage collagen fibers [9] and cause DNA damage [10]. Christopher et al. showed oxidative DNA damage induced by UV light using Chinese hamster ovary cells [11]. Declan et al. used bladder cancer cell-lines to study DNA damage and repair in telomerase reverse transcriptase (TERT) gene following IR [12]. Moreover, in rat mesenchymal stem cells, low-dose IR and UV stimulate cell proliferation through activation of the MAPK/ERK pathway [13]. Tan et al. reported that up-regulation of miR-125b served as a negative feedback mechanism to control p38α activity and promote cell survival upon UV radiation [14]. However, UV- and IR-induced responses in PBL and underlying molecular mechanisms are still not clearly demonstrated.
Rieger et al. utilized GSE1977 to investigate the transcriptional response of 10,000 genes in DNA damage to IR and UV radiation. In this study, we downloaded GSE1977 andidentified the feature genes between UV and IR exposed samples to explain the responses in PBL.[MF3][MF4] Besides, cluster analysis, functional enrichment analysis and protein-protein interaction (PPI) networks were constructed to study and identify several genes in these responses. We hope this study can give us a systematic perspective to understand the mechanisms and discover potential genes in the responses to UV and IR radiation.[MF5]
Data and methods Affymetrix microarray analysis
The array data of GSE1977 was downloaded from Gene Expression Omnibus (GEO) database, which was deposited by Rieger KE et al [15]. Peripheral blood lymphocytes (PBL) samples were collected from 15 healthy individuals, ages 21-36. Lymphoblastoid cells were divided into three aliquots for mock (Mock), ultra-violet radiation (UV) and ionizing radiation (IR) treatment. The raw CEL data and annotation files were downloaded based on the platform of GPL8300 (Affymetrix Human Genome U95 Version 2 Array) for further analysis.
Data preprocessing
The probe IDs were converted into corresponding gene names based on the annotation information on the platform. If multiple probes corresponded to a same gene, the average expression value was calculated as the expression value of this gene. The missing values were imputed using k-nearest neighbor averaging [16] method with theimpute package [17] in R. The raw microarray data were quantile normalized with the PreprocessCore package [18] in R[MF6] and presented by box plot.
Feature genes analysis
In order to identify feature genes, different methods were used to filter genes. At first, interquartile range (IQR) [19] was used to filter genes based on gene expression levels distribution. All genes whose variability closely to 0 [MF7]were eliminated. Thenthe mean values among groups were compared according to a repeated measures analysis of variance (ANOVA) using genefilter package [20] in R. The genes in inter-group with significant difference and p-value < 0.01 were selected.[MF8] Finally, random forest method was applied to weigh the construction of each gene on the classification for samples [MF9]using randomForest package [21] in R. Hierarchical cluster analysis was constructed using the top 100 genes and visualized using heat map by the gplots package [22] in R.
Functional enrichment analysis
Gene Ontology (GO) database [23] is a collection of a large number of gene annotation terms. Kyoto Encyclopedia of Genes and Genomes (KEGG) knowledge database [24] is applied to identify the functional and metabolic pathway. Database for Annotation, Visualization and Integrated Discovery (DAVID) [25,26] is a tool that provides a comprehensive set of functional annotation for large list of genes. GO enrichment analysis and KEGG pathway enrichment analysis were conducted for the feature genes with DAVID. The p-value < 0.05 was the cutoff criterion for the gene enrichment analysis.
Protein-protein interaction (PPI) network construction
Search Tool for the Retrieval of Interacting Genes (STRING) [27] is an online database which collects comprehensive information of proteins [MF10]. The interactions of protein pairs in STRING database were displayed with a combined score. The genes in different gene clusters [MF11]were mapped into PPIs based on the information of STRING database, respectively. The protein pairs with confidence score > 0.4 were considered to be significant. PPI network was visualized in the application of Cytoscape software [28].
Results Data preprocessing
After preprocessing, total 9012 genes were obtained from 45 chips data. Normalization result was shown in Figure 1.
Feature genes analysis
Figure 2A showed the distribution of IQR of raw data before filtering.[MF12] Filtered with ANOVA, 826 genes were retained according to the criteria of p-value < 0.01 (Figure 2B). Moreover, top 100 feature genes were identified by random forest based on their contributions to classification.
Heat map and cluster analysis[MF13]
The heat map for top 100 feature genes was shown in Figure 3A. The cluster analysis results showed that feature genes were classified into 4 clusters (Figure 3B). Feature genes in cluster 1 were up-regulated in UV and IR, but down-regulated in Mock. In cluster 2, feature genes were up-regulated in IR and Mock, but down-regulated in UV. Cluster 3 contained feature genes up-regulated in UV and down-regulated in Mock and IR. Cluster 4 included feature genes up-regulated in Mock, down-regulated in IR, but inconsistent changes in UV. Feature genes in cluster 4 were ignored due to weak separating capacity.[MF14]
Functional enrichment analysis
The GO enrichment analysis was shown in Table 1. Feature genes in cluster 1 were significantly enriched in GO terms of DNA damage stimulus and cell death; feature genes in cluster 2 were mainly enriched in tRNA and ncRNA metabolic processes; feature genes in cluster 3 were enriched in regulation of cell proliferation.
Total 4 pathways were obtained in KEGG enrichment analysis (Table 2). The feature genes in cluster 1 were enriched in “p53 signaling pathway”, “nucleotide excision repair” and “apoptosis”; feature genes in cluster 2 were enriched in “aminoacyl-tRNA biosynthesis”.
Protein-protein interaction (PPI) network construction
Only two PPI networks were constructed between feature genes and their interactive genes in cluster 1 (Figure 4A) and cluster 2 (Figure 4B), respectively.
In Figure 4A, the PPI network of cluster 1 included 17 nodes and 25 edges. Cyclin-dependent kinase inhibitor 1A (p21) (CDKN1A) and growth arrest and DNA-damage-inducible α (GADD45A) were hub nodes in this network. The PPI network of cluster 2 comprised 25 nodes and 29 edges. Cyclin-dependent kinase 4 (CDK4) and serine hydroxymethyltransferase 2 (mitochondrial) (SHMT2) were hub nodes in this network. In addition, glycyl-tRNA synthetase (GARS), alanyl-tRNA synthetase (AARS), tyrosyl-tRNA synthetase (YARS) and methionine-tRNA synthetase (MARS) composed a small module[MF15].
Discussion In this study, we aimed to identify feature genes that were particularly importance to UV and IR response.[MF16] For the identification of feature genes with the publicly available microarray database (GSE1977), we used ANOVA and random forest analysis. As a result, total 826 genes were identified according to the criteria of p-value < 0.01. Top 100 feature genes were classified into 4 clusters. The genes in cluster 1 were enriched in p53 signaling pathway and genes in cluster 2 were enriched in aminoacyl-tRNA biosynthesis pathway. Furthermore, PPI network results showed that CDKN1A and GADD45A were hub nodes in cluster 1. CDK4 and SHMT2 were hub nodes in cluster 2.
P53 is a protein that in human is encoded by tumor protein p53 gene [29]. It regulates cell cycle and acts as a tumor suppressor preventing cancer [30]. The p53 signaling pathway is stimulated by DNA damage [31]. In this study, CDKN1A and GADD45A enriched in p53 signaling pathway were up-regulated in cluster 1 with UV and IR treatment. CDKN1A and GADD45A were also hub nodes in PPI network. They were significantly enriched in GO term of response to DNA damage stimulus, which were consistent with previous studies. [MF17]CDKN1A (the gene encoding p21) encodes a potent cyclin-dependent kinase inhibitor. Waldman et al. reported that p21/CDKN1A was necessary for the p53-mediated G1 arrest in human cancer cells [32]. It involved in cell response to DNA damage mediated through transcriptional activation by p53 [33]. Moreover, GADD45A, a member of GADD45 family genes, is a p53-regulated stress protein. GADD45A has been characterized as one of the important players that participate in cellular response to a variety of DNA damage agents [34]. As a result, we infer that CDKN1A and GADD45A may regulate DNA damage induced by UV and IR treatment through p53 pathway.
In cluster 1, genes were significantly enriched in p53 signaling pathway. While, in cluster 2, genes were enriched in aminoacyl-tRNA biosynthesis pathway. Aminoacyl-tRNA is substrate for translation. The function of aminoacyl-tRNA biosynthesis is to precisely match amino acids with tRNAs containing the corresponding anticodon [35]. Mihail et al. reported that UV-induced damage to translation inhibition was through impairing aminoacyl-tRNA binding[MF18] [36]. In this study, GARS, AARS, YARS and MARS were enriched in this pathway. These genes were aminoacyl-tRNA synthetases. Yao et al. reported that UV irradiation induced phosphorylation of MARS by general control nonrepressed-2 [37]. Thus, phosphorylation of MARS inhibited the methionine combined with its tRNA.[MF19] In addition, these four genes composed a small module[MF20] in PPI network. It suggested that these four genes interacted with each other in aminoacyl-tRNA biosynthesis pathway.
CDK4 is a member of the cyclin-dependent kinase family. In this study, cluster analysis of feature genes revealed that CDK4 was down-regulated in UV treatment, which was agreed with previous studies. Kim et al. reported that UV induced cell cycle arrest was mediated by CDK4 downregulation [38]. In addition, SHMT2 was also down-regulated in UV-exposed in cluster 2. SHMT2 encodes the mitochondrial form of a pyridoxal phosphate-dependent enzyme. Fox et al. reported that SHMT2 transcript was highly susceptible to UV-induced DNA damage [39]. Thus, CDK4 and SHMT2 were closely associated with responses induced by UV. They may be unique characteristic genes for UV treatment.
In conclusion, our data provides a comprehensive bioinformatics analysis of feature genes which may be involved in UV- and IR-induced responses in PBL. The genes of CDKN1A and GADD45A may regulate DNA damage induced by UV and IR treatment through p53 signaling pathway. The genes of GARS, AARS, YARS and MARS may be associated with response induced by UV via aminoacyl-tRNA biosynthesis pathway. CDK4 and SHMT2 may be potential detection genes for UV treatment in PBL. However, further experiments are still needed to confirm our results.

[casanovaaggrev]