Skip to main content

Identification of significant genes associated with prognosis of gastric cancer by bioinformatics analysis



Gastric cancer (GC) ranks second in mortality among all malignant diseases worldwide. However, the cause and molecular mechanism underlying gastric cancer are not clear. Here, we used integrated bioinformatics to identify possible key genes and reveal the pathogenesis and prognosis of gastric cancer.


The gene expression profiles of GSE118916, GSE79973, and GSE29272 were available from the Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) between GC and normal gastric tissues were screened by R software and Venn diagram software. GO and KEGG pathway enrichment of DEGs was performed using the DAVID database. A protein-protein interaction (PPI) network was established by STRING and visualized using Cytoscape software. Then the influence of hub genes on expression and survival was assessed using TCGA database.


A total of 83 DEGs were found in the three datasets, including 41 up-regulated genes and 42 down-regulated genes. These DEGs were mainly enriched in extracellular matrix organization and cell adhesion. The enriched pathways obtained in the KEGG pathway analysis were extracellular matrix (ECM)-receptor interaction and focal adhesion. A PPI network of DEGs was analyzed using the Molecular Complex Detection (MCODE) app of Cytoscape. Four genes were considered hub genes, including COL5A1, FBN1, SPARC, and LUM. Among them, LUM was found to have a significantly worse prognosis based on TCGA database.


We screened DEGs associated with GC by integrated bioinformatics analysis and found one potential biomarker that may be involved in the progress of GC. This hub gene may serve as a guide for further molecular biological experiments.


Gastric cancer (GC) is the sixth most commonly diagnosed cancer. Its mortality rate places it second among the malignant tumors worldwide [1]. The 5-year overall survival rate of patients in the early stage can reach 95% [2], but for patients in the advanced stage, it has remained at about 50% even after comprehensive treatment based on surgery [3, 4]. The cause of the low survival rate is tumor recurrence and metastasis. Therefore, it is important to study the potential molecular mechanism underlying the malignant biological behavior of GC cells and find effective early diagnostic techniques and reliable molecular markers for monitoring recurrence and evaluating prognosis. Despite major advances in the understanding of the molecular mechanisms of GC and in emerging targeted therapeutic options, not all patients see effective results from existing targeted therapies [5, 6].

In recent years, the use of microarray and RNA-sequencing technology has provided an efficient tool in the search for promising biomarkers for cancer diagnosis, treatment, and prognosis [7, 8]. A large amount of data has been collected on public database platforms such as Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA). These databases can be used to study the molecular mechanism further. A lot of research has been done on the gene expression profile of GC. The exact molecular mechanism of the GC is far from fully uncovered [9]. There is considerable need to find more potential for effective therapeutic strategies.

In order to better understand the influence of DEGs on molecular pathogenesis of GC, in this study, we downloaded three gene expression profiles from the GEO database and screened DEGs. We performed further gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment of DEGs. Finally, key genes affecting the prognosis of GC patients were identified using the PPI network and survival analyses.


Microarray data and identification of DEGs

Three sets of microarrays, GSE118916, GSE79973, and GSE29272, were downloaded from the Gene Expression Omnibus ( database. We only chose paired GC tissues and their matched adjacent tissues. When multiple probes were found to correspond to one specific gene, the average level of expression was considered to be its final expression. The original microarray data of each series were processed using R software package (version 3.6.1; The data were log2 transformed. |Log2 fold change (FC)| > 1 and adjusted P < 0.01 were considered the cutoff criteria for DEG screening. A Venn diagram was created using Venny (version 2.1; All common DEGs in these three datasets were selected for further study.

GO and KEGG pathway enrichment analysis

GO is a common method for annotating a large number of genes [10]. KEGG is an integrated database resource for biological interpretation of genome sequences and other high-throughput data [11]. GO and KEGG pathway enrichment analysis was performed using the database for annotation, visualization, and integrated discovery (DAVID) online tool (version DAVID 6.8;, which provides a comprehensive set of functional annotation tools for investigators to understand the biological meaning behind the large list of genes [12]. A P < 0.05 was considered statistically significant.

PPI network construction and hub gene identification

The Search Tool for the Retrieval of Interacting Genes (STRING; version 11.0; was used to explore the protein-protein interaction (PPI) information of DEGs. Validated interaction score > 0.4 was selected as the cutoff criterion. Cytoscape software (version 3.6.0; was used to visualize and analyze integration of PPI networks. The Molecular Complex Detection (MCODE) app with default parameters in Cytoscape was used to filter modules of the entire network. The cytoHubba app of the Cytoscape software was used to select important hub genes among these DEGs. We use the density of maximum neighborhood component (DMNC) and maximal clique centrality (MCC) methods provided in the cytoHubba app. Mutual genes from two methods were selected as hub genes.

Validation and survival analysis based on TCGA database

To validate the results of hub genes, expression on box plots of GC from the Cancer Genome Atlas (TCGA) database was used to show the expression patterns between tumor and normal samples. Survival and stage analysis of the hub genes were also made with the Gene Expression Profiling Interactive Analysis (GEPIA) online database (


Microarray data information and identification of DEGs

Three gene expression profiles (GSE118916, GSE79973, and GSE29272) were acquired from GEO database. The detailed information of these three gene expression profiles is shown in Table 1. There were a total of 318 samples, including 159 tumor and 159 matched adjacent tissues. There were 1295 DEGs, including 651 upregulated and 644 downregulated genes, in GSE118916. A total of 376 DEGs were screened from the GSE79973 data set, including 132 upregulated and 244 downregulated genes. Another 330 DEGs were selected from the GSE29272 data set, including 165 upregulated and 165 downregulated genes. The volcano plots of DEGs among each data set are shown in Fig. 1 a–c. A total of 83 genes were screened out in all three datasets for further analysis (Fig. 1d). There were 41 upregulated genes and 42 downregulated genes in GC tissues compared to adjacent tissues (Table 2).

Table 1 Information for GEO gastric cancer data
Fig. 1
figure 1

Identification of DEGs among each GEO data set. ac The volcano plots of the distribution of DEGs in each data set. d Authentication of 83 common DEGs in the three datasets (GSE118916, GSE79973, and GSE29272) through Venn diagram software (available online:

Table 2 Detected DEGs in gastric cancer by integrated microarray

GO and KEGG pathway enrichment analysis of DEGs

GO and KEGG pathway enrichment of all 83 DEGs was analyzed using the DAVID online tool. The GO enrichment analysis results were divided into three functional categories, biological processes (BP), cell component (CC), and molecular function (MF). In the BP category, the genes were significantly enriched in extracellular matrix organization, collagen catabolic process, and cell adhesion categories. In the CC category, the genes were significantly enriched in extracellular exosome and extracellular regions. In the MF category, the genes were significantly enriched in calcium ion binding and identical protein binding. The details are shown in Table 3. The signaling pathways of DEGs were mainly enriched in extracellular matrix (ECM)-receptor interaction, protein digestion and absorption, focal adhesion, and PI3K-Akt signaling pathway (Table 4).

Table 3 GO analysis of DEGs associated with gastric cancer
Table 4 KEGG pathway analysis of DEGs associated with gastric cancer

PPI network construction and selection of hub genes

To further explore the interaction between these 83 DEGs, the STRING database was used to construct PPI networks, and the resulting PPI networks were constructed using Cytoscape (Fig. 2a). Then, using MCODE, two key modules were identified from the whole network (Fig. 2 b and c). There were 21 nodes and 177 edges in module 1. In module 2, there were seven nodes and 19 edges. In order to identify hub genes, two algorithms (DMNC and MCC) of the cytoHubba app in the Cytoscape software were used. The top 10 hub genes based on the two methods were screened, and there were four mutual hub genes from the two methods: COL5A1, FBN1, SPARC, and LUM.

Fig. 2
figure 2

Establishment of PPI network and modules analysis. a Entire PPI network. b PPI network of module 1. c PPI network of module 2

Validation and survival analysis based on TCGA database

To validate the results given above, the gene expression profiles of these four hub genes from TCGA database were used. GEPIA was used to visualize and analyze integration of TCGA database. These hub genes were significantly differentially expressed (P < 0.01), which was consistent with the results from the GEO data sets (Fig. 3). These hub genes were differentially expressed across various stages of GC (Fig. 4). Only LUM was significantly closely correlated with the overall survival of GC patients (log-rank P = 0.041; Fig. 5).

Fig. 3
figure 3

Box plots of four hub gene expressions in TCGA database

Fig. 4
figure 4

Plots of four hub gene expressions in different stages of GC

Fig. 5
figure 5

Kaplan–Meier survival analysis of LUM


In this study, we integrated three microarray expression profiles from GEO and identified 83 DEGs between GC and normal gastric tissues, including 41 upregulated and 42 downregulated genes. Functional enrichment and KEGG pathway analysis showed that the DEGs primarily enriched in ECM organization, ECM-receptor interaction, and cell adhesion pathways. Our results suggested that these DEGs may play important role in the progression of GC.

ECM organization and ECM-receptor interaction have been proven to be an important part of tumorigenesis and development [16]. Genes encoding proteins that mediate ECM remodeling were upregulated in patients with prostate, lung, and gastric cancers [17]. Collagens are the most abundant ECM components, and they can regulate the physical and biochemical properties of the tumor microenvironment, which modulate cancer cell polarity, migration, and signaling [18, 19]. Cell adhesion is a key mediator of cancer progression and facilitates cancer metastatic dissemination. Many cell adhesion molecules within the tumor microenvironment are changed, and these changes alter the ability of tumor cells to interact with other cells and proteins of the ECM [20].

We also identified four major hub genes through the establishment of the PPI network by the STRING database and modules analysis, namely, COL5A1, FBN1, SPARC, and LUM. Subsequent survival analysis of these genes revealed that one of these four upregulated genes was closely related to the poor prognosis of GC patients.

The collagen type 5 α-1 chain (COL5A1) encodes an alpha chain for one of the low-abundance fibrillar collagens. In the research on ovarian cancer, COL5A1 is a poor outcome gene signature. Collagen remodeling might be a common biological process that contributes to poor overall survival [21]. Some studies have suggested COL5A1 is highly expressed at the mRNA and protein levels in breast cancer, and the patients with breast cancer with high COL5A1 expression have a reduced prognosis [22]. In GC, the COL family is a promising prognostic marker [23]. Fibrillin 1 (FBN1) is overexpressed in testicular germ cell tumors relative to nonneoplastic testicular tissue in patients with germ cell tumors, and it could be involved in germ cell neoplasia in situ development [24]. Silencing FBN1 could inhibit the cell proliferative, migratory, and invasive abilities of GC cells, while the influence of upregulated FBN1 expression showed the opposite effect [25]. Secreted protein acidic and rich in cysteine (SPARC) is a matricellular protein modulating cell-matrix interactions and has been found upregulated in colorectal tumor stroma. High SPARC was associated with better disease outcome in stage 2 colorectal cancer, but not in stage 3 colorectal cancer. It may play different roles in different development stages of colorectal cancer [26]. However, SPARC is upregulated in gastric cancer tissues relative to normal gastric tissues. High SPARC expression is associated with worse outcomes than negative and low SPARC expression, and SPARC is a potential marker for poor gastric cancer prognosis [27].

Lumican (LUM) is a protein-coding gene that encodes a member of the small leucine-rich proteoglycan (SLRP) family, which includes decorin, biglycan, fibromodulin, keratocan, epiphycan, and osteoglycin [28]. In recent years, an increasing number of experimental data has come to show that LUM is expressed in many kinds of tumors, including colorectal, prostate, lung, and pancreatic cancer [29,30,31,32]. The role of LUM in cancer varies according to the type of tumor. LUM is highly expressed in bladder cancer tissues and cell lines, and increased LUM expression is associated with the histological grade and the T/N stage of bladder tumors. The in vitro and in vivo data further indicate that low expression of LUM can inhibit the growth and migration of bladder cancer cells by inactivating MAPK signaling [33]. In node-negative invasive breast cancer, low lumican expression has a worse survival [34].

We provide reliable molecular biomarkers for therapy and prognosis of GC based on integrated bioinformatics analysis, including GO, KEGG pathway enrichment, PPI network, module analysis, and TCGA database, particularly when two algorithms are used to identify hub genes. However, our study has a number of limitations that should be considered. First, although we used the TCGA database to valid the results of GEO, molecular experiments are urgently needed to verify. Although we integrated three microarray data, large sample size is needed to validate the results. Second, we compared the paired GC tissues and their matched adjacent tissues. Many details were not taken into account, including histological type, grade of GC, and the distance from adjacent tissue to cancerous tissue. All of these may affect the expression of DEGs. Finally, in order to reduce the number of false-positive DEGs, we obtained co-expressed DEGs in three datasets. In this way, many important genes may have been lost.


We screened DEGs associated with GC by integrated bioinformatics analysis and found one potential biomarker that may be involved in the progress of GC. This hub gene may serve as a guide for further molecular biological experiments.

Availability of data and materials

The data that support the findings of this study are available from the Gene Expression Omnibus ( and the Gene Expression Profiling Interactive Analysis (GEPIA) database (



Gastric cancer


Gene Expression Omnibus


Gene ontology


Kyoto Encyclopedia of Genes and Genomes


Fold change


Database for annotation, visualization, and integrated discovery


Differentially expressed genes


Protein-protein interaction


The Cancer Genome Atlas


Extracellular matrix


Molecular Complex Detection


Search Tool for the Retrieval of Interacting Genes


Density of maximum neighborhood component


Maximal clique centrality


Gene Expression Profiling Interactive Analysis


Biological processes


Cell component


Molecular function


  1. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.

    Article  Google Scholar 

  2. Yaguchi Y, Tsujimoto H, Hiraki S, Ito N, Nomura S, Horiguchi H, et al. Long-term outcome following sentinel node navigation surgery for cT1 gastric cancer. Mol Clin Oncol. 2019;10(6):615–8.

    PubMed  PubMed Central  Google Scholar 

  3. Inokuchi M, Nakagawa M, Tanioka T, Okuno K, Gokita K, Kojima K. Long- and short-term outcomes of laparoscopic gastrectomy versus open gastrectomy in patients with clinically and pathological locally advanced gastric cancer: a propensity-score matching analysis. Surg Endosc. 2018;32(2):735–42.

    Article  Google Scholar 

  4. Shi Y, Xu X, Zhao Y, Qian F, Tang B, Hao Y, et al. Long-term oncologic outcomes of a randomized controlled trial comparing laparoscopic versus open gastrectomy with D2 lymph node dissection for advanced gastric cancer. Surgery. 2019;165(6):1211–6.

    Article  Google Scholar 

  5. Liu W, Zhong S, Chen J, Yu Y. HER-2/neu overexpression is an independent prognostic factor for intestinal-type and early-stage gastric cancer patients. J Clin Gastroenterol. 2012;46(4):e31–7.

    Article  CAS  Google Scholar 

  6. Liu D, Wang N, Sun Y, Guo T, Zhu X, Guo J. Expression of VEGF with tumor incidence, metastasis and prognosis in human gastric carcinoma. Cancer Biomark. 2018;22(4):693–700.

    Article  CAS  Google Scholar 

  7. Hu Y, Gaedcke J, Emons G, Beissbarth T, Grade M, Jo P, et al. Colorectal cancer susceptibility loci as predictive markers of rectal cancer prognosis after surgery. Genes Chromosomes Cancer. 2018;57(3):140–9.

    Article  CAS  Google Scholar 

  8. Saijo S, Kuwano Y, Tange S, Rokutan K, Nishida K. A novel long non-coding RNA from the HOXA6-HOXA5 locus facilitates colon cancer cell growth. BMC Cancer. 2019;19(1):532.

    Article  CAS  Google Scholar 

  9. Rajgopal S, Fredrick SJ, Parvathi VD. CircRNAs: insights into gastric cancer. Gastrointest Tumors. 2021;8(4):159–68.

    Article  CAS  Google Scholar 

  10. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9.

    Article  CAS  Google Scholar 

  11. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(D1):D457–62.

    Article  CAS  Google Scholar 

  12. Huang DW, Sherman BT, Tan Q, Collins JR, Alvord WG, Roayaei J, et al. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol. 2007;8(9):R183.

    Article  CAS  Google Scholar 

  13. Li L, Zuan Z, Yanchao Z, Zhang Q, Xiaoting W, Miao B, Jiang C, Sujuan F. FN1 SPARC and SERPINE1 are highly expressed and significantly related to a poor prognosis of gastric adenocarcinoma revealed by microarray and bioinformatics. Sci Rep. 2019;9(1):7827.

  14. He J, Jin Y, Chen Y, Yao HB, Xia YJ, Ma YY, Wang W, Shao QS. Downregulation of ALDOB is associated with poor prognosis of patients with gastric cancer. OncoTargets and Therapy. 2016;9:6099–109.

  15. Wang G, Hu N, Yang HH, Wang L, Su H, Wang C, Clifford R, Dawsey EM, Li JM, Ding T, Han XY, Giffen C, Goldstein AM, Taylor PR, Lee MP, Tan P. Comparison of Global Gene Expression of Gastric Cardia and Noncardia Cancers from a High-Risk Population in China. PLoS ONE. 2013;8(5):e63826.

  16. Malik R, Lelkes PI, Cukierman E. Biomechanical and biochemical remodeling of stromal extracellular matrix in cancer. Trends Biotechnol. 2015;33(4):230–6.

    Article  CAS  Google Scholar 

  17. Chang HY, Sneddon JB, Alizadeh AA, Sood R, West RB, Montgomery K, et al. Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds. PLoS Biol. 2004;2(2):E7.

    Article  CAS  Google Scholar 

  18. Fraley SI, Feng Y, Giri A, Longmore GD, Wirtz D. Dimensional and temporal controls of three-dimensional cell migration by zyxin and binding partners. Nat Commun. 2012;3:719.

    Article  CAS  Google Scholar 

  19. Levental KR, Yu H, Kass L, Lakins JN, Egeblad M, Erler JT, et al. Matrix crosslinking forces tumor progression by enhancing integrin signaling. Cell. 2009;139(5):891–906.

    Article  CAS  Google Scholar 

  20. Laubli H, Borsig L. Altered cell adhesion and glycosylation promote cancer immune suppression and metastasis. Front Immunol. 2019;10:2120.

    Article  CAS  Google Scholar 

  21. Cheon DJ, Tong Y, Sim MS, Dering J, Berel D, Cui X, et al. A collagen-remodeling gene signature regulated by TGF-beta signaling is associated with metastasis and poor survival in serous ovarian cancer. Clin Cancer Res. 2014;20(3):711–23.

    Article  CAS  Google Scholar 

  22. Wu M, Sun Q, Mo CH, Pang JS, Hou JY, Pang LL, et al. Prospective molecular mechanism of COL5A1 in breast cancer based on a microarray, RNA sequencing and immunohistochemistry. Oncol Rep. 2019;42(1):151–75.

    PubMed  PubMed Central  CAS  Google Scholar 

  23. Sun H. Identification of key genes associated with gastric cancer based on DNA microarray data. Oncol Lett. 2016;11(1):525–30.

    Article  CAS  Google Scholar 

  24. Cierna Z, Mego M, Jurisica I, Machalekova K, Chovanec M, Miskovska V, et al. Fibrillin-1 (FBN-1) a new marker of germ cell neoplasia in situ. BMC Cancer. 2016;16:597.

    Article  CAS  Google Scholar 

  25. Yang D, Zhao D, Chen X. MiR-133b inhibits proliferation and invasion of gastric cancer cells by up-regulating FBN1 expression. Cancer Biomark. 2017;19(4):425–36.

    Article  CAS  Google Scholar 

  26. Chew A, Salama P, Robbshaw A, Klopcic B, Zeps N, Platell C, et al. SPARC, FOXP3, CD8 and CD45 correlation with disease recurrence and long-term disease-free survival in colorectal cancer. PLoS One. 2011;6(7):e22047.

    Article  CAS  Google Scholar 

  27. Wang Z, Hao B, Yang Y, Wang R, Li Y, Wu Q. Prognostic role of SPARC expression in gastric cancer: a meta-analysis. Arch Med Sci. 2014;10(5):863–9.

    Article  Google Scholar 

  28. Iozzo RV. Matrix proteoglycans: from molecular design to cellular function. Annu Rev Biochem. 1998;67:609–52.

    Article  CAS  Google Scholar 

  29. de Wit M, Carvalho B, Delis-van Diemen PM, van Alphen C, Belien JAM, Meijer GA, et al. Lumican and versican protein expression are associated with colorectal adenoma-to-carcinoma progression. PLoS One. 2017;12(5):e0174768.

    Article  CAS  Google Scholar 

  30. Coulson-Thomas VJ, Coulson-Thomas YM, Gesteira TF, Andrade de Paula CA, Carneiro CR, Ortiz V, et al. Lumican expression, localization and antitumor activity in prostate cancer. Exp Cell Res. 2013;319(7):967–81.

    Article  CAS  Google Scholar 

  31. Yang CT, Li JM, Chu WK, Chow SE. Downregulation of lumican accelerates lung cancer cell invasion through p120 catenin. Cell Death Dis. 2018;9(4):414.

    Article  CAS  Google Scholar 

  32. Yang ZX, Lu CY, Yang YL, Dou KF, Tao KS. Lumican expression in pancreatic ductal adenocarcinoma. Hepatogastroenterology. 2013;60(122):349–53.

    PubMed  CAS  Google Scholar 

  33. Mao W, Luo M, Huang X, Wang Q, Fan J, Gao L, et al. Knockdown of lumican inhibits proliferation and migration of bladder cancer. Transl Oncol. 2019;12(8):1072–8.

    Article  Google Scholar 

  34. Troup S, Njue C, Kliewer EV, Parisien M, Roskelley C, Chakravarti S, et al. Reduced expression of the small leucine-rich proteoglycans, lumican, and decorin is associated with poor outcome in node-negative invasive breast cancer. Clin Cancer Res. 2003;9(1):207–14.

    PubMed  CAS  Google Scholar 

Download references


We thank LetPub ( for its linguistic assistance during the preparation of this manuscript.


The present study was supported by the University Natural Science Research Project of Anhui province (CN) (grant number KJ2020A0584).

Author information

Authors and Affiliations



SW conceived the study, extracted and analyzed the data, and drafted the manuscript. ST collected the data and helped to draft the manuscript. YL extracted the data. YS analyzed the data. ML participated in the study design. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Shuanhu Wang.

Ethics declarations

Ethics approval and consent to participate

This study was conducted in accordance with the World Medical Association’s Declaration of Helsinki and approved by the Ethical Committee of Bengbu Medical College (2021-204). Since the study does not include the personally identifiable information, the informed consent of patients was waived.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Tao, S., Liu, Y. et al. Identification of significant genes associated with prognosis of gastric cancer by bioinformatics analysis. J Egypt Natl Canc Inst 34, 55 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: