In this study, we performed bioinformatic analysis using microarray analysis to identify biomarkers related to BC which hold the potential to substantially sort between protein patterns for BC and normal tissue. Four datasets were used herein to categorize DEGs and validate results. The fact that the common probes were excessively high, we opted to select the first 100 upregulated and downregulated probes in tumor samples. With this approach, the high number of significant probesets (2888), which is an indication that the probability of false positives may be low, was narrowed to 68 common significant probesets. The comparable fold-change values of the shared probesets in both datasets behaved in the same manner. That is, the common proteins were upregulated and downregulated in both datasets. This is indicative that the effect of these common probesets on the formation of tumorigenesis can be determined independently in different experiment settings, thus possessing strong differentiating properties between cancer and healthy tissue.
GEO and STRING analysis
The GO enrichment analysis conducted with the 68 common probes demonstrates that most of these genes have functions related to cell division and mitosis; additionally, they were mostly found to encode proteins found in the “cell cycle-related pathways.” These included “regulation of cell cycle checkpoints” and “separation of sister chromatids,” which are key pathways of cellular and nuclear division, and are proposed to lead to cancer development [20]. The lack of commonly downregulated genes in BC samples indicates that these genes may have false-positive findings. Nevertheless, GEO data and STRING analysis results suggest that the trigger for BC development may be by accelerating biological mechanisms and pathways rather than inhibiting them. This could contribute to the selection of drugs and treatment approaches that suppress the accelerated pathways.
Specifically, the interaction between eight DEGs (CCNB1, CCNB2, CDC20, CDCA5, CENPF, KNTC1, BUB1B, and AURKB) was particularly apparent. CCNB1 has previously been reported in several studies on BC [21,22,23]. It encodes a key regulatory protein involved in the replication of nuclear matter by creating a complex with CDC2 (cell division cycle) also known as p34. Together, they control the mitosis at the G2/M-specific checkpoint [24]. Although CDC2 was not detected in this study, other regulatory genes responsible for the cell division cycle were enriched in the GEO and STRING analysis. This is likely to imbalance in the mitotic activity in BC samples. The CCNB1 gene is located on chromosome 5q13.2 and is composed of 9 exons; it encodes a 2029 bp mRNA and its longest transcript encodes a protein of 433 aa. Some SNPs on this gene have been previously reported in relation to cancer development; however, these SNPs have not been assessed in BC yet.
The spindle checkpoint kinase BUB1B (BUB1 mitotic checkpoint serine/threonine kinase beta) is involved in mitotic checkpoint and has recently been reported in a study as a hub candidate gene for BC [24]. It is thought to be localized to the kinetochore and delays the anaphase-promoting complex/cyclosome, enabling chromosomes to properly segregate [25]. Located on chromosome 15q15.1, it encodes an mRNA of 3669 bp and a protein of 1050 aa, and its impaired activity has been implicated in the formation of breast cancer [26].
Furthermore, similar oncogenes, including CCNB1, CDC20, and AURKA from the aurora kinase group were reported as hub genes by Zhang et al. [27], recently. Our study also supports their Go and KEGG enrichment analyses of DEGs, which implies the importance of mitotic checkpoints.
Validations studies
Two validation studies were conducted. The first one showed clear differentiation between tumor and normal samples. Almost none of the upregulated genes in tumor samples had high expression in normal samples. However, several upregulated genes in normal samples were also upregulated in some of the BC samples. Thus, genes with increased expression in the BC showed an enhanced discrimination power. This initial validation clustered the probes into four distinct groups. Two BC samples (ST1) were clustered with healthy tissues (N) and did not express the upregulated genes in BC, but rather expressed genes that were downregulated in tumor samples. The ST2 cluster containing 4 tumor samples showed increased expression in genes that were upregulated in the tumors. They behaved as expected from tumor samples. The final cluster (ST3) included 3 BC samples and expressed slightly less the upregulated genes in tumor samples. This indicates that based on the common 57 gene expressions, BC samples revealed subgroups that are comparable to or distinct from healthy tissue samples. Due to the fact that the datasets lacked information on tumor subtypes, a clustering of probes based on BC subtypes could not be implemented in this study. Nevertheless, provided that the dataset includes the BC subtype information, this can be accomplished and might offer alternative approaches and specific genes for personalized treatments.
The second validation dataset, although acquired from a different platform, was also capable of distinguishing between subtypes based on the common 52 genes out of 68, but to a lower extent. This dataset was composed of 86 samples with 4 normal samples and separated the BC samples into 3 subtypes. The distinction was not as apparent as it was in the first validation, due to the fact that there were multiple probesets corresponding to the same gene. Moreover, some probesets had no contribution to the separation, but nevertheless, a separation between BC subtypes was still noticeable.
Thus, based on the validation studies, the 68 featured proteins have been produced in two different platforms and distinguished not just BC from normal tissue but also differentiated between subtypes of BC.
Protein database
Protein database investigations led to the prominence of four of these 68 proteins (CKAP2L, AURKB, APIP, LGALS3) which were found consistent with the results of statistical protein levels presented herein. CKAP2L (cytoskeleton-associated protein 2-like) is involved in spindle organization and cell cycle progression from prometaphase to telophase [28]. AURKB (aurora kinase B), a member of the kinase family, is thought to have a role in the control of chromosomal alignment and segregation during mitosis by interacting with microtubules. A group of small-molecule inhibitors of AURKB, with ongoing or completed Phase I and II trials, have recently been proposed as potential drugs for cancer treatment [29]. Given that the expression levels of CKAP2L and AURKB statistically increased in BC samples, this could be a promising approach to investigate. APIP (APAF-1 interacting protein) is a protein found mostly in the cytosol and interacts with Apaf-1 (apoptotic protease activating factor-1), which holds a central role in the initiation to form the apoptosome complex and downstream pathway to intrinsic apoptosis [30]. APIP is thought to block the intrinsic mitochondrial apoptosis pathway via two routes, one Apaf-1-dependent [31] and the other Apaf-1-independent [32]. The downregulation of APIP expression in BC tissue samples could, however, be indicative of the absence of a regulatory protein. Similarly, mRNA and protein expressions of APIP were reported downregulated in non-small cell lung carcinoma [33]. LGALS3 (galectin 3), a member of the galectin family of carbohydrate-binding proteins, has previously been reported to induce apoptosis in human breast cancer cell lines through TRAIL signals that were dependent on increased PTEN activation and decreased PI3K/AKT survival pathway [34]. Downregulation of LGALS3 in the BC tissue most likely intervenes with TRAIL-induced apoptotic pathways. However, these findings contradict Oka et al. [35], who reported that overexpression of LGALS3 protects J82 human bladder cancer cells against TRAIL-induced apoptosis [35]. This contradiction can most likely be explained by the diverse apoptotic molecular pathways in different cell lines [34, 35].
BC is one of the most studied cancer types in the bioinformatic analysis due to its prevalence in humanity [27, 36, 37]. However, the use of different datasets as well as various types of statistical approaches, which improve constantly diverse biomarkers is constantly being predicted for BC [27, 36, 37], thus underlying the multiple factorial nature of BC and increasing the value of bioinformatic studies.