Using the Precision Lasso for gene selection in diffuse large B cell lymphoma cancer

Pourhamidi, Rashed; Moslemi, Azam

doi:10.1186/s43046-023-00172-5

Research
Open access
Published: 26 June 2023

Using the Precision Lasso for gene selection in diffuse large B cell lymphoma cancer

Journal of the Egyptian National Cancer Institute volume 35, Article number: 19 (2023) Cite this article

1332 Accesses
Metrics details

Abstract

Background

Gene selection from gene expression profiles is the appropriate tool for diagnosing and predicting cancers. The aim of this study is to perform a Precision Lasso regression model on gene expression of diffuse large B cell lymphoma patients and to find marker genes related to DLBCL.

Methods

In the present case–control study, the dataset included 180 gene expressions from 14 healthy individuals and 17 DLBCL patients. The marker genes were selected by fitting Ridge, Lasso, Elastic Net, and Precision Lasso regression models.

Results

Based on our findings, the Precision Lasso, the Ridge, the Elastic Net, and the Lasso models choose the most marker genes, respectively. In addition, the top 20 genes are based on models compared with the results of clinical studies. The Precision Lasso and the Ridge models selected the most common genes with the clinical results, respectively.

Conclusions

The performance of the Precision Lasso model in selecting related genes could be considered more acceptable rather than other models.

Introduction

Lymphomas are a group of malignant tumors that involve lymphocytic cells or the immune system. These diseases often originate in the lymph nodes but may be diagnosed first in extranodal tissues [1]. Lymphoma is divided into two types: Hodgkin’s and non-Hodgkin’s. Non-Hodgkin’s lymphoma (NHL) is a group of lymphoid-derived malignancies that are classified according to their clinical and biological characteristics. Non-Hodgkin’s cancer is one of the most common blood cancers. It is the eighth most common cancer in men and the eleventh most common cancer in women [2]. Non-Hodgkin’s lymphoma has several subgroups, including diffuse large B cell lymphoma (DLBCL), Burkitt lymphoma (BL), mantle cell lymphoma (MCL), gastric mucosa-associated lymphoid tissue (MALT), follicular lymphoma (FL), and others [3].

Diffuse large B cell lymphoma is the most common subtype of NHL lymphoma, accounting for 30% to 40% of all newly diagnosed cases [4]. NHL is the seventh most common cancer in the USA, with 19.6 new cases per 100,000 people between 2012 and 2016. The 5-year relative survival rate is 63% for DLBCL and 88% for FL. In recent years, many studies have confirmed that genetic factors are closely related to DLBCL [5, 6].

Microarray technology has advanced rapidly in biotechnology. In fact, molecular hybridization tests that rely on light visualization are now feasible in the area of nanotechnology in DNA microarrays. The two main uses of DNA chips are studies of transcriptomic and genetic mutations. In humans, the transcriptome is used to study differences in the genes expression levels in natural cells compared to tumor cells [7].

Advancements have been made in diagnostic and therapeutic technologies, but DLBCL is not yet predictable. Researches have shown that microarray technology has the potential to diagnose and predict cancer. In addition, the microarray expression profile can differentiate cancer based on cellular nature and growth stage. Therefore, microarray plays an important role in the discovery of cancer-related genomic abnormalities [3].

The technology for measuring gene expression levels and assessing variability for big data is a high-dimensional technology. Due to the large number of variables, it is not possible to use the classical hypothesis test. In other words, in the classical hypothesis tests, each variable tests independently. So, microarray data could be used for linear regression models, which simultaneously tests all variables. However, it is not possible to estimate the parameters with a linear regression model, and special methods should be used to reduce the number of variables or to ignore the minimizing the sum of squared errors [8].

In 1970, Harley and Kennard introduced Ridge regression model by adding the term “penalty” to the estimator of the ordinary least square. They tried to fix or reduce the sum of squared errors by using the penalty function on the parameters of the regression model. Therefore, the Ridge regression estimator in high-dimensional data was able to estimate the parameters using a linear combination of the estimator of the ordinary least square [9]. In 1996, Tibshirani introduced the Lasso regression model in which used the method of dimension reduction variables. He also used the method of minimizing the sum of square error to estimate the parameters. In this model, the number of parameters is controlled using a “penalty” function on the sum of the absolute values of the regression model coefficients. Despite solving the problem of estimating the parameters in multiple regression, the Lasso in the following two conditions does not provide a good result, which are:

(1)
If the two explanatory variables are highly correlated, they have a very similar effect on the response variable
(2)
If the explanatory variables are collinear

In the above conditions, the Lasso randomly selects one of the variables and causes the wrong result [8]. Zou and Hastie, in 2005, proposed the Elastic Net regression model.

The Elastic Net model combined the Lasso and Ridge with the placement of the second degree penalty equations. This model involved both the dimension reduction and the least squares estimation [10]. In the following years, many methods have been introduced to solve these two problems; a method that solves both of the above problems was proposed by Wang et al. in 2018 under the title of the Precision Lasso regression model [8].

The present study uses gene expression data from DLBCL patients that have been extracted by microarray technology. In this type of high-dimensional data, a high correlation between variables is also a problem. This study aims to apply Precision Lasso model on microarray data of DLBCL patients and finding gene markers related to DLBCL. Also, Precision Lasso compares with different penalty models. Therefore, patients benefit from more effective treatment opportunities by diagnosing and predicting the DLBCL cancer.

Methods

The methods used in this research are consistent with the related guidelines. The steps for conducting this research are presented in Fig. 1. Overall, the method includes dataset collection, gene selection by regression models, and model evaluation which is described in the following sections.

Dataset collection

In the present case–control study, DLBCL data was used, which included 180 genes expression and 31 individuals. The data is available on the following site: https://www.ncbi.nlm.nih.gov/. The dataset includes blood samples from 31 donors, including 14 healthy individuals and 17 DLBCL patients. The notable point about the dataset is that when donating blood, people have no symptoms of the disease and are healthy enough to donate blood. According to Jorgensen et al., this is the first study of the microarray expression profile of apparently healthy individuals taken several years before the diagnosis of DLBCL [11].

Gene selection

According to the dataset of the study, the most appropriate regression models were processed on these data. Regression models include the Ridge, the Lasso, the Elastic Net, and the Precision Lasso.

Shrinkage regression models

When the number of variables p is greater than the number of observations (p ≫ n), the ordinary least square method cannot be used to estimate linear regression coefficients. Another issue is determining the number of independent variables that should be used in the model. As the number of variables increases, over-fitting occurs, and as they decrease, we may encounter under-fitting.

To solve the problem of estimating parameters in high-dimensional data in the last two decades, many methods were proposed based on the dimension reduction and the converted minimum squared error estimator. Here, four different penalty methods are described with their advantages and disadvantages.

Ridge regression model

The best way to estimate the regression model parameters, due to the lowest error, is the ordinary least square method. However, it cannot be expected minimum variance for the estimators. Therefore, we need to find a way to select the right number of estimators. The application of Ridge regression is clarified in such situations. The estimator of Ridge regression is not unbiased but has a smaller variance than the ordinary least square method. In the ridge regression model, using the constraint ∥β∥² ≤ C ² on the parameters of the regression model, it tries to fix or reduce the sum of the squares of the parameters, so this constraint was added by the ordinary least square method.

One of the features of the Ridge regression model is that the penalty function reduces the coefficients to zero but does not make any of them zero. Of course, this does not apply to a so large λ. This feature challenges the interpretation of a model with a large number of variables [9].

Lasso regression model

The Lasso regression model provides a suitable method for modeling the response variable based on the lowest and most appropriate number of explanatory variables. This method separates the more suitable variables from the rest of the variables by providing a simpler model. That is why it is known as the Lasso method, which is a Canadian word meaning snare. In 1996, Robert Tibshirani, by using a penalty function on the sum of the absolute values of the regression model coefficients, controlled the number of parameters. In this condition, the sum of the squared estimate of errors of the Lasso model writes as follows:

$$\sum\nolimits_{i=1}^{N} \left( y_{i} - \beta_{0} - \sum\nolimits^{p}_{j=1} \beta_{j} x_{ij}\right)^{2} + \lambda \sum\nolimits_{j} \left| \beta_{j} \right|$$

(1)

λ is a regulating parameter, meaning that if its value is zero, the model will become linear regression, and all variables will be present in it. If its value increases, the number of explanatory variables in the model will decrease. One of the main goals of the Lasso is to improve the interpretation of the model by determining a smaller subset of explanatory variables that have the most effect [7].

Elastic Net regression model

The Elastic Net regression model was introduced by Zu and Hasti. Elastic means flexibility. In fact, the Elastic Net model is a combination of Lasso and Ridge models and uses second degree penalties. This method is used when the Lasso cannot select the grouping variable by one category and ignore the other categories. Using this model can be useful for the dataset with high correlation [10].

Precision Lasso regression model

The regular regression model, introduced by Wang et al. as Precision Lasso proved the instability and inconsistency in the Ridge, Lasso, and Elastic Net models primarily by using a condition called irrepresentable. The condition is as follows:

$$\left|{\left({x}^{\left(2\right)}\right)}^{T}{x}^{\left(1\right)}{\left({\left({x}^{\left(1\right)}\right)}^{T}{x}^{\left(1\right)}\right)}^{-1}sign\left({\beta }^{\left(1\right)}\right)\right|<1-\eta$$

(2)

In this condition, x ⁽¹⁾ is a set of active variables x ⁽²⁾ is a set of inactive variables and η is a positive constant vector.

The instability of the Lasso points to its inability to detect the effects of correlated explanatory variables. Since correlated explanatory variables cannot analyze separately and by classical statistics, a simple way to achieve this goal is to determine similar weights for correlated variables. Considering the Trace Lasso regression model, a set of weights in which the correlated variables add to the other variables. Inconsistency is another disadvantage of the Lasso, which refers to the collinearity between variables. To solve the two problems of instability and inconsistency, for the first time, Wang et al. proposed γ a regulatory parameter to combine the two solutions. However, for example, if there is instability, γ = 1, and if there is inconsistency, γ = 0, and if there are both of them, γ = 1/2. The strategy introduced can be extended to other ℓ functions more simply. As an example, when the Response variable is dichotomous, by substituting ℓ with the negative in the likelihood logarithm, the Precision Lasso model is converted into a logistic regression model. This formula is applied in case–control data as those in the present study.

$$arg\ min\ell\left(x,\gamma ;\beta \right)+\lambda \Vert \left[{\gamma \left({x}^{T}x\right)}^\frac{1}{2}+\left(1-\gamma \right){\left({x}^{T}x+\mu I\right)}^{-\frac{1}{2}}\right]diag\left(\beta \right)\Vert$$

(3)

In the present study, due to the high correlation of genetic data, we tried to find cancer-related gene markers using the above four penalty methods [8].

Model evaluation

We evaluated shrinkage regression models using two steps. In the first step, according to previous studies, the expressed genes caused by DLBCL disease were identified. Then, we compared the genes that were selected using the models with the identified genes. In the next step, the holdout method was used with 10 folds. Then, the goodness of fit of regression models was compared based on the area under the ROC curve (AUC) and average precision score (AP-Score) [12].

Analysis of gene expression data was performed using R 3.6.2 and Python 2.7 software.

Results

This study applied four penalty regression models, including the Ridge, the Lasso, the Elastic Net, and the Precision Lasso regression models, to select best genetic markers from the DLBCL cancer gene expression dataset. This dataset consists of 180 genes belonging to 31 individuals. These include 17 DLBCL patients and 14 healthy people. The dataset includes two challenges: the very high ratio of the number of variables to individuals and a high correlation between the genes. Therefore, selecting the more effective genes in the model would better predict DLBCL cancer. Four statistical models were fitted to the gene expression dataset. The maximum twenty genes with the highest coefficient in each regression models were selected and were compared with the DLBCL cancer-related genes based on results of clinical studies.

Table 1 showed the selected genes by regression models that had high level of expression related to DLBCL cancer based on clinical studies.

Table 1 Selected genes by regression models with high level of expression genes related to DLBCL cancer based on clinical studies

Full size table

Table 2 showed the selected genes by regression models that had low level of expression related to DLBCL cancer based on clinical studies.

Table 2 Selected genes by regression models with low level of expression genes related to DLBCL cancer based on clinical studies

Full size table

Based on results in Tables 1 and 2, the Precision Lasso had the biggest share in the selection of DLBCL cancer-related genes, followed by Ridge, Elastic Net, and Lasso.

Figure 2 showed ROC curves of binary logistic data for each models. The Ridge model had lowest AUC value and the Precision Lasso, Elastic Net, and Lasso had high AUC value.

Table 3 showed the goodness of fit index, AUC, and AP-Score for the understudy regression models based on holdout method. The Precision Lasso models had highest AP-Score. Also, the Lasso, Elastic Net, and Precision Lasso models had high AUC value.

Table 3 The goodness of fit test for regression models

Full size table

Finally, the relationship of maximum the 20 genes that had the highest coefficient in the regression model in these four regression models were investigated with different types of cancer. According to Table 4, the Precision Lasso regression model selected the most DLBCL cancer-related genes.

Table 4 Relationship among the top 20 selected genes based on regression models and different types of cancer

Full size table

Discussion

The study used gene expression dataset from the DLBCL patients. Four penalty regression models were applied, including the Ridge, the Lasso, the Elastic Net, and the Precision Lasso.

In particular, these regression models are suitable for such dataset, including the number of explanatory variables greater than the number of observations, with a high correlation between variables. These models selected genes related to DLBCL cancer. The results were reported by statistical and clinical comparison. Among the regression models under study, Precision Lasso, Ridge, Elastic Net, and Lasso regression models selected genetic markers (high and low expression levels) associated with DLBCL cancer, respectively. Also, the top 20 genes were selected based on these regression models and compared with results of clinical studies. In this comparison, Precision Lasso regression and Ridge regression models were the most accurate, respectively, and Elastic Net and Lasso regression models selected the least number of genetic markers associated with DLBCL cancer.

In the following, the AUC and AP-Score were used to compare the goodness of fit of models. The ROC curve was plotted for the models. The Ridge model had the lowest area under ROC curve diagram, and the Precision Lasso, Elastic Net, and Lasso had highest AUC value. Also, the AP-Score was lowest for Ridge model, but the highest AP-Score was calculated for Precision Lasso. Based on the goodness of fit of the Precision Lasso, Lasso and Elastic Net models are very accurate.

The increasing importance of variable selection for high-dimensional data in various sciences has led to the introduction of new methods. Recently, the use of shrinkage methods has received much attention. In 2016, Padthe et al. showed that among the penalty regression models, the Elastic Net regression model performed better [45]. In 2018, Farhadi et al. compared the three models of Ridge, Lasso, and Elastic Net regression on simulated data. In this study, the Ridge regression model had the worst performance, and the Elastic Net regression model had the best performance [46]. In 2018, Wang et al. by comparison between different regression models on breast cancer gene expression showed that the Precision Lasso and Trace Lasso regression models were more accurate than other penalty regression models.

Conclusion

According to our results, the performance of Precision Lasso regression model in selecting gene markers is more acceptable than other models. It suggests other regression models, including the Adaptive Lasso and Trace Lasso regression model use in future studies. There are also many data mining methods, such as machine learning, to compare with regression models. High-dimensional data in various sciences has expanded so much that a science called data science has been developed as an interdisciplinary science. This study was performed on a DLBCL dataset that had been extracted in a very small sample size with microarray technology. Also, it efforts theses regression models compare based on results of larger sample of microarray data.

Availability of data and materials

Datasets analyzed during the current study are available in the https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117063 [11].

Abbreviations

NHL:: Non-Hodgkin’s lymphoma
DLBCL:: Diffuse large B cell lymphoma
BL:: Burkitt lymphoma
MCL:: Mantle cell lymphoma
MALT:: Gastric mucosa-associated lymphoid tissue
FL:: Follicular lymphoma
AUC:: Area under the ROC curve
AP-Score:: Average precision score

References

Wood NK, Goaz PW. Differential diagnosis of oral and maxillofacial lesions, 90. Mosby; 5th edition St. Louis: Don ladig; 1997.
Fouladseresht H, et al. The incidence of non-Hodgkin lymphoma in Iran: a systematic review and meta-analysis. World Cance Res J. 2019;6:e1261.
Shi Y, et al. Reproducibility of quantitative real-time PCR analysis in microRNA expression profiling and comparisons with microarray assays in diffuse large B-cell lymphoma patients. Int J Clin Exp Med. 2019;12(5):5776–84.
CAS Google Scholar
Sehn LH, Gascoyne RD. Diffuse large B-cell lymphoma: optimizing outcome in the context of clinical and biologic heterogeneity. J Am Soc Hematol. 2015;125(1):22–32.
CAS Google Scholar
Zhuang H, et al. MicroRNA-146a rs2910164 polymorphism and the risk of diffuse large B cell lymphoma in the Chinese Han population. Med Oncol. 2014;31:306.
Article PubMed Google Scholar
Pophali PA, et al. Compliance with cancer screening and influenza vaccination guidelines in non-Hodgkin lymphoma survivors. J Cancer Surviv. 2020;14:316–21.
Article PubMed PubMed Central Google Scholar
Serre j.-l. Techniques and Tools in Molecular Biology Used in Genetic Diagnoses, Diagnostic Techniques in Genetics. 2006. p. 1-59. https://doi.org/10.1002/0470033363.ch1.
Wang H, et al. Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data. Biosci. 2018;35(7):1181–7.
Google Scholar
Hoerl AE, Kennard RW. Ridge Regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67.
Article Google Scholar
Zou H, Hastie T. Regularization and variable selection via the elastic net. J Royal Stat Soc: Series B Stat Methodol. 2005;67(2):301–20.
Article Google Scholar
Jørgensen S, et al. The value of circulating microRNAs for early diagnosis of B-cell lymphoma: A case-control study on historical samples. Sci Rep. 2020;10:9637.
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1–22.
Article PubMed PubMed Central Google Scholar
Wang T, et al. Comparison of GLP-1 analogues versus sitagliptin in the management of type 2 diabetes: systematic review and meta-analysis of head-to-head studies. PLoS ONE. 2014;9(8): e103798.
Article PubMed PubMed Central Google Scholar
Chen W, et al. MicroRNA-361-3p suppresses tumor cell proliferation and metastasis by directly targeting SH2B1 in NSCLC. J Exp Clin Cancer Res. 2016;35(1):76.
Article PubMed PubMed Central Google Scholar
Lawrie C, et al. MicroRNA expression in lymphocyte development and malignancy. Leukemia. 2008;22(7):1440–6.
Article CAS PubMed Google Scholar
Chen P, et al. Tumor suppressor microRNA- 136–5p regulates the cellular function of renal cell carcinoma. Oncol Lett. 2018;15(4):5995–6002.
PubMed PubMed Central Google Scholar
Lawrie CH, et al. Expression of microRNAs in diffuse large B cell lymphoma is associated with immunophenotype, survival and transformation from follicular lymphoma. J Cell Mol Med. 2009;13(7):1248–60.
Article CAS PubMed Google Scholar
Yu X, Li Z. New insights into MicroRNAs involves in drug resistance in diffuse large B cell lymphoma. Am J Transl Res. 2015;7(12):2536.
CAS PubMed PubMed Central Google Scholar
Ni H, et al. MicroRNAs in diffuse large B- cell lymphoma. Oncol Lett. 2016;11(2):1271–80.
Article CAS PubMed Google Scholar
Ge Y-Z, et al. MicroRNA expression profiles predict clinical phenotypes and prognosis in chromophobe renal cell carcinoma. Sci Rep. 2015;5(1):1–8.
Article Google Scholar
Liu X, et al. Expression of MiR-296-5p in diffuse large B-Cell lymphoma and its influence on biological behavior of tumor cells. Zhongguo shi yan xue ye xue za zhi. 2018;26(2):437–42.
PubMed Google Scholar
Zheng Y, et al. miR- 376a suppresses proliferation and induces apoptosis in hepatocellular carcinoma. FEBS Lett. 2012;586(16):2396–403.
Article CAS PubMed Google Scholar
Yan Z, et al. Identification of hsa-miR-335 as a prognostic signature in gastric cancer. PLoS ONE. 2012;7:7.
Google Scholar
Abdelfattah N, et al. MiR-584-5p potentiates vincristine and radiation response by inducing spindle defects and DNA damage in medulloblastoma. Nat Commun. 2018;9(1):1–19.
Article CAS Google Scholar
Degli Esposti D, et al. miR-500a-5p regulates oxidative stress response genes in breast cancer and predicts cancer survival. Sci Rep. 2017;7(1):1–10.
Article CAS Google Scholar
Pan J, et al. A two-miRNA signature (miR-33a-5p and miR-128-3p) in whole blood as potential biomarker for early diagnosis of lung cancer. Sci Rep. 2018;8(1):1–12.
Article Google Scholar
Alencar AJ, et al. MicroRNAs are independent predictors of outcome in diffuse large B-cell lymphoma patients treated with R-CHOP. Clin Cancer Res. 2011;17(12):4125–35.
Article CAS PubMed PubMed Central Google Scholar
Roehle A, et al. MicroRNA signatures characterize diffuse large B- cell lymphomas and follicular lymphomas. Br J Haematol. 2008;142(5):732–44.
Article CAS PubMed Google Scholar
Lin C, et al. Oncogene miR-154-5p regulates cellular function and acts as a molecular marker with poor prognosis in renal cell carcinoma. Life Sci. 2018;209:481–9.
Article CAS PubMed Google Scholar
Hosseini SM, et al. Clinically significant dysregulation of hsa-miR-30d-5p and hsa-let-7b expression in patients with surgically resected non-small cell lung cancer. Avicenna J Med Biotechnol. 2018;10(2):98.
PubMed PubMed Central Google Scholar
Sun C, et al. Hsa-miR-326 targets CCND1 and inhibits non-small cell lung cancer development. Oncotarget. 2016;7(7):8341.
Article PubMed PubMed Central Google Scholar
Cui Y, et al. MicroRNA-30e inhibits proliferation and invasion of non-small cell lung cancer via targeting SOX9. Hum Cell. 2019;32(3):326–33.
Article CAS PubMed Google Scholar
Khare D, et al. Plasma microRNA profiling: Exploring better biomarkers for lymphoma surveillance. PLoS ONE. 2017;12:11.
Article Google Scholar
Yang W, et al. MiR-652-3p is upregulated in non-small cell lung cancer and promotes proliferation and metastasis by directly targeting Lgl1. Oncotarget. 2016;7(13):16703.
Article PubMed PubMed Central Google Scholar
Xue X, et al. miR-342-3p suppresses cell proliferation and migration by targeting AGR2 in non-small cell lung cancer. Cancer Lett. 2018;412:170–8.
Article CAS PubMed Google Scholar
Wang S-H, et al. microRNA-148a suppresses human gastric cancer cell metastasis by reversing epithelial-to-mesenchymal transition. Tumor Biology. 2013;34(6):3705–12.
Article CAS PubMed Google Scholar
Jørgensen S, et al. Plasma microrna predicts B-cell lymphoma up to 12 months before diagnosis–data from the Danish Blood Donor Study. DC: American Society of Hematology Washington; 2014.
Book Google Scholar
Wang R, Chen X-F, Shu Y-Q. Prediction of non-small cell lung cancer metastasis-associated microRNAs using bioinformatics. Am J Cancer Res. 2015;5(1):32.
PubMed Google Scholar
Arai T, et al. Regulation of spindle and kinetochore-associated protein 1 by antitumor miR-10a-5p in renal cell carcinoma. Cancer Sci. 2017;108(10):2088–101.
Article CAS PubMed PubMed Central Google Scholar
Assal RA, et al. A pleiotropic effect of the single clustered hepatic metastamiRs miR-96-5p and miR-182-5p on insulin-like growth factor II, insulin-like growth factor-1 receptor and insulin-like growth factor-binding protein-3 in hepatocellular carcinoma. Mol Med Rep. 2015;12(1):645–50.
Article CAS PubMed Google Scholar
Inomata M, et al. MicroRNA-17-92 down-regulates expression of distinct targets in different B-cell lymphoma subtypes. Blood. 2009;113(2):396–402.
Article CAS PubMed Google Scholar
Cheng L, et al. RAB23, regulated by miR-92b, promotes the progression of esophageal squamous cell carcinoma. Gene. 2016;595(1):31–8.
Article CAS PubMed Google Scholar
Huang Z, et al. MicroRNA-95 promotes cell proliferation and targets sorting Nexin 1 in human colorectal carcinoma. Can Res. 2011;71(7):2582–9.
Article CAS Google Scholar
Huang W-T, et al. Inhibition of ZEB1 by miR-200 characterizes Helicobacter pylori-positive gastric diffuse large B-cell lymphoma with a less aggressive behavior. Mod Pathol. 2014;27(8):1116–25.
Article CAS PubMed Google Scholar
Padthe KK. Feature grouping using weighted L1 norm for high-dimensional data. 2016.
Farhadi Z, Belaghi RA, Alma OG. Analysis of penalized regression methods in a simple linear model on the high-dimensional data. Am J Theor Appl Stat. 2019;8(5):185–92.
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

The present article was extracted from the MSc thesis and was financially supported by Arak University of medical sciences coded IR.ARAKMU.REC.1398.189.

Author information

Authors and Affiliations

Non Communicable Diseases Research Center, Bam University of Medical Sciences, Bam, Iran
Rashed Pourhamidi
Department of Biostatistics, School of Medicine, Arak University of Medical Sciences, Sardasht, Basij Square, Arak, Markazi Province, Iran
Azam Moslemi

Authors

Rashed Pourhamidi
View author publications
You can also search for this author in PubMed Google Scholar
Azam Moslemi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AM and RP contributed for the research design. RP contributed for the data acquisition and analysis. RP and AM contributed for the manuscript writing. AM also contributed in the editing of the manuscript. AM was the supervisor of the work. All authors gave approval for the final version of the manuscript.

Corresponding author

Correspondence to Azam Moslemi.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the ethical committee of Arak University of Medical Sciences coded IR.ARAKMU.REC.1398.189.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pourhamidi, R., Moslemi, A. Using the Precision Lasso for gene selection in diffuse large B cell lymphoma cancer. J Egypt Natl Canc Inst 35, 19 (2023). https://doi.org/10.1186/s43046-023-00172-5

Download citation

Received: 13 April 2022
Accepted: 18 April 2023
Published: 26 June 2023
DOI: https://doi.org/10.1186/s43046-023-00172-5

Using the Precision Lasso for gene selection in diffuse large B cell lymphoma cancer

Abstract

Background

Methods

Results

Conclusions

Introduction

Methods

Dataset collection

Gene selection

Shrinkage regression models

Ridge regression model

Lasso regression model

Elastic Net regression model

Precision Lasso regression model

Model evaluation

Results

Discussion

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords