The Open Biotechnology Journal




ISSN: 1874-0707 ― Volume 14, 2020

Feature Selection Algorithm Based on Mutual Information and Lasso for Microarray Data



Wang Zhongxin1, Sun Gang1, 2, *, Zhang Jing3, Zhao Jia1
1 School of Computer and Information Engineering, Fuyang Teachers College, Fuyang, China
2 School of Computer and Information, Hefei University of Technology, Hefei, China
3 Information & Telecommunication Branch, State Grid Anhui Electric Power Company, Hefei, China

Abstract

With the development of microarray technology, massive microarray data is produced by gene expression experiments, and it provides a new approach for the study of human disease. Due to the characteristics of high dimensionality, much noise and data redundancy for microarray data, it is difficult to my knowledge from microarray data profoundly and accurately,and it also brings enormous difficulty for information genes selection. Therefore, a new feature selection algorithm for high dimensional microarray data is proposed in this paper, which mainly involves two steps. In the first step, mutual information method is used to calculate all genes, and according to the mutual information value, information genes is selected as candidate genes subset and irrelevant genes are filtered. In the second step, an improved method based on Lasso is used to select information genes from candidate genes subset, which aims to remove the redundant genes. Experimental results show that the proposed algorithm can select fewer genes, and it has better classification ability, stable performance and strong generalization ability. It is an effective genes feature selection algorithm.

Keywords: Feature selection, Lasso, Microarray data, Mutual information.


Article Information


Identifiers and Pagination:

Year: 2016
Volume: 10
First Page: 278
Last Page: 286
Publisher Id: TOBIOTJ-10-278
DOI: 10.2174/1874070701610010278

Article History:

Received Date: 17/11/2015
Revision Received Date: 30/06/2016
Acceptance Date: 19/07/2016
Electronic publication date: 31/10/2016
Collection year: 2016

© Zhongxin et al.; Licensee Bentham Open

open-access license: This is an open access article licensed under the terms of the Creative Commons Attribution-Non-Commercial 4.0 International Public License (CC BY-NC 4.0) (https://creativecommons.org/licenses/by-nc/4.0/legalcode), which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.


* Address correspondence to this authors at the Hefei University of Technology, Tunxi Rd. No. 193, Baohe, Hefei, Anhui, China; Tel: 0551-62902373; Fax: +0551-62902373; E-mail: ahfysungang@163.com





1. INTRODUCTION

With the rapid development of microarray technology, massive microarray data is produced by experiments, which provides a new approach to study the disease. The dimensions of microarray data are higher, but the number of disease-related genes is fewer. The disease-related gene is called information gene. Feature selection technique is to select the subset of the relevant attributes from the high-dimensional data, which has been widely used in pattern recognition, artificial intelligence, data mining, machine learning and other fields [1Kim YS, Street WN, Menczer F. Data Mining: Opportunities and Challenges. Hershey: Idea Group Publishing 2003.], and is also an important tool to analyze the high-dimensional microarray data [2Saeys Y, Inza I, Larrafiaga P. A review of feature selection techniques in bioinformatics Bioinformatics 2007; 23(19): 2507-17.
[http://dx.doi.org/10.1093/bioinformatics/btm344]
].

The sorting method is a simple and common method among information genes selection methods. The method scores all genes, then information gene are selected according to the scores. The more commonly used methods are mutual information method [3Wang YH, Makedon FS, Ford JC, Pearlman J. HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 2005; 21(8): 1530-7.
[http://dx.doi.org/10.1093/bioinformatics/bti192] [PMID: 15585531]
], Signal Noise Ratio method [4Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999; 286(5439): 531-7.
[http://dx.doi.org/10.1126/science.286.5439.531] [PMID: 10521349]
], Relief feature filtering algorithm [5Robnik sikonja M., Kononenko I. Theoretical and empirical analysis of ReliefF and RreliefF. Mach Learn 2003; 53(1): 23-69.] at present. However, genes selected by sorting method often have strong correlation, which result in redundant genes [6Hanczar B, Courtine M, Benis A, Hennegar C, Clément K, Zucker JD. ‘Improving classification of microarray data using prototype-based feature selection’. ACM SIGKDD Explor Newsl 2003; 5(2): 23-30.
[http://dx.doi.org/10.1145/980972.980977]
]. Too many redundant genes will make the scale of genes subset larger, increasing the computational burden, decreasing the distinguish ability, and resulting in classification errors. In order to remove redundant genes, Tan et al. [7Tan F, Fu XZ, Wang H, Zhang YQ, Bourgeois A. A hybrid feature selection approach for microarray gene expression data. In: Proceedings of the 6th International Conference on Computational Science. Berlin Heidelberg: Springer-Verlag 2006; pp. 678-85.
[http://dx.doi.org/10.1007/11758525_92]
] proposed different sorting methods to select genes subset; Wang Shulin et al. [8Wang S-L, Wang J, Chen H-W, et al. Heuristic breadth-first search algorithm for informative gene selection based on gene expression profiles. Chin J Comput 2008; 31(4): 636-49.] proposed breadth-first search algorithm to select genes subset; Chuang et al. [9Chuang LY, Yang CH, Li JC, Yang CH. A hybrid BPSO-CGA approach for gene selection and classification of microarray data. J Comput Biol 2012; 19(1): 68-82.
[http://dx.doi.org/10.1089/cmb.2010.0064] [PMID: 21210743]
] proposed particle swarm algorithm and genetic algorithm to select genes subset; Yu et al. [10Zhiwen Y, Le L, Jane Y, et al. SC3: Triple spectral clustering based consensus clustering framework for class discovery from cancer gene expression profiles. IEEE/ACM Trans Comput Biol Bioinf 2012; 9(6): 1751-65.
[http://dx.doi.org/10.1109/TCBB.2012.108]
] proposed clustering algorithm to dynamically select genes subset; Benso A et al. [11Benso A, Carlo S D, Politano G. A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory. Computational Biology and Bioinformatics, IEEE/ACM Trans 2011; 8(3): 577-91.
[http://dx.doi.org/10.1109/TCBB.2010.90]
] proposed spectral clustering to select genes subset; Chen et al. [12Chen Z, Li J, Wei L. A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue. Artif Intell Med 2007; 41(2): 161-75.
[http://dx.doi.org/10.1016/j.artmed.2007.07.008] [PMID: 17851055]
] proposed multi-core support vector machine to select genes subset. Above several algorithms remove redundant genes to some extent, but there may be over-fitting and poor generalization ability. Therefore, the design of a robust and efficient genes feature selection algorithm is the research focus of microarray data analysis.

Lasso method is a dimension reduction method based on linear regression model, which has attracted widespread attention in the field of feature selection because of its efficient performance [13Zheng SF, Liu WX. An experimental comparison of gene selection by Lasso and Dantzig selector for cancer classification. Comput Biol Med 2011; 41(11): 1033-40.
[http://dx.doi.org/10.1016/j.compbiomed.2011.08.011] [PMID: 21955335]
]. If there is a strong correlation between genes, and they are mutually redundant, Lasso method may mistakenly take them as information genes. In addition, because linear regression model constructed by Lasso method is very strict, it is possible over-fitting exists. Therefore, we propose a new feature selection algorithm FSMIL (Feature Selection Algorithm based on Mutual Information and Lasso) to filter irrelevant genes and remove redundant genes. The algorithm is implemented by two steps; Step 1 uses mutual information method to filter irrelevant genes, and Step 2 uses an improved Lasso method to remove redundant genes. Experiments use 5 public microarray datasets to verify the feasibility and effectiveness of the algorithm.

2. FRAMEWORK OF FSMIL ALGORITHM

In order to select useful information genes from high-dimensional microarray data, filter unrelevant genes and remove redundant genes, we propose a new feature selection algorithm based on mutual information and Lasso FSMIL, and its framework is shown in Fig. (1). The algorithm is implemented by two steps; Step 1 uses mutual information method to filter irrelevant genes, and Step 2 uses an improved Lasso method to remove redundant genes.

Fig. (1)
Framework of the algorithm.


Assuming that there are n samples, m genes in microarray dataset (X, Y), the detailed steps of the algorithm are as follows:

  1. Use mutual information to calculate the values of m genes, and select m' higher value genes.
  2. Use an improved Lasso method to select m” genes from m' genes.
  3. Establish classification models and evaluate the performance of the algorithm.

(1) and (2) will be discussed in detail later. In (3), we use several different classification models and LOOCV method to evaluate the performance of the algorithm. LOOCV method is a special kind of cross-validation method that is used commonly. Every time it selects a sample as test sample, the remaining samples are used to establish classification model.

3. MAIN TITLE MUTUAL INFORMATION METHOD

In general, the dimensions of microarray data are very higher, but the number of disease-related genes is fewer. Therefore, there are a large number of irrelevant genes in microarray data, which are not much affected and sometimes cause interference in diagnosing the disease.

Mutual information is the index measuring the correlation between the two variables, which is one of the most effective filtration methods as it is simple and efficient. In the process of selecting genes, mutual information value is used to measures the correlation between gene and classification category. If the mutual information value is larger, it means more classification information is in genes, and its importance is also higher. Mutual information value is calculated by entropy and conditional entropy. Assuming that X and Y represent gene and category, x and y represent the value of the corresponding X and Y respectively, p(x) and p(y) represent the probability of gene X and gene Y respectively, p(xy) is the joint probability of X and Y, p(x|y) is the conditional probability of X. The mutual information MI (X,Y) is calculated as follows:

(1)

where p(x) and p(y) represent the probability of gene X and gene Y respectively, p(xy) is the joint probability of X and Y.

Information entropy H(X) is:

(2)

Conditional entropy H(X|Y) is:

(3)

where p(x|y) is the conditional probability of X.

For continuous attribute data, we firstly deal with continuous data into discrete data, and then calculate the mutual information. The detailed steps that we filter irrelevant genes are: firstly the mutual information value of m genes is calculated according to formula (1), then all genes are sorted by the mutual information value in descending order, and the first m' genes are selected as a candidate genes subset, usually m '<< m. However, the m' genes are usually strong relevant, it will result in redundancy. If redundancy genes are too many, the candidate genes subset will become larger, which increase the computational burden and decrease the distinguish ability. Therefore, we use an improved Lasso method to remove redundant genes, get m” information genes from m' genes.

4. IMPROVED LASSO METHOD

Lasso method is originally proposed to describe the optimization problem with constraints by the scholar Tibshirani [14Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc B 1996; 58(1): 267-88.], which is widely used in microarray data analysis because of its simplicity and efficiency. The basic idea of the algorithm is that when the sum of the absolute value of the regression coefficients is less than the threshold, the minimum sum of residuals square is calculated, that will cause that the value of some regression coefficients is equal to 0. Assuming the data (X, Y) containing n samples with m features, X = (x1, ..., xj, ..., xm), xj = (x1j, x2j, ..., xnj) T is the independent variable; Y = (y1, ..., yi, ..., yn) T, yi is the response variable, i = 1,2, ..., n; j = 1,2, ..., m. Under the condition that the 1-norm of regression coefficients β = (β1, β2, ..., βm) is less than the threshold t, the data (X, Y) is analyzed by linear regression.

Standardized on xj, and centered on yi:

(4)

Minimize the sum of residuals square:

(5)

In the formula (5), the threshold t can be adjusted. When the threshold t is set relatively small, regression coefficients that the correlation is not high and will be reduced to 0. Removing these variables whose regression coefficient is 0 can implement feature selection. When the threshold t is set relatively large, the constraint conditions will no longer work, and all attributes will be selected as features.

The correlation between genes in candidate genes subset selected by mutual information method is generally strong. If these genes are also mutually redundant and features are directly selected by Lasso method, these genes will be mistaken as information genes. Therefore, in order to effectively remove redundant genes and overcome the lack of Lasso method, we propose an improved Lasso method to remove redundant genes. The detailed implementation process is as follows: (X, Y) is a training set after filtering irrelevant genes; Genes in Glist are sorted by the mutual information value in ascending order, and sequentially generates K genes subsets, denoted Glist (i). Firstly initializing that information genes subset Sbest is empty, then add current information genes subset Sbest into Glist (i) as a new genes subset; The features of data subset Xi that is corresponding to new genes subset Glist (i) are selected by the improved Lasso method to remove redundant genes and retain information genes; Finally the optimal information genes subset Sbest is obtained after K times iterations.

Assuming that Gi, Gj are information genes strongly correlated with the classification category, and Gi (Gj) is a redundant gene of Gj (Gi). If features are directly selected by Lasso method, Gi and Gj are more likely retained as information genes. If features are selected by the improved Lasso method, in the process of filtering irrelevant genes in step 1 Gi and Gj are selected as information genes, but when features are selected second time by the improved Lasso method in step 2, since the algorithm uses iterative strategy, regardless of whether Gi and Gj are distributed into the same genes subset originally, genes will be iteratively selected next time. After iterations many times, redundant genes can be effectively removed. Therefore, the improved Lasso method proposed in the paper can remove redundant genes to some extent. Its detailed implementation steps are shown in Fig. (2).

Fig. (2)
Improved lasso algorithm.



5. EXPERIMENTAL RESULTS AND ANALYSIS

5.1. Experimental Datasets and Environment Configuration

Experiments use 5 public microarray datasets to evaluate the performance of the algorithm proposed in this paper. The detailed description of the datasets is shown in Table 1.

Table 1
The experimental datasets.


  1. Colon dataset: The dataset contains 62 samples, including among them 40 colon cancer samples and 22 normal samples; each sample is composed of 2000 genes.
  2. Prostate dataset: The dataset contains 102 samples, including 52 prostate cancer samples and 50 normal samples; each sample is composed of 12,600 genes.
  3. Lymphoma dataset: The dataset contains 77 samples, including 58 diffuse large B-cell lymphoma samples, 19 follicular lymphoid tumor samples; each sample is composed of 7129 genes.
  4. Leukemia dataset: The dataset contains 72 samples, including 25 acute myeloid leukemia samples and 47 acute lymphoblastic leukemia samples; each sample is composed of 7129 genes.
  5. Lung datasets: The dataset contains 181 samples, including 31 malignant pleural mesothelioma samples and 150 malignant breast cancer samples; each sample is composed of 12533 genes.

In this paper, experimental environment is configured as follows: Intel CPU 3.40 GHZ, 4GB memory, the PC, Windows 7 operating system, Weka + Matlab development environment. We uses 4 classification models, namely support vector machine SVM, K neighbors KNN, C4.5 decision tree and random forest. The kernel function of SVM is configured to polynomial kernel function, the neighbor number of KNN is configured to 10, the confidence factor of C4.5 for pruning is configured to 0.25, and the tree number of random forest is configured to 10.

5.2. Analysis of Experimental Results

The proposed algorithm in this paper firstly uses mutual information method to select m' genes from m genes in order to filter irrelevant genes; then uses an improved Lasso method to select m” genes from m' genes in order to remove redundant genes. Therefore, the experiment will study from the following 3 aspects.

5.2.1. Experimental Analysis of Mutual Information Method

Generally in the range of genes number is Top 50-200 selected by mutual information method, this paper selects 100 genes as candidate genes subset. In order to test the performance of candidate genes subset selected by mutual information method, the experiment firstly uses mutual information method to select 100 information genes on 5 microarray datasets, then uses 4 different classification models to classify, and the average classification accuracy is as the final classification accuracy Acc. The experiment also directly uses 4 different classification models to classify the original datasets. The experimental results of the original datasets and the datasets selected by mutual information method are shown in Table 2, Genes represents the genes number, Acc represents the average classification accuracy.

Table 2
The experimental results of mutual information method.


As can be seen from Table 2, the classification accuracy of the candidate genes subset selected by the mutual information method is no less than the classification accuracy of the original dataset. This shows that most of the genes filtered by the mutual information method are irrelevant genes and the retained genes are information genes. Experimental results show that Top 100 genes can achieve more accurate classification on the original genes dataset, which contain the complete classification information of the original genes dataset.

5.2.2. Experimental Analysis of Improved Lasso Method

In order to verify the performance of the improved Lasso method removing redundant genes, experimental analysis of Lasso method and the improved Lasso method is done.

Because genes dataset is divided into K parts by the improved Lasso method, the value of the parameter K needs to be determined before the experiment. When the values of the K are different, the experiments on 5 public microarray datasets is done, and classification accuracy Acc is the average classification accuracy obtained by 4 different classification models. The experimental results are shown in Fig. (3). As can be seen from Fig. (3), when the value of K changes, classification accuracy obtained by the improved Lasso is no great change, which shows the improved Lasso method is not sensitive to the value of K. In order to select the relatively better value of K, we analyze the value of K. The value of K can not be too large. The large value of K means more divided parts, thus the number of genes of each part is relatively less. The value of K can not be too small. The small value of K means less divided parts, thus the number of genes of each part is relatively more. The both cases are unfavorable to select information genes.

Fig. (3)
Experimental results of the improved Lasso with different K values.


Here we verify the performance of the improved Lasso method removing redundant genes. In the experiment, the candidate genes subset obtained by mutual information method is classified by Lasso method and the improved Lasso method; the optimal solution of Lasso method is obtained by least angle regression algorithm. The genes number of candidate genes subset is Top 100, and the value of K is 10, which is perfect in the experiment above. The genes number Genes and the average classification accuracy Acc are shown in Table 3.

Table 3
Performance comparison between Lasso and improved Lasso.


As can be seen from Table 3, the number of information genes and classification accuracy of the improved Lasso method are superior to Lasso method on 5 public microarray datasets. For example, on Prostate dataset the number of information genes selected by Lasso method is 62, however, the number of information genes selected by the improved Lasso method is only 16, which greatly reduces the dimensions of genes. Meanwhile, the classification accuracy of the improved Lasso method is also not less than the Lasso method. Therefore, the improved Lasso method can effectively remove redundant genes.

5.2.3. Experimental Analysis of FSMIL Algorithm

Finally, in order to verify the performance of FSMIL algorithm, 5 public microarray datasets are classified by FSMIL algorithm, meanwhile, it is compared with the classic mutual information method and Lasso method. Firstly, FSMIL algorithm uses mutual information method to calculate the value of mutual information for all m genes, and select the Top 100 (m '= 100) genes as a candidate genes subset according to the mutual information value in descending order; then uses the improved Lasso method to select information genes from candidate genes subset. The parameter K is set to 10, which is perfect in the experiment above. In order to compare with FSMIL algorithm, the number of genes selected by mutual information method is also set to 100. Firstly SVM is adopted as classification model. The classification results of SVM on information genes subsets selected by 3 feature selection method and the original dataset are shown in Table 4.

Table 4
Performance comparison of SVM with different gene selection methods on datasets(%).


Table 4 shows the classification performance of SVM on 5 public microarray datasets. The “original” column represents the classification accuracy of SVM on the original microarray dataset, which does not use any feature selection algorithm and directly uses SVM. “MI”, “Lasso” and “FSMIL” 3 columns represent the classification accuracy of SVM on the information genes subsets selected by 3 different feature selection algorithms. As can be seen from Table 4, the classification accuracy of the proposed feature selection algorithm on 5 public microarray datasets is not less than other algorithms, which shows FSMIL method retains the useful genes of the original dataset, filters irrelevant genes and removes redundancy genes.

To confirm whether the proposed algorithm FSMIL in other classification models still has good classification performance, it also uses the KNN, C4.5 and random forest 4 classification models on 5 public microarray datasets to experiment. The experimental results are shown in Tables 5-7. As can be seen from Table 5, the classification accuracy of FSMIL algorithm on 5 microarray datasets is not less than KNN classification model on the original microarray datasets, and also better than KNN classification model on the information genes subsets selected by other feature selection algorithms. As can be seen from Table 6, the classification accuracy of FSMIL algorithm in Colon dataset is slightly lower than C4.5 classification model in the original microarray dataset, but the absolute value of the difference between them is only 1.61%. The average accuracy of FSMIL on 5 experimental datasets is the highest. The number of information genes selected by FSMIL is fewer, only 4 dimensions, is not helpful to establish decision tree model, so the classification accuracy is lower than the “original” column and the “MI” column the number of information genes of which is 100. As can be seen from Table 7, the classification accuracy of FSMIL algorithm on 5 microarray datasets is not less than the random forest classification model on the original microarray dataset, and also better than the random forest classification model on the information genes subsets selected by other feature selection algorithms.

Table 5
Performance comparison of KNN with different gene selection methods on datasets (%).


From the experimental analysis of the above 3 aspects, the number of information genes selected by FSMIL algorithm is fewer, which are not more than 16 dimensions, and even the minimum number reaches 9 dimensions. However, FSMIL algorithm on the overall classification accuracy is not less than other algorithms, which shows FSMIL algorithm can select information genes, filter irrelevant genes and remove redundancy genes to some extent. Therefore, FSMIL algorithm can select fewer information genes, and it has better classification ability, stable performance and strong generalization ability. It is an effective genes feature selection algorithm to solve the high dimension and high redundancy problems of microarray data.

Table 6
Performance comparison of C4.5 with different gene selection methods on datasets (%).



Table 7
Performance comparison of Random Forest with different gene selection methods on datasets (%).


CONCLUSION

Microarray data has a very important role for diagnosis of the disease, however, high dimension and high redundancy of microarray data brings great difficulties for mining knowledge from microarray data profoundly and accurately, and information genes selection is a very critical task. In this paper, a new feature selection algorithm FSMIL is proposed, which takes microarray datasets as specific study objects. The algorithm is implemented by2 steps; Step 1 uses mutual information method to calculate the mutual information value of all genes, and selection information genes according to the mutual information value in descending order to filter irrelevant genes and remove redundant genes. Step 2 uses the improved Lasso method to select candidate genes subset to remove redundant genes. Experimental results show that the algorithm can select the fewer genes, and it has better classification ability, stable performance and strong generalization ability to solve high dimension and high redundancy problems of microarray data. It is an effective information genes feature selection algorithm.

CONFLICT OF INTEREST

The authors confirm that this article content has no conflict of interest.

ACKNOWLEDGEMENTS

The work was jointly supported by the National Natural Science Foundation of China and Shenhua Group Co., Ltd. (51174257/E0422), and supported by “the Fundamental Research Funds for the Central Universities of Hefei University of Technology(2013bhzx0040), and supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China (KJ2016A549), and supported by the Natural Science Foundation of Fuyang Teachers College (2016FSKJ17).

REFERENCES

[1] Kim YS, Street WN, Menczer F. Data Mining: Opportunities and Challenges. Hershey: Idea Group Publishing 2003.
[2] Saeys Y, Inza I, Larrafiaga P. A review of feature selection techniques in bioinformatics Bioinformatics 2007; 23(19): 2507-17.
[http://dx.doi.org/10.1093/bioinformatics/btm344]
[3] Wang YH, Makedon FS, Ford JC, Pearlman J. HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 2005; 21(8): 1530-7.
[http://dx.doi.org/10.1093/bioinformatics/bti192] [PMID: 15585531]
[4] Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999; 286(5439): 531-7.
[http://dx.doi.org/10.1126/science.286.5439.531] [PMID: 10521349]
[5] Robnik sikonja M., Kononenko I. Theoretical and empirical analysis of ReliefF and RreliefF. Mach Learn 2003; 53(1): 23-69.
[6] Hanczar B, Courtine M, Benis A, Hennegar C, Clément K, Zucker JD. ‘Improving classification of microarray data using prototype-based feature selection’. ACM SIGKDD Explor Newsl 2003; 5(2): 23-30.
[http://dx.doi.org/10.1145/980972.980977]
[7] Tan F, Fu XZ, Wang H, Zhang YQ, Bourgeois A. A hybrid feature selection approach for microarray gene expression data. In: Proceedings of the 6th International Conference on Computational Science. Berlin Heidelberg: Springer-Verlag 2006; pp. 678-85.
[http://dx.doi.org/10.1007/11758525_92]
[8] Wang S-L, Wang J, Chen H-W, et al. Heuristic breadth-first search algorithm for informative gene selection based on gene expression profiles. Chin J Comput 2008; 31(4): 636-49.
[9] Chuang LY, Yang CH, Li JC, Yang CH. A hybrid BPSO-CGA approach for gene selection and classification of microarray data. J Comput Biol 2012; 19(1): 68-82.
[http://dx.doi.org/10.1089/cmb.2010.0064] [PMID: 21210743]
[10] Zhiwen Y, Le L, Jane Y, et al. SC3: Triple spectral clustering based consensus clustering framework for class discovery from cancer gene expression profiles. IEEE/ACM Trans Comput Biol Bioinf 2012; 9(6): 1751-65.
[http://dx.doi.org/10.1109/TCBB.2012.108]
[11] Benso A, Carlo S D, Politano G. A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory. Computational Biology and Bioinformatics, IEEE/ACM Trans 2011; 8(3): 577-91.
[http://dx.doi.org/10.1109/TCBB.2010.90]
[12] Chen Z, Li J, Wei L. A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue. Artif Intell Med 2007; 41(2): 161-75.
[http://dx.doi.org/10.1016/j.artmed.2007.07.008] [PMID: 17851055]
[13] Zheng SF, Liu WX. An experimental comparison of gene selection by Lasso and Dantzig selector for cancer classification. Comput Biol Med 2011; 41(11): 1033-40.
[http://dx.doi.org/10.1016/j.compbiomed.2011.08.011] [PMID: 21955335]
[14] Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc B 1996; 58(1): 267-88.

Endorsements



"Open access will revolutionize 21st century knowledge work and accelerate the diffusion of ideas and evidence that support just in time learning and the evolution of thinking in a number of disciplines."


Daniel Pesut
(Indiana University School of Nursing, USA)

"It is important that students and researchers from all over the world can have easy access to relevant, high-standard and timely scientific information. This is exactly what Open Access Journals provide and this is the reason why I support this endeavor."


Jacques Descotes
(Centre Antipoison-Centre de Pharmacovigilance, France)

"Publishing research articles is the key for future scientific progress. Open Access publishing is therefore of utmost importance for wider dissemination of information, and will help serving the best interest of the scientific community."


Patrice Talaga
(UCB S.A., Belgium)

"Open access journals are a novel concept in the medical literature. They offer accessible information to a wide variety of individuals, including physicians, medical students, clinical investigators, and the general public. They are an outstanding source of medical and scientific information."


Jeffrey M. Weinberg
(St. Luke's-Roosevelt Hospital Center, USA)

"Open access journals are extremely useful for graduate students, investigators and all other interested persons to read important scientific articles and subscribe scientific journals. Indeed, the research articles span a wide range of area and of high quality. This is specially a must for researchers belonging to institutions with limited library facility and funding to subscribe scientific journals."


Debomoy K. Lahiri
(Indiana University School of Medicine, USA)

"Open access journals represent a major break-through in publishing. They provide easy access to the latest research on a wide variety of issues. Relevant and timely articles are made available in a fraction of the time taken by more conventional publishers. Articles are of uniformly high quality and written by the world's leading authorities."


Robert Looney
(Naval Postgraduate School, USA)

"Open access journals have transformed the way scientific data is published and disseminated: particularly, whilst ensuring a high quality standard and transparency in the editorial process, they have increased the access to the scientific literature by those researchers that have limited library support or that are working on small budgets."


Richard Reithinger
(Westat, USA)

"Not only do open access journals greatly improve the access to high quality information for scientists in the developing world, it also provides extra exposure for our papers."


J. Ferwerda
(University of Oxford, UK)

"Open Access 'Chemistry' Journals allow the dissemination of knowledge at your finger tips without paying for the scientific content."


Sean L. Kitson
(Almac Sciences, Northern Ireland)

"In principle, all scientific journals should have open access, as should be science itself. Open access journals are very helpful for students, researchers and the general public including people from institutions which do not have library or cannot afford to subscribe scientific journals. The articles are high standard and cover a wide area."


Hubert Wolterbeek
(Delft University of Technology, The Netherlands)

"The widest possible diffusion of information is critical for the advancement of science. In this perspective, open access journals are instrumental in fostering researches and achievements."


Alessandro Laviano
(Sapienza - University of Rome, Italy)

"Open access journals are very useful for all scientists as they can have quick information in the different fields of science."


Philippe Hernigou
(Paris University, France)

"There are many scientists who can not afford the rather expensive subscriptions to scientific journals. Open access journals offer a good alternative for free access to good quality scientific information."


Fidel Toldrá
(Instituto de Agroquimica y Tecnologia de Alimentos, Spain)

"Open access journals have become a fundamental tool for students, researchers, patients and the general public. Many people from institutions which do not have library or cannot afford to subscribe scientific journals benefit of them on a daily basis. The articles are among the best and cover most scientific areas."


M. Bendandi
(University Clinic of Navarre, Spain)

"These journals provide researchers with a platform for rapid, open access scientific communication. The articles are of high quality and broad scope."


Peter Chiba
(University of Vienna, Austria)

"Open access journals are probably one of the most important contributions to promote and diffuse science worldwide."


Jaime Sampaio
(University of Trás-os-Montes e Alto Douro, Portugal)

"Open access journals make up a new and rather revolutionary way to scientific publication. This option opens several quite interesting possibilities to disseminate openly and freely new knowledge and even to facilitate interpersonal communication among scientists."


Eduardo A. Castro
(INIFTA, Argentina)

"Open access journals are freely available online throughout the world, for you to read, download, copy, distribute, and use. The articles published in the open access journals are high quality and cover a wide range of fields."


Kenji Hashimoto
(Chiba University, Japan)

"Open Access journals offer an innovative and efficient way of publication for academics and professionals in a wide range of disciplines. The papers published are of high quality after rigorous peer review and they are Indexed in: major international databases. I read Open Access journals to keep abreast of the recent development in my field of study."


Daniel Shek
(Chinese University of Hong Kong, Hong Kong)

"It is a modern trend for publishers to establish open access journals. Researchers, faculty members, and students will be greatly benefited by the new journals of Bentham Science Publishers Ltd. in this category."


Jih Ru Hwu
(National Central University, Taiwan)


Browse Contents




Webmaster Contact: info@benthamopen.net
Copyright © 2020 Bentham Open