Ensemble pruning via Weighted Accuracy and Diversity


Samuel Zeng
Samuel Zeng
Mr. Samuel Zeng presently is a research assistant of Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory (NLP2CT) in Macau, with particular emphases on the research of machine learning methodologies for natural language processing and machine translation. He received a B.S. with honors in E-Commerce technology from Beijing Normal University Zhuhai at China in 2009, and a M.SC. in computer and information science from the University of Macau in 2012.
Addresses: www mail


  • 16:00, Wednesday, July 11th, 2012
  • Room 336


  • Samuel Zeng, University of Macau


Ensemble Pruning (EP) refers to the approach dealing with diminution or reduction of the ensemble size prior to predictions combination. The objective of ensemble pruning is intent to reduce the memory requirement and accelerate the classification process while preserving or improving prediction accuracy of the ensemble. A pernicious problem among the typical ensemble pruning algorithms is taking only one of the two crucial criteria into account for evaluating ensemble quality: either accuracy or diversity. None of them considers these two guidelines simultaneously, nor the interaction between them. Our claim is that accuracy and diversity are mutual restraint factors, assembling all classifiers with high accuracy sometimes may downgrade the complementarity (diversity) and robustness of the algorithm; whereas diversely assemble the classifiers may seriously decrease the classification accuracy. Therefore, we proposed Weighted Accuracy and Diversity (WAD) method, a novel criterion or measure to evaluate the quality of a classifier ensemble, helping the ensemble pruning task. Our method can coordinate accuracy and diversity by dint of weighting to get a score representing ensemble quality. In our research, the proposed method has been validated in several natural language processing applications, including part-of-speech tagging, shallow parsing and sentence boundary detection

Note: This seminar will be held in English.