Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2016 Vol.17 No.10 P.973-981

Max-margin based Bayesian classifier

Author(s): Tao-cheng Hu, Jin-hui Yu
Affiliation(s): 1. State Key Lab of CAD & CG, Zhejiang University, Hangzhou 310058, China
Corresponding email(s): hutaocheng@gmail.com, jhyu@cad.zju.edu.cn
Key Words: Multi-class learning, Max-margin learning, Online algorithm

Share this article to： More \|Next Article >>>

Tao-cheng Hu, Jin-hui Yu. Max-margin based Bayesian classifier[J]. Frontiers of Information Technology & Electronic Engineering, 2016, 17(10): 973-981.

@article{title="Max-margin based Bayesian classifier",
author="Tao-cheng Hu, Jin-hui Yu",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="17",
number="10",
pages="973-981",
year="2016",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1601078"
}

%0 Journal Article
%T Max-margin based Bayesian classifier
%A Tao-cheng Hu
%A Jin-hui Yu
%J Frontiers of Information Technology & Electronic Engineering
%V 17
%N 10
%P 973-981
%@ 2095-9184
%D 2016
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1601078

TY - JOUR
T1 - Max-margin based Bayesian classifier
A1 - Tao-cheng Hu
A1 - Jin-hui Yu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 17
IS - 10
SP - 973
EP - 981
%@ 2095-9184
Y1 - 2016
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1601078

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: There is a tradeoff between generalization capability and computational overhead in multi-class learning. We propose a generative probabilistic multi-class classifier, considering both the generalization capability and the learning/prediction rate. We show that the classifier has a max-margin property. Thus, prediction on future unseen data can nearly achieve the same performance as in the training stage. In addition, local variables are eliminated, which greatly simplifies the optimization problem. By convex and probabilistic analysis, an efficient online learning algorithm is developed. The algorithm aggregates rather than averages dualities, which is different from the classical situations. Empirical results indicate that our method has a good generalization capability and coverage rate.

基于最大间隔的贝叶斯分类器

概要：多分类学习中经常需要考虑在泛化性能和计算开销间进行权衡。本文提出一个生成式概率多分类器，综合考虑了泛化性和学习/预测速率。我们首先证明了我们的分类器具有最大间隔性质，这意味着对于未来数据的预测精度几乎和训练阶段一样高。此外，我们消除了目标函数中的大量的局部变元，极大地简化了优化问题。通过凸分析和概率语义分析，我们设计了高效的在线算法，与经典情形的最大不同在于这个算法使用聚集而非平均化处理梯度。实验证明了我们的算法具有很好的泛化性能和收敛速度。

关键词：多类学习；最大间隔学习；在线算法

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Agarwal, A., Kakade, S.M., Karampatziakis, N., et al., 2014. Least squares revisited: calable approaches for multi-class prediction. Proc. Int. Conf. on Machine Learning, p.541-549.

[2]Bishop, C.M., 2006.Pattern Recognition and Machine Learning.Springer, New York, USA.

[3]Blei, D.M., Ng, A.Y., Jordan, M.I., 2003.Latent Dirichlet allocation.J. Mach. Learn. Res., 3(Jan):993-1022.

[4]Boyd, S., Vandenberghe, L., 2004.Convex Optimization.Cambridge University Press, Cambridge, UK.

[5]Cai, Q., Yin, Y.F., Man, H., 2013.DSPM: dynamic structure preserving map for action recognition.IEEE Int. Conf. on Multimedia and Expo, p.1-6.

[6]

[7]Daniely, A., Shalev-Shwartz, S., 2014.Optimal learners for multiclass problems.Proc. Conf. on Learning Theory, p.287-316.

[8]Duchi, J., Hazan, E., Singer, Y., 2011.Adaptive subgradient methods for online learning and stochastic optimization.J. Mach. Learn. Res., 12:2121-2159.

[9]Galar, M., Fernández, A., Barrenechea, E., et al., 2011.An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes.Patt. Recogn., 44(8):1761-1776.

[10]Hazan, E., Rakhlin, A., Bartlett, P.L., 2007.Adaptive online gradient descent. In: Platt, J.C., Koller, D., Singer, Y., et al. (Eds.),Advances in Neural Information Processing Systems 20. MIT Press, Canada, p.65-72.

[11]Hu, T.C., Yu, J.H., 2015.Generalized entropy based semi-supervised learning.IEEE/ACIS Int. Conf. on Computer and Information Science, p.259-263.

[12]Hu, T.C., Yu, J.H., 2016.Incremental max-margin learning for semi-supervised multi-class problem.Stud. Comput. Intell., 612:31-43.

[13]Jebara, T., 2004.Machine learning: discriminative and generative.In: Meila, M. (Ed.), the Kluwer International Series in Engineering and Computer Science.Kluwer Academic, Germany.

[14]LeCun, Y., Bottou, L., Bengio, Y., et al., 1998.Gradient-based learning applied to document recognition.Proc. IEEE, 86(11):2278-2324.

[15]Nene, S.A., Nayar, S.K., Murase, H., 1996a.Columbia Object Image Library (COIL-20)Available from http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php [Accessed on Feb. 1, 2016].

[16]Nene, S.A., Nayar, S.K., Murase, H., 1996b.Columbia Object Image Library (COIL-100)Available from http://www.cs.columbia.edu/CAVE/software/softlib/coil-100.php [Accessed on Feb. 1, 2016].

[17]Rahimi, A., Recht, B., 2007.Random features for large-scale kernel machines.In: Platt, J.C., Koller, D., Singer, Y., et al. (Eds.),Advances in Neural Information Processing Systems 20. MIT Press, Canada, p.1177-1184.

[18]Ramaswamy, H.G., Babu, B.S., Agarwal, S., et al., 2014.On the consistency of output code based learning algorithms for multiclass learning problems.Proc. Conf. on Learning Theory, p.885-902.

[19]Shalev-Shwartz, S., 2007.Online learning: theory, algorithms and applications.PhD Thesis, Hebrew University, Jerusalem, Israel.

[20]Shalev-Shwartz, S., Kakade, S.M., 2009.Mind the duality gap: logarithmic regret algorithms for online optimization.In: Koller, D., Schuurmans, D., Bengio, Y. (Eds.),Advances in Neural Information Processing Systems 21. MIT Press, Canada, p.1457-1464.

[21]Srebro, N., Sridharan, K., Tewari, A., 2011.On the universality of online mirror descent.In: Saul, L.K., Weiss, Y., Bottou, L. (Eds.),Advances in Neural Information Processing Systems 17. MIT Press, Canada, p.2645-2653.

[22]Zhu, J., 2012.Max-margin nonparametric latent feature models for link prediction.Proc. Int. Conf. on Machine Learning, p.719-726.

[23]Zhu, J., Xing, E.P., 2009.Maximum entropy discrimination Markov networks.J. Mach. Learn. Res., 10(Nov):2531-2569.

[24]Zhu, J., Chen, N., Xing, E.P., 2011.Infinite latent SVM for classification and multi-task learning.In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., et al. (Eds.),Advances in Neural Information Processing Systems 24. MIT Press, Canada, p.1620-1628.

[25]Zhu, J., Chen, N., Perkins, H., et al., 2013.Gibbs max-margin topic models with fast sampling algorithms.Proc. Int. Conf. on Machine Learning, p.124-132.

Open peer comments: Debate/Discuss/Question/Opinion

<1>