Next: About this document ...
Up: Natural Conjugate Gradient in
Previous: Acknowledgments
-
Amari, S. (1985).
- Differential-Geometrical Methods in Statistics, volume 28 of
Lecture Notes in Statistics.
Springer-Verlag.
-
Amari, S. (1995).
- Information geometry of the EM and em algorithms for neural
networks.
Neural Networks, 8(9):1379-1408.
-
Amari, S. (1998).
- Natural gradient works efficiently in learning.
Neural Computation, 10(2):251-276.
-
Anderson, B. and Moore, J. (1979).
- Optimal Filtering.
Prentice-Hall, Englewood Cliffs, NJ.
-
Barber, D. and Bishop, C. (1998).
- Ensemble learning for multi-layer networks.
In Jordan, M., Kearns, M., and Solla, S., editors, Advances in
Neural Information Processing Systems 10, pages 395-401. The MIT Press,
Cambridge, MA, USA.
-
Edelman, A., Arias, T. A., and Smith, S. T. (1998).
- The geometry of algorithms with orthogonality constraints.
SIAM Journal on Matrix Analysis and Applications,
20(2):303-353.
-
Ghahramani, Z. and Beal, M. (2001).
- Propagation algorithms for variational Bayesian learning.
In Leen, T., Dietterich, T., and Tresp, V., editors, Advances in
Neural Information Processing Systems 13, pages 507-513. The MIT Press,
Cambridge, MA, USA.
-
González, A. and Dorronsoro, J. R. (2006).
- A note on conjugate natural gradient training of multilayer
perceptrons.
In Proc. Int. Joint Conf. on Neural Networks (IJCNN'06),
pages 887-891, Vancouver, BC, Canada.
-
Honkela, A. and Valpola, H. (2005).
- Unsupervised variational Bayesian learning of nonlinear models.
In Saul, L., Weiss, Y., and Bottou, L., editors, Advances in
Neural Information Processing Systems 17, pages 593-600. MIT Press,
Cambridge, MA, USA.
-
Honkela, A., Valpola, H., and Karhunen, J. (2003).
- Accelerating cyclic update algorithms for parameter estimation by
pattern searches.
Neural Processing Letters, 17(2):191-203.
-
Lappalainen, H. and Honkela, A. (2000).
- Bayesian nonlinear independent component analysis by multi-layer
perceptrons.
In Girolami, M., editor, Advances in Independent Component
Analysis, pages 93-121. Springer-Verlag, Berlin.
-
McLachlan, G. J. and Krishnan, T. (1996).
- The EM Algorithm and Extensions.
Wiley.
-
Murray, M. K. and Rice, J. W. (1993).
- Differential Geometry and Statistics.
Chapman & Hall.
-
Nocedal, J. (1991).
- Theory of algorithms for unconstrained optimization.
Acta Numerica, 1:199-242.
-
Salakhutdinov, R. and Roweis, S. T. (2003).
- Adaptive overrelaxed bound optimization methods.
In Proc. 20th International Conference on Machine Learning (ICML
2003), pages 664-671.
-
Salakhutdinov, R., Roweis, S. T., and Ghahramani, Z. (2003).
- Optimization with EM and expectation-conjugate-gradient.
In Proc. 20th International Conference on Machine Learning (ICML
2003), pages 672-679.
-
Sato, M. (2001).
- Online model selection based on the variational Bayes.
Neural Computation, 13(7):1649-1681.
-
Seeger, M. (2000).
- Bayesian model selection for support vector machines, Gaussian
processes and other kernel classifiers.
In Solla, S., Leen, T., and Müller, K.-R., editors, Advances
in Neural Information Processing Systems 12, pages 603-609. MIT Press,
Cambridge, MA, USA.
-
Smith, S. T. (1993).
- Geometric Optimization Methods for Adaptive Filtering.
PhD thesis, Harvard University, Cambridge, Massachusetts.
-
Tanaka, T. (2001).
- Information geometry of mean-field approximation.
In Opper, M. and Saad, D., editors, Advanced Mean Field Methods:
Theory and Practice, pages 259-273. The MIT Press, Cambridge, MA, USA.
-
Valpola, H. (2000).
- Bayesian Ensemble Learning for Nonlinear Factor Analysis.
PhD thesis, Helsinki University of Technology, Espoo, Finland.
Published in Acta Polytechnica Scandinavica, Mathematics and
Computing Series No. 108.
-
Valpola, H., Harva, M., and Karhunen, J. (2004).
- Hierarchical models of variance sources.
Signal Processing, 84(2):267-282.
-
Valpola, H. and Karhunen, J. (2002).
- An unsupervised ensemble learning method for nonlinear dynamic
state-space models.
Neural Computation, 14(11):2647-2692.
-
Winn, J. and Bishop, C. M. (2005).
- Variational message passing.
Journal of Machine Learning Research, 6:661-694.
-
Yang, H. H. and Amari, S. (1997).
- Adaptive online learning algorithms for blind separation: Maximum
entropy and minimum mutual information.
Neural Computation, 9(7):1457-1482.
Tapani Raiko
2007-04-18