next up previous
Next: About this document ... Up: Natural Conjugate Gradient in Previous: Acknowledgments

Bibliography

Amari, S. (1985).
Differential-Geometrical Methods in Statistics, volume 28 of Lecture Notes in Statistics.
Springer-Verlag.

Amari, S. (1995).
Information geometry of the EM and em algorithms for neural networks.
Neural Networks, 8(9):1379-1408.

Amari, S. (1998).
Natural gradient works efficiently in learning.
Neural Computation, 10(2):251-276.

Anderson, B. and Moore, J. (1979).
Optimal Filtering.
Prentice-Hall, Englewood Cliffs, NJ.

Barber, D. and Bishop, C. (1998).
Ensemble learning for multi-layer networks.
In Jordan, M., Kearns, M., and Solla, S., editors, Advances in Neural Information Processing Systems 10, pages 395-401. The MIT Press, Cambridge, MA, USA.

Edelman, A., Arias, T. A., and Smith, S. T. (1998).
The geometry of algorithms with orthogonality constraints.
SIAM Journal on Matrix Analysis and Applications, 20(2):303-353.

Ghahramani, Z. and Beal, M. (2001).
Propagation algorithms for variational Bayesian learning.
In Leen, T., Dietterich, T., and Tresp, V., editors, Advances in Neural Information Processing Systems 13, pages 507-513. The MIT Press, Cambridge, MA, USA.

González, A. and Dorronsoro, J. R. (2006).
A note on conjugate natural gradient training of multilayer perceptrons.
In Proc. Int. Joint Conf. on Neural Networks (IJCNN'06), pages 887-891, Vancouver, BC, Canada.

Honkela, A. and Valpola, H. (2005).
Unsupervised variational Bayesian learning of nonlinear models.
In Saul, L., Weiss, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 17, pages 593-600. MIT Press, Cambridge, MA, USA.

Honkela, A., Valpola, H., and Karhunen, J. (2003).
Accelerating cyclic update algorithms for parameter estimation by pattern searches.
Neural Processing Letters, 17(2):191-203.

Lappalainen, H. and Honkela, A. (2000).
Bayesian nonlinear independent component analysis by multi-layer perceptrons.
In Girolami, M., editor, Advances in Independent Component Analysis, pages 93-121. Springer-Verlag, Berlin.

McLachlan, G. J. and Krishnan, T. (1996).
The EM Algorithm and Extensions.
Wiley.

Murray, M. K. and Rice, J. W. (1993).
Differential Geometry and Statistics.
Chapman & Hall.

Nocedal, J. (1991).
Theory of algorithms for unconstrained optimization.
Acta Numerica, 1:199-242.

Salakhutdinov, R. and Roweis, S. T. (2003).
Adaptive overrelaxed bound optimization methods.
In Proc. 20th International Conference on Machine Learning (ICML 2003), pages 664-671.

Salakhutdinov, R., Roweis, S. T., and Ghahramani, Z. (2003).
Optimization with EM and expectation-conjugate-gradient.
In Proc. 20th International Conference on Machine Learning (ICML 2003), pages 672-679.

Sato, M. (2001).
Online model selection based on the variational Bayes.
Neural Computation, 13(7):1649-1681.

Seeger, M. (2000).
Bayesian model selection for support vector machines, Gaussian processes and other kernel classifiers.
In Solla, S., Leen, T., and Müller, K.-R., editors, Advances in Neural Information Processing Systems 12, pages 603-609. MIT Press, Cambridge, MA, USA.

Smith, S. T. (1993).
Geometric Optimization Methods for Adaptive Filtering.
PhD thesis, Harvard University, Cambridge, Massachusetts.

Tanaka, T. (2001).
Information geometry of mean-field approximation.
In Opper, M. and Saad, D., editors, Advanced Mean Field Methods: Theory and Practice, pages 259-273. The MIT Press, Cambridge, MA, USA.

Valpola, H. (2000).
Bayesian Ensemble Learning for Nonlinear Factor Analysis.
PhD thesis, Helsinki University of Technology, Espoo, Finland.
Published in Acta Polytechnica Scandinavica, Mathematics and Computing Series No. 108.

Valpola, H., Harva, M., and Karhunen, J. (2004).
Hierarchical models of variance sources.
Signal Processing, 84(2):267-282.

Valpola, H. and Karhunen, J. (2002).
An unsupervised ensemble learning method for nonlinear dynamic state-space models.
Neural Computation, 14(11):2647-2692.

Winn, J. and Bishop, C. M. (2005).
Variational message passing.
Journal of Machine Learning Research, 6:661-694.

Yang, H. H. and Amari, S. (1997).
Adaptive online learning algorithms for blind separation: Maximum entropy and minimum mutual information.
Neural Computation, 9(7):1457-1482.



Tapani Raiko 2007-04-18