Suomeksi

References

  1. H. Attias
    Hierarchical ICA belief networks.
    In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, NIPS 11, 1999. In press.
  2. H. Attias
    Independent factor analysis.
    Neural Computation, 11(4):803-851, 1999.
    [Post Script (674 kb)]
  3. D. Barber and C. M. Bishop.
    Ensemble learning for multi-layer networks.
    In M. I. Jordan, M. J. Kearns, and S. A. Solla, editors, NIPS 10, pages 395-401, 1998. The MIT Press.
    [Abstract and Post Script]
  4. D. Barber and B. Schottky.
    Radial basis functions: a bayesian treatement.
    In M. I. Jordan, M. J. Kearns, and S. A. Solla, editors, NIPS 10, pages 402-408, 1998. The MIT Press.
    [Abstract and Post Script]
  5. C. M. Bishop, N. Lawrence, T. Jaakkola and M. I. Jordan.
    Approximating posterior distributions in belief networks using mixtures.
    In M. I. Jordan, M. J. Kearns, and S. A. Solla, editors, NIPS 10, pages 416-422, 1998. The MIT Press.
  6. Z. Ghahramani and G. E. Hinton.
    Hierarchical non-linear factor analysis and topographic maps.
    In M. I. Jordan, M. J. Kearns, and S. A. Solla, editors, NIPS 10, pages 486-492, 1998. The MIT Press.
    [Post Script]
  7. G. E. Hinton and D. van Camp.
    Keeping neural networks simple by minimizing the description length of the weights.
    In Proceedings of the COLT'93, pages 5-13, Santa Cruz, California, 1993.
    [Post Script (744 kb)], index terms and [PDF]
  8. G. E. Hinton and Z. Ghahramani.
    Generative Models for Discovering Sparse Distributed Representations.
    Philosophical Transactions Royal Society B, 354:117-1190.
    [Post Script]
  9. G. E. Hinton and R. S. Zemel.
    Autoencoders, minimum description length and Helmholz free energy.
    In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, NIPS 6, pages 3-10, San Francisco, 1994. Morgan Kaufmann.
  10. S. Hochreiter and J. Schmidhuber.
    Flat minima.
    Neural Computation , 9(1):1-42, January 1997.
    [PDF]
  11. S. Hochreiter and J. Schmidhuber.
    LOCOCODE performs nonlinear ICA without knowing the number of sources.
    In Proceedings of the ICA'99, pages 149-154, Aussois, France, 1999.
  12. H. Lappalainen.
    Using an MDL-based cost function with neural networks
    In Proceedings of the IJCNN'98, pages 2384-2389, Anchorage, Alaska, 1998.
    [HTML], [Post Script (63 kb)]
  13. H. Lappalainen.
    Ensemble learning for independent component analysis.
    In Proceedings of the ICA'99, pages 7-12, Aussois, France, 1999.
    [HTML], [Post Script (90 kb)]
  14. H. Lappalainen and X. Giannakopoulos.
    Multi-layer perceptrons as nonlinear generative models for unsupervised learning: a Bayesian treatment.
    In Proceedings of ICANN'99. Accepted.
    [HTML], [Post Script (126 kb)]
  15. D. J. C. MacKay.
    Bayesian interpolation.
    Neural Computation , 4:415-447, 1992.
  16. D. J. C. MacKay.
    A practical Bayesian framework for backpropagation networks.
    Neural Computation , 4:448-472, 1992.
  17. D. J. C. MacKay.
    The evidence framework applied to classification networks.
    Neural Computation , 4:698-714, 1992.
  18. D. J. C. MacKay.
    Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks.
    Network 6(3):469-505, 1995.
  19. D. J. C. MacKay.
    Ensemble learning and evidence maximization.
    [Post Script]
  20. D. J. C. MacKay.
    Developments in Probabilistic Modelling with Neural Networks - Ensemble Learning.
    In Neural Networks: Artificial Intelligence and Industrial Applications. Proceedings of the 3rd Annual Symposium on Neural Networks, Nijmegen, Netherlands, 14-15 September 1995, pages 191-198, Berlin, 1995. Springer.
    [Post Script (45 kb)]
  21. D. J. C. MacKay.
    Ensemble learning for hidden Markov Models.
    Available from http://wol.ra.phy.cam.ac.uk/, 1997.
    [Post Script (33 kb)]
  22. D. J. C. MacKay.
    Comparison of approximate methods for handling hyperparameters.
    Neural Computation. Submitted.
    [Post Script]
  23. É. Moulines, J.-F. Cardoso and E. Gassiat.
    Maximum likelihood for blind separation and deconvolution of noisy signals using mixture models.
    In Proceedings of the ICASSP'97, pages 3617-3620, Munich, Germany, 1997.
    [Post Script]
  24. R. M. Neal.
    Learning Stochastic Feedforward Networks.
    Technical Report CRG-TR-90-7, Dept. of Computer Science, University of Toronto.
  25. J.-H. Oh and H. S. Seung.
    Learning generative models with the up-propagation algorithm.
    In M. I. Jordan, M. J. Kearns, and S. A. Solla, editors, NIPS 10, pages 605-611, 1998. The MIT Press.
    [Post Script]
  26. B. Pfahringer.
    Compression-based feature subset selection.
    In P. Turney, editor, IJCAI-95 Workshop on Data Engineering for Inductive Learning. IJCAI-95 Workshop Program Working Notes, Montreal, Canada, 1995.
  27. J. Rissanen.
    Modeling by shortest data description.
    Automatica, 14:465-471, 1978.
  28. J. Rissanen.
    A universal prior for integers and estimation by minimum description length.
    Annals of Statistics, 11(2):416-431, 1983.
  29. J. Rissanen.
    Stochastic complexity.
    Journal of the Royal Statistical Society (Series B), 49(3):223-239 and 252-265, 1987.
  30. J. Rissanen.
    Fisher information and stochastic complexity.
    IEEE Transactions on Information Theory, 42(1):40-47, January 1996.
  31. J. Rissanen and G. G. Langdon, Jr.
    Universal modeling and coding.
    IEEE Transactions on Information Theory, 27:12-23, 1981.
  32. L. K. Saul, T. Jaakkola and M. I. Jordan.
    Mean field theory for sigmoid belief networks.
    Journal of Artificial Intelligence Research, 4:61--76, 1996.
    [Abstract and Post Script]
  33. M. J. Schervish.
    Theory of Statistics.
    Springer-Verlag, New York, 1995.
  34. C. E. Shannon.
    A mathematical theory of communication.
    Bell System Technical Journal, 27:379-423, July 1948.
  35. C. S. Wallace and D. M. Boulton.
    An information measure for classification.
    Computer Journal, 11(2):185-194, 1968.
  36. C. S. Wallace and P. R. Freeman.
    Estimation and inference by compact coding.
    Journal of the Royal Statistical Society (Series B), 49(3):240-265, 1987.
  37. R. S. Zemel.
    A minimum description length framework for unsupervised learning.
    PhD thesis, University of Toronto, Canada, 1993.
  38. R. S. Zemel and G. E. Hinton.
    Developing population codes by minimizing description length.
    In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, NIPS 6, pages 11-18, San Francisco, 1994. Morgan Kaufmann.


Harri Lappalainen
<Harri.Lappalainen@hut.fi>