next up previous contents
Next: About this document ... Up: Bayesian Inference in Nonlinear Previous: Future work   Contents

Bibliography

Anderberg, M. (1973).
Cluster Analysis for Applications.
Academic Press, New York, NY.

Anderson, B. and Moore, J. (1979).
Optimal Filtering.
Prentice-Hall, Englewood Cliffs, NJ.

Anderson, C., Domingos, P., and Weld, D. (2002).
Relational Markov models and their application to adaptive web navigation.
In Hand, D., Keim, D., Zaïne, O., and Goebel, R., editors, Proceedings of the Eighth International Conference on Knowledge Discovery and Data Mining (KDD-02), pages 143-152, Edmonton, Canada. ACM Press.

Attias, H. (1999).
Independent factor analysis.
Neural Computation, 11(4):803-851.

Attias, H. (2001).
ICA, graphical models and variational methods.
In Roberts, S. and Everson, R., editors, Independent Component Analysis: Principles and Practice, pages 95-112. Cambridge University Press.

Attias, H. (2003).
Planning by probabilistic inference.
In Bishop, C. M. and Frey, B. J., editors, Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (AISTATS 2003), Key West, Florida.

Bar-Shalom, Y. (1981).
Stochastic dynamic programming: Caution and probing.
IEEE Transactions on Automatic Control, 26(5):1184-1195.

Barber, D. and Bishop, C. M. (1998).
Ensemble learning in Bayesian neural networks.
In Bishop, C. M., editor, Neural Networks and Machine Learning, pages 215-237. Springer, Berlin.

Bayes, T. (1763/1958).
Studies in the history of probability and statistics: IX. Thomas Bayes's essay towards solving a problem in the doctrine of chances.
Biometrika, 45:296-315.

Beal, M. and Ghahramani, Z. (2003).
The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures.
Bayesian Statistics 7, 7:453-464.

Bernardo, J. M. and Smith, A. F. M. (2000).
Bayesian Theory.
J. Wiley.

Bishop, C. M. (1995).
Neural Networks for Pattern Recognition.
Clarendon Press.

Bishop, C. M. (1999).
Latent variable models.
In Jordan, M., editor, Learning in Graphical Models, pages 371-403. The MIT Press, Cambridge, MA, USA.

Bishop, C. M. (2006).
Pattern Recognition and Machine Learning.
Springer.

Bromberg, F., Margaritis, D., and Honavar, V. (2006).
Efficient Markov network structure discovery from independence tests.
In SIAM Data Mining 2006 (SDM06).
To appear.

Charniak, E. (1993).
Statistical Language Learning.
MIT Press, Cambridge, Massachusetts.

Chen, C., editor (1990).
Neural Networks For Pattern Recognition And Their Applications.
World Scientific Publishing, Singapore.

Chen, C. (1999).
Linear System Theory and Design.
Oxford University Press, Oxford.
3rd Edition.

Choudrey, R., Penny, W., and Roberts, S. (2000).
An ensemble learning approach to independent component analysis.
In Proc. of the IEEE Workshop on Neural Networks for Signal Processing, Sydney, Australia, December 2000, pages 435-444. IEEE Press.

Chui, C. and Chen, G. (1991).
Kalman Filtering: With Real-Time Applications.
Springer.

Codd, E. (1970).
A relational model of data for large shared data banks.
Communications of the Association of Computing Machinery, 13(6):377-387.

Comon, P. (1994).
Independent component analysis - a new concept?
Signal Processing, 36:287-314.

Cowell, R. G., Dawid, A. P., Lauritzen, S. L., and Spiegelhalter, D. J. (1999).
Probabilistic Networks and Expert Systems.
Springer-Verlag, New York.

Cox, R. T. (1946).
Probability, frequency and reasonable expectation.
American Journal of Physics, 14(1):1-13.

Davison, B. and Hirsh, H. (1998).
Predicting sequences of user actions.
In Predicting the Future: AI Approaches to Time-Series Analysis, pages 5-12. AAAI Press.
Proceedings of AAAI-98/ICML-98 Workshop, published as Technical Report WS-98-07.

De Raedt, L., editor (1996).
Advances in Inductive Logic Programming.
IOS Press.

De Raedt, L. (2005).
From Inductive Logic Programming to Multi-Relational Data Mining.
Cognitive Technologies. Springer-Verlag.

De Raedt, L. and Kersting, K. (2003).
Probabilistic Logic Learning.
ACM-SIGKDD Explorations: Special issue on Multi-Relational Data Mining, 5(1):31-48.

Dean, T. L. and Wellman, M. P. (1991).
Planning and Control.
Morgan Kaufmann.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977).
Maximum likelihood from incomplete data via the EM algorithm.
J. of the Royal Statistical Society, Series B (Methodological), 39(1):1-38.

Diez, F. (1993).
Parameter adjustment in Bayes networks: The generalized noisy or-gate.
In Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence (UAI '93), pages 99-105, San Francisco, CA. Morgan Kaufmann.

Doucet, A., de Freitas, N., and Gordon, N. J. (2001).
Sequential Monte Carlo Methods in Practice.
Springer Verlag.

Doyle, J. C., Francis, B. A., and Tannenbaum, A. R. (1992).
Feedback control theory.
MacMillan, New York.

Dubois, D. and Prade, H. (1993).
Fuzzy sets and probability: misunderstandings, bridges and gaps.
In Proceedings of the Second IEEE Conference on Fuzzy Systems, pages 1059-1068.

Dzeroski, S. and Lavrac, N. (2001).
Introduction to inductive logic programming.
In Dzeroski, S. and Lavrac, N., editors, Relational Data Mining, pages 48-73. Springer-Verlag.

Eduardo Fernández Camacho, C. B. (2004).
Model Predictive Control.
Springer.

Engle, R. F. and Watson, M. W. (1987).
The Kalman filter: applications to forecasting and rational-expectations models.
In Bewley, T. F., editor, Advances in Econometrics Fifth World Congress. Cambridge University Press.

Fischer, I. and Meinl, T. (2004).
Graph based molecular data mining--an overview.
In Thissen, W., Wieringa, P., Pantic, M., and Ludema, M., editors, IEEE SMC 2004 Conference Proceedings, pages 4578-4582, Den Haag, The Netherlands.

Frasconi, P., Soda, G., and Vullo, A. (2002).
Hidden Markov models for text categorization in multi-page documents.
Journal of Intelligent Information Systems, 18(2/3):195-217.

Frey, B. J. and Hinton, G. E. (1999).
Variational learning in nonlinear Gaussian belief networks.
Neural Computation, 11(1):193-214.

Friedman, N. (1997).
Learning belief networks in the presence of missing values and hidden variables.
In Fisher, D., editor, Proceedings of the Fourteenth International Conference on Machine Learning (ICML-1997), pages 125-133, Nashville, Tennessee, USA. Morgan Kaufmann.

Friedman, N. (1998).
The Bayesian structural EM algorithm.
In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI'98), pages 129-138.

Furukawa, K., Michie, D., and Muggleton, S. (1999).
Machine Intelligence 15: Machine intelligence and inductive learning.
Oxford University Press.

Gelman, A., Carlin, J., Stern, H., and Rubin, D. (1995).
Bayesian Data Analysis.
Chapman & Hall/CRC Press, Boca Raton, Florida.

Geman, S. and Geman, D. (1984).
Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721-741.

Getoor, L. (2001).
Learning Statistical Models from Relational Data.
PhD thesis, Stanford University.

Getoor, L., Friedman, N., Koller, D., and Pfeffer, A. (2001).
Learning probabilistic relational models.
In Dzeroski, S. and Lavrac, N., editors, Relational Data Mining, pages 307-333. Springer-Verlag.

Getoor, L., Friedman, N., Koller, D., and Taskar, B. (2002).
Learning probabilistic models of link structure.
Journal of Machine Learning Research, 3:679-707.

Ghahramani, Z. (1998).
Learning dynamic Bayesian networks.
In Giles, C. and Gori, M., editors, Adaptive Processing of Sequences and Data Structures, Lecture Notes in Computer Science, pages 168-197. Springer-Verlag, Berlin.

Ghahramani, Z. and Beal, M. (2001).
Propagation algorithms for variational Bayesian learning.
In Leen, T., Dietterich, T., and Tresp, V., editors, Advances in Neural Information Processing Systems 13, pages 507-513. The MIT Press, Cambridge, MA, USA.

Ghahramani, Z. and Jordan, M. (1997).
Factorial hidden Markov models.
Machine Learning, 29:245-273.

Giarratano, J. and Riley, G. (1994).
Expert Systems, Principles and Programming.
PWS Publishing Company, Boston.

Gödel, K. (1929).
Über die Vollständigkeit des Logikkalküls.
PhD thesis, University Of Vienna.

Green, P., Barker, J., Cooke, M., and Josifovski, L. (2001).
Handling missing and unreliable information in speech recognition.
In Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics (AISTATS 2001), pages 49-56, Key West, Florida, USA.

Hanson, C. W. and Marshall, B. (2001).
Artificial intelligence applications in the intensive care unit.
Critical Care Medicine, 29(2):427-435.

Harman, H. (1967).
Modern Factor Analysis.
University of Chicago Press, 2nd edition.

Harva, M. and Kabán, A. (2005).
A variational Bayesian method for rectified factor analysis.
In Proc. Int. Joint Conf. on Neural Networks (IJCNN'05), pages 185-190, Montreal, Canada.

Harva, M., Raiko, T., Honkela, A., Valpola, H., and Karhunen, J. (2005).
Bayes Blocks: An implementation of the variational Bayesian building blocks framework.
In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI 2005), pages 259-266, Edinburgh, Scotland.

Haykin, S. (1999).
Neural Networks - A Comprehensive Foundation, 2nd ed.
Prentice-Hall.

Helma, C., Gottmann, E., and Kramer, S. (2000).
Knowledge discovery and data mining in toxicology.
Statistical Methods in Medical Research, 9:329-358.
Special issue on Data Mining in Medicine.

Hinton, G. E. and van Camp, D. (1993).
Keeping neural networks simple by minimizing the description length of the weights.
In Proc. of the 6th Ann. ACM Conf. on Computational Learning Theory, pages 5-13, Santa Cruz, CA, USA.

Hofmann, R. and Tresp, V. (1996).
Discovering structure in continuous variables using Bayesian networks.
In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., editors, Advances in Neural Information Processing Systems, volume 8, pages 500-506. The MIT Press.

Hofmann, R. and Tresp, V. (1998).
Nonlinear Markov networks for continuous variables.
In Jordan, M. I., Kearns, M. J., and Solla, S. A., editors, Advances in Neural Information Processing Systems, volume 10, pages 521-529. The MIT Press.

Honkela, A., Harmeling, S., Lundqvist, L., and Valpola, H. (2004).
Using kernel PCA for initialisation of variational Bayesian nonlinear blind source separation method.
In Puntonet, C. G. and Prieto, A., editors, Proc. of the Fifth International Conference on Independent Component Analysis and Blind Signal Separation (ICA 2004), volume 3195 of Lecture Notes in Computer Science, pages 790-797, Granada, Spain. Springer-Verlag, Berlin.

Honkela, A., Östman, T., and Vigário, R. (2005).
Empirical evidence of the linear nature of magnetoencephalograms.
In Proc. 13th European Symposium on Artificial Neural Networks (ESANN 2005), pages 285-290, Bruges, Belgium.

Honkela, A. and Valpola, H. (2004).
Variational learning and bits-back coding: an information-theoretic view to Bayesian learning.
IEEE Transactions on Neural Networks, 15(4):800-810.

Honkela, A. and Valpola, H. (2005).
Unsupervised variational Bayesian learning of nonlinear models.
In Saul, L., Weiss, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 17, pages 593-600. MIT Press, Cambridge, MA, USA.

Honkela, A., Valpola, H., and Karhunen, J. (2003).
Accelerating cyclic update algorithms for parameter estimation by pattern searches.
Neural Processing Letters, 17(2):191-203.

Hornik, K., Stinchcombe, M., and White, H. (1989).
Multilayer feedforward networks are universal approximators.
Neural Networks, 2(5):359-366.

Horváth, T., Wrobel, S., and Bohnebeck, U. (2001).
Relational instance-based learning with lists and terms.
Machine Learning, 43:53-80.

Hyvärinen, A., Karhunen, J., and Oja, E. (2001).
Independent Component Analysis.
J. Wiley.

Ilin, A. and Honkela, A. (2004).
Postnonlinear independent component analysis by variational Bayesian learning.
In Puntonet, C. G. and Prieto, A., editors, Proc. of the Fifth International Conference on Independent Component Analysis and Blind Signal Separation (ICA 2004), volume 3195 of Lecture Notes in Computer Science, pages 766-773, Granada, Spain. Springer-Verlag, Berlin.

Ilin, A. and Valpola, H. (2005).
On the effect of the form of the posterior approximation in variational learning of ICA models.
Neural Processing Letters, 22(2):183-204.

Ilin, A., Valpola, H., and Oja, E. (2004).
Nonlinear dynamical factor analysis for state change detection.
IEEE Transactions on Neural Networks, 15(3):559-575.

Jacobs, N. and Blockeel, H. (2001).
The learning shell: Automated macro construction.
In User Modeling 2001, pages 34-43.

Jaynes, E. T. (2003).
Probability Theory: The Logic of Science.
Cambridge University Press, Cambridge, UK.

Jensen, F., Lauritzen, S. L., and Olesen, K. G. (1990).
Bayesian updating in causal probabilistic networks by local computations.
Computational Statistics Quarterly, 4:269-282.

Jolliffe, I. T. (1986).
Principal Component Analysis.
Springer-Verlag.

Jordan, M., editor (1999).
Learning in Graphical Models.
The MIT Press, Cambridge, MA, USA.

Jordan, M., Ghahramani, Z., Jaakkola, T., and Saul, L. (1999).
An introduction to variational methods for graphical models.
In Jordan, M., editor, Learning in Graphical Models, pages 105-161. The MIT Press, Cambridge, MA, USA.

Julier, S. and Uhlmann, J. (1997).
A new extension of the Kalman filter to nonlinear systems.
In Int. Symp. Aerospace/Defense Sensing, Simul. and Controls.

Jutten, C. and Karhunen, J. (2004).
Advances in blind source separation (BSS) and independent component analysis (ICA) for nonlinear mixtures.
International Journal of Neural Systems, 14(5):267-292.

Kalman, R. E. (1960).
A new approach to linear filtering and prediction problems.
Transactions of the ASME-Journal of Basic Engineering, 82(Series D):35-45.

Kendall, M. (1975).
Multivariate Analysis.
Charles Griffin & Co.

Kersting, K. and De Raedt, L. (2001).
Bayesian logic programs.
Technical Report 151, Institute for Computer Science, University of Freiburg, Germany.

Kersting, K. and De Raedt, L. (2006).
Bayesian Logic Programming: Theory and tool.
In Getoor, L. and Taskar, B., editors, An Introduction to Statistical Relational Learning. MIT Press.
To appear.

Kersting, K. and Landwehr, N. (2004).
Scaled conjugate gradients for maximum likelihood: An empirical comparison with the EM algorithm.
In J. A. Gámez, S. M. and Salmerón, A., editors, "Advances in Bayesian Networks", Series: Studies in Fuzziness and Soft Computing, volume 146, pages 235-254. Springer.

Kirk, D. E. (2004).
Optimal Control Theory.
Courier Dover Publications.

Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983).
Optimization by simulated annealing.
Science, 220(4598):671-680.

Klir, G. and Yuan, B. (1995).
Fuzzy Sets and Fuzzy Logic: Theory and Applications.
Prentice-Hall Inc.

Kohonen, T. (2001).
Self-Organizing Maps.
Springer, 3rd, extended edition.

Koller, D. (1999).
Probabilistic relational models.
In Dzeroski, S. and Flach, P., editors, Proceedings of Ninth International Workshop on Inductive Logic Programming (ILP-99), volume 1634 of LNAI, pages 3-13, Bled, Slovenia. Springer.

Korvemaker, B. and Greiner, R. (2000).
Predicting UNIX command files: Adjusting to user patterns.
In Adaptive User Interfaces: Papers from the 2000 AAAI Spring Symposium, pages 59-64.

Koski, T. (2001).
Hidden Markov Models for Bioinformatics.
Kluwer Academic Publishers.

Landwehr, N., Kersting, K., and De Raedt, L. (2005).
nFOIL: Integrating Naïve Bayes and Foil.
In Veloso, M. and Kambhampat, S., editors, Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05), pages 275-282, Pittsburgh, Pennsylvania, USA. AAAI Press.

Landwehr, N., Mielikäinen, T., Eronen, L., Toivonen, H., and Mannila, H. (2006).
Constrained hidden markov models for population-based haplotyping.
In Rouso, J., Kaski, S., and Ukkonen, E., editors, Proceedings of the Workshop on Probabilistic Modeling and Machine Learning in Structural and Systems Biology (PMSB), Tuusula, Finland.

Lane, T. (1999).
Hidden Markov models for human/computer interface modeling.
In Rudström, Å., editor, Proceedings of the IJCAI-99 Workshop on Learning about Users, pages 35-44, Stockholm, Sweden.

Lappalainen, H. and Honkela, A. (2000).
Bayesian nonlinear independent component analysis by multi-layer perceptrons.
In Girolami, M., editor, Advances in Independent Component Analysis, pages 93-121. Springer-Verlag, Berlin.

Lappalainen, H. and Miskin, J. (2000).
Ensemble learning.
In Girolami, M., editor, Advances in Independent Component Analysis, pages 75-92. Springer-Verlag, Berlin.

Lasserre, J., Bishop, C. M., and Minka, T. (2006).
Principled hybrids of generative and discriminative models.
In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, New York.

Lavrac, N. and Dzeroski, S. (1994).
Inductive Logic Programming: Techniques and Applications.
Ellis Horwood, New York.

Little, R. and D.B.Rubin (1987).
Statistical Analysis with Missing Data.
J. Wiley & Sons.

Lloyd, J. (2003).
Logic for Learning: Learning Comprehensible Theories from Structured Data.
Springer-Verlag.

MacKay, D. J. C. (1995a).
Developments in probabilistic modelling with neural networks - ensemble learning.
In Neural Networks: Artificial Intelligence and Industrial Applications. Proc. of the 3rd Annual Symposium on Neural Networks, pages 191-198.

MacKay, D. J. C. (1995b).
Probable networks and plausible predictions--a review of practical Bayesian methods for supervised neural networks.
Network: Computation in Neural Systems, 6:469-505.

MacKay, D. J. C. (2003).
Information Theory, Inference, and Learning Algorithms.
Cambridge University Press.

Maybeck, P. S. (1979).
Stochastic models, estimation, and control, volume 141 of Mathematics in Science and Engineering.
Academic Press.

Meila, M. and Jordan, M. I. (1996).
Learning fine motion by markov mixtures of experts.
In Touretzky, D., Mozer, M. C., and Hasselmo, M., editors, Advances in Neural Information Processing Systems 8. MIT Press.

Meng, X. L. and van Dyk, D. A. (1995).
Augmenting data wisely to speed up the em algorithm.
In Proceedings of the Statistical Computing Section of the American Statistical Association, pages 160-165.

Minka, T. (2001).
Expectation propagation for approximate Bayesian inference.
In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, UAI 2001, pages 362-369.

Miskin, J. and MacKay, D. J. C. (2001).
Ensemble learning for blind source separation.
In Roberts, S. and Everson, R., editors, Independent Component Analysis: Principles and Practice, pages 209-233. Cambridge University Press.

Morari, M. and Lee, J. (1999).
Model predictive control: Past, present and future.
Computers and Chemical Engineering, pages 667-682.

Muggleton, S. (1995).
Inverse entailment and Progol.
New Generation Computing Journal, 13:245-286.

Muggleton, S. and De Raedt, L. (1994).
Inductive logic programming: Theory and methods.
Journal of Logic Programming, 19/20:629-679.

Muggleton, S. and Feng, C. (1992).
Efficient induction in logic programs.
In Muggleton, S., editor, Inductive Logic Programming, pages 281-298. Academic Press.

Murphy, K. P. (2001).
An introduction to graphical models.
Technical report, Intel Research.

Murphy, K. P., Weiss, Y., and Jordan, M. I. (1999).
Loopy belief propagation for approximate inference: An empirical study.
In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI'99), pages 467-475.

Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995).
SCOP: a structural classification of proteins database for the investigation of sequences and structures.
Journal of Molecular Biology, 247:536-540.

Neal, R. M. (1992).
Connectionist learning of belief networks.
Artificial Intelligence, 56:71-113.

Neal, R. M. (2001).
Annealed importance sampling.
Statistics and Computing, 11(2):125-139.

Neal, R. M. and Hinton, G. E. (1999).
A view of the EM algorithm that justifies incremental, sparse, and other variants.
In Jordan, M. I., editor, Learning in Graphical Models, pages 355-368. The MIT Press, Cambridge, MA, USA.

Neapolitan, R. E. (2004).
Learning Bayesian Networks.
Pearson Prentice Hall, Upper Saddle River, NJ.

Nolan, L., Harva, M., Kabán, A., and Raychaudhury, S. (2006).
A data-driven Bayesian approach for finding young stellar populations in early-type galaxies from their UV-optical spectra.
Monthly Notices of the Royal Astronomical Society, 366(1):321-338.

Palomäki, K. J., Brown, G. J., and Barker, J. (2004).
Techniques for handling convolutional distortion with "missing data" automatic speech recognition.
Speech Communication, 43:123-142.

Pearl, J. (1988).
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
Morgan Kaufman, San Francisco.

Petersen, K. B., Winther, O., and Hansen, L. K. (2005).
On the slow convergence of EM and VBEM in low noise linear mixtures.
Neural Computation, 17(9):1921-1926.

Pietraszek, T. and Tanner, A. (2005).
Data mining and machine learning--towards reducing false positives in intrusion detection.
Information Security Technical Report Journal, 10(3):169-183.

Popescul, A., Ungar, L., Lawrence, S., and Pennock, D. (2003).
Statistical relational learning for document mining.
In Proceedings of the IEEE International Conference on Data Mining (ICDM-03), pages 275-282.

Psiaki, M. (2005).
Backward-smoothing extended Kalman filter.
Journal of Guidance, Control, and Dynamics, 28(5).

Quinlan, J. (1990).
Learning logical definitions from relations.
Machine Learning, 5(3):239-266.

Rabiner, L. R. and Juang, B. H. (1986).
An introduction to hidden Markov models.
IEEE Acoustics, Speech, and Signal Processing Magazine, 3(1):4-15.

Raiko, T. (2001).
Hierarchical nonlinear factor analysis.
Master's thesis, Helsinki University of Technology, Espoo.

Raiko, T., Kersting, K., Karhunen, J., and De Raedt, L. (2002).
Bayesian learning of logical hidden markov models.
In Proceedings of the Finnish Artificial Intelligence Conference (STeP 2002), pages 64-71, Oulu, Finland.

Raju, K., Ristaniemi, T., Karhunen, J., and Oja, E. (2006).
Jammer suppression in DS-CDMA arrays using independent component analysis.
IEEE Trans. on Wireless Communications, 5(1):77-82.

Reiter, R. (1978).
On closed world data bases.
In Logic and Data Bases, pages 119-140. Plenum Publ. Co., New York.

Resnik, M. (1987).
Choices: An Introduction to Decision Theory.
University of Minnesota Press, Minneapolis, Minnesota.

Ristaniemi, T. (2000).
Synchronization and Blind Signal Processing in CDMA Systems.
PhD thesis, University of Jyväskylä, Jyväskylä, Finland.

Ristic, B., Arulampalam, S., and Gordon, N. (2004).
Beyond the Kalman Filter.
Artech House.

Roberts, S. and Everson, R. (2001).
Introduction.
In Roberts, S. and Everson, R., editors, Independent Component Analysis: Principles and Practice, pages 1-70. Cambridge University Press.

Rosenqvist, F. and Karlström, A. (2005).
Realisation and estimation of piecewise-linear output-error models.
Automatica, 41(3):545-551.

Russell, S. and Norvig, P. (1995).
Artificial Intelligence A Modern Approach.
Prentice-Hall, New Jersey.

Salakhutdinov, R., Roweis, S. T., and Ghahramani, Z. (2003).
Optimization with EM and expectation-conjugate-gradient.
In Proceedings of the international conference on machine learning (ICML-2003), pages 672-679.

Särkkä, S., Vehtari, A., and Lampinen, J. (2006).
Rao-Blackwellized particle filter for multiple target tracking.
Information Fusion.
to appear.

Schwarz, G. (1978).
Estimating the dimension of a model.
The Annals of Statistics, 6(2):461-464.

Segal, E., Taskar, B., Gasch, A., Friedman, N., and Koller, D. (2001).
Rich probabilistic models for gene expression.
Bioinformatics, 17:243-252.

Seltzer, M., Raj, B., and Stern, R. (2004).
A Bayesian framework for spectrographic mask estimation for missing feature speech recognition.
Speech Communication, 43(4):379-393.

Spiegelhalter, D., Thomas, A., Best, N., and Gilks, W. (1995).
BUGS: Bayesian inference using Gibbs sampling, version 0.50.

Srinivasan, A. (2005).
The Aleph manual.
Available at http://web.comlab.ox.ac.uk/oucl/work/ashwin.srinivasan/.

Sterling, L. and Shapiro, E. (1994).
The Art of Prolog.
The MIT Press, second edition.

Stinchcombe, M. and White, H. (1989).
Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions.
In Proceedings of the International Joint Conference on Neural Networks (IJCNN '89), pages I-613-617.

Stone, M. (1974).
Cross-validation choice and assessment of statistical predictions.
Journal of the Royal Statistical Society, 36:111-147.

Taskar, B., Abbeel, P., and Koller, D. (2002).
Discriminative probabilistic models for relational data.
In Proc. Conference on Uncertainty in Artificial Intelligence (UAI02), pages 485-492, Edmonton.

Thrun, S. (1992).
The role of exploration in learning control.
In White, D. and Sofge, D., editors, Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, Florence, Kentucky 41022.

Tornio, M. and Raiko, T. (2006).
Variational Bayesian approach for nonlinear identification and control.
In Proceedings of the IFAC Workshop on Nonlinear Model Predictive Control for Fast Systems (NMPC FS06), Grenoble, France.
To appear.

Valpola, H., Harva, M., and Karhunen, J. (2004).
Hierarchical models of variance sources.
Signal Processing, 84(2):267-282.

Valpola, H., Honkela, A., Harva, M., Ilin, A., Raiko, T., and Östman, T. (2003a).
Bayes blocks software library.
Available at http://www.cis.hut.fi/projects/bayes/software/.

Valpola, H. and Karhunen, J. (2002).
An unsupervised ensemble learning method for nonlinear dynamic state-space models.
Neural Computation, 14(11):2647-2692.

Valpola, H., Östman, T., and Karhunen, J. (2003b).
Nonlinear independent factor analysis by hierarchical models.
In Proc. 4th Int. Symp. on Independent Component Analysis and Blind Signal Separation (ICA2003), pages 257-262, Nara, Japan.

Valpola, H., Raiko, T., and Karhunen, J. (2001).
Building blocks for hierarchical latent variable models.
In Proc. 3rd Int. Conf. on Independent Component Analysis and Signal Separation (ICA2001), pages 710-715, San Diego, USA.

Vigário, R., Jousmäki, V., Hämäläinen, M., Hari, R., and Oja, E. (1998).
Independent component analysis for identification of artifacts in magnetoencephalographic recordings.
In Advances in Neural Information Processing System 10 (Proc. NIPS 97), pages 229-235. MIT Press.

Wallace, C. S. (1990).
Classification by minimum-message-length inference.
In Aki, S. G., Fiala, F., and Koczkodaj, W. W., editors, Advances in Computing and Information - ICCI '90, volume 468 of Lecture Notes in Computer Science, pages 72-81. Springer, Berlin.

Winn, J. and Bishop, C. M. (2005).
Variational message passing.
Journal of Machine Learning Research, 6:661-694.

Winn, J. and Joijic, N. (2005).
LOCUS: Learning object classes with unsupervised segmentation.
In Proc. IEEE Intl. Conf. on Computer Vision (ICCV), pages 756-763, Beijing.

Won, K.-J., Prugel-Bennett, A., and Krogh, A. (2006).
Evolving the structure of hidden Markov models.
IEEE Transactions on Evolutionary Computation, 10(1):39-49.



Tapani Raiko 2006-11-21