Second Course ``Machine Learning and Neural Networks''

Next: Third Course ``Machine Learning: Up: Our Machine Learning Courses Previous: First Course ``Machine Learning:

Second Course ``Machine Learning and Neural Networks''

As mentioned before, our laboratory had up to the teaching year 2006-2007 two courses on neural networks, ``Principles of Neural Computing'' and ``Advanced Course in Neural Computing'', both worth of 5 ECTS credit points. They were installed in the late 1990's by Profs. Juha Karhunen and Samuel Kaski. The textbook used in both courses was the well-known Prof. Haykin's book [Haykin, 1998].

As the importance of neural networks has decreased in the field of machine learning in favor of probabilistic graphical modeling and other methods during the recent years, we decided to compress our two earlier neural networks courses mentioned above into a single one in the new Macadamia master's programme. The new course, ``Machine Learning and Neural Networks'', was lectured for the first time from November to mid-December in 2007, with 4 hours lectures and exercises in a week. The course was designed and lectured by Prof. Juha Karhunen, who had lectured and developed on previous years both of our earlier neural networks courses.

Even though Haykin's book [Haykin, 1998] covers probably best the topics discussed in the new course and a new edition of it is under finalization, we chose as the main textbook for the new course Ham's and Kostanic's book [Ham & Kostanic, 2001]. The main reason for this choice was that matters are discussed throughout too extensively and thoroughly in Haykin's about 850 pages long book [Haykin, 1998] for the needs of a single course with 5 ECTS credit points.

In Ham's and Kostanic's book, the theory of neural networks is covered much more concisely in the first five chapters covering about 240 pages. Some of this theory is on matters that are not so relevant any more, such as Adaline and Hopfield networks, but we skipped almost completely such out-of-date topics in our course. The remainder in this book deals with various applications of neural networks, but they were not considered either in our course. This was because several of these applications chapters are on linear problems, while a major justification for using neural networks is their ability to tackle difficult problems using distributed nonlinear processing. Linear problems based on second-order statistics can usually be handled much more efficiently using standard numerical and signal processing methods than using slowly converging and inaccurate stochastic gradient type neural algorithms.

The new course ``Machine Learning and Neural Networks'' consists of the following 12 lectures, each covering one topic area discussed in our course:

Introduction to neural networks, examples of their applications.
Models of neuron, activation functions, network architectures.
Single neuron models and learning rules: least-mean squares (LMS) algorithm, basic perceptron, their weaknesses.
Hebbian learning and principal component analysis (PCA), preprocessing of data.
Feedforward multilayer perceptron (MLP) networks, backpropagation learning algorithms, their properties and some improvements.
Advanced optimization algorithms for multilayer perceptron networks: conjugate gradient algorithm, Levenberg-Marquardt algorithm.
Model assessment and selection: generalization, overlearning, regularization, bias-variance decomposition, validation and cross-validation.
Radial-basis function (RBF) neural networks and their learning algorithms.
Support vector machines for classification and nonlinear regression.
Independent component analysis (ICA): basic principles, criteria, learning algorithms, and some applications.
Self-organizing maps (SOM) and learning vector quantization (LVQ).
Processing of temporal information in feedforward networks, simple recurrent network.

Each of these ``lectures'' covers one subject entity. The number of slides in them varies greatly, and presenting one of them took in practice often less ore more time than one oral lecture of $2 \times 45$ minutes. Most of the lectures were based on Ham's and Kostanic's book [Ham & Kostanic, 2001], but it does not cover all the lecture topics or covers them poorly. The lecture on model assessment and selection was compiled from several sources, in particular from Bishop's book [Bishop, 2006], because these important matters are discussed very little in [Ham & Kostanic, 2001]. The lecture on support vector machines was based on Chapter 6 in Haykin's book [Haykin, 1998], and similarly the last lecture on processing of temporal information. Finally, independent component analysis was presented following the treatment in Hyvärinen's and Oja's tutorial article [Hyvärinen & Oja, 2000], which is still a good and highly readable introduction to the basic independent component analysis.

There is a little overlap with our first machine learning course ``Machine Learning: Basic Principles'' on model assessment and selection as well as on principal component analysis, but this was not considered harmful, because these matters are important and discussed from different viewpoints in these courses. There was a lot of work in designing the course and writing the lecture slides and solutions to the exercise problems. During the forthcoming years, the course will most probably remain largely as it is now, but the new promising paradigm Extreme learning machine [Huang et al., 2006] will be included in the discussion of multilayer perceptron networks.

We used the exercise problems of our two earlier neural networks courses, but also added new problems. Both Haykin's book [Haykin, 1998] and Ham's and Kostanic's book [Ham & Kostanic, 2001] have accompanying solutions manuals, making it easier to select instructive problems of suitable difficulty level. Computer demonstrations were also presented in context with the exercises to give the students an idea of how the methods discussed perform in practice. Another reason for presenting demos is that it is difficult to design suitable exercise problems for example on highly nonlinear multilayer perceptron networks.

The course also contains a computer assignment, which is selected randomly for each student from five possible assignments. Two of them are on multilayer perceptron networks, two on self-organizing maps, and one on independent component analysis. All the lecture slides, exercise problems and their solutions, prepared using Latex, are available in .pdf form on the home page of the course http://www.cis.hut.fi/Opinnot/T-61.5130/. The original Latex source files can be requested from the lecturer of the course, Prof. Juha Karhunen (email: Juha.Karhunen@tkk.fi).

Next: Third Course ``Machine Learning: Up: Our Machine Learning Courses Previous: First Course ``Machine Learning:

Tapani Raiko 2008-06-02