The Program for the Hatutus Fall Seminar 2007

November 20th, 2007

Helsinki University of Technology
Computer Science Building
Hall T5 & Sauna


13:00	Opening

13:05	Facial landmark detection and gender classification
	Erno Mäkinen and Yulia Gizatdinova, University of Tampere
13:30	Bayesian learning of novel object classes from image sequences
	Miika Toivanen, Helsinki University of Technology
13:55	Detecting and localising objects by their local parts
	Joni Kämäräinen, Lappeenranta University of Technology
14:20	Global and local feature representations in image content analysis
	Ville Viitaniemi, Helsinki University of Technology

14:45	Coffee break

15:15	NTF vs. PCA features for searching in a spectral image database
	Alexey Andriyashin, University of Joensuu
15:40	Content-based search and browsing in semantic multimedia retrieval
	Mika Rautiainen, University of Oulu
16:05	Lossy compression of scanned map images
	Alexey Podlasov, University of Joensuu
16:30	Reconstructing reality or seeing it?
	Perttu Laurinen, University of Oulu

16:55	Closing of Seminar

17:00	General Meeting

18:00	Christmas Party

Abstracts

Facial landmark detection and gender classification

Human faces present a rich source of information with potential applications in security and human-computer interaction domains. Automatic face analysis is a field that provides a wide range of methods for analyzing faces in static images and in video sequences. The problem of automatic face analysis has been challenging computer scientists already for several decades, and still needs further investigation. The difficulty comes from the fact that facial appearance varies considerably with changes in environmental conditions (illumination, head pose, orientation, and occlusions), ethnicity, gender, and facial expressions. A representation of the face that remains robust with respect to a variety of facial appearances helps to solve the problem.

In this talk, we first outline the area of automatic face analysis and define its main sub areas. Further, we present research on automatic face analysis that has been carried out at the University of Tampere. First, we introduce a method of automatic facial landmark detection in static facial images showing facial expressions. Further, we present our research on automatic gender classification. We conclude this talk with a short discussion on issues that remain unresolved in these areas and outline some ideas for future work.

Bayesian learning of novel object classes from image sequences

We have used a Bayesian framework to find a representation of any unknown class of object, appearing in images, that are presented to the system in a sequential fashion. Likelihood in a novel image is formed from the Gabor responses in a random grid of feature points, lying in the first image. The prior part assumes the shape of the grid to equal in each image, with Gaussian deviations. As some of the nodes lie on the background, the model includes an occlusion term to infer the probability at which the node should be associated with the foreground object. Sequential Monte Carlo is used in matching the grid in a novel image. After some number of images, a representation of the object class can be formed by including only the nodes with high association probability, i.e. the foreground nodes.

Detecting and localising objects by their local parts

State-of-the-art approaches perform object detection and localisation by incorporating local descriptors and their spatial configuration into a generative probability model. The state-of-the-art is obeyed also in the presented method with the difference that it does not utilise interest point detectors similarly to the recent semi-supervised methods, but applies a supervised approach where local image features (landmarks) are annotated in a training set and therefore their appearance and spatial variation can be learnt. This enables probabilistic learning of the both, landmarks (local features) and their spatial constellation. The approach enables working in purely probabilistic search spaces providing a MAP estimate of object location, and in contrast to the recent methods, no background class needs to be formed. Using the training set pdfs for both spatial constellation and local feature appearance can be estimated. By applying an inference bias that the largest pdf mode has probability one, we are able to combine prior information (spatial configuration of the features) and observations (image feature appearance) into posterior distribution which can be generatively sampled, e.g. using MCMC techniques. The MCMC methods are sensitive to initialisation, but as a solution, we also propose a very efficient and accurate RANSAC-based method for finding good initial hypotheses of object poses. The complete method can robustly and accurately detect and localise objects under any homography.

Global and local feature representations in image content analysis

In this talk I will take a look at global and local image feature representations. Some image features are truly global. In contrast, some properties are localised to certain parts of the images. Also the image analysis tasks can be of global or local nature. If we recast the image content analysis task as pairwise similarity assessment task, the question is whether we want to evaluate the similarity of the images as a whole or the similarity of parts. I will show examples of both type of tasks. I will also explore various techniques of synthetising global image features out of local ones.

NTF vs. PCA features for searching in a spectral image database

A technique for searching in a spectral image database is proposed in this study. It is based on a similarity measure between spectral image features. New spectral image features was introduced and compared here. Non-negative tensor factorization (NTF) and principal component analysis (PCA) were applied in a spectral image domain to characterize colors of a spectral image. A New way of NTF with a multiresolution approach was used to accelerate the time complexity in the extraction of the features. The proposed method was implemented and tested with a spectral image database. Different similarity measures were applied in the different spectral image features. The results were represented in a way of images ordering according similarities. Also results were discussed and conclusions were made.

Content-based search and browsing in semantic multimedia retrieval

Growth in storage capacity has led to large digital video repositories and complicated the discovery of specific information without the laborious manual annotation of data. The research focuses on creating a retrieval system that is ultimately independent of manual work. To retrieve relevant content more efficiently, the semantic gap between information need and digital content data has to be overcome. This talk addresses the challenge of improving semantic search performance of video retrieval systems using pattern recognition, data abstraction and clustering techniques jointly with human interaction through manually created queries and visual browsing. The presentation introduces both methodologies and application prototypes.

Lossy compression of scanned map images

An algorithm for lossy compression of scanned map images is proposed. The algorithm is based on color quantization, efficient statistical context tree modeling and arithmetic coding. The rate-distortion performance is evaluated on a set of scanned topographic maps and compared to JPEG2000 lossy compression algorithm as well as to ECW, which is a commercially available solution for compression of satellite and aerial fotos. The proposed algorithm outperforms these competitors in rate-distortion sense for whole operational rate-distortion curve. The advantage comparing to JPEG2000 varies from 40% to 60% in sense of file size for comparable quality level. Also, dithering removal filter is proposed improving visual appearance of the image and providing about 5-9% of compression efficiency.

Reconstructing reality or seeing it?

In pattern recognition the researchers often try to understand or explain the nature of the studied phenomena using digitally stored measurement values. In a way, this can be seen as the process of trying to reconstruct what has happened and based on that numerical reconstruction (aka model / classifier) to improve the level of knowledge on the studied phenomena. However, constructing the model forces us to make compromises, because understanding complex phenomena thoroughly based solely on numerical values is very challenging for human beings.

In this presentation a simple, yet effective, method that removes a lot of the needs for compromising or guessing during the reconstruction process is proposed. The technique is based on visually mapping the hardly understandable numerical measurements to the actual events they present. The understanding is achieved by presenting and observing the measurement data synchronically with a simultaneously stored data stream that humans understand. Here, a case where the method is applied on a particularly suitable application area, wearable electronics, is shown. The experiment shows how measurement values from the movements of a human captured with a 3D accelerometer, 3D magnetometer or 3D gyroscope can be better understood by synchronizing them to a video stream. The implementation of the method is demonstrated with a versatile tool, called SignalPlayer, developed for this purpose.

In conclusion, using the presented approach the process of designing pattern recognition methods can be hopefully made easier, faster and more comfortable in many application areas. The technique is still in its early steps and there remains a lot to be researched and thus also to be presented in the future meetings of HATUTUS.