November 20th, 2007
Helsinki University of Technology
Computer Science Building
Hall T5 & Sauna
|13:05||Facial landmark detection and gender classification|
|Erno Mäkinen and Yulia Gizatdinova, University of Tampere|
|13:30||Bayesian learning of novel object classes from image sequences|
|Miika Toivanen, Helsinki University of Technology|
|13:55||Detecting and localising objects by their local parts|
|Joni Kämäräinen, Lappeenranta University of Technology|
|14:20||Global and local feature representations in image content analysis|
|Ville Viitaniemi, Helsinki University of Technology|
|15:15||NTF vs. PCA features for searching in a spectral image database|
|Alexey Andriyashin, University of Joensuu|
|15:40||Content-based search and browsing in semantic multimedia retrieval|
|Mika Rautiainen, University of Oulu|
|16:05||Lossy compression of scanned map images|
|Alexey Podlasov, University of Joensuu|
|16:30||Reconstructing reality or seeing it?|
|Perttu Laurinen, University of Oulu|
|16:55||Closing of Seminar|
Human faces present a rich source of information with potential applications in security and human-computer interaction domains. Automatic face analysis is a field that provides a wide range of methods for analyzing faces in static images and in video sequences. The problem of automatic face analysis has been challenging computer scientists already for several decades, and still needs further investigation. The difficulty comes from the fact that facial appearance varies considerably with changes in environmental conditions (illumination, head pose, orientation, and occlusions), ethnicity, gender, and facial expressions. A representation of the face that remains robust with respect to a variety of facial appearances helps to solve the problem.
In this talk, we first outline the area of automatic face analysis and define its main sub areas. Further, we present research on automatic face analysis that has been carried out at the University of Tampere. First, we introduce a method of automatic facial landmark detection in static facial images showing facial expressions. Further, we present our research on automatic gender classification. We conclude this talk with a short discussion on issues that remain unresolved in these areas and outline some ideas for future work.
We have used a Bayesian framework to find a representation of any unknown class of object, appearing in images, that are presented to the system in a sequential fashion. Likelihood in a novel image is formed from the Gabor responses in a random grid of feature points, lying in the first image. The prior part assumes the shape of the grid to equal in each image, with Gaussian deviations. As some of the nodes lie on the background, the model includes an occlusion term to infer the probability at which the node should be associated with the foreground object. Sequential Monte Carlo is used in matching the grid in a novel image. After some number of images, a representation of the object class can be formed by including only the nodes with high association probability, i.e. the foreground nodes.
State-of-the-art approaches perform object detection and localisation by incorporating local descriptors and their spatial configuration into a generative probability model. The state-of-the-art is obeyed also in the presented method with the difference that it does not utilise interest point detectors similarly to the recent semi-supervised methods, but applies a supervised approach where local image features (landmarks) are annotated in a training set and therefore their appearance and spatial variation can be learnt. This enables probabilistic learning of the both, landmarks (local features) and their spatial constellation. The approach enables working in purely probabilistic search spaces providing a MAP estimate of object location, and in contrast to the recent methods, no background class needs to be formed. Using the training set pdfs for both spatial constellation and local feature appearance can be estimated. By applying an inference bias that the largest pdf mode has probability one, we are able to combine prior information (spatial configuration of the features) and observations (image feature appearance) into posterior distribution which can be generatively sampled, e.g. using MCMC techniques. The MCMC methods are sensitive to initialisation, but as a solution, we also propose a very efficient and accurate RANSAC-based method for finding good initial hypotheses of object poses. The complete method can robustly and accurately detect and localise objects under any homography.
In this talk I will take a look at global and local image feature representations. Some image features are truly global. In contrast, some properties are localised to certain parts of the images. Also the image analysis tasks can be of global or local nature. If we recast the image content analysis task as pairwise similarity assessment task, the question is whether we want to evaluate the similarity of the images as a whole or the similarity of parts. I will show examples of both type of tasks. I will also explore various techniques of synthetising global image features out of local ones.
A technique for searching in a spectral image database is proposed in this study. It is based on a similarity measure between spectral image features. New spectral image features was introduced and compared here. Non-negative tensor factorization (NTF) and principal component analysis (PCA) were applied in a spectral image domain to characterize colors of a spectral image. A New way of NTF with a multiresolution approach was used to accelerate the time complexity in the extraction of the features. The proposed method was implemented and tested with a spectral image database. Different similarity measures were applied in the different spectral image features. The results were represented in a way of images ordering according similarities. Also results were discussed and conclusions were made.
Growth in storage capacity has led to large digital video repositories and complicated the discovery of specific information without the laborious manual annotation of data. The research focuses on creating a retrieval system that is ultimately independent of manual work. To retrieve relevant content more efficiently, the semantic gap between information need and digital content data has to be overcome. This talk addresses the challenge of improving semantic search performance of video retrieval systems using pattern recognition, data abstraction and clustering techniques jointly with human interaction through manually created queries and visual browsing. The presentation introduces both methodologies and application prototypes.
An algorithm for lossy compression of scanned map images is proposed. The algorithm is based on color quantization, efficient statistical context tree modeling and arithmetic coding. The rate-distortion performance is evaluated on a set of scanned topographic maps and compared to JPEG2000 lossy compression algorithm as well as to ECW, which is a commercially available solution for compression of satellite and aerial fotos. The proposed algorithm outperforms these competitors in rate-distortion sense for whole operational rate-distortion curve. The advantage comparing to JPEG2000 varies from 40% to 60% in sense of file size for comparable quality level. Also, dithering removal filter is proposed improving visual appearance of the image and providing about 5-9% of compression efficiency.
In pattern recognition the researchers often try to understand or explain the nature of the studied phenomena using digitally stored measurement values. In a way, this can be seen as the process of trying to reconstruct what has happened and based on that numerical reconstruction (aka model / classifier) to improve the level of knowledge on the studied phenomena. However, constructing the model forces us to make compromises, because understanding complex phenomena thoroughly based solely on numerical values is very challenging for human beings.
In this presentation a simple, yet effective, method that removes a lot of the needs for compromising or guessing during the reconstruction process is proposed. The technique is based on visually mapping the hardly understandable numerical measurements to the actual events they present. The understanding is achieved by presenting and observing the measurement data synchronically with a simultaneously stored data stream that humans understand. Here, a case where the method is applied on a particularly suitable application area, wearable electronics, is shown. The experiment shows how measurement values from the movements of a human captured with a 3D accelerometer, 3D magnetometer or 3D gyroscope can be better understood by synchronizing them to a video stream. The implementation of the method is demonstrated with a versatile tool, called SignalPlayer, developed for this purpose.
In conclusion, using the presented approach the process of designing pattern recognition methods can be hopefully made easier, faster and more comfortable in many application areas. The technique is still in its early steps and there remains a lot to be researched and thus also to be presented in the future meetings of HATUTUS.
Page maintained by email@example.com, last updated Friday, 16-Nov-2007 13:53:05 EET