HEIKKI MANNILA



picture





o Short biography


o Research interests

My primary research interests are in algorithms, data mining, and data analysis. I look at basic research questions in computer science in areas where there are applications in sight. The application areas I am interested in included computational biology, paleontology, and linguistics.


o List of publications (not updated regularly)


o What's new?

  • Jefrey Lijffijt, Panagiotis Papapetrou, Kai Puolamaki, Heikki Mannila: Analyzing Word Frequencies in Large Text Corpora Using Inter-arrival Times and Bootstrapping. ECML/PKDD (2) 2011: 341-357

  • Panagiotis Papapetrou, Aristides Gionis, Heikki Mannila: A Shapley Value Approach for Influence Attribution. ECML/PKDD (2) 2011: 549-564 187

  • Aleksi Kallio, Niko Vuokko, Markus Ojala, Niina Haiminen, Heikki Mannila: Randomization techniques for assessing the significance of gene periodicity results. BMC Bioinformatics 12: 330 (2011)

  • Gemma C. Garriga, Esa Junttila, Heikki Mannila: Banded structure in binary matrices. Knowl. Inf. Syst. 28(1): 197-226 (2011)

  • T. Nevalainen, H. Raumolin-Brunberg and H. Mannila: The diffusion of language change in real time: Progressive and conservative individuals and the time depth of change Language Variation and Change 23, 1, 1-43, 2011.

  • J. Saarinen, E. Oikarinen, M. Fortelius and H. Mannila: The living and the fossilized: how well do unevenly distributed points capture the faunal information in a grid. Evolutionary Ecology Research, 12: 363–376, 2010.

  • Theodoros Lappas, Evimaria Terzi, Dimitrios Gunopulos, Heikki Mannila: Finding effectors in social networks. KDD 2010: 1059-1068

  • Panu Luosto, Jyrki Kivinen, Heikki Mannila: Gaussian Clusters and Noise: An Approach Based on the Minimum Description Length Principle. Discovery Science 2010: 251-265

  • T. Elomaa, H. Mannila, P. Orponen (eds.): Algorithms and Applications, Essays Dedicated to Esko Ukkonen on the Occasion of His 60th Birthday. ISBN 978-3-642-12475-4, Springer 2010.

  • T. Vesala, S. Launiainen, P. Kolari, J. Pumpanen, S. Sevanto, P. Hari, E. Nikinmaa, P. Kaski, H. Mannila, E. Ukkonen, S. Piao and P. Ciais: Autumn temperature and carbon balance of a boreal Scots pine forest in Southern Finland. Biogeosciences 7, 163-176, 2010.

  • M. Ojala, G. Garriga, A. Gionis, H. Mannila: Evaluating Query Result Significance in Databases via Randomizations. SDM'10: Proceedings of the 2010 SIAM International Conference on Data Mining, p. 906-917.

  • J. Wessman, T. Paunio, A. Tuulio-Henriksson, M. Koivisto, T. Partonen, J. Suvisaari, JA. Turunen, J. Wedenoja, W. Hennah, O. Pietilainen, J. Lonnqvist, H. Mannila, L. Peltonen: Mixture model clustering of phenotype features reveals evidence for association of DTNBP1 to a specific subtype of schizophrenia. Biological Psychiatry, Volume 66, Issue 11, Pages 990-996, 2009.

  • M. Miah, G. Das, V. Hristidis, H. Mannila: Determining Attributes to Maximize Visibility of Objects IEEE Transactions on Knowledge and Data Engineering 21, 7 (2009), 959-973.

  • H. Hakkoymaz, G. Chatzimilioudis, D. Gunopulos, H. Mannila: Applying Electromagnetic Field Theory Concepts to Clustering with Constraints. ECML/PKDD (1) 2009: 485-500.

  • T. Feder, H. Mannila, E. Terzi: Approximating the Minimum Chain Completion problem . Information Processing Letters, 109, 17, 2009, 980-985.

  • S. Hanhijärvi, M. Ojala, N. Vuokko, K. Puolamäki, N. Tatti, and H. Mannila: Tell me something I don't know: Randomization strategies for iterative data mining. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '09), to appear.

  • L.H. Liow, M. Fortelius, K. Lintulaakso, H. Mannila, N.Chr. Stenseth: Lower Extinction Risk in Sleep-or-Hide Mammals. American Naturalist 2009. Vol. 173, pp. 264-272.

  • A. Ukkonen, K. Puolamaki, A. Gionis, H. Mannila: A Randomized Approximation Algorithm for Computing Bucket Orders. Information Processing Letters 109 (2009), 356-359.

  • H. Mannila: Finding Total and Partial Orders from Data for Seriation, Discovery Science 2008 p. 16-25.

  • G. Garriga, A. Ukkonen, H. Mannila: Feature Selection in Taxonomies with Applications to Paleontology, Discovery Science 2008 p. 112--123. [Correction.]

  • N. Haiminen, H. Mannila, E. Terzi: Determining significance of pairwise co-occurrences of events in bursty sequences. BMC Bioinformatics 9(336), 2008. [online, open access]

  • P. Miettinen, T. Mielikainen, A. Gionis, G. Das, H. Mannila: The Discrete Basis Problem. To appear in IEEE Transactions on Knowledge and Data Engineering, 20(10), October 2008. [PrePrint from IEEE] (An expanded versio of P. Miettinen, T. Mielikainen, A. Gionis, G. Das, H. Mannila: The Discrete Basis Problem. 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) 2006, p. 335-346. PKDD Best Paper.)
  • N. Haiminen, H. Mannila: Evaluation of BIC and cross validation for model selection on sequence segmentations. International Journal of Data Mining and Bioinformatics (IJDMB) (in press).

  • L.H. Liow, M. Fortelius, E. Bingham, K. Lintulaakso, H. Mannila, L. Flynn, and N.Chr. Stenseth Higher origination and extinction rates in larger mammals. Proc Natl Acad Sci 105(16), pp. 6097-6102, 2008.

  • P.E. Lundmark, U. Liljedahl, D.I. Boomsma, H. Mannila, N.G. Martin, A. Palotie, L. Peltonen, M. Perola, T.D. Spector and A.-C. Syvänen: Evaluation of HapMap data in six populations of European descent. European Journal of Human Genetics 2008, 1-9.

  • P. Rastas, M. Koivisto, H. Mannila, and E. Ukkonen: Phasing genotypes using a hidden Markov model. In: Bioinformatics Algorithms: Techniques and Applications, I. Mandoiu and A. Zelikovsky (eds.), p. 373-391, Wiley 2008.

  • P. Miettinen, A. Gallo, H. Mannila: Finding duplicate descriptors: algorithms for redescription mining. SIAM Data Mining Conference 2008, p. 334-345.

  • M. Ojala, N. Vuokko, A. Kallio, N. Haiminen, H. Mannila: Randomization of real-valued matrices for assessing the significance of data mining results. SIAM Data Mining Conference 2008, p. 494-505..

  • B. Goethals, W. Le Page, and Heikki Mannila, Mining Association Rules of Simple Conjunctive Queries, SIAM Data Mining Conference 2008, p. 96-107.

  • M. Miah, V. Hristidis, G. Das, H. Mannila: Standing Out in a Crowd: Selecting Attributes for Maximum Visibility. International Conference on Data Engineering (ICDE 2008), p. 356-365.

  • Robert Gwadera and Aristides Gionis and Heikki Mannila: Optimal segmentation using tree models, Knowledge and Information Systems 15, 3 (2008).

  • A. Gionis, H. Mannila, T. Mielikainen, and P. Tsaparas, Assessing Data Mining Results via Swap Randomization, ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 1 , Issue 3 (December 2007) Article No. 14.
    We consider a simple randomization technique for producing random datasets that have the same row and column margins with the given dataset. Then one can test the significance of a data mining result by computing the results of interest on the randomized instances and comparing them against the results on the actual data. This randomization technique can be used to assess the results of many different types of data mining algorithms, such as frequent sets, clustering, and rankings. To generate random datasets with given mar- gins, we use variations of a Markov chain approach, which is based on a simple swap operation. We give theoretical results on the efficiency of different randomization methods, and apply the swap randomization method to several well- known datasets. Our results indicate that for some datasets the structure discovered by the data mining algorithms is a random artifact, while for other datasets the discovered structure conveys meaningful information.

    The code is available.

  • H. Mannila: The role of information technology for systems biology. In Systems Biology: A Grand Challenge for Europe, ESF 2007, p. 21-23.

  • A. Ukkonen and H. Mannila: Finding Outlying Items in Sets of Partial Rankings . In: Knowledge Discovery in Databases: PKDD 2007, p. 265-276.

  • S. Hyvonen, A. Gionis, and H. Mannila: Recurrent predictive models for sequence segmentation. Advances in Intelligent Data Analysis VII (IDA 2007), p. 195-206.

  • H. Mannila and E. Terzi: Nestedness and segmented nestedness. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2007), p. 480-489.

  • H. Heikinheimo, E. Hinkkanen, H. Mannila, T. Mielikäinen, and J. Seppänen, Finding low-entropy ssets and trees from binary data In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2007), p. 350-359.

  • Niina Haiminen, Heikki Mannila, Evimaria Terzi: Comparing segmentations by applying randomization techniques. BMC Bioinformatics 2007, 8:171 (23 May 2007).

  • N. Haiminen, H. Mannila: Discovering isochores by least-squares optimal segmentation. Gene 394 (Issues 1-2), 2007, pp. 53-60 (1 June 2007). [online via ScienceDirect]

  • N. Landwehr, T. Mielikäinen, L. Eronen, H. Toivonen and H. Mannila, Constrained hidden Markov models for population-based haplotyping, BMC Bioinformatics 2007, 8(Suppl 2):S9.

  • A. Dasgupta, G. Das, and H. Mannila: A Random Walk Approach to Sampling Hidden Databases . Proceedings of the 2007 ACM SIGMOD international conference on Management of Data (SIGMOD 2007), p. 629-640.

  • A. Hinneburg, H. Mannila, S. Kaislaniemi, T. Nevalainen and H. Raumolin-Brunberg: How to Handle Small Samples: Bootstrap and Bayesian Methods in the Analysis of Linguistic Change, Literary and Linguistic Computing 22, 2 (June 2007) 137-150; doi: 10.1093/llc/fqm006

  • H. Heikinheimo, M. Fortelius, J. Eronen and H. Mannila: Biogeography of European land mammals shows environmentally distinct and spatially coherent clusters. Journal of Biogeography 34, 6, 1053-1064 (2007). doi:10.1111/j.1365-2699.2006.01664.x

  • A. Gionis, H. Mannila, P. Tsaparas: Clustering Aggregation (long version) ACM Transactions on Knowledge Discovery from Data, 1, 1 (2007),
    The code is available.

  • R. Gwadera, A. Gionis, and H. Mannila, Optimal Segmentation using Tree Models. 2006 IEEE International Conference on Data Mining, p. 244-253, 2006

  • N. Tatti, T. Mielikainen, A. Gionis, and H. Mannila, What is the dimension of your binary data? 2006 IEEE International Conference on Data Mining, p. 603-612, 2006.

  • P. Miettinen, T. Mielikäinen, A. Gionis, G. Das, H. Mannila: The Discrete Basis Problem. 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) 2006, p. 335-346. PKDD Best Paper.
  • H. Heikinheimo, H. Mannila, J. Seppänen: Finding Trees from Unordered 0-1 Data. 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) 2006, p. 175-186.
  • A. Gionis, H. Mannila, K. Puolamaki, and A. Ukkonen, Algorithms for Discovering Bucket Orders from Data, 12th International Conference on Knowledge Discovery and Data Mining (KDD) 2006, p. 561-566.
    We consider bucket orders, i.e., total orders with ties. They can be used to capture the essential order information without overfitting the data: they form a useful concept class between total orders and arbitrary partial orders. We address the question of finding a bucket order for a set of items, given pairwise precedence information between the items. We also discuss methods for computing the pairwise precedence data. We describe simple and efficient algorithms for finding good bucket orders. Several of the algorithms have a provable approximation guarantee, and they scale well to large datasets. We provide experimental results on artificial and a real data that show the usefulness of bucket orders and demonstrate the accuracy and efficiency of the algorithms.
  • N. Landwehr, T. Mielikainen, L. Eronen, H. Toivonen, and H. Mannila: Constrained Hidden Markov Models for Population-based Haplotyping, PMSB 2006, to appear.

  • K. Puolamäki, M. Fortelius, H. Mannila: Seriation in Paleontological Data Using Markov Chain Monte Carlo Methods. PLoS Comput Biol 2(2): e6
    This paper looks at the seriation problem in paleontology. Given a collection of fossil sites, a set of taxa, and the presence/absence information for all taxa, find a good ordering for the sites. We describe a probabilistic model for the seriation problem, and show how MCMC techniques can be used to obtain estimates for the ordering of the sites, taxon lifetimes, etc. Compared to the spectral method described in another paper, the MCMC method gives better estimates of the uncertainty in the results, but is much slower. The code for the methods is available.

  • Jean-Francois Boulicaut, Luc de Raedt, Heikki Mannila (eds.): Constraint-based mining and inductive databases. Springer-Verlag LNCS Volume 3848, ISBN: 3-540-31331-1, Springer 2005.
    A collection of papers on constraints in pattern discovery and on the related concept of inductive databases.

  • J. Seppanen, H. Mannila: Boolean formulas and frequent sets. In Jean-Francois Boulicaut, Luc de Raedt, Heikki Mannila (eds.): Constraint-based mining and inductive databases, Springer-Verlag LNCS Volume 3848, ISBN: 3-540-31331-1, Springer 2005, p. 348-361.
    We consider the problem of approximation the frequency of a query, given a collection of frequent itemsets. We study the algorithm that truncates the inclusion-exclusion sum to include only the frequencies of known itemsets, give a bound for its performance on disjunctions of attributes that is smaller than the previously known bound, and show that this bound is in fact achievable. We also show how to generalize the algorithm to approximate arbitrary Boolean queries.

  • E. Bingham, A. Gionis, N. Haiminen, H. Hiisila, H. Mannila, E. Terzi: Segmentation and Dimensionality Reduction, SIAM Data Mining Conference (SDM) 2006.
    Sequence segmentation and dimensionality reduction have been used as methods for studying high-dimensional sequences: they both reduce the complexity of the representation of the original data. In this paper we study the interplay of these two techniques. We formulate the problem of segmenting a sequence while modeling it with a basis of small size, thus essentially reducing the dimension of the input sequence. We give three di erent algorithms for this problem: all combine existing methods for sequence segmentation and dimensionality reduction. For two of the proposed algorithms we prove guarantees for the quality of the solutions obtained. We describe experimental results on synthetic and real datasets, including data on exchange rates and genomic sequences. Our experiments show that the algorithms indeed discover underlying structure in the data, including both segmental structure and interdependencies between the dimensions. The code for the methods is available.

  • Polish translation of D. Hand, H. Mannila and P. Smyth: Principles of Data Mining available: " Eksploracja danych", Wydawnictwa Naukowo-Techniczne, ISBN 83-204-3053-4, 2005.

  • F. Afrati, G. Das, A. Gionis, H. Mannila, T. Mielikäinen, P. Tsaparas: Mining chains of relations. ICDM 2005, the Fifth IEEE International Conference on Data Mining, p. 553-556.

  • S. Papadimitriou, A. Gionis, P. Tsaparas, R.A. Vaisanen, H. Mannila C. Faloutsos: Parameter-Free Spatial Data Mining Using MDL. ICDM 2005, the Fifth IEEE International Conference on Data Mining, p. 346-353.

  • M. Fortelius, A. Gionis, J. Jernvall, H. Mannila, Spectral Ordering and Biochronology of European Fossil Mammals, Paleobiology 32, 2, 206-214.
    This paper looks at the seriation problem in paleontology. Given a collection of fossil sites, a set of taxa, and the presence/absence information for all taxa, find a good ordering for the sites. The biological background knowledge that is used is that the species become extant, live for a certain period, and then become extinct; i.e., in error-free data the correct ordering is characterized as the ordering giving the consecutive ones property for the matrix. Real data, however, has lots of noise, and finding the optimal ordering is a hard problem. We show that spectral methods give very good results. Basically, one constructs a similarity matrix for the sites, computes the Laplacian, and uses one of the eigenvectors as the ordering criterion. The code is available.

  • P. Rastas, M. Koivisto, H. Mannila, and E. Ukkonen: A hidden Markov technique for haplotype reconstruction. In: R. Casadio and G. Myers (eds.), Algorithms in Bioinformatics: 5th International Workshop, WABI 2005, Lecture Notes in Computer Science, 3692, pp. 140-151, Springer, 2005.

  • S. Hyvönen, H. Junninen, L. Laakso, M. Dal Maso, T. Grönholm, B. Bonn, P. Keronen, P. Aalto, V. Hiltunen, T. Pohja, S. Launiainen, P. Hari, H. Mannila, M. Kulmala: A look at aerosol formation using data mining techniques, Atmos. Chem. Phys., 5, 3345-3356, 2005.

  • A. Ukkonen, M. Fortelius, H. Mannila: Finding partial orders from unordered 0-1 data. In R. Grossman, R. Bayardo, K. P. Bennett (Eds.): Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 285-293.

  • A. Gionis, H. Mannila, P.Tsaparas, Clustering aggregation, In 21st International Conference on Data Engineering (ICDE) 2005. p. 341-352.
    The code is available.
  • J. Seppanen, H. Mannila: Boolean formulas and frequent sets. In Jean-Francois Boulicaut, Luc de Raedt, Heikki Mannila (eds.): Constraint-based mining and inductive databases, Springer-Verlag LNCS Volume 3848, ISBN: 3-540-31331-1, Springer 2005, p. 348-361.
    We consider the problem of approximation the frequency of a query, given a collection of frequent itemsets. We study the algorithm that truncates the inclusion-exclusion sum to include only the frequencies of known itemsets, give a bound for its performance on disjunctions of attributes that is smaller than the previously known bound, and show that this bound is in fact achievable. We also show how to generalize the algorithm to approximate arbitrary Boolean queries.

  • E. Bingham, A. Gionis, N. Haiminen, H. Hiisila, H. Mannila, E. Terzi: Segmentation and Dimensionality Reduction, SIAM Data Mining Conference (SDM) 2006.
    Sequence segmentation and dimensionality reduction have been used as methods for studying high-dimensional sequences: they both reduce the complexity of the representation of the original data. In this paper we study the interplay of these two techniques. We formulate the problem of segmenting a sequence while modeling it with a basis of small size, thus essentially reducing the dimension of the input sequence. We give three di erent algorithms for this problem: all combine existing methods for sequence segmentation and dimensionality reduction. For two of the proposed algorithms we prove guarantees for the quality of the solutions obtained. We describe experimental results on synthetic and real datasets, including data on exchange rates and genomic sequences. Our experiments show that the algorithms indeed discover underlying structure in the data, including both segmental structure and interdependencies between the dimensions. The code for the methods is available.

  • Polish translation of D. Hand, H. Mannila and P. Smyth: Principles of Data Mining available: " Eksploracja danych", Wydawnictwa Naukowo-Techniczne, ISBN 83-204-3053-4, 2005.

  • F. Afrati, G. Das, A. Gionis, H. Mannila, T. Mielikäinen, P. Tsaparas: Mining chains of relations. ICDM 2005, the Fifth IEEE International Conference on Data Mining, p. 553-556.

  • S. Papadimitriou, A. Gionis, P. Tsaparas, R.A. Vaisanen, H. Mannila C. Faloutsos: Parameter-Free Spatial Data Mining Using MDL. ICDM 2005, the Fifth IEEE International Conference on Data Mining, p. 346-353.

  • M. Fortelius, A. Gionis, J. Jernvall, H. Mannila, Spectral Ordering and Biochronology of European Fossil Mammals, Paleobiology 32, 2, 206-214.
    This paper looks at the seriation problem in paleontology. Given a collection of fossil sites, a set of taxa, and the presence/absence information for all taxa, find a good ordering for the sites. The biological background knowledge that is used is that the species become extant, live for a certain period, and then become extinct; i.e., in error-free data the correct ordering is characterized as the ordering giving the consecutive ones property for the matrix. Real data, however, has lots of noise, and finding the optimal ordering is a hard problem. We show that spectral methods give very good results. Basically, one constructs a similarity matrix for the sites, computes the Laplacian, and uses one of the eigenvectors as the ordering criterion. The code is available.

  • P. Rastas, M. Koivisto, H. Mannila, and E. Ukkonen: A hidden Markov technique for haplotype reconstruction. In: R. Casadio and G. Myers (eds.), Algorithms in Bioinformatics: 5th International Workshop, WABI 2005, Lecture Notes in Computer Science, 3692, pp. 140-151, Springer, 2005.

  • S. Hyvönen, H. Junninen, L. Laakso, M. Dal Maso, T. Grönholm, B. Bonn, P. Keronen, P. Aalto, V. Hiltunen, T. Pohja, S. Launiainen, P. Hari, H. Mannila, M. Kulmala: A look at aerosol formation using data mining techniques, Atmos. Chem. Phys., 5, 3345-3356, 2005.

  • A. Ukkonen, M. Fortelius, H. Mannila: Finding partial orders from unordered 0-1 data. In R. Grossman, R. Bayardo, K. P. Bennett (Eds.): Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 285-293.

  • A. Gionis, H. Mannila, P.Tsaparas, Clustering aggregation, In 21st International Conference on Data Engineering (ICDE) 2005. p. 341-352.
    The code is available.
  • M. Salmenkivi, H. Mannila: Piecewise Constant Modeling of Sequential Data Using Reversible Jump Markov Chain Monte Carlo. In J. Wang, M. Zaki, H. Toivonen, D. Shasha (Eds.): Data Mining in Bioinformatics. Springer 2005, p. 85-103

  • M. Salmenkivi, H. Mannila: Using Markov chain Monte Carlo and dynamic programming for event sequence data. Knowl. Inf. Syst. 7(3): 267-288 (2005)

  • A. Patrikainen, H. Mannila: Subspace clustering of high-dimensional binary data - A probabilistic approach. Workshop on Clustering High-Dimensional Data and Its Applications, SIAM International Conference on Data Mining 2004, pp. 57-65.

  • Mikko Koivisto, Teemu Kivioja, Pasi Rastas, Heikki Mannila, and Esko Ukkonen: Hidden Markov modelling techniques for haplotype analysis. In: S. Ben-David, J. Case, and A. Maruoka (eds.), Algorithmic Learning Theory: 15th International Conference, ALT 2004, Lecture Notes in Computer Science, 3244, pp. 37-52, Springer, 2004.

  • F. Geerts, H. Mannila, E. Terzi: Relational link-based ranking . The 30th International Conference on Very Large Data Bases (VLDB'04) , 2004, p. 552-563.

  • J. Seppänen, H. Mannila, Dense itemsets. In W. Kim, R. Kohavi, J. Gehrke, W. DuMouchel (Eds.): Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), p. 683-688.

  • A. Gionis, H. Mannila, E. Terzi, Clustered segmentations, 3rd Workshop on Mining Temporal and Sequential Data (TDM) 2004
  • A. Gionis, H. Mannila, J. Seppänen, Geometric and combinatorial tiles in 0-1 data, 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) 2004, p. 173-184.
  • F. Afrati, A. Gionis, H. Mannila, Approximating a collection of frequent sets, 10th International Conference on Knowledge Discovery and Data Mining (KDD 2004), p. 12-19.
  • o Somewhat older papers

  • Dmitry Pavlov, H. Mannila, P. Smyth: Beyond independence: probabilistic methods for query approximation on binary transaction data. IEEE Trans. Knowl. Data Eng. 15(6): 1409-1421 (2003)

  • Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Sanjeev Saluja, Hannu Toivonen, and Ram Sewak Sharma. Discovering all most specific sentences. ACM Transactions on Database Systems 28 (2): 140 - 174, June 2003. (DOI: http://doi.acm.org/10.1145/777943.777945)
  • Slides of ICDM 2003 invited talk: Global structure from sequences

  • A. Gionis, T. Kujala and H. Mannila: Fragments of order. ACM SIGKDD 2003, p. 129-136.

  • A. Leino, H. Mannila and R.-L. Pitkanen: Rule discovery and probabilistic modeling for onomastic data. PKDD 2003, p. 291-302.

  • T. Mielikainen and H. Mannila: The Pattern Ordering Problem. PKDD 2003, p. 327-338.

  • J. Seppanen, E. Bingham and H. Mannila: A simple algorithm for topic identification in 0-1 data. PKDD 2003, p. 423-434.

  • A. Gionis and H. Mannila: Finding recurrent sources in sequences. ACM ReCOMB 2003, p. 123-130.
    The code is available.

  • Y. Zhu, J. Hollmen, R. Raty, Y. Aalto, B. Nagy, E. Elonen, J. Kere, H. Mannila, K. Franssila, S. Knuutila: Investigatory and analytical approaches to differential gene expression profiling in mantle cell lymphoma. Br J Haematol. 2002 Dec;119(4):905-15.

  • T. Niini, K. Vettenranta, J. Hollmen, M.L. Larramendy, Y. Aalto, H. Wikman, B. Nagy, J.K. Seppanen, A.F. Salvador, H. Mannila, U.M. Saarinen-Pihkala, S. Knuutila: Expression of myeloid-specific genes in childhood acute lumpoblastic leukemia -- a cDNA array study. Leukemia 16, 2213-2221, 2002.

  • Luc de Raedt, Manfred Jaeger, Sau Dan Lee, Heikki Mannila: A theory of inductive query answering. Proceedings of the 2nd IEEE International Conference on Data Mining Vipin Kumar, Shusaku Tsumoto, Ning Zhong, Philip S. Yu, Xindong Wu (Eds.), pp. 123-130, 2002.

  • J. Han, R.B. Altman, V. Kumar, H. Mannila, D. Pregibon Emerging Scientific Applications in Data Mining Communications of the ACM 45, 8 (August 2002), 54-58.

  • M. Salmenkivi, J. Kere, H. Mannila: Genome Segmentation using Piecewise Constant Intensity Models and Reversible Jump MCMC. (European Computational Biology Conference 2002.) Bioinformatics 18, Supplement 2, S211-S218.

  • P. Onkamo, V. Ollikainen, P. Sevon, HTT. Toivonen, H. Mannila, and J. Kere: Association analysis for quantitative traits by data mining: QHPM. The Annals of Human Genetics 66 (2002), 419-429.

  • Machine Learning: ECML 2002 - 12th European Conference on Machine Learning, LNCS 2430, T. Elomaa, H. Mannila, H. Toivonen (Eds.). Springer 2002.

  • Principles of Data Mining and Knowledge Discovery - 6th European Conference, PKDD 2002, LNCS 2431, T. Elomaa, H. Mannila, H. Toivonen (Eds.). Springer 2002.

  • E. Bingham, H. Mannila and J. Seppänen: Topics in 0-1 data. To appear in KDD 2002.

  • H. Mannila: Global and local methods in data mining: basic techniques and open problems. ICALP 2002, 29th International Colloquium on Automata, Languages, and Programming, Malaga, Spain, July 2002; (c) Springer-Verlag

  • C.K. Leung, R. Ng, and H. Mannila: OSSM: A Segmentation Approach to Optimize Frequency Counting. ICDE 2002.

  • H. Mannila, A. Patrikainen, J. Seppänen, and J. Kere: Long-range control of expression in yeast. Bioinformatics 18, 3 (2002), 482-483.

  • B. Bollobas, G. Das, D. Gunopulos and H. Mannila: Time-Series Similarity Problems and Well-Separated Geometric Sets. Nordic Journal on Computing, 2001. Shorter version in 13th Annual ACM Symposium on Computational Geometry, 1997, p. 454-456.

  • Principles of Data Mining , David Hand, Heikki Mannila, and Padhraic Smyth, MIT Press, August 2001.

    o New links to older papers

    Here are links to some papers that previously were unlinked in the full list of publications.

  • E. Bingham and H. Mannila: Random projection in dimensionality reduction: applications to image and text data. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2001), F. Provost and R. Srikant (eds.), p. 245-250.

  • H. Mannila and C. Meek: Global partial orders from sequential data. Sixth Annual Conference on Knowledge Discovery and Data Mining (KDD-2000), p. 161-168.

  • G. Das and H. Mannila: Context-based similarity methods for categorical attributes. Principles of Data Mining and Knowledge Discovery, 4th European Conference (PKDD 2000) D.A. Zighed et al. (eds.), p. 201-211.

  • H. Mannila and D. Rusakov: Decomposing event sequences into independent components. First SIAM Conference on Data Mining, 2001.

  • H. Mannila and J. Seppänen: Recognizing similar situations from event sequences. First SIAM Conference on Data Mining, 2001.