Jefrey Lijffijt, D. Sc. (Tech)
I am a Research Associate at University of Bristol, working on projects FORSIED and DS4DEMS. My new page is
but since UoB does not allow me to design my own page, I am still maintaining
this page.
Contact info:
- Office:
- 83 Woodland Road, BS8 1US, Bristol, UK
- Postal Address:
- MVB Woodland Road, BS8 1UB, Bristol, UK
- Telephone:
- +44 7908 222 196
- Email:
You can find me on Twitter, here.
Brief Bio
Research Topics
I am interested in theory and practice of statistical modeling and pattern
mining in various data. Currently, I am working mostly on a novel framework for
pattern mining based on subjective interestingness in data analysis.
As a side project I am active in analysis of natural language corpora and
corpus linguistics. I have also been working on (interactive visual) graph
mining and social network analysis on web data and citation networks. More
generally, I am interested in mining interesting/surprising patterns in
transactional, sequential, relational data, and graphs, as well as in text
mining, natural language processing, statistical significance testing, and
maximum entropy modeling.
I am a Research Associate (PostDoc) in Data Science at the University of
Bristol, working with Prof. Tijl De Bie.
Previously, I worked as a Postdoctoral Researcher at Aalto University with Prof. Aristides Gionis and Prof. Samuel Kaski. I defended my
doctoral dissertation at Aalto University in December 2013, my advisor was Prof. Heikki Mannila. I graduated
with distinction and I received an award from Aalto University School of
Science for the best doctoral disseration of 2013. Before my doctoral
studies, I have worked as a consultant in predictive analytics at Crystalloids,
Amsterdam, and as a research intern at Philips Research, Eindhoven.
Exploratory data mining, interactive data analysis, pattern mining,
hypothesis testing, statistical significance, maximum entropy modeling, subjective interestingness, graph mining, natural language corpora.
Recent Activity
- Jefrey Lijffijt, Eirini Spyropoulou, Bo Kang, Tijl De Bie. P-N-RMiner: A
Generic Framework for Mining Interesting Structured Relational Patterns.
To appear at IEEE International Conference on Data Science and Advanced
Analytics (DSAA) 2015. (Preprint, Presentation)
- Jefrey Lijffijt, Tanja Säily. Adjusting p-values for heterogeneity
in collocation analysis. D2E, 19 - 22 October, Helsinki, Finland, 2015. (Presentation)
- Jefrey Lijffijt, Eirini Spyropoulou, Tijl De Bie. Making Sense of
Relational Data. Tutorial at ECML-PKDD 2015. (Website
with slides)
- Jefrey Lijffijt. ''So what?'' -- On acing your PhD. Invited talk at
the doctoral consortium of ECML-PKDD 2015. (Annotated presentation)
- Matt McVicar, Cedric Mesnage, Jefrey Lijffijt, Tijl De Bie.
Interactively exploring supply and demand in the UK independent music
scene. In Proceedings of the European Conference of Machine Learning and
Principles and Practices of Knowledge Discovery in Databases (ECML-PKDD) - Part
III, pages 289-292. Springer-Verlag, Berlin-Heidelberg, 2015. (Webpage
of demo, Preprint, Original)
- Stefan Evert, Gerold Schneider, Vaclac Brezina, Stefan The. Gries, Jefrey
Lijffijt, Paul Rayson, Sean Wallis, Andrew Hardie. Corpus statistics: key
issues and controversies. Panel discussion at Corpus Linguistics 2015,
Lancaster, UK. (Slides)
- Matt McVicar, Cédric Mesnage, Jefrey Lijffijt, Eirini Spyropoulou,
Tijl De Bie. Supply and demand of independent UK music artists on the
web. In Proceedings of the 2015 ACM Conference on Web Science, 2015.
- Karmen Dykstra, Jefrey Lijffijt, Aristides Gionis. Covering the egonet:
A crowdsourcing approach to social circle discovery on Twitter. In
Proceedings of the 9th International AAAI Conference on Web and Social
Media, 2015. (Preprint, Original,
- Best Doctoral Dissertation of 2013, Aalto University School of
Research Grants
- Advisor in Academy of Finland Project Reassessing language change: the
challenge of real time (2014-2018, PI Prof. Terttu Nevalainen).
- Funded PhD Position 2012-2013, Finnish Doctoral Programme in Computational
Tutorials, Panel and Invited Talks
- Jefrey Lijffijt, Eirini Spyropoulou, Tijl De Bie. Making Sense of
Relational Data. Tutorial at ECML-PKDD 2015. (Website
with slides)
- Jefrey Lijffijt. ''So what?'' -- On acing your PhD. Invited talk at
the doctoral consortium of ECML-PKDD 2015. (Annotated presentation)
- Stefan Evert, Gerold Schneider, Vaclac Brezina, Stefan The. Gries, Jefrey
Lijffijt, Paul Rayson, Sean Wallis, Andrew Hardie. Corpus statistics: key
issues and controversies. Panel discussion at Corpus Linguistics 2015,
Lancaster, UK. (Slides)
- Jefrey Lijffijt. Are you talking Bernoulli to me? Significance testing and burstiness of words in text corpora. Department of Mathematics and Statistics, University of Jyväskylä, 11 November 2011, Jyväskylä, Finland. (Presentation)
Community Services
Organisation of Conferences, Workshops, Panels
- Stefan Evert, Gerold Schneider, Vaclac Brezina, Stefan The. Gries, Jefrey
Lijffijt, Paul Rayson, Sean Wallis, Andrew Hardie. Corpus statistics: key
issues and controversies. Panel discussion at Corpus Linguistics 2015,
Lancaster, UK. (Abstract)
Reviewer for Journals
- Data Mining and Knowledge Discovery (DAMI).
- IEEE Transactions on Knowledge and Data Engineering (TKDE).
- Machine Learning (MLJ).
Program Committee Member for Conferences and Workshops
- ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD),
- European Conference of Machine Learning and Principles and Practices of
Knowledge Discovery in Databases (ECML-PKDD), 2012, 2013, 2014.
- Interactive Data Exploration and Analytics Workshop (IDEA), 2014.
- Practical Theories of Data Mining Workshop (PTDM), 2012.
Supervision of Theses
- Bo Kang, PhD Thesis, June 2015 - now.
- Matthew Bastow, MSc Thesis patternVis: an interactive visualisation tool
for RMiner, Feb - Sep 2015.
- Guest Lecture Text Mining & NLP (Part of Introduction to AI), University of Bristol, Spring 2015. (Slides)
- Assistent to Algorithmic Methods of Data Mining, Aalto University, Fall 2012.
- Assistent to Algorithmic Methods of Data Mining, Aalto University, Fall 2011.
- Assistent to Statistical Significance Testing in Data Mining, Aalto University, Spring 2010.
Refereed Publications
Journal Articles
- Jefrey Lijffijt, Panagiotis Papapetrou, Kai Puolamäki. Size
matters: Choosing the most informative set of window lengths for mining patterns
in event sequences. Data Mining and Knowledge Discovery, online
ahead of print. (Preprint, Original, Code)
- Jefrey Lijffijt, Terttu Nevalainen, Tanja Säily, Panagiotis
Papapetrou, Kai Puolamäki, Heikki Mannila. Significance testing of word
frequencies in corpora. Digital Scholarship in the Humanities, online
ahead of print. (Preprint, Original
[free access])
- Jefrey Lijffijt, Panagiotis Papapetrou, Kai Puolamäki. A
statistical significance testing approach to mining the most informative set of
patterns. Data Mining and Knowledge Discovery, 28(1): 238-263, 2014.
(Original, Itemset
mining implementation)
Conference Articles
- Jefrey Lijffijt, Eirini Spyropoulou, Bo Kang, Tijl De Bie. P-N-RMiner: A
generic framework for mining interesting structured relational patterns.
To appear at IEEE International Conference on Data Science and Advanced
Analytics (DSAA) 2015. (Preprint, Presentation)
- Matt McVicar, Cedric Mesnage, Jefrey Lijffijt, Tijl De Bie.
Interactively exploring supply and demand in the UK independent music
scene. In Proceedings of the European Conference of Machine Learning and
Principles and Practices of Knowledge Discovery in Databases (ECML-PKDD) - Part
III, pages 289-292. Springer-Verlag, Berlin-Heidelberg, 2015. (Webpage
of demo, Preprint, Original)
- Karmen Dykstra, Jefrey Lijffijt, Aristides Gionis. Covering the egonet:
A crowdsourcing approach to social circle discovery on Twitter. In
Proceedings of the 9th International AAAI Conference on Web and Social
Media, 2015. (Preprint, Original,
- Matt McVicar, Cédric Mesnage, Jefrey Lijffijt, Eirini Spyropoulou,
Tijl De Bie. Supply and demand of independent UK music artists on the
web. In Proceedings of the 2015 ACM Conference on Web Science, 2015.
- Jefrey Lijffijt. A fast and simple method for mining subsequences with surprising event counts. In Proceedings of the European Conference of Machine Learning and Principles and Practices of Knowledge Discovery in Databases (ECML-PKDD 2013) - Part I, pages 385-400. Springer-Verlag, Berlin-Heidelberg, 2013. (Preprint, Original, Presentation)
- Jefrey Lijffijt, Tanja Säily, Terttu Nevalainen. CEECing the baseline: Lexical stability and significant change in a historical corpus. In Outposts of Historical Corpus Linguistics: From the Helsinki Corpus to a Proliferation of Resources (Studies in Variation, Contacts and Change in English 10). Research Unit for Variation, Contacts and Change in English, Helsinki, 2012. (Original)
- Jefrey Lijffijt, Panagiotis Papapetrou, Kai Puolamäki. Size matters: Finding the most informative set of window lengths. In Proceedings of the European Conference of Machine Learning and Principles and Practices of Knowledge Discovery in Databases (ECML-PKDD 2012) - Part II, pages 451-466. Springer-Verlag, Berlin-Heidelberg, 2012. (Preprint, Original, Presentation, Poster)
- Turo Vartiainen, Jefrey Lijffijt. Premodifying -ing participles in the parsed BNC. In Corpus Linguistics and Variation in English: Theory and Description, pages 247-258. Rodopi, Amsterdam, 2012. (Preprint, Original, Presentation)
- Jefrey Lijffijt, Panagiotis Papapetrou, Kai Puolamäki, Heikki Mannila. Analyzing word frequencies in large text corpora using inter-arrival times and bootstrapping. In Proceedings of the European Conference of Machine Learning and Principles and Practices of Knowledge Discovery in Databases (ECML-PKDD 2011) - Part II, pages 341-357. Springer-Verlag, Berlin-Heidelberg, 2011. (Preprint, Original, Presentation, Poster)
Workshop Articles
- Kai Puolamäki, Panagiotis Papapetrou, Jefrey Lijffijt. Visually controllable data mining methods. In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, pages 409-417. IEEE Computer Society, Washington, DC, USA, 2010. (Preprint, Original)
- Jefrey Lijffijt, Panagiotis Papapetrou, Jaakko Hollmén, Vassilis Athitsos. Benchmarking dynamic time warping for music retrieval. In Proceedings of the 3rd International Conference on Pervasive Technologies Related to Assistive Environments (PETRA), article 59. ACM New York, NY, USA, 2010. (Preprint, Original, Data, Code, Results)
- Jefrey Lijffijt, Panagiotis Papapetrou, Jaakko Hollmén. Tracking your steps on the track: Body sensor recordings of a controlled walking experiment. In Proceedings of the 3rd International Conference on Pervasive Technologies Related to Assistive Environments (PETRA), article 58. ACM New York, NY, USA, 2010. (Preprint, Original, Data)
Non-refereed Publications
Letters to Journals
- Jefrey Lijffijt, Stefan Th. Gries. Correction to Stefan Th. Gries' "Dispersions and adjusted frequencies in corpora". International Journal of Corpus Linguistics, 17 (1), 147-149, 2012. (Preprint, Original)
Technical Reports
- Jefrey Lijffijt, Panagiotis Papapetrou, Niko Vuokko, Kai Puolamäki. The smallest set of constraints that explains the data: a randomization approach. TKK-ICS-R31, TKK Reports in Information and Computer Science, Espoo, May 2010. (Original)
- Jefrey Lijffijt, Ingrid C. M. Flinsenberg. Compression-based activity classification and motif discovery in time series of acceleration data. TN-2008-00521, Koninklijke Philips Electronics N.V., Eindhoven, September 2008.
Doctoral Thesis
- Jefrey Lijffijt. Computational methods for comparison and exploration of event sequences. Doctoral dissertation, Aalto University School of Science, Dec. 2013. (Original)
Master's Thesis
- Jefrey Lijffijt. Compression-based activity classification and motif discovery in time series of acceleration data. Master's thesis, Utrecht University, Sep. 2008. (Original)
Other presentations not directly related to papers
Conference Presentations and Posters
- Jefrey Lijffijt, Tanja Säily. Adjusting p-values for heterogeneity
in collocation analysis. D2E, 19 - 22 October, Helsinki, Finland, 2015. (Presentation)
- Tanja Säily, Terttu Nevalainen, Jefrey Lijffijt. Tracing significant change in 17th-century English lexis: the civil war effect. HPSCG 15, 23 - 25 August, Helsinki, Finland, 2012. (Poster, Handouts)
- Jefrey Lijffijt, Tanja Säily, Terttu Nevalainen. Chi-square test considered harmful: Better methods for testing the significance of word frequencies. ICAME 33, 30 May - 3 June, Leuven, Belgium, 2012. (Presentation, Implementation)
- Panagiotis Papapetrou, Jefrey Lijffijt, Tanja Säily, Kai Puolamäki, Terttu Nevalainen, Heikki Mannila. Are you talking Bernoulli to me? Comparing methods of assessing word frequencies. Helsinki Corpus Festival, 28 Sep - 2 Oct, Helsinki, Finland, 2011. (Presentation)
- Turo Vartiainen, Jefrey Lijffijt. Can articles predict the word class of the premodifier? A study of the -ing participle. ICAME 32, 1 - 5 June, Oslo, Norway, 2011.
- Turo Vartiainen, Jefrey Lijffijt. Premodifying -ing participles in the parsed BNC. ICAME 31, 26 - 30 May, Giessen, Germany, 2010. (Presentation)
- Jefrey Lijffijt, Harri Siirtola, Tanja Säily, Turo Vartiainen, Terttu Nevalainen, Heikki Mannila. Towards interactive visual analysis of corpora. ICAME 31, 26 - 30 May, Giessen, Germany, 2010. (Poster)
- Jefrey Lijffijt. Local and global lexicon: a novel approach to quantifying persistence. XXXVII Kielitieteen päivät Helsingin Yliopistossa, 20 - 22 May, Helsinki, Finland, 2010. (Presentation)
Other Presentations and Posters
- Jefrey Lijffijt. Analysis of linguistic variation. Poster: Spring Workshop on Mining and Learning (SML), Bad Neuenahr, Germany, 2012. (Poster)
- Jefrey Lijffijt. Data mining tools for analysis of linguistic variation. Poster: Lorentz Workshop on Mining Patterns and Subgroups, Leiden, The Netherlands, 2010. (Poster)