Proceedings of STeP'96. Jarmo Alander, Timo Honkela and Matti Jakobsson (eds.),
Publications of the Finnish Artificial Intelligence Society, pp. 35-47.
Data Mining Accounting Numbers
Using Self-Organizing Maps
Barbro Back
Turku School of Economics and Business Administration
Kaisa Sere
University of Kuopio
Hannu Vanharanta
University of Joensuu
Abstract
The amount of financial information in today's sophisticated large data bases is huge and makes
comparisons between company performance - especially over time - difficult or at least very time
consuming. The aim of this paper is to investigate whether neural networks in the form of self-organizing maps can be used to data mine accounting numbers in large data bases over several time
periods. By using self-organizing maps, we overcome the problems associated with finding the
appropriate underlying distribution and the functional form of the underlying data in the
structuring task that is often encountered, for example, when using cluster analysis. The method
chosen also offers a way of visualizing the results. The database in this study consists of annual
reports of 130 world wide forest companies with data from a five year time period.
Introduction
Competitive benchmarking is an important company-internal process, in which the functions and
performance of one company are compared with those of other companies. Financial competitive
benchmarking uses financial information -- most often in the form of ratios -- to perform these
comparisons. Financial competitive benchmarking is utilized, among other things, as a
communication tool in strategic management, for example in situations where company
management must gain approval, from internal and external interest groups alike, for new
functional objectives for the company.
Multivariate statistical methods have been used as a tool of analysis for company
performance, bankruptcy predictions, stock market predictions etc., although mostly in research
contexts. However, many problems have been reported concerning these methods. The two most
important problems are the assumption on normality in the underlying distributions and difficulties
in finding an appropriate functional form for the distributions. Moreover, results of analyses are
difficult to visualize when there are several explanatory variables [Vermeulen et al., 1994].
Vanharanta [1995] has used modern computer technology and built a hyperknowledge-based
system for financial benchmarking. The system contains a database with financial data on more
than 130 pulp and paper companies worldwide. The amount of financial information in this
system is, however, so large that it makes comparisons between companies difficult ¾ or at least
very time consuming.
In a previous study [Back et al., 1995] we investegated the potential of self-organizing maps
for pre-processing the vast financial data available on companies and for presenting an
approximated position of one company's financial performance compared to that of other
companies. The results were very promising. By using self-organizing maps we have overcome
the problems associated with finding the appropriate underlying distribution and the functional
form of the financial indicators. Furthermore, the visualization capabilities of self-organizing maps
provide a good way of presenting and analyzing the results.
Neural networks have previously been suggested by Trigueiros [1995] for use with
computerized accounting reports databases, and by Chen et al. [1995] to define cluster structures
in large databases. Martin-del-Brio and Serrano-Cinca [1995] used self-organizing maps for
analyzing the financial state of Spanish companies.
In this paper, we use the self-organizing maps to structure Vanharanta's database into
clusters based on the underlying weight maps. Each cluster is then named according to the
financial characteristics of the cluster. The data base contains financial data for a five year period.
We analyse the financial performance of the Finnish forest companies in these maps over the years
1985-89. Eventhough we take a closer look only at these companies, any individual company or
group of companies can be the focus of interest.
We anticipate that neural networks can be used in future for benchmarking purposes to help
executives find company characteristics that will lead to sustainable excellence of a company, in
other words to help answer the question: Which are the characteristics that lead a company
towards long-lasting good performance? Some company characteristics seem to produce and
maintain good overall company performance, sustainable profitability, increasing productivity and
continuous growth.
The rest of the paper is organized as follows: Section 2 describes the methodology we have
used, the database, the list of companies in the study and the criteria for and the choice of financial
ratios. Section 3 presents the results of applying neural networks to the problem and section 4
presents the empirical results. The conclusions of our study are presented in Section 5.
Methodology
Benchmarking
Competitive benchmarking is a company-internal process in
which the activities of a given company are measured against the best practices of other, best-in-class companies [Geber, B., 1990]. In the process of competitive benchmarking, internal
functions are analyzed and measured using financial (i.e. quantitative) and/or non-financial (i.e.
qualitative) yardsticks. Functions measured from one company are compared with similar
functions measured from leading competitors, or they are compared with the best practices in
other industries. The differences between compared functions are measured. The overall
management goal of competitive benchmarking within a given company is to close the measured
"gap" by changing the company's characteristics in ways that will improve company performance.
The financial information needed for financial benchmarking work is, however, invariably
available only from large commercial databases or from specialized reports and publications, from
where it must be gleaned with difficulty. Such information is thus far removed from its active
users. If the needed financial information is to be brought closer to the active users, it must first be
pre-processed, i.e. refined and classified. The overall objective of the present study is to pre-process, with the help of neural networks, the data and information needed for financial
benchmarking purposes. Thus pre-processed, the information can be used in computerized
benchmarking systems and executive support systems, making the task of competitive financial
benchmarking easier and more effective.
Self-organizing maps
Since companies [in the database] do not have
predefined labels describing their financial status, a network intended for pre-processing their data
can have no pre-desired outputs. For this reason, we utilize an unsupervised learning method. A
Kohonen network [Kohonen, 1995], being the most common network model based on
unsupervised learning, is used in this study.
Database and selection of companies
The Green Gold Financial Reports
database [Salonen and Vanharanta, 1990a, 1990b, 1991] is used as the experimental financial
knowledge base for the neural network tests. It consists of standardized income statements,
balance sheets and cash flow statements of 130 companies in the international pulp and paper
industry. The database also consists of specific financial ratios, calculated using information from
the standardized reports as well as general company information concerning products and
production volumes. There are 47 different key ratios for each company. The companies are all
based in one of three regions: North America, Northern Europe or Central Europe. The financial
data covers a period of five years from 1985 to 1989. The companies are listed in Table 1 (with
some companies omitted that did not have enough data available).
For our experiment we used some 120 pulp and paper companies from the database. We have
also included the averages of Finland, Norway and Sweden as three additional "companies".
Table 1 contains companies in
14 countries.
(Your browser must support tables to be able to view this.)
Choice of ratios
The population consists of 47 financial ratios in the
benchmarking system organized in the benchmarking system into six groups under the headings:
- Profitability
- Indebtedness
- Capital Structure
- Liquidity
- Working capital
- Cash flow ratios
The choice of ratios in this study was based on an empirical study conducted using ten
financial analysts from a large Finnish bank who participated in a validation test of the
benchmarking system [Vanharanta et al., 1995]. If a ratio in that study was used by at least five
analysts it was selected as a variable for the network in this study. The following nine ratios were
selected. The numbers in parentheses indicate the appropriate ratio group number shown above.
- Operating profit (% of sales) (1)
- Profit after financial items (% of sales) (1)
- Return on total assets (ROTA) (1)
- Return on equity (ROE) (1)
- Total liabilities (% of sales) (2)
- Solidity (3)
- Current ratio (4)
- Funds from operations (% of sales) (6)
- Investments (% of sales) (6)
We note that there are four profitability measures, one indebtedness measure, one capital
structure measure, one liquidity measure, no working capital measures and two cash flow
measures. It seems reasonable that the emphasis is on profitability in a benchmarking situation.
Training and testing the network
In this section we give a description of the construction process followed in developing the self-
organizing maps. The actual construction work was performed using The Self-Organizing Map
Program Package version 3.1 prepared by the SOM Programming Team of the Helsinki
University of Technology.
We started by standardizing the ratios in the database using histogram equalization
[Klimasauskas, 1991] in order to ease the SOM's learning process and to improve its
performance. Histogram equalization is a way of mapping rare figures to a small part of the target
range and spreading out frequent figures so that it becomes easier for the neural network to
discriminate among frequent figures.
All the maps were trained in two phases. The purpose of the first training phase was to order
the randomly initialized reference vectors of the maps to "approximately correct" values. During
the second phase the maps are "fine-tuned," i.e. final ordering of the reference vectors takes place.
We constructed maps separately for each of the years 1985, 1986, 1987, and 1989. The
nework topology chosen was hexagonal with 15 * 10 neurons in each map. This is the same
network structure as in our previous study. The parameters of the best maps with respect to the
average quantization error are given in Table 2:
Year Phase Training Learning Neighbour- Quantization
length rate hood width error
1985 1 1000 0.05 10
2 95000 0.02 3 0.247267
1986 1 1000 0.08 10
2 115000 0.02 3 0.261194
1987 1 1000 0.07 10
2 95000 0.03 3 0.274494
1988 1 1000 0.06 11
2 120000 0.02 3 0.257365
1989 1 1000 0.06 12
2 100000 0.03 3 0.253538
Table 2: Network parameters
Results
In the construction process hundreds of maps were initialized and trained. The best ones, in
respect of average quantization error (shown above in Table 2), were more carefully inspected,
i.e. the locations of the companies and the values of weights (corresponding to financial ratios)
were visualized.
The groups, or clusters, A to H on the maps in Figure 1 to Figure 5 (in Appendix ) were
identified by analyzing the weight distributions of the maps for the years 1985-89 in the forms of
s.c. U-matrices and weight maps as produced by the tool we used.
Financial performance within the groups
Our interpretation of the defined
groups based on weight maps for year 1985 is as follows:
- Group A is separated into subgroups A1 and A2. A1 can be considered as an
"average" group. The group is doing rather well regardless of which ratio is used as an
indicator. The A1 group consists solely of US companies except for one European company.
The A2 group is somewhat below average but very close to the A1 group in every respect.
- Group B is separated into subgroups B1 and B2. Characteristic to group B are high
total liabilities and investments combined with small profitability and, naturally, solidity. The
difference between the subgroups is that B1 has slightly higher values in liabilities and
investments than does B2. B1 consists mainly of Finnish and Canadian companies and B2
includes also US companies.
- Group C is best defined as "slightly better than average". It consists mainly of
North-American and European companies.
- Group D represents the best companies in terms of high profitability, solidity and
cash flow. On the other hand it is a group of low investment companies. It consists of mainly
North-American companies and one Swedish company.
- Group E represents companies with high investments and relatively low solidity
but, surprisingly at the same time, the highest liquidity. Profitability is above average. It
consists of two North-American, two Swedish and one Finnish company.
- Group F has a slightly lower profitability and liquidity than group E, but on the
other hand better solidity. If it were not for the extremely high current ratio of group E, these
two groups would probably have been defined as one. It consists of Finnish and US
companies.
- Group G is almost as good as group D. It probably would have been justified to
define also groups D and G as subgroups like B1 and B2. It consists of two Swedish, two
North-American and two European companies.
- Group H is undoubtedly the most solid group. It is a group of low investments,
cash flow and profitability, but high solidity and liquidity. It consists of mainly North-American companies.
Because the groups were identified with data from year 1985, the companies in these groups
for the other years are not always identical though the groups clearly exist. Furthermore, for
every year we do not have data from the same companies resulting in some companies missing in
some maps and appearing in others. Related to this we can notice that in the years 1988-98 a new
group of companies, Group X, starts to emerge showing an other side of the dynamics of the
system.
Financial performance over time
In the following we focus only on the
financial performance of the Finnish companies over time. As was stated previously, most of the
Finnish companies can be located to the group B for year 1985. Only two companies 25 (Rauma-Repola) and 29 (Yhtyneet) are outside this group. The same pattern continues during the years
1986-89. Most of the Finnish companies are investing heavily with huge liabilities. They have a
low solidity, a weak liquidity and a bad profitability based on the ratios chosen for this study.
In the year 1986 company 25 has joined the group B and stays within this group through the
rest of the years in this study. Company 29 stays outside group B until the last year of this study
and joins the group B in 1989.
Conclusions and future research
The objective of this study was to investigate the potential of
self-organizing maps, to pre-process
the vast amount of financial data available on companies and use these maps as data mining tools.
Our work bench consisted of a hyperknowledge-based system for financial benchmarking. The
benchmarking system contained financial data on 130 pulp and paper companies worldwide.
Using nine different ratios as variables -- four measuring profitability, one indebtedness, one
capital structure, one liquidity and two cash flow -- we constructed different maps for each of the
years 1985, 1986, 1987, 1988, and 1989. Our main interest in this investigation was to show
how to analyse the financial performance of individual companies (in this case Finnish forest
companies) over time in a world-wide scale.
Acknowledgements
We like to thank Mikko Irjala for carrying out the practical work with training the networks. The
work reported here was carried out within the AnNet-project. The authors wish to thank the
Foundation for Economic Education for providing financial support for this project.
References
Back, B. - Irjala, M. - Sere, K. - Vanharanta, V. (1995) Competitive Financial
Benchmarking Using Self-Organizing Maps. Abo Akademi, Reports on Computer
Science and Mathematics, Ser. A, No 169, 1995.
Chen, S. K. - Mangiameli, P. - West, D. (1994) The Comparative Ability of Self-organizing
Neural Networks to Define Cluster Structure. Omega, International Journal of
Management Science. Vol. 23, No. 3, pp. 271-279, 1995.
Geber, B. (1990) Benchmarking: Measuring Yourself Against the Best. Training, 27 (11),
pp. 36-44.
Klimasauskas, C.C. (1991) Applying Neural Networks, Part IV: Improving Performance.
PC/AI Magazine. Vol. 5, No. 4. 1991.
Kohonen, T. (1995) Self-Organizing Maps. Springer-Verlag.
Martin-del-Brio, B. - Serrano-Cinca, C. (1995) Self Organizing Neural Networks: The
Financial State of Spanish Companies. In Neural Networks in the Capital Markets,
edited by Refenes, John Wiley & Sons.
Salonen, H. - Vanharanta, H. (1990a) Financial Analysis World Pulp and Paper Companies
1985-1989, Nordic Countries. Green Gold Financial Reports. Vol. 1. Ekono Oy,
Espoo, Finland.
Salonen, H. - Vanharanta, H. (1990b) Financial Analysis World Pulp and Paper Companies
1985-1989, North America. Green Gold Financial Reports. Vol. 2. Ekono Oy,
Espoo, Finland.
Salonen, H. - Vanharanta, H. (1991) Financial Analysis World Pulp and Paper Companies
1985-1989, Europe. Green Gold Financial Reports. Vol. 3. Ekono Oy, Espoo,
Finland.
Trigueiros, D. (1995) Accounting Identities and the Distribution of Ratios. British
Accounting Review. Vol. 27, pp. 109-126.
Vanharanta, H. (1995) Hyperknowledge and Continuous Strategy in Executive Support
Systems. Acta Academiae Aboensis. Ser. B, Vol. 55, No. 1. Turku, Finland.
Vanharanta, H., - Käkölä, T. - Back, B. (1995) Validity and Utility of a Hyperknowledge-
Based Financial Benchmarking System. Proceedings of the Twenty-Eight Annual
Hawaii International Conference on Systems Science. IEEE Computer Society
Press, Vol. 3, pp. 221-230.
Appendix: Maps for the years 1985-1989