Proceedings of STeP'96. Jarmo Alander, Timo Honkela and Matti Jakobsson (eds.),
Publications of the Finnish Artificial Intelligence Society, pp. 35-47.

Data Mining Accounting Numbers
Using Self-Organizing Maps

Barbro Back
Turku School of Economics and Business Administration

Kaisa Sere
University of Kuopio

Hannu Vanharanta
University of Joensuu

Abstract

The amount of financial information in today's sophisticated large data bases is huge and makes comparisons between company performance - especially over time - difficult or at least very time consuming. The aim of this paper is to investigate whether neural networks in the form of self-organizing maps can be used to data mine accounting numbers in large data bases over several time periods. By using self-organizing maps, we overcome the problems associated with finding the appropriate underlying distribution and the functional form of the underlying data in the structuring task that is often encountered, for example, when using cluster analysis. The method chosen also offers a way of visualizing the results. The database in this study consists of annual reports of 130 world wide forest companies with data from a five year time period.

Introduction

Competitive benchmarking is an important company-internal process, in which the functions and performance of one company are compared with those of other companies. Financial competitive benchmarking uses financial information -- most often in the form of ratios -- to perform these comparisons. Financial competitive benchmarking is utilized, among other things, as a communication tool in strategic management, for example in situations where company management must gain approval, from internal and external interest groups alike, for new functional objectives for the company.

Multivariate statistical methods have been used as a tool of analysis for company performance, bankruptcy predictions, stock market predictions etc., although mostly in research contexts. However, many problems have been reported concerning these methods. The two most important problems are the assumption on normality in the underlying distributions and difficulties in finding an appropriate functional form for the distributions. Moreover, results of analyses are difficult to visualize when there are several explanatory variables [Vermeulen et al., 1994].

Vanharanta [1995] has used modern computer technology and built a hyperknowledge-based system for financial benchmarking. The system contains a database with financial data on more than 130 pulp and paper companies worldwide. The amount of financial information in this system is, however, so large that it makes comparisons between companies difficult ž or at least very time consuming.

In a previous study [Back et al., 1995] we investegated the potential of self-organizing maps for pre-processing the vast financial data available on companies and for presenting an approximated position of one company's financial performance compared to that of other companies. The results were very promising. By using self-organizing maps we have overcome the problems associated with finding the appropriate underlying distribution and the functional form of the financial indicators. Furthermore, the visualization capabilities of self-organizing maps provide a good way of presenting and analyzing the results.

Neural networks have previously been suggested by Trigueiros [1995] for use with computerized accounting reports databases, and by Chen et al. [1995] to define cluster structures in large databases. Martin-del-Brio and Serrano-Cinca [1995] used self-organizing maps for analyzing the financial state of Spanish companies.

In this paper, we use the self-organizing maps to structure Vanharanta's database into clusters based on the underlying weight maps. Each cluster is then named according to the financial characteristics of the cluster. The data base contains financial data for a five year period. We analyse the financial performance of the Finnish forest companies in these maps over the years 1985-89. Eventhough we take a closer look only at these companies, any individual company or group of companies can be the focus of interest.

We anticipate that neural networks can be used in future for benchmarking purposes to help executives find company characteristics that will lead to sustainable excellence of a company, in other words to help answer the question: Which are the characteristics that lead a company towards long-lasting good performance? Some company characteristics seem to produce and maintain good overall company performance, sustainable profitability, increasing productivity and continuous growth.

The rest of the paper is organized as follows: Section 2 describes the methodology we have used, the database, the list of companies in the study and the criteria for and the choice of financial ratios. Section 3 presents the results of applying neural networks to the problem and section 4 presents the empirical results. The conclusions of our study are presented in Section 5.

Methodology

Benchmarking

Competitive benchmarking is a company-internal process in which the activities of a given company are measured against the best practices of other, best-in-class companies [Geber, B., 1990]. In the process of competitive benchmarking, internal functions are analyzed and measured using financial (i.e. quantitative) and/or non-financial (i.e. qualitative) yardsticks. Functions measured from one company are compared with similar functions measured from leading competitors, or they are compared with the best practices in other industries. The differences between compared functions are measured. The overall management goal of competitive benchmarking within a given company is to close the measured "gap" by changing the company's characteristics in ways that will improve company performance.

The financial information needed for financial benchmarking work is, however, invariably available only from large commercial databases or from specialized reports and publications, from where it must be gleaned with difficulty. Such information is thus far removed from its active users. If the needed financial information is to be brought closer to the active users, it must first be pre-processed, i.e. refined and classified. The overall objective of the present study is to pre-process, with the help of neural networks, the data and information needed for financial benchmarking purposes. Thus pre-processed, the information can be used in computerized benchmarking systems and executive support systems, making the task of competitive financial benchmarking easier and more effective.

Self-organizing maps

Since companies [in the database] do not have predefined labels describing their financial status, a network intended for pre-processing their data can have no pre-desired outputs. For this reason, we utilize an unsupervised learning method. A Kohonen network [Kohonen, 1995], being the most common network model based on unsupervised learning, is used in this study.

Database and selection of companies

The Green Gold Financial Reports database [Salonen and Vanharanta, 1990a, 1990b, 1991] is used as the experimental financial knowledge base for the neural network tests. It consists of standardized income statements, balance sheets and cash flow statements of 130 companies in the international pulp and paper industry. The database also consists of specific financial ratios, calculated using information from the standardized reports as well as general company information concerning products and production volumes. There are 47 different key ratios for each company. The companies are all based in one of three regions: North America, Northern Europe or Central Europe. The financial data covers a period of five years from 1985 to 1989. The companies are listed in Table 1 (with some companies omitted that did not have enough data available).

For our experiment we used some 120 pulp and paper companies from the database. We have also included the averages of Finland, Norway and Sweden as three additional "companies".

Table 1 contains companies in 14 countries.
(Your browser must support tables to be able to view this.)

Choice of ratios

The population consists of 47 financial ratios in the benchmarking system organized in the benchmarking system into six groups under the headings:

Profitability
Indebtedness
Capital Structure
Liquidity
Working capital
Cash flow ratios

The choice of ratios in this study was based on an empirical study conducted using ten financial analysts from a large Finnish bank who participated in a validation test of the benchmarking system [Vanharanta et al., 1995]. If a ratio in that study was used by at least five analysts it was selected as a variable for the network in this study. The following nine ratios were selected. The numbers in parentheses indicate the appropriate ratio group number shown above.

Operating profit (% of sales) (1)
Profit after financial items (% of sales) (1)
Return on total assets (ROTA) (1)
Return on equity (ROE) (1)
Total liabilities (% of sales) (2)
Solidity (3)
Current ratio (4)
Funds from operations (% of sales) (6)
Investments (% of sales) (6)

We note that there are four profitability measures, one indebtedness measure, one capital structure measure, one liquidity measure, no working capital measures and two cash flow measures. It seems reasonable that the emphasis is on profitability in a benchmarking situation.

Training and testing the network

In this section we give a description of the construction process followed in developing the self- organizing maps. The actual construction work was performed using The Self-Organizing Map Program Package version 3.1 prepared by the SOM Programming Team of the Helsinki University of Technology.

We started by standardizing the ratios in the database using histogram equalization [Klimasauskas, 1991] in order to ease the SOM's learning process and to improve its performance. Histogram equalization is a way of mapping rare figures to a small part of the target range and spreading out frequent figures so that it becomes easier for the neural network to discriminate among frequent figures.

All the maps were trained in two phases. The purpose of the first training phase was to order the randomly initialized reference vectors of the maps to "approximately correct" values. During the second phase the maps are "fine-tuned," i.e. final ordering of the reference vectors takes place.

We constructed maps separately for each of the years 1985, 1986, 1987, and 1989. The nework topology chosen was hexagonal with 15 * 10 neurons in each map. This is the same network structure as in our previous study. The parameters of the best maps with respect to the average quantization error are given in Table 2:

Year Phase Training Learning Neighbour- Quantization length rate hood width error

1985 1 1000 0.05 10 2 95000 0.02 3 0.247267

1986 1 1000 0.08 10 2 115000 0.02 3 0.261194

1987 1 1000 0.07 10 2 95000 0.03 3 0.274494

1988 1 1000 0.06 11 2 120000 0.02 3 0.257365

1989 1 1000 0.06 12 2 100000 0.03 3 0.253538

Table 2: Network parameters

Results

In the construction process hundreds of maps were initialized and trained. The best ones, in respect of average quantization error (shown above in Table 2), were more carefully inspected, i.e. the locations of the companies and the values of weights (corresponding to financial ratios) were visualized. The groups, or clusters, A to H on the maps in Figure 1 to Figure 5 (in Appendix ) were identified by analyzing the weight distributions of the maps for the years 1985-89 in the forms of s.c. U-matrices and weight maps as produced by the tool we used.

Financial performance within the groups

Our interpretation of the defined groups based on weight maps for year 1985 is as follows:

Group A is separated into subgroups A1 and A2. A1 can be considered as an "average" group. The group is doing rather well regardless of which ratio is used as an indicator. The A1 group consists solely of US companies except for one European company. The A2 group is somewhat below average but very close to the A1 group in every respect.
Group B is separated into subgroups B1 and B2. Characteristic to group B are high total liabilities and investments combined with small profitability and, naturally, solidity. The difference between the subgroups is that B1 has slightly higher values in liabilities and investments than does B2. B1 consists mainly of Finnish and Canadian companies and B2 includes also US companies.
Group C is best defined as "slightly better than average". It consists mainly of North-American and European companies.
Group D represents the best companies in terms of high profitability, solidity and cash flow. On the other hand it is a group of low investment companies. It consists of mainly North-American companies and one Swedish company.
Group E represents companies with high investments and relatively low solidity but, surprisingly at the same time, the highest liquidity. Profitability is above average. It consists of two North-American, two Swedish and one Finnish company.
Group F has a slightly lower profitability and liquidity than group E, but on the other hand better solidity. If it were not for the extremely high current ratio of group E, these two groups would probably have been defined as one. It consists of Finnish and US companies.
Group G is almost as good as group D. It probably would have been justified to define also groups D and G as subgroups like B1 and B2. It consists of two Swedish, two North-American and two European companies.
Group H is undoubtedly the most solid group. It is a group of low investments, cash flow and profitability, but high solidity and liquidity. It consists of mainly North-American companies.

Because the groups were identified with data from year 1985, the companies in these groups for the other years are not always identical though the groups clearly exist. Furthermore, for every year we do not have data from the same companies resulting in some companies missing in some maps and appearing in others. Related to this we can notice that in the years 1988-98 a new group of companies, Group X, starts to emerge showing an other side of the dynamics of the system.

Financial performance over time

In the following we focus only on the financial performance of the Finnish companies over time. As was stated previously, most of the Finnish companies can be located to the group B for year 1985. Only two companies 25 (Rauma-Repola) and 29 (Yhtyneet) are outside this group. The same pattern continues during the years 1986-89. Most of the Finnish companies are investing heavily with huge liabilities. They have a low solidity, a weak liquidity and a bad profitability based on the ratios chosen for this study.

In the year 1986 company 25 has joined the group B and stays within this group through the rest of the years in this study. Company 29 stays outside group B until the last year of this study and joins the group B in 1989.

Conclusions and future research

The objective of this study was to investigate the potential of self-organizing maps, to pre-process the vast amount of financial data available on companies and use these maps as data mining tools. Our work bench consisted of a hyperknowledge-based system for financial benchmarking. The benchmarking system contained financial data on 130 pulp and paper companies worldwide. Using nine different ratios as variables -- four measuring profitability, one indebtedness, one capital structure, one liquidity and two cash flow -- we constructed different maps for each of the years 1985, 1986, 1987, 1988, and 1989. Our main interest in this investigation was to show how to analyse the financial performance of individual companies (in this case Finnish forest companies) over time in a world-wide scale.

Acknowledgements

We like to thank Mikko Irjala for carrying out the practical work with training the networks. The work reported here was carried out within the AnNet-project. The authors wish to thank the Foundation for Economic Education for providing financial support for this project.

References

Back, B. - Irjala, M. - Sere, K. - Vanharanta, V. (1995) Competitive Financial Benchmarking Using Self-Organizing Maps. Abo Akademi, Reports on Computer Science and Mathematics, Ser. A, No 169, 1995.

Chen, S. K. - Mangiameli, P. - West, D. (1994) The Comparative Ability of Self-organizing Neural Networks to Define Cluster Structure. Omega, International Journal of Management Science. Vol. 23, No. 3, pp. 271-279, 1995.

Geber, B. (1990) Benchmarking: Measuring Yourself Against the Best. Training, 27 (11), pp. 36-44.

Klimasauskas, C.C. (1991) Applying Neural Networks, Part IV: Improving Performance. PC/AI Magazine. Vol. 5, No. 4. 1991.

Kohonen, T. (1995) Self-Organizing Maps. Springer-Verlag.

Martin-del-Brio, B. - Serrano-Cinca, C. (1995) Self Organizing Neural Networks: The Financial State of Spanish Companies. In Neural Networks in the Capital Markets, edited by Refenes, John Wiley & Sons.

Salonen, H. - Vanharanta, H. (1990a) Financial Analysis World Pulp and Paper Companies 1985-1989, Nordic Countries. Green Gold Financial Reports. Vol. 1. Ekono Oy, Espoo, Finland.

Salonen, H. - Vanharanta, H. (1990b) Financial Analysis World Pulp and Paper Companies 1985-1989, North America. Green Gold Financial Reports. Vol. 2. Ekono Oy, Espoo, Finland.

Salonen, H. - Vanharanta, H. (1991) Financial Analysis World Pulp and Paper Companies 1985-1989, Europe. Green Gold Financial Reports. Vol. 3. Ekono Oy, Espoo, Finland.

Trigueiros, D. (1995) Accounting Identities and the Distribution of Ratios. British Accounting Review. Vol. 27, pp. 109-126.

Vanharanta, H. (1995) Hyperknowledge and Continuous Strategy in Executive Support Systems. Acta Academiae Aboensis. Ser. B, Vol. 55, No. 1. Turku, Finland.

Vanharanta, H., - Käkölä, T. - Back, B. (1995) Validity and Utility of a Hyperknowledge- Based Financial Benchmarking System. Proceedings of the Twenty-Eight Annual Hawaii International Conference on Systems Science. IEEE Computer Society Press, Vol. 3, pp. 221-230.

Data Mining Accounting Numbers Using Self-Organizing Maps