An important consideration in electronic publishing is how the data to be published is viewed and presented at different stages of the process. Important stages may be identified at points where the data needs to be communicated from process or person to another. These presentations include the one used by the writer of an article, the one that is given to the pagination process and the one used to communicate the pagination results to visualization tools. Below some of the relevant views on the data on the different stages are identified for text, articles and page layout. The quality of a data representation scheme may be evaluated by assessing how well it is able to provide a representation for the different needs described below.
Thus, a language for expressing the different representations is needed. Especially the representations before and after pagination are of interest in this work. One alternative for representing is to use a document formatting language such as PostSript, LaTeX or TeX to represent layouts. However, even more general approaches are emerging with the invent of SGML (Standard Generalized Markup Language) and HTML (HyperText Markup Language), a language defined with SGML especially for hypertext documents. Adobe PDF-format offers even better means for describing static page layouts, but unlike HTML, does not include any hypermedia capabilities. A new standard, MHEG, has been proposed for description of multimedia documents. It remains to be seen what effects for electronic publishing it will have and whether design tools will be implemented supporting it. Therefore, the currently most interesting standards---SGML, HTML and DSSSL---are discussed below.
Standard Generalized Markup Language is a metalanguage for describing markup languages enabling one to describe document structure apart from the data and its visualization [Sperberg-McQueen and Burnard, 1993,Goldfarb, 1990]. From the point of view of electronic publishing, SGML may be used to structure the document during writing and before storing it to the article database. With SGML the writer is not forced to worry about the visual outlook, only the logical roles of parts in the article.
In editorial publishing, SGML offers a simple way for the writer to describe several views to the same document. The editor may specify different levels of depth of the material by tagging some parts as optional or alternative. Material tagged as background knowledge may be further categorized as much as is useful, for example according to interests or scientific fields. Thus, there is no need to write all versions of the article separately, but just to write production rules in SGML and to later read them with an SGML-parser when there is knowledge of the specific reading instance (time, reader information etc).
HTML, HyperText Markup Language, is a language described in SGML and
designed for description of hypermedia documents. The WWW (World Wide
Web) of Internet is a network of huge amount of documents (perhaps
) written in HTML (or some of its extensions) and
linked to each other with HTML's hyperlink mechanism. These documents
are situated on the information provider's WWW-server, and may
be accessed and read by any of the numerous WWW-browser clients,
usually residing at the reader's system.
With HTML, it is possible to write documents while specifying only semantic roles, logical styles, for the different parts of the document. In this case WWW-browsers have the freedom to visualize the given logical styles as they see fit. Examples of logical styles are level two heading, author, address and hyperlink. HTML also enables one to describe some alternative information, e.g. pictures may have alternative brief textual descriptions, which are presented to users having no graphics viewing capabilities.
In addition, some WWW-browsers support non-standard extensions to HTML enabling the author to describe in more detail the physical outlook of the document on the reader's screen. These details include spaces between items, font sizes, colours etc. On the negative side, usage of these visual enhancements requires a lot of human design on top of providing the contents of the document.
Document Style Semantics and Specification Language (DSSSL) is described in Draft International Standard ISO/IEC 10179 [ISO, 1995]. The draft DSSSL proposes a generalized transformation method between different types of document structures as well as enables semantic prosessing of documents.
DSSSL may be used with SGML but is not restricted to it. It, unlike SGML, will offer a way to describe the visual document layout produced by the pagination process, i.e. the placement of articles, sizes, fonts etc. This description in turn may be interpreted by future DSSSL-conforming visualizing tools or printing systems.
There exist several tools to support manual design of pages, such as PageMaker, FrameMaker and the PDF-format of Adobe. In addition, there are tools and methods for publishing on the Internet, which require a lot of interactive design: HyperText document description language HTML used as the basis of World Wide Web (WWW) documents (see section 2.3.1 for further details of HTML and WWW), several different HTML-capable document browsers such as Netscape, Mosaic, Lynx and HotJava, editors supporting the writing of HTML-documents etc.
There are not many tools for automatic pagination. However, in the past years some results have been obtained on this area at Technical Research Centre of Finland (VTT). The different versions of VIP (VTT's Intelligent Pagination) suit different kinds of pagination tasks: pagination of newspaper advertisement sections (see Ylä-Jääski and Ahonen, 1990), yellow page phone directories, prototypic pagination of editorial material of newspapers delivered by fax (VIPÊ , see Veijonen et al., 1994 and section 5.3). Of these, the yellow page application is in production use in Pindar Set (Great Britain), in Turun Sanomat (Finland), and the newspaper advertisement application is used in Dagens Nyheter (Sweden) and in Nederlandse Dagbladunie (Netherlands).
For automatic design of article layout, a prototype and method has been designed by [Kukkonen, 1992] and integrated into Grafimedia's Marita publishing system. According to [Kukkonen, 1992], given the width of the article as number of columns, the prototype is able to design an article layout that fills the usual design criteria of newspapers, and produces at least as good article layouts as professionals. The ability to design an article layout for the given textual content and article width forms a part of the basis for the automatic design of whole page layouts of editorial material.