The SWIG Redevelopment Effort

David Beazley
Department of Computer Science
University of Chicago
Chicago, IL 60637
beazley@cs.uchicago.edu

$Header: /cvs/projects/SWIG/Doc/Devel/Attic/whitepaper.html,v 1.1.2.2 2001/09/02 12:45:13 beazley Exp $

1. An Introduction

One of the biggest problems faced by people writing software is the problem how to make software easier to use, more interactive, and more modular. Typically, the computer science community has approached these problems by focusing on formal design methodology and highly specified frameworks built around notions of software components, object-oriented programming, and anything labeled as "best practice" (whatever that means). Although this type of approach is perhaps appropriate for very large software projects involving hundreds of programmers, software engineers, and managers, I've never met a sane programmer who really enjoys writing software in such an environment. Furthermore, a large number of software projects are undertaken by small groups of people who would not classify themselves as professional software developers or software engineers. Typical examples might include scientific computing software, specialized systems for engineering applications, or just about any kind of experimental research and development project. These are the types of programming projects "in the small" that are my primary interest.

First, programming projects in the small should not be confused with the toy programs one might write as part of a class project or when solving exceedingly trivial problems. More often that not, a software package written by only a few people may have been developed over a period of several years and may contain of hundreds of thousands of lines of source code. Furthermore, due to limited manpower, these projects are likely to rely on a variety of third-party packages and programming libraries to accomplish certain tasks. Finally, it is not uncommon for such software to have been developed in a relatively piecemeal fashion with little if any formal design. The developers may also be burdened with the task of supporting a large base of legacy code that is critical to the application, but which is too complicated to simply rewrite from scratch. As a result, the software developed in such an environment may be a tangled web of code that gets the job done, but which is less than ideal in terms of its usuability and overall design.

Of course, one does not need to look very far to see examples of this kind of development. For instance, I would claim that just about every successful project within the Open Source community has been developed in this way. As a more specific example, Swig itself was developed in a relatively adhoc manner over a period of two years. Although it was my intent to have a relatively clean design at the start, the system has since evolved into a very tangled mess of monolithic C++ code. It's not that I wanted to end up in this situation--rather the experience gained by Swig's early users pushed the system in an unanticipated direction that the original design failed to address. In many ways, it is ironic that SWIG should end up in this particular state given that this is exactly the type of situation that Swig was built to address!

Naturally, this brings us to the overall motivation behind SWIG itself. In a nutshell, SWIG is a software development tool that aims to make it easier to do the following:

I also want to emphasize that the target users of Swig are not professional software engineers. Rather the system is designed to be very easy to use for more ordinary people who just happen to be working on programming projects as part of their work or for fun (physicists, engineers, hackers, etc...). It is also designed to provide a certain element of "instant gratification" if you will. I believe that the following quotes from a SWIG user survey put things in the right perspective:

2. Problems with SWIG

Despite the early success of SWIG, the system suffers from a number of serious limitations. Furthermore, these problems are not easily fixed within the current design. Of course, the real trick is how one goes about solving these issues without making Swig excessively complicated--both from the point of development and use.

3. SWIG Redevelopment: Modules

Simply stated, the primary goal of SWIG redevlopment is to redesign the SWIG compiler as an extensible set of loosely coupled modules (Note: it is not my intent to radically change the way in which an end-user uses SWIG). In this context, my intent is to allow a module to be virtually anything that might be part of a compiler or which would interact with a compiler in some manner. For example: Unfortunately, as programs go, compilers tend to be extremely complicated. Therefore, to make any sort of module system work, the mechanism by which modules interact and exchange data needs to be extremely powerful and extremely simple.

To address these problems, SWIG redevelopment is based on a few fundamental ideas:

  1. All data will be internally represented using an XML-like scheme in which every piece of data is identified by a unique element "tag" and a set of associated attributes. Manipulation of the data in turn will involve nothing more than making an appropriate association of the "tags" with some sort of "action" to be performed. Unlike an approach in which objects are placed into a rigid C++ class hierarchy, the XML-based approach allows a virtually unlimited number of different object types and attributes to be created and manipulated without ever having to recompile anything. As a result, this would allow modules to easily extend the system in novel ways. It should also be added that this data representation greatly simplifies the underlying core of the system because an XML-like representation can be built entirely using nothing more than a hash-table object and a few fundamental datatypes such as strings and lists.

  2. All underlying data structures will be built using a dynamic type handling mechanism and a small collection of fundamental datatypes including strings, lists, and hash tables. There are several advantages to this approach. First, dynamic typing generally results in substantially less code if done correctly. For instance, in my own experiences using Objective-C vs. C++, I found that my dynamically typed Objective-C programs were up to 5 times smaller than their C++ counterparts. Furthermore, dynamic typing is also one of the reasons why scripting languages are so powerful.

  3. Modules will interact with each other and exchange data using the XML-scheme previously described. Due to the flexibility of this approach, this allows modules to be written in a relatively stand-alone manner. Furthermore, the use of XML may simplify the development of external tools that do not share any commonality with the SWIG executable or its internal data structures.

  4. Dynamic loading. Closely associated with loose-coupling, the SWIG module system should optionally support dynamic loading of compiler modules. This might be accomplished in two ways. First, I believe that SWIG itself should provide a scripting interface that allows its modules to be dynamically loaded into a variety of scripting languages. Second, SWIG should probably implement some sort of module loading system that allows modules to be used without the optional scripting interace.
Finally, it should be noted that the implementation language of choice for the SWIG redevelopment effort is ANSI C. There are several reasons for this:

4. The Initial Module Set

The following list describes the proposed modules that will be part of the new system: