TIP 34: Modernize TEA Build System

Login
Author:         Mo DeJong <[email protected]>
Author:         Andreas Kupries <[email protected]>
State:          Withdrawn
Type:           Project
Vote:           Done
Created:        03-May-2001
Post-History:   
Tcl-Version:    8.5

Abstract

A number of things in the original TEA specification and documentation have fallen out of date. Numerous complaints about the difficulty of creating a TEA compliant package have appeared on news:comp.lang.tcl. Other complaints about the ease of building Tcl and Tk using the autoconf based build system have also surfaced. Addressing these concerns is made even more difficult by the fact that two independent build systems currently exist, one for UNIX, and one for Windows. Maintaining multiple build systems is a frustratingly slow process that wastes time better spent on other issues. In addition, the Tcl build scripts do not support cross compilation which makes the maintenance process even slower since one can't test simple build system changes for a given platform without access to that platform. This document describes how these concerns can be addressed.

Documentation

As new software is released, existing documentation becomes obsolete. Some of the existing TEA documentation is now so badly out of date that suggested software releases are no longer available. For example, the TEA based build for Windows requires Cygwin, yet there is a lack of clear instructions that describe how to install Cygwin. The solution to this problem is simple, the TEA documentation and implementation must be updated.

Platform Detection

Most open source packages make use of a pair of scripts called config.guess and config.sub to detect a platform specific configuration string. For example, running config.guess on an x86 Linux box might print:

% ./config.guess
i686-pc-linux-gnu

This value can be accessed from the configure.in script by adding a call to the AC_CANONICAL_HOST() macro. One would then detect a specific platform by doing a switch on the value of the $host variable. Tcl currently detects the build platform by examining the results of running `uname -s`-`uname -r`. Tcl should use the config.guess and config.sub method of platform detection. This may not seem like a big change at first but it has quite a few ramifications.

This change cannot be made incrementally, so there is a real danger of breaking the build for configurations that currently work. For this reason, it would be a good idea to wait until a 8.4 release branch has been created before adding such a change to the CVS HEAD. The cross compiling section will describe some of the other maintenance benefits that can be realized by using config.sub and config.guess.

Cross Compiling

Tcl's build system has never supported cross compiling. The use of uname instead of config.guess during platform detection is largely to blame for this. A configure script that makes use of the AC_CANONICAL_HOST() macro keeps track of two separate configuration variables, build and host. The build variable holds the configuration string for the platform running the ./configure script. The host variable holds the configuration string for the platform the generated binaries will be run on. For example, if one were to build a Windows binary under Linux the build variable could be set to i686-pc-linux-gnu and the host variable could be set to i386-pc-mingw32msvc.

Using the host variable instead of the output of uname has another important benefit. The build system maintainer can often test out changes to the build system logic without having access to the system in question. This is important since a surprising number of "broken builds" are the result of stupid logic or syntax errors in the sh code for a particular platform. For example, one could test out the Windows build using a cross compiler prefixed by i386-mingw32msvc like so:

% ./configure --host=i386-mingw32msvc
% make

The above commands will run the Windows configure script in bash and then generate Win32 .dll and .exe files. One can even create cross compilers for multiple systems. Binaries for Solaris, IRIX, or others can be created under Linux or even Windows. The attentive reader will note that even compiling a Win32 application under Cygwin is technically a cross compile since the generated executable would be run with the Win32 runtime instead of the Cygwin runtime. To properly support cross compiling and make it easy for the end user, an upgrade to autoconf is also required.

Autoconf 2.50 Update

The autoconf 2.13 release is now several years old. The most recent stable release of autoconf is 2.50. A huge number of problems have been fixed in the autoconf 2.50 release, the most important of which is cross compiling support. Cross compiling with autoconf 2.13 is possible but far harder than it needs to be. The long and sordid history of this particular feature in autoconf will not be addressed in this document. It is safe to say that the autoconf 2.50 release is the first release to make cross compiling easy for the end user. The existing build system works with the autoconf 2.50 release on a number of systems, but it is possible that the build on some system would be broken by this upgrade. For this reason, the autoconf 2.50 upgrade should happen after an 8.4 branch has been created.

Use a Config Header

Autoconf supports two ways to set platform specific flags set via the AC_DEFINE() macro. Tcl currently makes use of the default option. Each call to AC_DEFINE() adds a -DVAR=1 value that is passed into the compiler. These -D flags are also included in the generated tclConfig.sh file so that extensions will use the same set of defines. The second option is to generate a .h file that will be #included into any needed header or source file. This file could be called tclconfig.h. Instead of passing -DVAR=1 on the command line, the generated .h file might look like:

# define VAR 1

There is no functional difference between these two options. The benefit of using tclconfig.h will likely only be realized by the maintainers or people hacking on the core and extensions. By taking the -D flags out of the Makefile, we make it easier to hack around in the generated .h file without having to worry about also changing the flags in the generated tclConfig.sh file. It can be quite a pain to change an option in Tcl's Makefile, then again in the tclConfig.sh file, and then rerun the configure script in each extension. This pain can be avoided by adding a call to AM_CONFIG_HEADER(tclconfig.h) to the top of Tcl's configure.in script. One possible ramification of this change may be the need to install tclconfig.h in the event items in tclInt.h depend on #defines in tclconfig.h.

Misplaced AC_SUBST Calls

The configure.in file for both Tcl and Tk invokes the AC_SUBST() macro for each variable that is to be substituted into the Makefile. This makes sense for variables defined in the configure.in file, but it makes no sense for those variables defined in tcl.m4. Invoking AC_SUBST() in configure.in for a variable defined in tcl.m4 means that the maintainer will need to keep track of the variable in Tcl's configure.in as well as the configure.in of any extension that uses the given variable. This simply makes no sense. Each macro defined in tcl.m4 should call AC_SUBST() for any variables it wishes to substitute into generated files.

One Rule to Build Them All

Perhaps the most frustrating thing about maintaining Tcl's build system is the fact that each and every modification needs to be made and tested in at least 4 configurations. This is because Tcl and Tk have two separate build systems, one for Unix and one for Windows. Each and every change needs to be tested on Unix for both Tcl and Tk and then again on Windows for both Tcl and Tk. It is an incredible waste of time.

A single build system with properly abstracted macros can be used to build both a Unix and Windows version of Tcl. Changes would still need to be tested on each platform, but dealing with only one source base would save a significant amount of the maintainer's time.

A related problem is experienced by extension authors. It can be quite difficult to write a build system for an extension that works with both the Unix and Windows version of Tcl. Here is a telling quote from Todd Helfter, the maintainer of the Oratcl extension.

"TEA specifies that there should be only one set of configure files. Why should extension writers have to comply with a standard that Tcl nor Tk does not?"

For starters, some of the variables defined in the Unix version of tclConfig.sh do not exist in the Windows version. Most of the obvious problems in this area have already been corrected in the 8.4 release, but some difficult ones remain. For example, how would an extension author figure out how to name a generated library file in a cross platform way? One would assume that a couple of variables could be concatenated together, but in practice this is not so easy. A quick look at the configure.in scripts will turn up examples like this:

(Unix version)

if test "${SHARED_BUILD}" = "1" ; then
  eval "TCL_LIB_FILE=libtcl${TCL_SHARED_LIB_SUFFIX}"
else
  eval "TCL_LIB_FILE=libtcl${TCL_UNSHARED_LIB_SUFFIX}"
fi

(Windows version)

eval "TCL_LIB_FILE=${LIBPREFIX}tcl$VER${LIBSUFFIX}"

The attentive reader will note that these are completely different! To understand the Unix version, one has to go exploring to find out how TCL_[UN]SHARED_LIB_SUFFIX gets set. To understand the Windows version, one needs to look into the origins of LIBPREFIX and LIBSUFFIX. It is really quite a bother and it gets even worse once you introduce additional compilers (like mingw) into the mix. The casual coder would poke around for a bit then give up.

The solution to this problem is to merge the two build systems and provide some properly abstracted macros that work on multiple platforms. These macros will need to be available to Tcl/Tk as well as extension authors. Here is a short example of a couple of macros that were developed while porting Tcl/Tk and Itcl to the Cygnus environment. These macros can be found in the current CVS HEAD of gdb/Insight.

TCL_TOOL_STATIC_LIB_LONGNAME(VAR, LIBNAME, SUFFIX)
TCL_TOOL_SHARED_LIB_LONGNAME(VAR, LIBNAME, SUFFIX)

Using these macros, one could code the Unix and Windows versions to use the same logic.

if test "${SHARED_BUILD}" = "1" ; then
  TCL_TOOL_SHARED_LIB_LONGNAME(TCL_LIB_FILE, tcl, ${TCL_SHARED_LIB_SUFFIX})
else
  TCL_TOOL_STATIC_LIB_LONGNAME(TCL_LIB_FILE, tcl, ${TCL_UNSHARED_LIB_SUFFIX})
fi

Behind the scenes, the unix version might set TCL_LIB_FILE=libtcl83.so while the Windows version might set TCL_LIB_FILE=tcl83.dll. Once both build systems use the same code, we can abstract them out into a new tcl.m4 file that is shared between the Unix and Windows versions. An extension that has only one build system can also make use of these macros. The concept is not a difficult one, the problem is that the implementation takes a very long time to get right.

VC++ vs. GCC Library Names

The previous section touched on issues related to library naming and how they differ between Unix and Windows. This section will focus only on Windows and discuss some of the differences that make it difficult to support both the VC++ and gcc compilers. Two variables that are quite difficult to support properly are TCL_LIB_SPEC and TCL_BUILD_LIB_SPEC. The gcc compiler supports command line arguments like -L${dir} -l${lib} but VC++ does not. VC++ users must pass the fully qualified name of a .lib file or set an environment variable. To deal with this situation, the following macros were developed.

TCL_TOOL_LIB_SHORTNAME(VAR, LIBNAME, VERSION)
TCL_TOOL_LIB_SPEC(VAR, DIR, LIBARG)

A single configure.in file for Tcl or an extension could make use of these macros as follows:

TCL_TOOL_LIB_SHORTNAME(TCL_LIB_FLAG, tcl, ${TCL_VERSION})
TCL_TOOL_LIB_SPEC(TCL_BUILD_LIB_SPEC, `pwd`, ${TCL_LIB_FLAG})
TCL_TOOL_LIB_SPEC(TCL_LIB_SPEC, ${exec_prefix}/lib, ${TCL_LIB_FLAG})

When configured with VC++, the macros would set:

TCL_BUILD_LIB_SPEC="/build/tcl83.lib"
TCL_LIB_SPEC="/install/tcl83.lib"

When configured with gcc, the macros would set:

TCL_BUILD_LIB_SPEC="-L/build -ltcl83"
TCL_LIB_SPEC="-L/install -ltcl83"

These macros are a good first step. They provide abstraction for library naming issues and work in both shared and static builds. This set of macros has been tested with Tcl, Tk, and Itcl and should be ready for incorporation into other extensions. One only needs to look at Tcl bug 219330 to find an example of a core extension that only builds with VC++ under Windows. An extension writer should not need to solve compiler problems such as this. The Tcl core needs to provide abstracted macros that can be used in extensions.

Cygwin vs. Mingw

The Windows version of Tcl traditionally supported building with VC++ only. During the Tcl 8.3 development process, gcc support was added. Trouble is, the default version of gcc delivered with Cygwin had and continues to have some problems compiling the Windows Tcl code.

Cygwin is a Unix/POSIX compatibility layer built on top of the Win32 API. The Cygwin version of gcc is designed to help people compile C programs that make use of POSIX APIs. The Cygwin version of gcc was not designed to support compiling Win32 native applications. Some support for Win32 applications was added later via the -mno-cygwin command line switch, but it is far from perfect.

The Mingw project was created to produce a version of gcc that supports building native Win32 applications. This version of gcc is a native Windows application, it does not depend on the Cygwin dll and it does not link generated applications to the Cygwin dll. In fact, it is simply not possible to create an executable that accidently requires the Cygwin dll when compiling with the Mingw version of gcc. This can be a problem with the Cygwin version of gcc, especially when C++ is involved.

Tcl currently builds with the Mingw version of gcc. Tcl does not build with the Cygwin version of gcc even though the -mno-cygwin is used. Tcl also does not build using the Cygwin version of gcc in POSIX mode without the -mno-cygwin flag. In addition, the individual working on Cygwin compatibility for Tcl will not be pursuing it in the future. For all of these reasons, support for Cygwin gcc should be dropped in favor of Mingw gcc. Documentation should be updated to instruct people to install the Mingw version of gcc instead of the Cygwin version. It is even possible to have both Cygwin gcc and Mingw gcc installed on the same system, the user just needs to make sure the correct one is on the PATH before compiling Tcl. A check should also be placed in the configure.in script to keep people from accidently compiling with the Cygwin version of gcc since it does not work and will only lead to useless bug reports.

A New tclconfig Module

A number of Tcl extensions copy Tcl's tcl.m4 file into the extension's CVS module. For example, Itcl distributes a locally modified version of tcl.m4 that supports building on Windows and Unix with a single configure.in script. It is very difficult to keep up maintenance when this sort of approach is used. For one thing, the maintainer has to constantly copy the tcl.m4 into extensions. Each and every commit to a tcl.m4 file in the tcl CVS module needs to be followed by a commit to the same file in the tk module. If an extension has made local modifications to the tcl.m4 file, a new one can't just me copied over. The extension maintainer would need to merge the changes in my hand or make use of some fancy CVS maintainer branch features that are not commonly used. The result of all this trouble is a predictable lag in keeping an extension's build system up to date.

Perhaps the best solution to this problem is to create a new module in the Tcl CVS named tclconfig. This module would contain all the macro files and supporting scripts that were available to Tcl and any extensions. Anyone who has worked on tclpro will notice that this is equivalent to the config module that currently exists in the tclpro CVS. The idea is the same, but the new module needs to live in the tcl CVS repo not the tclpro CVS repo. When a user checks the tcl module out of the CVS a tclconfig directory will also be created. The tcl/unix/aclocal.m4 script would then be changed from:

builtin(include,tcl.m4)

To:

builtin(include,../../tclconfig/tcl.m4)

This approach will provide a single destination for all configure related changes. It will end the need to copy tcl.m4 files into each extension and will help extension authors resist the urge to make local modifications to the tcl.m4 script. If an extension author wants to change tcl.m4 they will need to submit a patch via the normal channels. Nothing will force extension authors to use this new approach, but it will be available to them when they are ready to upgrade. This change should not be integrated until Tcl 8.5.

Extension Defaults

A Tcl extension should default to the same configuration options that Tcl was compiled with. For example, if --prefix=/usr/local/tcl84 is passed to Tcl's configure script, it should not also need to be passed to Tk's configure script [Tcl bug 428627]. Tk should use the compiler that Tcl was configured with by default. In fact, each of the following options could fit into this category:

Each one of these options will need to be saved in tclConfig.sh.

Build vs. Install Configuration

The tclConfig.sh script should provide information that allows an extension to be built with either an installed or uninstalled version of Tcl. The current tclConfig.sh largely fails to provide this functionality to extensions. Both build variables and install variables are included in the tclConfig.sh file, but there is nothing to indicate to the script that loads tclConfig.sh whether or not a given tclConfig.sh has actually been installed.

An extension author has a couple of choices about how to deal with this situation. One could simply recreate needed flags like TCL_LIB_SPEC and TCL_STUB_LIB_SPEC and ignore the definitions in the tclConfig.sh file. Tk uses this approach.

One could also just pick from the build or install variables. If the extension author chose to use TCL_LIB_SPEC instead of TCL_BUILD_LIB_SPEC then the extension would only build with a version of Tcl that was already installed. Itcl uses this approach.

Both of these approaches are seriously flawed. Tcl needs to provide a means to load a tclConfig.sh file without having to worry about build vs. install flags. The SC_LOAD_TCLCONFIG macro can be modified in such a way as to define TCL_LIB_SPEC=$TCL_BUILD_LIB_SPEC when loading tclConfig.sh from the build directory. When tclConfig.sh is loaded from the install directory one could safely set TCL_BUILD_LIB_SPEC=$TCL_LIB_SPEC.

The above change would address the most obvious problem in tclConfig.sh, but others remain. Tcl bugs 219260 and 421835 provide some examples of variables in tclConfig.sh that may work at build time but not install time and vice versa. One possible long term solution could be to create a tclBuildConfig.sh as well as a tclConfig.sh script. The tclBuildConfig.sh could be loaded by the SC_LOAD_TCLCONFIG macro when --with-tcl indicated a build directory. The tclConfig.sh script would not contain any build variables and would be the only configuration file to get installed.

Chicken or the Egg

A number of targets in the current Tcl build process depend on having a working version of Tcl on the users PATH. It is not entirely clear if this was intentional. Either way, it creates real problems for users. Tcl bugs 420501 and 464874 demonstrates how a user might get a build error indicating that tclsh cannot be found before tclsh is even built. This problem should be straightforward to fix. Tcl patch 465874 describes on possible approach. The fact that this problem exists in the first place is indicative of a larger issue.

The Tcl build process should minimize external dependencies. This seems like a simple thing to seek agreement on, but people are constantly advocating changes that fly in the face of this goal. Tcl bug 219259 describes one such misguided effort. Older releases of Itcl mistakenly assumed a version of tclsh would exist on the system and tried to install files using a Tcl script named installFile.tcl. [59] seeks to embed Tcl build information into tclsh so that Tcl extensions could build themselves using Tcl scripts. Each of these approaches make different assumptions about how Tcl will bootstrap itself.

The current bootstrap approach depends on tclsh in some circumstances and not in others. This tends to work most of the time. Trouble is, if the user does something to change the time-stamps of certain files, it can leave the tree in an unbuildable state. Another approach would be to remove all uses of tclsh in the Makefile. This removes the possibility of accidently depending on tclsh during the build process but it can make the maintainer's life much more difficult. The most drastic solution would be to bootstrap a version of tclsh that could be used to rebuild Tcl as well as extensions. This is doable, but it will get very tricky in the case of a cross compile since the bootstrap version of tclsh would need to run on the build system while the generated version of tclsh would need to run on the host system. If the normal build process is always going to depend on tclsh, a private build version of tclsh will need to be bootstrapped. Realistically, this may be too much work to implement. It seems the only workable approach is to make sure none of the targets in the normal build process depend on an external tclsh.

Death to TCL_DBGX

The most thoroughly evil part of the entire Tcl build process is the TCL_DBGX variable used in configure.in, Makefile.in, and tclConfig.sh. The following quote from Brent Welch say it all:

"The TCL_DBGX manifestation is one of the worst /bin/sh quoting hell situations I have encountered."

Presumably, the TCL_DBGX variable was introduced to support building of debug vs. non-debug versions of Tcl and installing them into the same directory. That may have been a noble goal, but in practice this indirection makes many parts of the build system extremely difficult to modify. This scheme also fails to deal with other identifying suffixes like 's' for a static build or 't' for a threaded build. A number of queries as to the usefulness of TCL_DBGX have appeared on news:comp.lang.tcl but none produced satisfactory results. The TCL_DBGX variable should be removed and never spoken of again.

Configure Flags

The --enable-symbols should be renamed to --enable-debug. Quite a few other software projects use the --enable-debug flag. A Google search turned up no other projects that make use of the --enable-symbols flag. This change would have documentation impact, but it would be minimal. The old flag could be retained for compatibility if a significant number of folks were concerned.

The Tcl 8.4 release removes the --enable-gcc flag. This flag introduced a number of problems that are much better solved by simply setting the CC environment variable before running configure. This change has already been incorporated into the CVS and is only mentioned here for the sake of completeness.

Risks

Implementing this TIP is by no means an easy task. Build system changes are by far the most dangerous since a broken configuration will not be noticed until someone actually tries to build on the given system. One can only ask for forgiveness before hand. Large scale build system changes will no doubt break something. For this reason, many of the suggested changes should be incorporated into the Tcl 8.5 release.

Alternatives

One alternative is to continue to use the existing system. While things would get no worse, they would also get no better. Another alternative is to replace the existing TEA based build system with a build system written in Tcl. This TIP does explore some of the build issues a Tcl based system will face, but tries to focus on improving the existing system instead of replacing TEA.

See Also

SourceForge bugs: 219259, 219260, 219330, 420501, 421835, 428627, 464874

SourceForge patch: 465874

Copyright

This document has been placed in the public domain.