Web2C is an integrated collection of TeX-related programs: TeX itself, METAFONT, MetaPost, BIBTeX, etc. It is the heart of TeX Live.
A bit of history: The original implementation was by Tomas Rokicki who, in 1987, developed a first TeX-to-C system adapting change files under Unix, which were primarily the work of Howard Trickey and Pavel Curtis. Tim Morgan became the maintainer of the system, and during this period the name changed to Web-to-C. In 1990, Karl Berry took over the work, assisted by dozens of additional contributors, and in 1997 he handed the baton to Olaf Weber.
The Web2C system runs on Unix, 32-bit Windows systems, Mac OS X, and other operating systems. It uses Knuth’s original sources for TeX and other basic programs written in web and translates them into C source code. The core TeX programs are:
The precise functions and syntax of these programs are described in the documentation of the individual packages and of Web2C itself. However, knowing a few principles governing the whole family of programs will help you take advantage of your Web2C installation.
All programs honor these standard GNU options:
For locating files the Web2C programs use the path searching library Kpathsea. This library uses a combination of environment variables and a configuration files to optimize searching the (huge) collection of TeX files. Web2C can look at more than one directory tree simultaneously, which is useful in maintaining TeX’s standard distribution and local extensions in two distinct trees. To speed up file searches the root of each tree has a file ls-R, containing an entry showing the name and relative pathname for all files under that root.
Let us first describe the generic path searching mechanism of the Kpathsea library.
We call a search path a colon- or semicolon-separated list of path elements, which are basically directory names. A search path can come from (a combination of) many sources. To look up a file ‘my-file’ along a path ‘.:/dir’, Kpathsea checks each element of the path in turn: first ./my-file, then /dir/my-file, returning the first match (or possibly all matches).
In order to adapt optimally to all operating systems’ conventions, on non-Unix systems Kpathsea can use filename separators different from colon (‘:’) and slash (‘/’).
To check a particular path element p, Kpathsea first checks if a prebuilt database (see “Filename database” on page 72) applies to p, i.e., if the database is in a directory that is a prefix of p. If so, the path specification is matched against the contents of the database.
If the database does not exist, or does not apply to this path element, or contains no matches, the filesystem is searched (if this was not forbidden by a specification starting with ‘!!’ and if the file being searched for must exist). Kpathsea constructs the list of directories that correspond to this path element, and then checks in each for the file being sought.
The “file must exist” condition comes into play with ‘.vf’ files and input files read by TeX’s \openin command. Such files may not exist (e.g., cmr10.vf), and so it would be wrong to search the disk for them. Therefore, if you fail to update ls-R when you install a new ‘.vf’ file, it will never be found. Each path element is checked in turn: first the database, then the disk. If a match is found, the search stops and the result is returned.
Although the simplest and most common path element is a directory name, Kpathsea supports additional features in search paths: layered default values, environment variable names, config file values, users’ home directories, and recursive subdirectory searching. Thus, we say that Kpathsea expands a path element, meaning it transforms all the specifications into basic directory name or names. This is described in the following sections in the same order as it takes place.
Note that if the filename being searched for is absolute or explicitly relative, i.e., starts with ‘/’ or ‘./’ or ‘../’, Kpathsea simply checks if that file exists.
A search path can come from many sources. In the order in which Kpathsea uses them:
You can see each of these values for a given search path by using the debugging options (see “Debugging actions” on page 80).
Kpathsea reads runtime configuration files named texmf.cnf for search path and other definitions. The search path used to look for these files is named TEXMFCNF (by default such a file lives in the texmf/web2c subdirectory). All texmf.cnf files in the search path will be read and definitions in earlier files override those in later files. Thus, with a search path of .:$TEXMF, values from ./texmf.cnf override those from $TEXMF/texmf.cnf.
A configuration file fragment illustrating most of these points is shown below:
Kpathsea recognizes certain special characters and constructions in search paths, similar to those available in Unix shells. As a general example, the complex path, ~$USER/{foo,bar}//baz, expands to all subdirectories under directories foo and bar in $USER’s home directory that contain a directory or file baz. These expansions are explained in the sections below.
If the highest-priority search path (see “Path sources” on page 67) contains an extra colon (i.e., leading, trailing, or doubled), Kpathsea inserts at that point the next-highest-priority search path that is defined. If that inserted path has an extra colon, the same happens with the next highest. For example, given an environment variable setting
Since it would be useless to insert the default value in more than one place, Kpathsea changes only one extra ‘:’ and leaves any others in place: it checks first for a leading ‘:’, then a trailing ‘:’, then a doubled ‘:’.
A useful feature is brace expansion, which means that, for instance, v{a,b}w expands to vaw:vbw. Nesting is allowed. This can be used to implement multiple TeX hierarchies, by assigning a brace list to $TEXMF. For example, in texmf.cnf, you find the following definition:
Using this you can then write something like
which means that, after looking in the current directory, the $HOMETEXMF/tex, $TEXMFLOCAL/tex, $VARTEXMF/tex and $TEXMFMAIN/tex trees only) will be searched (the last two use using ls-R data base files). It is a convenient way for running two parallel TeX structures, one “frozen” (on a CD, for instance) and the other being continuously updated with new versions as they become available. By using the $TEXMF variable in all definitions, one is sure to always search the up-to-date tree first.
Two or more consecutive slashes in a path element following a directory d is replaced by all subdirectories of d: first those subdirectories directly under d, then the subsubdirectories under those, and so on. At each level, the order in which the directories are searched is unspecified.
If you specify any filename components after the ‘//’, only subdirectories with matching components are included. For example, ‘/a//b’ expands into directories /a/1/b, /a/2/b, /a/1/1/b, and so on, but not /a/b/c or /a/1.
Multiple ‘//’ constructs in a path are possible, but ‘//’ at the beginning of a path is ignored.
The following list summarizes the special characters in Kpathsea configuration files.
Kpathsea goes to some lengths to minimize disk accesses for searches. Nevertheless, at installations with enough directories, searching each possible directory for a given file can take an excessively long time (this is especially true if many hundreds of font directories have to be traversed.) Therefore, Kpathsea can use an externally-built plain text “database” file named ls-R that maps files to directories, thus avoiding the need to exhaustively search the disk.
A second database file aliases allows you to give additional names to the files listed in ls-R. This can be helpful to confirm to DOS 8.3 filename conventions in source files.
As explained above, the name of the main filename database must be ls-R. You can put one at the root of each TeX hierarchy in your installation that you wish to be searched ($TEXMF by default); most sites have only one hierarchy. Kpathsea looks for ls-R files along the TEXMFDBS path.
The recommended way to create and maintain ‘ls-R’ is to run the mktexlsr script included with the distribution. It is invoked by the various ‘mktex’. . . scripts. In principle, this script just runs the command
If a file is not found in the database, by default Kpathsea goes ahead and searches the disk. If a particular path element begins with ‘!!’, however, only the database will be searched for that element, never the disk.
The kpsewhich program exercises path searching independent of any particular application. This can be useful as a sort of find program to locate files in TeX hierarchies (this is used heavily in the distributed ‘mktex’. . . scripts).
Kpathsea looks up each non-option argument on the command line as a filename, and returns the first file found. There is no option to return all the files with a particular name (you can run the Unix ‘find’ utility for that).
The more important options are described next.
The last two entries in Table 3 are special cases, where the paths and environment variables depend on the name of the program: the variable name is constructed by converting the program name to upper case, and then appending INPUTS.
The environment variables are set by default in the configuration file texmf.cnf. It is only when you want to override one or more of the values specified in that file that you might want to set them explicitly in your execution environment.
The ‘--format’ and ‘--path’ options are mutually exclusive.
Let us now have a look at Kpathsea in action. Here’s a straightforward search:
The latter is a BIBTeX bibliography database for TUGBoat articles.
Next we turn our attention to dvips’s header and configuration files. We first look at one of the commonly used files, the general prolog tex.pro for TeX support, before turning our attention to the generic configuration file (config.ps) and the PostScript font map psfonts.map. As the ‘.ps’ suffix is ambiguous we have to specify explicitly which type we are considering (dvips config) for the file config.ps.
We now take a closer look at the URW Times PostScript support files. The prefix for these in the standard font naming scheme is ‘utm’. The first file we look at is the configuration file, which contains the name of the map file:
It should be evident from these few examples how you can easily locate the whereabouts of a given file. This is especially important if you suspect that the wrong version of a file is picked up somehow, since kpsewhich will show you the first file encountered.
Sometimes it is necessary to investigate how a program resolves file references. To make this practical, Kpathsea offers various levels of debugging output:
A value of -1 will set all the above options; in practice, this is usually the most convenient value to use.
Similarly, with the dvips program, by setting a combination of debug switches, one can follow in detail where files are being picked up from. Alternatively, when a file is not found, the debug trace shows in which directories the program looks for the given file, so that one can get an indication what the problem is.
Generally speaking, as most programs call the Kpathsea library internally, one can select a debug option by using the KPATHSEA_DEBUG environment variable, and setting it to (a combination of) values as described in the above list.
(Note for Windows users: it is not easy to redirect all messages to a file in this system. For diagnostic purposes you can temporarily SET KPATHSEA_DEBUG_OUTPUT=err.log).
Let us consider, as an example, a small LaTeX source file, hello-world.tex, which contains the following input.
This little file only uses the font cmr10, so let us look how dvips prepares the PostScript file (we want to use the Type 1 version of the Computer Modern fonts, hence the option -Pcms).
debug:start search(file=texmf.cnf, must_exist=1, find_all=1,
path=.:/usr/local/bin/texlive:/usr/local/bin: /usr/local/bin/texmf/web2c:/usr/local: /usr/local/texmf/web2c:/.:/./teTeX/TeX/texmf/web2c:). kdebug:start search(file=ls-R, must_exist=1, find_all=1, path=~/tex:/usr/local/texmf). kdebug:search(ls-R) =>/usr/local/texmf/ls-R kdebug:start search(file=aliases, must_exist=1, find_all=1, path=~/tex:/usr/local/texmf). kdebug:search(aliases) => /usr/local/texmf/aliases kdebug:start search(file=config.ps, must_exist=0, find_all=0, path=.:~/tex:!!/usr/local/texmf/dvips//). kdebug:search(config.ps) => /usr/local/texmf/dvips/config/config.ps kdebug:start search(file=/root/.dvipsrc, must_exist=0, find_all=0, path=.:~/tex:!!/usr/local/texmf/dvips//). search(file=/home/goossens/.dvipsrc, must_exist=1, find_all=0, path=.:~/tex/dvips//:!!/usr/local/texmf/dvips//). kdebug:search($HOME/.dvipsrc) => kdebug:start search(file=config.cms, must_exist=0, find_all=0, path=.:~/tex/dvips//:!!/usr/local/texmf/dvips//). kdebug:search(config.cms) =>/usr/local/texmf/dvips/cms/config.cms
kdebug:start search(file=texc.pro, must\_exist=0, find\_all=0,
path=.:~/tex/dvips//:!!/usr/local/texmf/dvips//: ~/tex/fonts/type1//:!!/usr/local/texmf/fonts/type1//). kdebug:search(texc.pro) => /usr/local/texmf/dvips/base/texc.pro
kdebug:start search(file=cmr10.tfm, must\_exist=1, find\_all=0,
path=.:~/tex/fonts/tfm//:!!/usr/local/texmf/fonts/tfm//: /var/tex/fonts/tfm//). kdebug:search(cmr10.tfm) => /usr/local/texmf/fonts/tfm/public/cm/cmr10.tfm kdebug:start search(file=texps.pro, must\_exist=0, find\_all=0, ... <texps.pro> kdebug:start search(file=cmr10.pfb, must\_exist=0, find\_all=0, path=.:~/tex/dvips//:!!/usr/local/texmf/dvips//: ~/tex/fonts/type1//:!!/usr/local/texmf/fonts/type1//). kdebug:search(cmr10.pfb) => /usr/local/texmf/fonts/type1/public/cm/cmr10.pfb <cmr10.pfb>[1]
|
dvips starts by locating its working files. First, texmf.cnf is found, which gives the definitions of the search paths for the other files, then the file database ls-R (to optimize file searching) and the file aliases, which makes it possible to declare several names (e.g., a short DOS-like 8.3 and a more natural longer version) for the same file. Then dvips goes on to find the generic configuration file config.ps before looking for the customization file .dvipsrc (which, in this case is not found). Finally, dvips locates the config file for the Computer Modern PostScript fonts config.cms (this was initiated with the -Pcms option on the dvips command). This file contains the list of the map files which define the relation between the TeX, PostScript and file system names of the fonts.
At this point dvips identifies itself to the user:
After having found the file in question, dvips outputs date and time, and informs us that it will generate the file hello-world.ps, then that it needs the font file cmr10, and that the latter is declared as “resident” (no bitmaps needed):
Another useful feature of Web2C is its possibility to control a number of memory parameters (in particular, array sizes) via the runtime file texmf.cnf read by Kpathsea. The memory settings can be found in Part 3 of that file in the TeX Live distribution. The more important are:
Of course, this facility is no substitute for truly dynamic arrays and memory allocation, but since this is extremely difficult to implement in the present TeX source, these runtime parameters provide a practical compromise allowing some flexibility.