RCS--A System for Version Control
Walter F. Tichy
Department of Computer Sciences
Purdue University
West Lafayette, Indiana 47907
ABSTRACT
An important problem in program development
and maintenance is version control, i.e., the task
of keeping a software system consisting of many
versions and configurations well organized. The
Revision Control System (RCS) is a software tool
that assists with that task. RCS manages revi
sions of text documents, in particular source pro
grams, documentation, and test data. It automates
the storing, retrieval, logging and identification
of revisions, and it provides selection mechanisms
for composing configurations. This paper intro
duces basic version control concepts and discusses
the practice of version control using RCS. For
conserving space, RCS stores deltas, i.e., differ
ences between successive revisions. Several delta
storage methods are discussed. Usage statistics
show that RCS's delta storage method is space and
time efficient. The paper concludes with a
detailed survey of version control tools.
Keywords: configuration management, history man
agement, version control, revisions, deltas.
1995/06/01
RCS--A System for Version Control
Walter F. Tichy
Department of Computer Sciences
Purdue University
West Lafayette, Indiana 47907
1. Introduction
Version control is the task of keeping software systems
consisting of many versions and configurations well orga
nized. The Revision Control System (RCS) is a set of UNIX
commands that assist with that task.
RCS' primary function is to manage revision groups. A
revision group is a set of text documents, called revisions,
that evolved from each other. A new revision is created by
manually editing an existing one. RCS organizes the revi
sions into an ancestral tree. The initial revision is the
root of the tree, and the tree edges indicate from which
revision a given one evolved. Besides managing individual
revision groups, RCS provides flexible selection functions
for composing configurations. RCS may be combined with
MAKE1, resulting in a powerful package for version control.
RCS also offers facilities for merging updates with
customer modifications, for distributed software develop
ment, and for automatic identification. Identification is
the `stamping' of revisions and configurations with unique
markers. These markers are akin to serial numbers, telling
software maintainers unambiguously which configuration is
before them.
RCS is designed for both production and experimental
environments. In production environments, access controls
detect update conflicts and prevent overlapping changes. In
experimental environments, where strong controls are coun
terproductive, it is possible to loosen the controls.
Although RCS was originally intended for programs, it
is useful for any text that is revised frequently and whose
previous revisions must be preserved. RCS has been applied
successfully to store the source text for drawings, VLSI
layouts, documentation, specifications, test data, form let
ters and articles.
This paper discusses the practice of version control
using RCS. It also introduces basic version control con
cepts, useful for clarifying current practice and designing
similar systems. Revision groups of individual components
are treated in the next three sections, and the extensions
to configurations follow. Because of its size, a survey of
version control tools appears at the end of the paper.
2. Getting started with RCS
Suppose a text file f.c is to be placed under control
of RCS. Invoking the check-in command
ci f.c
creates a new revision group with the contents of f.c as the
initial revision (numbered 1.1) and stores the group into
the file f.c,v. Unless told otherwise, the command deletes
f.c. It also asks for a description of the group. The
description should state the common purpose of all revisions
in the group, and becomes part of the group's documentation.
All later check-in commands will ask for a log entry, which
should summarize the changes made. (The first revision is
assigned a default log message, which just records the fact
that it is the initial revision.)
Files ending in ,v are called RCS files (v stands for
versions); the others are called working files. To get back
the working file f.c in the previous example, execute the
check-out command:
co f.c
This command extracts the latest revision from the revision
group f.c,v and writes it into f.c. The file f.c can now be
edited and, when finished, checked back in with ci:
ci f.c
Ci assigns number 1.2 to the new revision. If ci complains
with the message
ci error: no lock set by <login>
then the system administrator has decided to configure RCS
for a production environment by enabling the `strict locking
feature'. If this feature is enabled, all RCS files are
initialized such that check-in operations require a lock on
the previous revision (the one from which the current one
evolved). Locking prevents overlapping modifications if
several people work on the same file. If locking is
required, the revision should have been locked during the
check-out by using the option -l:
co -l f.c
Of course it is too late now for the check-out with locking,
because f.c has already been changed; checking out the file
again would overwrite the modifications. (To prevent acci
dental overwrites, co senses the presence of a working file
and asks whether the user really intended to overwrite it.
The overwriting check-out is sometimes useful for backing up
to the previous revision.) To be able to proceed with the
check-in in the present case, first execute
rcs -l f.c
This command retroactively locks the latest revision, unless
someone else locked it in the meantime. In this case, the
two programmers involved have to negotiate whose modifica
tions should take precedence.
If an RCS file is private, i.e., if only the owner of
the file is expected to deposit revisions into it, the
strict locking feature is unnecessary and may be disabled.
If strict locking is disabled, the owner of the RCS file
need not have a lock for check-in. For safety reasons, all
others still do. Turning strict locking off and on is done
with the commands:
rcs -U f.c and rcs -L f.c
These commands enable or disable the strict locking feature
for each RCS file individually. The system administrator
only decides whether strict locking is enabled initially.
To reduce the clutter in a working directory, all RCS
files can be moved to a subdirectory with the name RCS. RCS
commands look first into that directory for RCS files. All
the commands presented above work with the RCS subdirectory
without change.
It may be undesirable that ci deletes the working file.
For instance, sometimes one would like to save the current
revision, but continue editing. Invoking
ci -l f.c
checks in f.c as usual, but performs an additional check-out
with locking afterwards. Thus, the working file does not
disappear after the check-in. Similarly, the option -u does
a check-in followed by a check-out without locking. This
option is useful if the file is needed for compilation after
the check-in. Both options update the identification mark
ers in the working file (see below).
Besides the operations ci and co, RCS provides the fol
lowing commands:
ident extract identification markers
rcs change RCS file attributes
rcsclean remove unchanged working files (optional)
rcsdiff compare revisions
rcsfreeze record a configuration (optional)
rcsmerge merge revisions
rlog read log messages and other information in RCS files
A synopsis of these commands appears in the Appendix.
2.1. Automatic Identification
RCS can stamp source and object code with special iden
tification strings, similar to product and serial numbers.
To obtain such identification, place the marker
$Id$
into the text of a revision, for instance inside a comment.
The check-out operation will replace this marker with a
string of the form
$Id: filename revisionnumber date time author state locker $
This string need never be touched, because co keeps it up to
date automatically. To propagate the marker into object
code, simply put it into a literal character string. In C,
this is done as follows:
static char rcsid[] = "$Id$";
The command ident extracts such markers from any file, in
particular from object code. Ident helps to find out which
revisions of which modules were used in a given program. It
returns a complete and unambiguous component list, from
which a copy of the program can be reconstructed. This
facility is invaluable for program maintenance.
There are several additional identification markers,
one for each component of $Id$. The marker
$Log$
has a similar function. It accumulates the log messages
that are requested during check-in. Thus, one can maintain
the complete history of a revision directly inside it, by
enclosing it in a comment. Figure 1 is an edited version of
a log contained in revision 4.1 of the file ci.c. The log
appears at the beginning of the file, and makes it easy to
determine what the recent modifications were.
/*
* $Log: ci.c,v $
* Revision 4.1 1983/05/10 17:03:06 wft
* Added option -d and -w, and updated assignment of date, etc. to new delta.
* Added handling of default branches.
*
* Revision 3.9 1983/02/15 15:25:44 wft
* Added call to fastcopy() to copy remainder of RCS file.
*
* Revision 3.8 1983/01/14 15:34:05 wft
* Added ignoring of interrupts while new RCS file is renamed;
* avoids deletion of RCS files by interrupts.
*
* Revision 3.7 1982/12/10 16:09:20 wft
* Corrected checking of return code from diff.
* An RCS file now inherits its mode during the first ci from the working file,
* except that write permission is removed.
*/
Figure 1. Log entries produced by the marker $Log$.
Since revisions are stored in the form of differences, each
log message is physically stored once, independent of the
number of revisions present. Thus, the $Log$ marker incurs
negligible space overhead.
3. The RCS Revision Tree
RCS arranges revisions in an ancestral tree. The ci
command builds this tree; the auxiliary command rcs prunes
it. The tree has a root revision, normally numbered 1.1,
and successive revisions are numbered 1.2, 1.3, etc. The
first field of a revision number is called the release num
ber and the second one the level number. Unless given
explicitly, the ci command assigns a new revision number by
incrementing the level number of the previous revision. The
release number must be incremented explicitly, using the -r
option of ci. Assuming there are revisions 1.1, 1.2, and
1.3 in the RCS file f.c,v, the command
ci -r2.1 f.c or ci -r2 f.c
assigns the number 2.1 to the new revision. Later check-ins
without the -r option will assign the numbers 2.2, 2.3, and
so on. The release number should be incremented only at
major transition points in the development, for instance
when a new release of a software product has been completed.
3.1. When are branches needed?
A young revision tree is slender: It consists of only
one branch, called the trunk. As the tree ages, side
branches may form. Branches are needed in the following 4
situations.
Temporary fixes
Suppose a tree has 5 revisions grouped in 2 releases,
as illustrated in Figure 2. Revision 1.3, the last one
of release 1, is in operation at customer sites, while
release 2 is in active development.
1.1 ----1.2 ----1.3 ----2.1 ----2.2 ++++
Figure 2. A slender revision tree.
Now imagine a customer requesting a fix of a problem in
revision 1.3, although actual development has moved on
to release 2. RCS does not permit an extra revision to
be spliced in between 1.3 and 2.1, since that would not
reflect the actual development history. Instead, cre
ate a branch at revision 1.3, and check in the fix on
that branch. The first branch starting at 1.3 has num
ber 1.3.1, and the revisions on that branch are num
bered 1.3.1.1, 1.3.1.2, etc. The double numbering is
needed to allow for another branch at 1.3, say 1.3.2.
Revisions on the second branch would be numbered
1.3.2.1, 1.3.2.2, and so on. The following steps cre
ate branch 1.3.1 and add revision 1.3.1.1:
co -r1.3 f.c -- check out revision 1.3
edit f.c -- change it
ci -r1.3.1 f.c -- check it in on branch 1.3.1
This sequence of commands transforms the tree of Figure
2 into the one in Figure 3. Note that it may be neces
sary to incorporate the differences between 1.3 and
1.3.1.1 into a revision at level 2. The operation
rcsmerge automates this process (see the Appendix).
1.1 ----1.2 ----1.3 ----2.1 ----2.2 ++++
1.3.1.+1+++
Figure 3. A revision tree with one side branch
Distributed development and customer modifications
Assume a situation as in Figure 2, where revision 1.3
is in operation at several customer sites, while
release 2 is in development. Customer sites should use
RCS to store the distributed software. However, cus
tomer modifications should not be placed on the same
branch as the distributed source; instead, they should
be placed on a side branch. When the next software
distribution arrives, it should be appended to the
trunk of the customer's RCS file, and the customer can
then merge the local modifications back into the new
release. In the above example, a customer's RCS file
would contain the following tree, assuming that the
customer has received revision 1.3, added his local
modifications as revision 1.3.1.1, then received revi
sion 2.4, and merged 2.4 and 1.3.1.1, resulting in
2.4.1.1.
1.3 --------------------2.4
1.3.1.1 2.4.1.1
Figure 4. A customer's revision tree with local modifications.
This approach is actually practiced in the CSNET pro
ject, where several universities and a company cooper
ate in developing a national computer network.
Parallel development
Sometimes it is desirable to explore an alternate
design or a different implementation technique in par
allel with the main line development. Such development
should be carried out on a side branch. The experimen
tal changes may later be moved into the main line, or
abandoned.
Conflicting updates
A common occurrence is that one programmer has checked
out a revision, but cannot complete the assignment for
some reason. In the meantime, another person must per
form another modification immediately. In that case,
the second person should check-out the same revision,
modify it, and check it in on a side branch, for later
merging.
Every node in a revision tree consists of the following
attributes: a revision number, a check-in date and time, the
author's identification, a log entry, a state and the actual
text. All these attributes are determined at the time the
revision is checked in. The state attribute indicates the
status of a revision. It is set automatically to `experi
mental' during check-in. A revision can later be promoted
to a higher status, for example `stable' or `released'. The
set of states is user-defined.
3.2. Revisions are represented as deltas
For conserving space, RCS stores revisions in the form
of deltas, i.e., as differences between revisions. The user
interface completely hides this fact.
A delta is a sequence of edit commands that transforms
one string into another. The deltas employed by RCS are
line-based, which means that the only edit commands allowed
are insertion and deletion of lines. If a single character
in a line is changed, the edit scripts consider the entire
line changed. The program diff2 produces a small, line-
based delta between pairs of text files. A character-based
edit script would take much longer to compute, and would not
be significantly shorter.
Using deltas is a classical space-time tradeoff: deltas
reduce the space consumed, but increase access time. How
ever, a version control tool should impose as little delay
as possible on programmers. Excessive delays discourage the
use of version controls, or induce programmers to take
shortcuts that compromise system integrity. To gain reason
ably fast access time for both editing and compiling, RCS
arranges deltas in the following way. The most recent revi
sion on the trunk is stored intact. All other revisions on
the trunk are stored as reverse deltas. A reverse delta
describes how to go backward in the development history: it
produces the desired revision if applied to the successor of
that revision. This implementation has the advantage that
extraction of the latest revision is a simple and fast copy
operation. Adding a new revision to the trunk is also fast:
ci simply adds the new revision intact, replaces the previ
ous revision with a reverse delta, and keeps the rest of the
old deltas. Thus, ci requires the computation of only one
new delta.
Branches need special treatment. The naive solution
would be to store complete copies for the tips of all
branches. Clearly, this approach would cost too much space.
Instead, RCS uses forward deltas for branches. Regenerating
a revision on a side branch proceeds as follows. First,
extract the latest revision on the trunk; secondly, apply
reverse deltas until the fork revision for the branch is
obtained; thirdly, apply forward deltas until the desired
branch revision is reached. Figure 5 illustrates a tree
with one side branch. Triangles pointing to the left and
right represent reverse and forward deltas, respectively.
| | | |
+---- +---- +---- +----
1.1| 1.2 | 1.3| 2.1| 2.2
| | | |
| |
1.|3.1.-1---1-.|3.1.2
| |
| |
Figure 5. A revision tree with reverse and forward deltas.
Although implementing fast check-out for the latest
trunk revision, this arrangement has the disadvantage that
generation of other revisions takes time proportional to the
number of deltas applied. For example, regenerating the
branch tip in Figure 5 requires application of five deltas
(including the initial one). Since usage statistics show
that the latest trunk revision is the one that is retrieved
in 95 per cent of all cases (see the section on usage
statistics), biasing check-out time in favor of that revi
sion results in significant savings. However, careful
implementation of the delta application process is necessary
to provide low retrieval overhead for other revisions, in
particular for branch tips.
There are several techniques for delta application.
The naive one is to pass each delta to a general-purpose
text editor. A prototype of RCS invoked the UNIX editor ed
both for applying deltas and for expanding the identifica
tion markers. Although easy to implement, performance was
poor, owing to the high start-up costs and excess generality
of ed. An intermediate version of RCS used a special-pur
pose, stream-oriented editor. This technique reduced the
cost of applying a delta to the cost of checking out the
latest trunk revision. The reason for this behavior is that
each delta application involves a complete pass over the
preceding revision.
However, there is a much better algorithm. Note that
the deltas are line oriented and that most of the work of a
stream editor involves copying unchanged lines from one
revision to the next. A faster algorithm avoids unnecessary
copying of character strings by using a piece table. A
piece table is a one-dimensional array, specifying how a
given revision is `pieced together' from lines in the RCS
file. Suppose piece table PTr represents revision r. Then
PTr[i] contains the starting position of line i of revision
r. Application of the next delta transforms piece table PTr
into PTr+1. For instance, a delete command removes a series
of entries from the piece table. An insertion command
inserts new entries, moving the entries following the inser
tion point further down the array. The inserted entries
point to the text lines in the delta. Thus, no I/O is
involved except for reading the delta itself. When all
deltas have been applied to the piece table, a sequential
pass through the table looks up each line in the RCS file
and copies it to the output file, updating identification
markers at the same time. Of course, the RCS file must per
mit random access, since the copied lines are scattered
throughout that file. Figure 6 illustrates an RCS file with
two revisions and the corresponding piece tables.
Figure 6 is not available.
Figure 6. An RCS file and its piece tables
The piece table approach has the property that the time
for applying a single delta is roughly determined by the
size of the delta, and not by the size of the revision. For
example, if a delta is 10 per cent of the size of a revi
sion, then applying it takes only 10 per cent of the time to
generate the latest trunk revision. (The stream editor
would take 100 per cent.)
There is an important alternative for representing
deltas that affects performance. SCCS3, a precursor of RCS,
uses interleaved deltas. A file containing interleaved
deltas is partitioned into blocks of lines. Each block has
a header that specifies to which revision(s) the block
belongs. The blocks are sorted out in such a way that a
single pass over the file can pick up all the lines belong
ing to a given revision. Thus, the regeneration time for
all revisions is the same: all headers must be inspected,
and the associated blocks either copied or skipped. As the
number of revisions increases, the cost of retrieving any
revision is much higher than the cost of checking out the
latest trunk revision with reverse deltas. A detailed com
parison of SCCS's interleaved deltas and RCS's reverse
deltas can be found in Reference 4. This reference consid
ers the version of RCS with the stream editor only. The
piece table method improves performance further, so that RCS
is always faster than SCCS, except if 10 or more deltas are
applied.
Additional speed-up for both delta methods can be
obtained by caching the most recently generated revision, as
has been implemented in DSEE.5 With caching, access time to
frequently used revisions can approach normal file access
time, at the cost of some additional space.
4. Locking: A Controversial Issue
The locking mechanism for RCS was difficult to design.
The problem and its solution are first presented in their
`pure' form, followed by a discussion of the complications
caused by `real-world' considerations.
RCS must prevent two or more persons from depositing
competing changes of the same revision. Suppose two pro
grammers check out revision 2.4 and modify it. Programmer A
checks in a revision before programmer B. Unfortunately,
programmer B has not seen A's changes, so the effect is that
A's changes are covered up by B's deposit. A's changes are
not lost since all revisions are saved, but they are con
fined to a single revision.
This conflict is prevented in RCS by locking. Whenever
someone intends to edit a revision (as opposed to reading or
compiling it), the revision should be checked out and
locked, using the -l option on co. On subsequent check-in,
ci tests the lock and then removes it. At most one program
mer at a time may lock a particular revision, and only this
programmer may check in the succeeding revision. Thus,
while a revision is locked, it is the exclusive responsibil
ity of the locker.
An important maxim for software tools like RCS is that
they must not stand in the way of making progress with a
project. This consideration leads to several weakenings of
the locking mechanism. First of all, even if a revision is
locked, it can still be checked out. This is necessary if
other people wish to compile or inspect the locked revision
while the next one is in preparation. The only operations
they cannot do are to lock the revision or to check in the
succeeding one. Secondly, check-in operations on other
branches in the RCS file are still possible; the locking of
one revision does not affect any other revision. Thirdly,
revisions are occasionally locked for a long period of time
because a programmer is absent or otherwise unable to com
plete the assignment. If another programmer has to make a
pressing change, there are the following three alternatives
for making progress: a) find out who is holding the lock and
ask that person to release it; b) check out the locked revi
sion, modify it, check it in on a branch, and merge the
changes later; c) break the lock. Breaking a lock leaves a
highly visible trace, namely an electronic mail message that
is sent automatically to the holder of the lock, recording
the breaker and a commentary requested from him. Thus,
breaking locks is tolerated under certain circumstances, but
will not go unnoticed. Experience has shown that the auto
matic mail message attaches a high enough stigma to lock
breaking, such that programmers break locks only in real
emergencies, or when a co-worker resigns and leaves locked
revisions behind.
If an RCS file is private, i.e., when a programmer owns
an RCS file and does not expect anyone else to perform
check-in operations, locking is an unnecessary nuisance. In
this case, the `strict locking feature' discussed earlier
may be disabled, provided that file protection is set such
that only the owner may write the RCS file. This has the
effect that only the owner can check-in revisions, and that
no lock is needed for doing so.
As added protection, each RCS file contains an access
list that specifies the users who may execute update opera
tions. If an access list is empty, only normal UNIX file
protection applies. Thus, the access list is useful for
restricting the set of people who would otherwise have
update permission. Just as with locking, the access list
has no effect on read-only operations such as co. This
approach is consistent with the UNIX philosophy of openness,
which contributes to a productive software development envi
ronment.
5. Configuration Management
The preceding sections described how RCS deals with
revisions of individual components; this section discusses
how to handle configurations. A configuration is a set of
revisions, where each revision comes from a different revi
sion group, and the revisions are selected according to a
certain criterion. For example, in order to build a func
tioning compiler, the `right' revisions from the scanner,
the parser, the optimizer and the code generator must be
combined. RCS, in conjunction with MAKE, provides a number
of facilities to effect a smooth selection.
5.1. RCS Selection Functions
Default selection
During development, the usual selection criterion is to
choose the latest revision of all components. The co
command makes this selection by default. For example,
the command
co *,v
retrieves the latest revision on the default branch of
each RCS file in the current directory. The default
branch is usually the trunk, but may be set to be a
side branch. Side branches as defaults are needed in
distributed software development, as discussed in the
section on the RCS revision tree.
Release based selection
Specifying a release or branch number selects the lat
est revision in that release or branch. For instance,
co -r2 *,v
retrieves the latest revision with release number 2
from each RCS file. This selection is convenient if a
release has been completed and development has moved on
to the next release.
State and author based selection
If the highest level number within a given release num
ber is not the desired one, the state attribute can
help. For example,
co -r2 -sReleased *,v
retrieves the latest revision with release number 2
whose state attribute is `Released'. Of course, the
state attribute has to be set appropriately, using the
ci or rcs commands. Another alternative is to select a
revision by its author, using the -w option.
Date based selection
Revisions may also be selected by date. Suppose a
release of an entire system was completed and current
on March 4, at 1:00 p.m. local time. Then the command
co -d'March 4, 1:00 pm LT' *,v
checks out all the components of that release, indepen
dent of the numbering. The -d option specifies a `cut
off date', i.e., the revision selected has a check-in
date that is closest to, but not after the date given.
Name based selection
The most powerful selection function is based on
assigning symbolic names to revisions and branches. In
large systems, a single release number or date is not
sufficient to collect the appropriate revisions from
all groups. For example, suppose one wishes to combine
release 2 of one subsystem and release 15 of another.
Most likely, the creation dates of those releases dif
fer also. Thus, a single revision number or date
passed to the co command will not suffice to select the
right revisions. Symbolic revision numbers solve this
problem. Each RCS file may contain a set of symbolic
names that are mapped to numeric revision numbers. For
example, assume the symbol V3 is bound to release num
ber 2 in file s,v, and to revision number 15.9 in t,v.
Then the single command
co -rV3 s,v t,v
retrieves the latest revision of release 2 from s,v,
and revision 15.9 from t,v. In a large system with
many modules, checking out all revisions with one com
mand greatly simplifies configuration management.
Judicious use of symbolic revision numbers helps with
organizing large configurations. A special command, rcs
freeze, assigns a symbolic revision number to a selected
revision in every RCS file. Rcsfreeze effectively freezes a
configuration. The assigned symbolic revision number
selects all components of the configuration. If necessary,
symbolic numbers may even be intermixed with numeric ones.
Thus, V3.5 in the above example would select revision 2.5 in
s,v and branch 15.9.5 in t,v.
The options -r, -s, -w and -d may be combined. If a
branch is given, the latest revision on that branch satisfy
ing all conditions is retrieved; otherwise, the default
branch is used.
5.2. Combining MAKE and RCS
MAKE1 is a program that processes configurations. It
is driven by configuration specifications recorded in a spe
cial file, called a `Makefile'. MAKE avoids redundant pro
cessing steps by comparing creation dates of source and pro
cessed objects. For example, when instructed to compile all
modules of a given system, it only recompiles those source
modules that were changed since they were processed last.
MAKE has been extended with an auto-checkout feature
for RCS.* When a certain file to be processed is not pre
sent, MAKE attempts a check-out operation. If successful,
MAKE performs the required processing, and then deletes the
checked out file to conserve space. The selection parame
ters discussed above can be passed to MAKE either as parame
ters, or directly embedded in the Makefile. MAKE has also
been extended to search the subdirectory named RCS for
needed files, rather than just the current working direc
tory. However, if a working file is present, MAKE totally
ignores the corresponding RCS file and uses the working
file. (In newer versions of MAKE distributed by AT&T and
others, auto-checkout can be achieved with the rule DEFAULT,
instead of a special extension of MAKE. However, a file
checked out by the rule DEFAULT will not be deleted after
processing. Rcsclean can be used for that purpose.)
With auto-checkout, RCS/MAKE can effect a selection
rule especially tuned for multi-person software development
and maintenance. In these situations, programmers should
obtain configurations that consist of the revisions they
have personally checked out plus the latest checked in revi
sion of all other revision groups. This schema can be set
up as follows.
Each programmer chooses a working directory and places
into it a symbolic link, named RCS, to the directory con
taining the relevant RCS files. The symbolic link makes
sure that co and ci operations need only specify the working
files, and that the Makefile need not be changed. The pro
grammer then checks out the needed files and modifies them.
If MAKE is invoked, it composes configurations by selecting
those revisions that are checked out, and the rest from the
subdirectory RCS. The latter selection may be controlled by
a symbolic revision number or any of the other selection
criteria. If there are several programmers editing in sepa
rate working directories, they are insulated from each
other's changes until checking in their modifications.
Similarly, a maintainer can recreate an older configu
ration by starting to work in an empty working directory.
During the initial MAKE invocation, all revisions are
selected from RCS files. As the maintainer checks out files
and modifies them, a new configuration is gradually built
up. Every time MAKE is invoked, it substitutes the modified
revisions into the configuration being manipulated.
A final application of RCS is to use it for storing
Makefiles. Revision groups of Makefiles represent multiple
versions of configurations. Whenever a configuration is
baselined or distributed, the best approach is to unambigu
ously fix the configuration with a symbolic revision number
by calling rcsfreeze, to embed that symbol into the Make
file, and to check in the Makefile (using the same symbolic
revision number). With this approach, old configurations
can be regenerated easily and reliably.
6. Usage Statistics
The following usage statistics were collected on two
DEC VAX-11/780 computers of the Purdue Computer Science
Department. Both machines are mainly used for research pur
poses. Thus, the data reflect an environment in which the
majority of projects involve prototyping and advanced soft
ware development, but relatively little long-term mainte
nance.
For the first experiment, the ci and co operations were
instrumented to log the number of backward and forward
deltas applied. The data were collected during a 13 month
period from Dec. 1982 to Dec. 1983. Table I summarizes the
results.
+----------+-------------+--------------+-------------+---------------+------------+
|Operation | Total | Total deltas | Mean deltas | Operations | Branch |
| | operations | applied | applied | with >1 delta | operations |
+----------+-------------+--------------+-------------+---------------+------------+
|co | 7867 | 9320 | 1.18 | 509 (6%) | 203 (3%) |
|ci | 3468 | 2207 | 0.64 | 85 (2%) | 75 (2%) |
|ci & co | 11335 | 11527 | 1.02 | 594 (5%) | 278 (2%) |
+----------+-------------+--------------+-------------+---------------+------------+
Table I. Statistics for co and ci operations.
The first two lines show statistics for check-out and
check-in; the third line shows the combination. Recall that
ci performs an implicit check-out to obtain a revision for
computing the delta. In all measures presented, the most
recent revision (stored intact) counts as one delta. The
number of deltas applied represents the number of passes
necessary, where the first `pass' is a copying step.
Note that the check-out operation is executed more than
twice as frequently as the check-in operation. The fourth
column gives the mean number of deltas applied in all three
cases. For ci, the mean number of deltas applied is less
than one. The reasons are that the initial check-in
requires no delta at all, and that the only time ci requires
more than one delta is for branches. Column 5 shows the
actual number of operations that applied more than one
delta. The last column indicates that branches were not
used often.
The last three columns demonstrate that the most recent
trunk revision is by far the most frequently accessed. For
RCS, check-out of this revision is a simple copy operation,
which is the absolute minimum given the copy-semantics of
co. Access to older revisions and branches is more common
in non-academic environments, yet even if access to older
deltas were an order of magnitude more frequent, the com
bined average number of deltas applied would still be below
1.2. Since RCS is faster than SCCS until up to 10 delta
applications, reverse deltas are clearly the method of
choice.
The second experiment, conducted in March of 1984,
involved surveying the existing RCS files on our two
machines. The goal was to determine the mean number of
revisions per RCS file, as well as the space consumed by
them. Table II shows the results. (Tables I and II were
produced at different times and are unrelated.)
+------------+-----------+-----------+-----------+--------------+--------------+----------+
| | Total RCS | Total | Mean | Mean size of | Mean size of | Overhead |
| | files | revisions | revisions | RCS files | revisions | |
+------------+-----------+-----------+-----------+--------------+--------------+----------+
|All files | 8033 | 11133 | 1.39 | 6156 | 5585 | 1.10 |
|Files with | 1477 | 4578 | 3.10 | 8074 | 6041 | 1.34 |
|>= 2 deltas | | | | | | |
+------------+-----------+-----------+-----------+--------------+--------------+----------+
Table II. Statistics for RCS files.
The mean number of revisions per RCS file is 1.39.
Columns 5 and 6 show the mean sizes (in bytes) of an RCS
file and of the latest revision of each RCS file, respec
tively. The `overhead' column contains the ratio of the
mean sizes. Assuming that all revisions in an RCS file are
approximately the same size, this ratio gives a measure of
the space consumed by the extra revisions.
In our sample, over 80 per cent of the RCS files con
tained only a single revision. The reason is that our sys
tems programmers routinely check in all source files on the
distribution tapes, even though they may never touch them
again. To get a better indication of how much space savings
are possible with deltas, all measures with those files that
contained 2 or more revisions were recomputed. Only for
those files is RCS necessary. As shown in the second line,
the average number of revisions for those files is 3.10,
with an overhead of 1.34. This means that the extra 2.10
deltas require 34 per cent extra space, or 16 per cent per
extra revision. Rochkind3 measured the space consumed by
SCCS, and reported an average of 5 revisions per group and
an overhead of 1.37 (or about 9 per cent per extra revi
sion). In a later paper, Glasser6 observed an average of 7
revisions per group in a single, large project, but provided
no overhead figure. In his paper on DSEE5, Leblang reported
that delta storage combined with blank compression results
in an overhead of a mere 1-2 per cent per revision. Since
leading blanks accounted for about 20 per cent of the sur
veyed Pascal programs, a revision group with 5-10 members
was smaller than a single cleartext copy.
The above observations demonstrate clearly that the
space needed for extra revisions is small. With delta stor
age, the luxury of keeping multiple revisions online is cer
tainly affordable. In fact, introducing a system with delta
storage may reduce storage requirements, because programmers
often save back-up copies anyway. Since back-up copies are
stored much more efficiently with deltas, introducing a sys
tem such as RCS may actually free a considerable amount of
space.
7. Survey of Version Control Tools
The need to keep back-up copies of software arose when
programs and data were no longer stored on paper media, but
were entered from terminals and stored on disk. Back-up
copies are desirable for reliability, and many modern edi
tors automatically save a back-up copy for every file
touched. This strategy is valuable for short-term back-ups,
but not suitable for long-term version control, since an
existing back-up copy is overwritten whenever the corre
sponding file is edited.
Tape archives are suitable for long-term, offline stor
age. If all changed files are dumped on a back-up tape once
per day, old revisions remain accessible. However, tape
archives are unsatisfactory for version control in several
ways. First, backing up the file system every 24 hours does
not capture intermediate revisions. Secondly, the old revi
sions are not online, and accessing them is tedious and
time-consuming. In particular, it is impractical to compare
several old revisions of a group, because that may require
mounting and searching several tapes. Tape archives are
important fail-safe tools in the event of catastrophic disk
failures or accidental deletions, but they are ill-suited
for version control. Conversely, version control tools do
not obviate the need for tape archives.
A natural technique for keeping several old revisions
online is to never delete a file. Editing a file simply
creates a new file with the same name, but with a different
sequence number. This technique, available as an option in
DEC's VMS operating system, turns out to be inadequate for
version control. First, it is prohibitively expensive in
terms of storage costs, especially since no data compression
techniques are employed. Secondly, indiscriminately storing
every change produces too many revisions, and programmers
have difficulties distinguishing them. The proliferation of
revisions forces programmers to spend much time on finding
and deleting useless files. Thirdly, most of the support
functions like locking, logging, revision selection, and
identification described in this paper are not available.
An alternative approach is to separate editing from
revision control. The user may repeatedly edit a given
revision, until freezing it with an explicit command. Once
a revision is frozen, it is stored permanently and can no
longer be modified. (In RCS, freezing a revisions is done
with ci.) Editing a frozen revision implicitly creates a
new one, which can again be changed repeatedly until it is
frozen itself. This approach saves exactly those revisions
that the user considers important, and keeps the number of
revisions manageable. IBM's CLEAR/CASTER7, AT&T's SCCS3,
CMU's SDC8 and DEC's CMS9, are examples of version control
systems using this approach. CLEAR/CASTER maintains a data
base of programs, specifications, documentation and mes
sages, using deltas. Its goal is to provide control over
the development process from a management viewpoint. SCCS
stores multiple revisions of source text in an ancestral
tree, records a log entry for each revision, provides access
control, and has facilities for uniquely identifying each
revision. An efficient delta technique reduces the space
consumed by each revision group. SDC is much simpler than
SCCS because it stores not more than two revisions. How
ever, it maintains a complete log for all old revisions,
some of which may be on back-up tape. CMS, like SCCS, man
ages tree-structured revision groups, but offers no identi
fication mechanism.
Tools for dealing with configurations are still in a
state of flux. SCCS, SDC and CMS can be combined with MAKE
or MAKE-like programs. Since flexible selection rules are
missing from all these tools, it is sometimes difficult to
specify precisely which revision of each group should be
passed to MAKE for building a desired configuration. The
Xerox Cedar system10 provides a `System Modeller' that can
rebuild a configuration from an arbitrary set of module
revisions. The revisions of a module are only distinguished
by creation time, and there is no tool for managing groups.
Since the selection rules are primitive, the System Modeller
appears to be somewhat tedious to use. Apollo's DSEE5 is a
sophisticated software engineering environment. It manages
revision groups in a way similar to SCCS and CMS. Configu
rations are built using `configuration threads'. A configu
ration thread states which revision of each group named in a
configuration should be chosen. A configuration thread may
contain dynamic specifiers (e.g., `choose the revisions I am
currently working on, and the most recent revisions other
wise'), which are bound automatically at build time. It
also provides a notification mechanism for alerting main
tainers about the need to rebuild a system after a change.
RCS is based on a general model for describing multi-
version/multi-configuration systems11. The model describes
systems using AND/OR graphs, where AND nodes represent con
figurations, and OR nodes represent version groups. The
model gives rise to a suit of selection rules for composing
configurations, almost all of which are implemented in RCS.
The revisions selected by RCS are passed to MAKE for config
uration building. Revision group management is modelled
after SCCS. RCS retains SCCS's best features, but offers a
significantly simpler user interface, flexible selection
rules, adequate integration with MAKE and improved identifi
cation. A detailed comparison of RCS and SCCS appears in
Reference 4.
An important component of all revision control systems
is a program for computing deltas. SCCS and RCS use the
program diff2, which first computes the longest common sub
string of two revisions, and then produces the delta from
that substring. The delta is simply an edit script consist
ing of deletion and insertion commands that generate one
revision from the other.
A delta based on a longest common substring is not nec
essarily minimal, because it does not take advantage of
crossing block moves. Crossing block moves arise if two or
more blocks of lines (e.g., procedures) appear in a differ
ent order in two revisions. An edit script derived from a
longest common substring first deletes the shorter of the
two blocks, and then reinserts it. Heckel12 proposed an
algorithm for detecting block moves, but since the algorithm
is based on heuristics, there are conditions under which the
generated delta is far from minimal. DSEE uses this algo
rithm combined with blank compression, apparently with sat
isfactory overall results. A new algorithm that is guaran
teed to produce a minimal delta based on block moves appears
in Reference 13. A future release of RCS will use this
algorithm.
Acknowledgements: Many people have helped make RCS a
success by contributed criticisms, suggestions, corrections,
and even whole new commands (including manual pages). The
list of people is too long to be reproduced here, but my
sincere thanks for their help and goodwill goes to all of
them.
Appendix: Synopsis of RCS Operations
ci - check in revisions
Ci stores the contents of a working file into the cor
responding RCS file as a new revision. If the RCS file
doesn't exist, ci creates it. Ci removes the working
file, unless one of the options -u or -l is present.
For each check-in, ci asks for a commentary describing
the changes relative to the previous revision.
Ci assigns the revision number given by the -r option;
if that option is missing, it derives the number from
the lock held by the user; if there is no lock and
locking is not strict, ci increments the number of the
latest revision on the trunk. A side branch can only
be started by explicitly specifying its number with the
-r option during check-in.
Ci also determines whether the revision to be checked
in is different from the previous one, and asks whether
to proceed if not. This facility simplifies check-in
operations for large systems, because one need not
remember which files were changed.
The option -k searches the checked in file for identi
fication markers containing the attributes revision
number, check-in date, author and state, and assigns
these to the new revision rather than computing them.
This option is useful for software distribution: Recip
ients of distributed software using RCS should check in
updates with the -k option. This convention guarantees
that revision numbers, check-in dates, etc., are the
same at all sites.
co - check out revisions
Co retrieves revisions according to revision number,
date, author and state attributes. It either places
the revision into the working file, or prints it on the
standard output. Co always expands the identification
markers.
ident - extract identification markers
Ident extracts the identification markers expanded by
co from any file and prints them.
rcs - change RCS file attributes
Rcs is an administrative operation that changes access
lists, locks, unlocks, breaks locks, toggles the
strict-locking feature, sets state attributes and sym
bolic revision numbers, changes the description, and
deletes revisions. A revision can only be deleted if
it is not the fork of a side branch.
rcsclean - clean working directory
Rcsclean removes working files that were checked out
but never changed.*
rcsdiff - compare revisions
Rcsdiff compares two revisions and prints their differ
ence, using the UNIX tool diff. One of the revisions
compared may be checked out. This command is useful
for finding out about changes.
rcsfreeze - freeze a configuration
Rcsfreeze assigns the same symbolic revision number to
a given revision in all RCS files. This command is
useful for accurately recording a configuration.*
rcsmerge - merge revisions
Rcsmerge merges two revisions, rev1 and rev2, with
respect to a common ancestor. A 3-way file comparison
determines the segments of lines that are (a) the same
in all three revisions, or (b) the same in 2 revisions,
or (c) different in all three. For all segments of
type (b) where rev1 is the differing revision, the seg
ment in rev1 replaces the corresponding segment of
rev2. Type (c) indicates an overlapping change, is
flagged as an error, and requires user intervention to
select the correct alternative.
rlog - read log messages
Rlog prints the log messages and other information in
an RCS file.
_________________________
An earlier version of this paper was published in
Software--Practice & Experience 15, 7 (July 1985),
637-654.
Pairs of RCS and working files can actually be spec
ified in 3 ways: a) both are given, b) only the working
file is given, c) only the RCS file is given. If a
pair is given, both files may have arbitrary path pre
fixes; RCS commands pair them up intelligently.
Note that this problem is entirely different from
the atomicity problem. Atomicity means that concurrent
update operations on the same RCS file cannot be per
mitted, because that may result in inconsistent data.
Atomic updates are essential (and implemented in RCS),
but do not solve the conflict discussed here.
* This auto-checkout extension is available only in
some versions of MAKE, e.g. GNU MAKE.
* The rcsclean and rcsfreeze commands are optional
and are not always installed.
- 2 -
References
1. Feldman, Stuart I., "Make--A Program for Maintaining
Computer Programs," Software--Practice & Experience,
vol. 9, no. 3, pp. 255-265, March 1979.
2. Hunt, James W. and McIlroy, M. D., "An Algorithm for
Differential File Comparison," 41, Computing Science
Technical Report, Bell Laboratories, June 1976.
3. Rochkind, Marc J., "The Source Code Control System,"
IEEE Transactions on Software Engineering, vol. SE-1,
no. 4, pp. 364-370, Dec. 1975.
4. Tichy, Walter F., "Design, Implementation, and Evalua
tion of a Revision Control System," in Proceedings of
the 6th International Conference on Software Engineer
ing, pp. 58-67, ACM, IEEE, IPS, NBS, September 1982.
5. Leblang, David B. and Chase, Robert P., "Computer-Aided
Software Engineering in a Distributed Workstation Envi
ronment," SIGPLAN Notices, vol. 19, no. 5, pp. 104-112,
May 1984. Proceedings of the ACM SIGSOFT/SIGPLAN Soft
ware Engineering Symposium on Practical Software Devel
opment Environments.
6. Glasser, Alan L., "The Evolution of a Source Code Con
trol System," Software Engineering Notes, vol. 3, no.
5, pp. 122-125, Nov. 1978. Proceedings of the Software
Quality and Assurance Workshop.
7. Brown, H.B., "The Clear/Caster System," Nato Conference
on Software Engineering, Rome, 1970.
8. Habermann, A. Nico, A Software Development Control Sys
tem, Technical Report, Carnegie-Mellon University,
Department of Computer Science, Jan. 1979.
9. DEC, Code Management System, Digital Equipment Corpora
tion, 1982. Document No. EA-23134-82
10. Lampson, Butler W. and Schmidt, Eric E., "Practical Use
of a Polymorphic Applicative Language," in Proceedings
of the 10th Symposium on Principles of Programming Lan
guages, pp. 237-255, ACM, January 1983.
11. Tichy, Walter F., "A Data Model for Programming Support
Environments and its Application," in Automated Tools
for Information System Design and Development, ed.
Hans-Jochen Schneider and Anthony I. Wasserman, North-
Holland Publishing Company, Amsterdam, 1982.
12. Heckel, Paul, "A Technique for Isolating Differences
Between Files," Communications of the ACM, vol. 21, no.
4, pp. 264-268, April 1978.
13. Tichy, Walter F., "The String-to-String Correction
Problem with Block Moves," ACM Transactions on Computer
Systems, vol. 2, no. 4, pp. 309-321, Nov. 1984.