Re: defining unique symbols

Pieter Schoenmakers <>
16 Jul 1997 23:00:42 -0400

          From comp.compilers

Related articles
defining unique symbols (1997-07-13)
Re: defining unique symbols (1997-07-16)
Re: defining unique symbols (Jerry Leichter) (1997-07-16)
Re: defining unique symbols (Pieter Schoenmakers) (1997-07-16)
Re: defining unique symbols michael.ball@Sun.COM (MICHAEL BALL) (1997-07-22)
Re: defining unique symbols (1997-07-27)
| List of all articles for this month |

From: Pieter Schoenmakers <>
Newsgroups: comp.compilers
Date: 16 Jul 1997 23:00:42 -0400
Organization: Compilers Central
References: 97-07-052
Keywords: OOP, linker

This problem is identical to that of uniquifying Objective-C selectors
(message identifiers). Several considerations are noteworthy in this

  - If you depend on the linker to unique the descriptors, you also depend
      on the dynamic linker/loader to do the same, unless you know absolutely
      surely that programs written in the language will never want to employ
      dynamic code loading (which you can't know).

      Extending the linker could be easy, for instance starting with the GNU
      linker, since that that is fairly portable. Dynamic loading however is
      a rather hairy: very machine dependent and highly unportable.

      An example of this approach is used by the Objective-C runtime library
      as implemented by NeXT, for their nextstep/openstep (and now rhapsody)
      operating environments. The linker is a modified Mach-O linker which
      has notion of an __OBJC segment, with various special section types.
      One of these is used to store the selector strings, which are uniqued
      by the linker. The dynamic linker provides similar functionality. Net
      result is that selectors are equal if and only if they have the same
      address, even in the context of dynamic loading.

  - An approach used by the GNU Objective-C runtime library is to not
      unique the selectors, but an identifying number inside it: the compiler
      uniques the selectors per module, i.e. object file. Every object file
      comes with a (compiler-generated) constructor (thanks C++ for getting
      this time consumer into all linkers); this constructor registers the
      module with the runtime library. This runtime assigns each unique
      selector a unique number, and selectors are now equal if

sel_a->unique_id == sel_b->unique_id

      When dynamic loading, the constructors of the new modules are invoked
      and the modules/selectors are added to the runtime information just
      like with the main-program selector information.

      This system is only two memory references slower than address
      equivalence testing. An advantage (in the context of selectors) is
      that all selectors have a closed naming 0, 1, ...

  - If the language will end up having a considerable standard library, you
      will want to make that a shared library or shared object. Unless a
      machine's shared library implementation is braindead, you'll have to
      take new versions of these shared libraries into account. This problem
      can be similar to that of dynamic loading, or simpler.

  - You do not want to have to do too much in the linking phase.

      Even the tiniest program suffers from long linking times if many (or)
      large libraries have to be scrutinized. This increases, undesirably,
      the edit-compile-link-run-crash cycle time

  - You do not want to have to do too much at run time.

      For (very) small programs (`ls' is my favourite example when I think of
      small programs: it must just list files; it must not take any time) the
      startup time of a (GNU Objective-C) runtime library is significant, and
      startup times in the range of centiseconds (or beyond) are unacceptable
      for small programs.

With TOM, an OO language I am developing, a program called the
resolver is run before the linker. It creates (and uniques, where
applicable) the selectors, argument and return types descriptions,
string constants, class descriptions, class hierarchy and method
dispatch structures, etc. The input to the resolver consists of the
`information' files generated by the compiler for each object file.
When run for static resolution, the resolver does everything necessary
for the program to run, i.e. all descriptions and structures needed by
the runtime library are generated by the resolver. When run for
dynamic resolution (this includes dynamic loading) the resolver only
generates the descriptions; the actual structures are built (partly
lazily) by the runtime library, at run time.

When resolving statically, the cpu time used by the resolver (and the
C compiler which must compile the resolver's output of several hundred
kB (or more)) is considerable, which is undesirable during program
development. On the other hand, the overhead of the runtime library
is negligible, making this setup suitable for small programs in
production environment.

When resolving dynamically, the run time of the resolver is decreased
considerably (down to a few seconds; depending on the size of the
program and the libraries used). Since the overhead at run time is
only in the order of a few centiseconds, dynamic resolution is very
useful during program development. For large programs (i.e. programs
running more than a second), this run time overhead is of course

When TOM will start using shared libraries, dynamic resolution will
probably become mandatory, making it unsuitable for writing an ls
replacement. But then again, it isn't meant to be used for that (it
is meant as an Objective-C replacement really).

More on TOM at --Tiggr

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.