Cyclomatic Complexity using GCC

Soren Lundsgaard <lundsgas@ssd.comm.mot.com>
27 Dec 1998 01:06:44 -0500

          From comp.compilers

Related articles
Cyclomatic Complexity using GCC lundsgas@ssd.comm.mot.com (Soren Lundsgaard) (1998-12-27)
| List of all articles for this month |

From: Soren Lundsgaard <lundsgas@ssd.comm.mot.com>
Newsgroups: comp.compilers
Date: 27 Dec 1998 01:06:44 -0500
Organization: Compilers Central
Keywords: optimize, GCC

To: comp.compilers moderated newsgroup and egcs mailing list.


The following is a letter I sent to Jim Willson at Cygnus about some
Cyclomatic Complexity work I did here at Motorola.


I have included some notes from earlier correspondence with Jim Wilson
to shed more light on this information. These are preceeded with '> '.


Could someone in the egcs mailing list work to implement this
computation in gcov? Are there other metrics that could be gleaned
from GCC using similar mechanisms?
-------------
Dear Jim Wilson,


I did a bit of work on my employer's time to investigate using the gcc
version 2.8.1 .bb and .bbg basic branch graph files to possibly
compute cyclomatic complexity of modules. I have very positive
results to report, computations of cyclomatic complexity are not only
possible, but quite easy to do using these files, with some changes to
gcov.c


I have sent out a query on our company wide newsgroup about turning over
the changes to the egcs project. I received no positive responses except
one person who mentioned that there were probably groups in Motorola who
contributed the port of gcc to one of our processors.


Due to the perceived difficulty of getting our legal department to sign
off on releasing the code changes to gcov, I would like to describe my
research and the resulting algorithm, which I do not believe could be
construed to be proprietary information, as it is derived from information
emanating from outside the corporation, together with obvious conclusions
based on that information.


As you recall, I queried you about the flags in the .bbg file, and you
described to me their different meaning. I had developed software (C
programs and an (n)awk script) to emit the graph based on the .bb and .bbg
files in a format suitable for processing by a program which is adept at
displaying directed graphs (dot, part of graphviz from AT&T). When I
examined the resulting graph without edges that included the "fake" flag,
it became obvious that this graph represented the "classic" graph of flow
control through a function.


> gcov creates a flow graph containing all of the arcs. Some of these
> arcs represent branches, some of them represent places where code
> joins at a label. If we have a basic block that does not end with a
> branch, then it will have an arc marked fall_through that connects
> it with the following basic block. So an arc marked fall_through
> means that there is no branch instruction associated with the arc.
> These arcs are very easy to instrument. These arcs do not have
> branch probabilities associated with them.


> Function calls to setjmp may return more times than called. In
> order to account for this, I add a fake arc from the function
> entrance to the basic block immediately after the call. Any extra
> returns get counted as transitions on this fake arc.


I modified gcov.c to walk through the graph data and count the number
of edges that were not "fake" and the number of vertices which were
connected to at least one non-"fake" edge, either coming into or
flowing out of the vertex. I then also counted the number of vertices
which were "terminal", meaning that all of the non-"fake" edges that
connected to the vertex either all flowed into the the vertex or
flowed out of the vertex.


The classic computation of cyclomatic complexity assumes that the
graph is strongly connected, and the computation then becomes:


#edges - #vertices + 1


The normal case is that there is one vertex (a) which has no
predecessor vertices, and one vertex (z) which has no successor
vertices, so the computation (taking into account the "virtual" edge
a->z not included in the number #edges below which makes the graph
strongly connected) becomes:


#edges - #vertices + 2


In the case that there are two vertices (a, b) which have no
predecessor vertices and one vertex (z) with no successor edges, the
cyclomatic complexity increases by one, for the one extra vertex (v)
and three additional edges (z->v, v->a, v->b) which need to be added
to make the graph strongly connected:


#edges - #vertices + 3


As you can see, the cyclomatic complexity computation
then becomes:


#edges - #vertices + #terminal_vertices


Gcc emits graphs which, after deletion of fake edges and unconnected
vertices, have no connection from their starting point to their end
points. Graphs can have multiple vertices with no successor edges,
typically due to a call to exit(). In addition, I saw one example in
a function which was compiled for a 68k machine where a vertex had no
predecessor edges.


I would like to suggest that a modification be made to gcov.c to make
this computation of cyclomatic complexity available. The structure
bb_info_list can keep track of function names via some simple addtions
in gcov, and the computation can be completed if the appropriate flag
is added to the command line. Unfortunately, at this time, because
of my contract with my employer, I am not able to release my changes
to gcc/gcov.c


Thank you for your help in this project.


Finally, I would like to post this letter with quotations from your
e-mail to me to the comp.compilers newsgroup. Please let me know if
it would be permissible to include these quotations.


> Yes. That is OK.


> I'd also suggest sending it to egcs@cygnus.com. That is where most
> egcs discussion takes place, and is the best place for finding a
> volunteer to finish the work you started.


skl.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.