JVM verifier vs. naive code generator

Soeren Sandmann <sandmann@cs.au.dk>
Sun, 20 Mar 2011 02:57:00 +0100

          From comp.compilers

Related articles
JVM verifier vs. naive code generator sandmann@cs.au.dk (Soeren Sandmann) (2011-03-20)
Re: JVM verifier vs. naive code generator gah@ugcs.caltech.edu (glen herrmannsfeldt) (2011-03-21)
Re: JVM verifier vs. naive code generator Pidgeot18@verizon.invalid (Joshua Cranmer) (2011-03-21)
| List of all articles for this month |

From: Soeren Sandmann <sandmann@cs.au.dk>
Newsgroups: comp.compilers
Date: Sun, 20 Mar 2011 02:57:00 +0100
Organization: SunSITE.dk - Supporting Open source
Keywords: Java, code, analysis, question
Posted-Date: 20 Mar 2011 18:50:46 EDT

Consider this fragment of Java code:


                  int i, j;


                  i = 100;
                  for (i >= 0 && (j = 10) == 10)
                          i = j;


This is correct; j is not used without being initialized, and the Java
compiler accepts it without complaints.


However, a (very) naive code generator might generate the following byte
code, which is correct correct in the sense that it is impossible for j
(local variable 1 in the assembly) to be used without initialization.
However, the JVM verifier rejects it, and rightly so according to the
data flow algorithm described in chapter 4 of the JVM spec. I have
annotated the assembly with the state of the verifier. [i] stands for
"integer"; '-' stands for undefined.


.class public examples/Foo
.super java/lang/Object


.method public static main([Ljava/lang/String;)V


.limit locals 2
.limit stack 2
                                                                ; i j
                                                                ; stack 0 1
                                                                ; ---------------------------------
        bipush 100 ; [i] - -
        istore 0 ; [i] -
        iload 0 ; [i] [i] -
        ifgt greaterthan ; [i] -
        bipush 0 ; [i] [i] -
        goto done1 ; [i] [i] -


greaterthan:
        bipush 1 ; [i] [i] -


done1: ; [i] [i] -
        dup ; [i][i] [i] -
        ifeq done_and ; [i] [i] -
        pop ; [i] -


        bipush 10 ; [i] [i] -
        dup ; [i] [i] [i] -
        istore 1 ; [i] [i] [i]


        bipush 10 ; [i] [i] [i] [i]
        if_icmpeq equal ; [i] [i]
        bipush 0 ; [i]
        goto done2 ; [i] [i] [i]


equal: ; [i] [i]
        bipush 1 ; [i] [i] [i]
done2: ; [i] [i] [i]


done_and: ; [i] [i] -
        ifne if_part ; [i] -
        goto done ; [i] -


if_part: ; [i] [i] -
        iload 1 ; [1] <= Error (var 1 may be uninitialized)
        istore 0 ;


done:


        return
.end method




This code can obviously be improved, and some fairly trivial
optimizations will slim down the control flow graph enough to make it
understandable to the verifier. However, it bugs me that I have to rely
on optimizations in order to get working code, because, among other
things, there is no way to be sure that it's sufficient in all cases.


Is there a standard way to get around this problem? Ie., a systematic
way to ensure that the code generated will not only be correct, but also
verifiable?




Thanks,
Soren



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.