Re: ASTs and Type Check Analysis

Hans-Peter Diettrich <DrDiettrich1@aol.com>
Tue, 26 Aug 2008 00:24:12 +0200

          From comp.compilers

Related articles
ASTs and Type Check Analysis crancran@gmail.com (CranCran77) (2008-08-20)
Re: ASTs and Type Check Analysis DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-08-24)
Re: ASTs and Type Check Analysis ArarghMail808@Arargh.com (2008-08-24)
Re: ASTs and Type Check Analysis crancran@gmail.com (CranCran77) (2008-08-25)
Re: ASTs and Type Check Analysis bc@freeuk.com (Bartc) (2008-08-25)
Re: ASTs and Type Check Analysis DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-08-26)
| List of all articles for this month |

From: Hans-Peter Diettrich <DrDiettrich1@aol.com>
Newsgroups: comp.compilers
Date: Tue, 26 Aug 2008 00:24:12 +0200
Organization: Compilers Central
References: 08-08-044 08-08-054 08-08-064
Keywords: types, analysis
Posted-Date: 26 Aug 2008 23:25:05 EDT

CranCran77 schrieb:


> The goal presently is to understand how I should handle type
> assignments to nodes within the AST tree.
>
> As you see in my example, I had several nodes that were
> CASTIntegerLiteral, but I question if that is the right approach or
> not. In my opinion, the parser which is creating the AST tree should
> not be responsible for determining the number type of Integer, Real,
> or Long. A simple CASTNumberLiteral node seems more appropriate to me
> and allow the type analysis check phase when traversing the tree
> determine whether its Integer, Real, or Long right?


I agree in so far, as the compiler can know better about the most
appropriate representation of a numerical constant, in preparation of
the compiled code for expressions. Even with known types (of variables),
type conversions may be required in the evaluation of any expression.
Then the "cost" of the conversion of an literal is zero, in contrast to
the conversion of some predefined value. Likewise the cost of operations
with all constant arguments is zero, because it can be done at compile
time. Here a reordering of a whole (sub-)expression may result in
smaller and faster code. Also have a look at Common Subexpression
Elimination (CSE).


But I'd not simply neglect the exact type of a literal, because
sometimes a real constant of e.g. 2.0 can be a hint to the compiler, to
use real arithmetic, where integer arithmetic could result in an
unwanted overflow in some intermediate result. The same for explicit
type casts, i.e. Long(2) should really result in Long arithmetic, even
if Byte arithmetic would fit together with the other operands.




> If my thought process is correct, then when my type check visitor
> traverses the nodes of the tree, certain AST nodes need to be
> decorated with type information, such as Integer, Real, or Long. When
> examining expressions such as binary ones, both sides to the operation
> need to be examined to make sure they both conform to the same type,
> and if not handle that appropriately.


Remember that every operator node will have an assigned result type, as
determined from the involved operands, and that required dynamic type
conversions may have to be inserted into the AST. I had to implement
similar tree transformations in my C decompiler, in order to throw out
type consersions and temporary variables, before emitting the
"optimized" source code.




> For example, "PRINT 2.5*3" would throw an exception with type mismatch
> because 3 would be evaluated to integer where-as 2.5 would be viewed
> as a real. A different method could be that it would turn the
> statement into "PRINT 2.5*3.0" so that both sides are viewed as real
> values for the computation to happen correctly.


I'd not insist in too many type casts in source code, when the intention
is clear. Only when e.g. the result of a division can differ, between an
integer and a real DIV, or when an overflow shall be prevented by
extending an value to a certain byte count, the according type casts
should be required and respected in the source code.






> I suppose what I see is that I could use a class object called
> CASTType with derived classes for Integer, Long, Real, and String.
> Each derived class would contain information about the size of the
> type for storage purposes and/or reference pointer for strings in the
> constant pool.


I'd use as few node classes as possible, with properties indicating e.g.
the kind of an unary or binary operator node, and the result type. How
else would you transform an node "operator+" into "IntADD" or "RealADD"
later? IMO all nodes with computational values should have a common base
class, with possible subclasses for literals, variables, operators and
functions.


DoDi


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.