Re: optimizing compilers for low power design

glen herrmannsfeldt <gah@ugcs.caltech.edu>
Fri, 20 Jun 2014 18:13:52 +0000 (UTC)

          From comp.compilers

Related articles
[2 earlier articles]
Re: optimizing compilers for low power design ivan@ootbcomp.com (Ivan Godard) (2014-06-15)
Re: optimizing compilers for low power design Pidgeot18@verizon.com.invalid (2014-06-15)
Re: optimizing compilers for low power design derek@_NOSPAM_knosof.co.uk (Derek M. Jones) (2014-06-16)
Re: optimizing compilers for low power design walter@bytecraft.com (Walter Banks) (2014-06-16)
Re: optimizing compilers for low power design gneuner2@comcast.net (George Neuner) (2014-06-18)
Re: optimizing compilers for low power design andrewchamberss@gmail.com (2014-06-20)
Re: optimizing compilers for low power design gah@ugcs.caltech.edu (glen herrmannsfeldt) (2014-06-20)
Re: optimizing compilers for low power design ivan@ootbcomp.com (Ivan Godard) (2014-06-20)
Re: optimizing compilers for low power design genew@telus.net (Gene Wirchenko) (2014-06-20)
Re: optimizing compilers for low power design gneuner2@comcast.net (George Neuner) (2014-06-20)
| List of all articles for this month |

From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Newsgroups: comp.compilers
Date: Fri, 20 Jun 2014 18:13:52 +0000 (UTC)
Organization: Aioe.org NNTP Server
References: 14-06-003 14-06-004 14-06-008 14-06-011
Keywords: code, architecture
Posted-Date: 20 Jun 2014 21:16:45 EDT

George Neuner <gneuner2@comcast.net> wrote:
> On Mon, 16 Jun 2014 12:11:32 +0100, "Derek M. Jones"


(snip)
>>A surprising percentage of power is consumed when a signal changes from
>>0 to 1 or from 1 to 0. So the idea is to arrange instruction order to
>>minimise the number of transitions at each bit position in an
>>instruction sequence.


>>[CMOS only uses power when it switches, so I'd think approximately all
>>of the power would be consumed when a signal changes. The idea of
>>Gray coded instruction streams is weirdly appealing. -John]


And off-chip lines require much more current than on-chip, so you
really only need to consider the external address and data bus.


If you really want to do this, you want a Gray coded program counter,
and instructions appropriately placed in memory.


> I'm mostly a software person, so this may be way off base.
> However ...


> For Gray coding instructions to be effective, most instructions would
> have to offer multiple choices for their encoding. Certainly for the
> most commonly used instructions and probably there would need to be at
> least several choices of encoding.


If I remember right, in the early 8080 days, people figured out some
bit patters that were decodable as instructions, but were not listed
officially. That is, "don't care" cases for the instruction decode
logic. Some of those might be appropriate bit patters.


> ISTM then that the decoder becomes n-ways wider at each decision step,
> and somewhat deeper [though maybe not twice] due to requiring
> additional combining/filtering steps. So fetch to execute latency
> would suffer and materially affect [already problematic] branching
> performance.


That is what I thought at first, but the current for off-chip drivers
is so much higher, that you might as well only count them.


> Moreover, I would think that to make this work best you'd also need to
> use fixed sized instructions externally so that there is no additional
> pre-decode penalties for locating instruction boundaries, aligning
> bits, needing extra fetches for instructions that span fetch lines,
> etc. You want to be able to just shift and drop each instruction
> directly into the decoder. But fixed size instructions would put
> additional fetch pressures on the memory system.


That, or you also have to count the other bytes of each instruction.
If you had PDP-11 like instructions, with a 16 bit opcode, and then
optional immediate data words, you would include those data words.
Words addressing memory locations could be optimized in the same
way. It is a little harder for data.


> On the software side, compilation becomes combinatorially more
> expensive. You might save some CPU power at execution, but ISTM that
> any savings from the CPU would be minimal because memory and cache
> power are unaffected: I don't see any reasonable possibility to
> minimize bit transitions from fetch line to fetch line, or farther
> across cache address strides (which I think would help only with a
> direct mapped cache anyway).


You either don't have off-chip cache, or consider the cache as the
primary source of bits. As above, only off-chip bits need to be
considered.


> comp.compilers probably is not the right forum, but perhaps someone
> who has CPU design experience can *briefly* speak to the hardware
> aspect of this idea. Or we can take it to comp.arch.


> [The Gray coding was mostly a joke. You're right, the circumstances
> in which it would be useful are unrealistically limited. -John]


It would only make sense for a processor that was going to be
executing the same code for a very long time, such as one in space.


Only last week, in a discussion on another newsgroup, I was
remembering the ICT, Integer Cosine Transform.


The ICT was invented for the Galileo spacecraft to minimize the
time needed to compress image data. The CDP1802 (or radiation
hard version of it) has no multiply instruction, so they wrote
a transform where the coefficients are selected to minimize the
bits that are '1'. That speeds up the shift and add multiplication.


Speeding up the encoding can be done at the expense of slowing down
decoding, which is done on earth.


http://tmo.jpl.nasa.gov/progress_report/42-115/115j.pdf


-- glen


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.