> Computers > Programming
Various Topics Home | Disclaimer | Report Adult Posts

Various Topics on Programming



Programming - "Application Profiler" in Computers


Old 06-15-2007   #1
..tri..
 
Default Application Profiler

Hi

I wonder if there is a tool out there that I could use to spot
bottlenecks in a C-program except gprof?
***ume that I have a media type of application and I wanna build an
Application Specific
Processor for that application. So I wanna find out how to speed up
these programs
by SIMD, VLIW or operation fusion techniques. Can anyone suggest me
some useful
tools for this task?

Cheers,
Patrick

 
Old 06-15-2007   #2
..er9230..
 
Default Re: Application Profiler

On Jun 14, 3:54 pm, Patrick <Patrick.Batema...@gmx.at> wrote:
> Hi
>
> I wonder if there is a tool out there that I could use to spot
> bottlenecks in a C-program except gprof?


Sure there are thousands of them:
http://www.google.com/search?hl=en&q=C+profiler

> ***ume that I have a media type of application and I wanna build an
> Application Specific
> Processor for that application. So I wanna find out how to speed up
> these programs
> by SIMD, VLIW or operation fusion techniques. Can anyone suggest me
> some useful
> tools for this task?


Improving the algorithm is a lot more important than cheesy little
tricks. After you have profiled to find the slow spots, improve the
algorithm or use a better one instead. That is the right way to get a
large performance gain.

 
Old 06-15-2007   #3
..m.. ..cha.. ..arksto..
 
Default Re: Application Profiler

user923005 wrote:
> Improving the algorithm is a lot more important than cheesy little
> tricks. After you have profiled to find the slow spots, improve the
> algorithm or use a better one instead. That is the right way to get a
> large performance gain.


Just changing the algorithm isn't the only solution. Changing data
structures sometimes helps more, e.g. from an unsorted array to a
tree-based set. If you use some datatype abstraction in the form of
accessor functions, then if they turn out to be major bottlenecks you
either they have slow algorithms (improving them may require changing
the data structure), or the data is being accessed too much. (E.g. The
best heap implementation in the world won't save you if you are
malloc()ing and free()ing in the middle of your tightest loop when you
could do so a few loops further out).

The other note is to *measure* your optimisations. Run them several
times in a loop and time it, if nothing else. If your program 10% goes
faster but the code has turned to spaghetti then it is probably not a
worthwhile optimisation.

Finally, do not insult your compiler and library writers. They
(usually) know far more about the task their code does than you do.
Example insult for almost any compiler you will use:
y = (x << 4) + (x << 2); /* faster than y = x * 10 */

--
Simon Richard Clarkstone:
s.r.cl?rkst?n?@durham.ac.uk/s?m?n.cl?rkst?n?@hotmail.com
"August 9 - I just made my signature file. Its only 6 pages long.
I will have to work on it some more." -- _Diary of an AOL User_
 
Old 06-15-2007   #4
..Falcon..
 
Default Re: Application Profiler

Simon Richard Clarkstone wrote:
>

.... snip ...
>
> Finally, do not insult your compiler and library writers. They
> (usually) know far more about the task their code does than you
> do. Example insult for almost any compiler you will use:
> y = (x << 4) + (x << 2); /* faster than y = x * 10 */


And, if intended to implement y = x * 10; wrong. :-)

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>
<http://www.aaxnet.com/editor/edit043.html>
cbfalconer at maineline dot net



--
Posted via a free Usenet account from http://www.teranews.com

 
Old 06-15-2007   #5
..bertwesse.. ..hoo.c..
 
Default Re: Application Profiler

On Jun 14, 5:54 pm, Patrick <Patrick.Batema...@gmx.at> wrote:
> Hi
>
> I wonder if there is a tool out there that I could use to spot
> bottlenecks in a C-program except gprof?
> ***ume that I have a media type of application and I wanna build an
> Application Specific
> Processor for that application. So I wanna find out how to speed up
> these programs
> by SIMD, VLIW or operation fusion techniques. Can anyone suggest me
> some useful
> tools for this task?



If you're looking for low-level insight into your codes performance,
Intel's V-Tune is hard to beat, although it doesn't much like AMD
processors (it works, but only in basic mode), and it's a bit pricy.

You might want to look at Intel's Integrated Performance Primitives
library, which has very fast implementations of many basic media
functions.

 
Old 06-15-2007   #6
..er9230..
 
Default Re: Application Profiler

On Jun 14, 6:12 pm, Simon Richard Clarkstone
<s.r.clarkst...@durham.ac.uk> wrote:
> user923005 wrote:
> > Improving the algorithm is a lot more important than cheesy little
> > tricks. After you have profiled to find the slow spots, improve the
> > algorithm or use a better one instead. That is the right way to get a
> > large performance gain.

>
> Just changing the algorithm isn't the only solution. Changing data
> structures sometimes helps more, e.g. from an unsorted array to a
> tree-based set. If you use some datatype abstraction in the form of
> accessor functions, then if they turn out to be major bottlenecks you
> either they have slow algorithms (improving them may require changing
> the data structure), or the data is being accessed too much. (E.g. The
> best heap implementation in the world won't save you if you are
> malloc()ing and free()ing in the middle of your tightest loop when you
> could do so a few loops further out).


I should have been more specific. I meant combination of new data
structures and /or new algorithms.
E.g. changing from btree to hash table.

> The other note is to *measure* your optimisations. Run them several
> times in a loop and time it, if nothing else. If your program 10% goes
> faster but the code has turned to spaghetti then it is probably not a
> worthwhile optimisation.
>
> Finally, do not insult your compiler and library writers. They
> (usually) know far more about the task their code does than you do.
> Example insult for almost any compiler you will use:
> y = (x << 4) + (x << 2); /* faster than y = x * 10 */


My compiler emits:

; 10 : return (x << 3) + (x << 1); /* faster than y = x * 10 */

mov eax, DWORD PTR _x$[esp-4]
lea eax, DWORD PTR [eax+eax*4]
add eax, eax

....

; 15 : return x*10; /* y = x * 10 */

mov eax, DWORD PTR _x$[esp-4]
lea eax, DWORD PTR [eax+eax*4]
add eax, eax

 
Old 06-15-2007   #7
..tri..
 
Default Re: Application Profiler

Thanks for the comments guys. But perhaps I should mention once
more that I dont wanna optimize the code on a algorithmic level.
I wanna speed up some established algorithms with dedicated hardware
support, e.g. SIMD or VLIW or combined operations. So the tool should
in some way help me where operations could be executed in parallel
or that it would make sense to add a custom instruction to the
hardware
architecture.

Cheers,
Patrick

 
Old 06-15-2007   #8
..m.. ..cha.. ..arksto..
 
Default Re: Application Profiler

user923005 wrote:

> On Jun 14, 6:12 pm, Simon Richard Clarkstone
> <s.r.clarkst...@durham.ac.uk> wrote:
>
>>Finally, do not insult your compiler and library writers. They
>>(usually) know far more about the task their code does than you do.


What I meant by this, in case it was unclear, is that the compiler will
usually be far better at micro-optimisations than you are, because the
micro-optimiser was written by experts on the processor. Silly tricks
will make your code less clear, and may actually make the compiler less
able to optimise code, by adding extra ***umptions or being unidiomatic.

[snip]
> My compiler emits:
>
> ; 10 : return (x << 3) + (x << 1); /* faster than y = x * 10 */
>
> mov eax, DWORD PTR _x$[esp-4]
> lea eax, DWORD PTR [eax+eax*4]
> add eax, eax
>
> ; 15 : return x*10; /* y = x * 10 */
>
> mov eax, DWORD PTR _x$[esp-4]
> lea eax, DWORD PTR [eax+eax*4]
> add eax, eax


Thank you for this excellent example of compiler intelligence; I will
file it away for future reference. AFAICT, they are roughly like this C
(***uming that sizeof(long) == 4):

register int t = x;
t = (int)(t + (long*)t); /* pointer arithmetic does: t = 4*t + t */
t = t + t;
return t;

The compiler knows best indeed.

--
Simon Richard Clarkstone:
s.r.cl?rkst?n?@durham.ac.uk/s?m?n.cl?rkst?n?@hotmail.com
"August 9 - I just made my signature file. Its only 6 pages long.
I will have to work on it some more." -- _Diary of an AOL User_
 
Old 06-15-2007   #9
..er9230..
 
Default Re: Application Profiler

On Jun 15, 9:12 am, Simon Richard Clarkstone
<s.r.clarkst...@durham.ac.uk> wrote:
> user923005 wrote:
> > On Jun 14, 6:12 pm, Simon Richard Clarkstone
> > <s.r.clarkst...@durham.ac.uk> wrote:

>
> >>Finally, do not insult your compiler and library writers. They
> >>(usually) know far more about the task their code does than you do.

>
> What I meant by this, in case it was unclear, is that the compiler will
> usually be far better at micro-optimisations than you are, because the
> micro-optimiser was written by experts on the processor. Silly tricks
> will make your code less clear, and may actually make the compiler less
> able to optimise code, by adding extra ***umptions or being unidiomatic.


Right, that's why I posted the example.

> [snip]
>
> > My compiler emits:

>
> > ; 10 : return (x << 3) + (x << 1); /* faster than y = x * 10 */

>
> > mov eax, DWORD PTR _x$[esp-4]
> > lea eax, DWORD PTR [eax+eax*4]
> > add eax, eax

>
> > ; 15 : return x*10; /* y = x * 10 */

>
> > mov eax, DWORD PTR _x$[esp-4]
> > lea eax, DWORD PTR [eax+eax*4]
> > add eax, eax

>
> Thank you for this excellent example of compiler intelligence; I will
> file it away for future reference. AFAICT, they are roughly like this C
> (***uming that sizeof(long) == 4):
>
> register int t = x;
> t = (int)(t + (long*)t); /* pointer arithmetic does: t = 4*t + t */
> t = t + t;
> return t;
>
> The compiler knows best indeed.


I agree. I went to an ***embly optimization site and they had a long
discussion about the best possible way to multiply by 10 across a
broad range of processors.
The final solution was exactly what was emitted by MS VC++.

> --
> Simon Richard Clarkstone:
> s.r.cl?rkst...@durham.ac.uk/s?m?n.cl?rkst...@hotmail.com
> "August 9 - I just made my signature file. Its only 6 pages long.
> I will have to work on it some more." -- _Diary of an AOL User_



 
Old 06-15-2007   #10
..er9230..
 
Default Re: Application Profiler

On Jun 15, 2:47 am, Patrick <Patrick.Batema...@gmx.at> wrote:
> Thanks for the comments guys. But perhaps I should mention once
> more that I dont wanna optimize the code on a algorithmic level.
> I wanna speed up some established algorithms with dedicated hardware
> support, e.g. SIMD or VLIW or combined operations. So the tool should
> in some way help me where operations could be executed in parallel
> or that it would make sense to add a custom instruction to the
> hardware
> architecture.


My compiler has a little dropdown that says:
"Enable Enhanced Instruction Set":
'Not Set'
'Streaming SIMD
Extensions (/arch:SSE)'
'Streaming SIMD
Extensions 2 (/arch:SSE2)'

The outcome of coosing one of those options besides 'Not Set' is a
program that will run 5% faster on chips that support the extension,
and that fail completely on those that don't.
I think you are barking up the wrong tree if you want to improve your
program.

Now, that is not to say that there is no benefit from inline ***embly
using SIMD or VLIW instructions. Just that these cases are very rare
and will be uncovered by profiling.
Since you are not interested in improving the algorithm, you have no
hope at all for big gains. Any sort of gain you get will be very
modest (by a small, linear factor) and come at a price of portabilty.

IMO-YMMV.

 

Thread Tools
Display Modes





Powered by vBulletin®
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.0