FPGARelated.com
Forums

PPC405 Performance Monitoring

Started by Anthony Mahar April 13, 2005
Hello,

Is there a way to do performance monitoring on the PPC405 in the Virtex 
II Pro?  I am specifically interested in cache hits.

I have wedged my own device between the CPU's instruction and data PLB 
interfaces and can currently get cache misses.  But I need to find a way 
to determine cache hits of an application running under an operating 
system.

If it was stand alone I could figure that information out by the number 
of load and store instructions, but this is an operating system with 
context switches, interrupt handlers, etc.

Is there a way to gather this information?  There did not seem to be any 
performance monitoring registers as seen with newer PowerPC and x86 
systems.  Can the trace port be used to passively monitor execution for 
load/store instructions?

Thank you,
Tony
Anthony Mahar wrote:
> Hello, > > Is there a way to do performance monitoring on the PPC405 in the
Virtex
> II Pro? I am specifically interested in cache hits. > > I have wedged my own device between the CPU's instruction and data
PLB
> interfaces and can currently get cache misses. But I need to find a
way
> to determine cache hits of an application running under an operating > system. > > If it was stand alone I could figure that information out by the
number
> of load and store instructions, but this is an operating system with > context switches, interrupt handlers, etc. > > Is there a way to gather this information? There did not seem to be
any
> performance monitoring registers as seen with newer PowerPC and x86 > systems. Can the trace port be used to passively monitor execution
for
> load/store instructions?
Unfortunately, I have few answers to your questions. However, I know of a research group in Georgia Tech that is designing/designed a memory access monitor, which sounds similar to yours. You may want to correspond with them to exchange notes. I learned of their monitor at the HPCA 2005 FPGA workshop. Here is a link to the workshop http//cag.csail.mit.edu/warfp2005/. A link to the workshop presentations is here at http//cag.csail.mit.edu/warfp2005/program.html. Their presentation was titled "Evaluating System wide Monitoring Capsule Design using Xilinx Virtex II Pro FPGA". Their paper has their contact information. As for the trace port, I have used it with a IBM/Agilent RISCWatch (RW) box, which collects a dynamic trace of the instructions over 8 million CPU cycles. The main limitation is that it only works for stand alone apps. When you have virtual memory enabled (while running Linux for instance), RW uses the TLB to conduct the virtual to physical address translations. This is great for regular code. However, when an interrupt is detected, the CPU converts to using physical addresses for the interrupt handler. Unfortunately, RW continues to use the TLB so it tries to translate physical addresses, for which no "translations" exists, so RW is unable to resolve interrupt handler instructions. After this point, the trace is corrupted. In any case, if you are interested in learning more about RW, you can refer to this appnote http//direct.xilinx.com/bvdocs/appnotes/xapp545.pdf. It has links to all manuals for the RW box and its tools. Lastly, for my own curiosity, how difficult was it to design and debug your monitor? The guy I spoke to from Georgia Tech at the workshop said they used Chipscope to learn the protocol (along with IBM's PLB spec). He claims that this was a painstaking process. NN
I looked into doing this a while back.

From the sounds of it, you have already create a data side cache miss
collection engine, now you need the number of total loads and stores. 
As you surmised, this info can be collected by the debug interface (note
the debug interface is different than the trace interface:
http://www.xilinx.com/ise/embedded/ppc405block_ref_guide.pdf) counted in
a similar fashion as as you currently do for the misses.  Except here
you need to identify the ld/st from the other instructions but the
decode is pretty straighforward.  

For CPI and instruction cache miss rate measurements, the same general
technique can be used.  

You should check out Nju's xapp545 appnote for another method of
collecting the trace data.  You can learn a lot about what the code is
actually doing by looking at 8Million-cycle dumps of instruction
execution.  

The issue of OS context switches and interrupts is really orthogonal. 
You don't mention your OS but Oprofile
(http://oprofile.sourceforge.net/) for Linux handles this by adding code
to every context switch-causing event to collect the values of the
counters--in this case the ones you've insterted between the PPC405 and
PLB bus--and assign them to the currently running code.  

A similar approach is valid for other OSs but leveraging Oprofile is a
good starting point since they've already figured out the relevant hooks
into the kernel.  

Paul 

Anthony Mahar wrote:
> > Hello, > > Is there a way to do performance monitoring on the PPC405 in the Virtex > II Pro? I am specifically interested in cache hits. > > I have wedged my own device between the CPU's instruction and data PLB > interfaces and can currently get cache misses. But I need to find a way > to determine cache hits of an application running under an operating > system. > > If it was stand alone I could figure that information out by the number > of load and store instructions, but this is an operating system with > context switches, interrupt handlers, etc. > > Is there a way to gather this information? There did not seem to be any > performance monitoring registers as seen with newer PowerPC and x86 > systems. Can the trace port be used to passively monitor execution for > load/store instructions? > > Thank you, > Tony
Nju Njoroge wrote:
> Anthony Mahar wrote: > >>Hello, >> >>Is there a way to do performance monitoring on the PPC405 in the > > Virtex > >>II Pro? I am specifically interested in cache hits. >> >>I have wedged my own device between the CPU's instruction and data > > PLB > >>interfaces and can currently get cache misses. But I need to find a > > way > >>to determine cache hits of an application running under an operating >>system. >> >>If it was stand alone I could figure that information out by the > > number > >>of load and store instructions, but this is an operating system with >>context switches, interrupt handlers, etc. >> >>Is there a way to gather this information? There did not seem to be > > any > >>performance monitoring registers as seen with newer PowerPC and x86 >>systems. Can the trace port be used to passively monitor execution > > for > >>load/store instructions? > > > Unfortunately, I have few answers to your questions. However, I know of > a research group in Georgia Tech that is designing/designed a memory > access monitor, which sounds similar to yours. You may want to > correspond with them to exchange notes. I learned of their monitor at > the HPCA 2005 FPGA workshop. Here is a link to the workshop > http//cag.csail.mit.edu/warfp2005/. A link to the workshop > presentations is here at > http//cag.csail.mit.edu/warfp2005/program.html. Their presentation was > titled "Evaluating System wide Monitoring Capsule Design using Xilinx > Virtex II Pro FPGA". Their paper has their contact information. > > As for the trace port, I have used it with a IBM/Agilent RISCWatch (RW) > box, which collects a dynamic trace of the instructions over 8 million > CPU cycles. The main limitation is that it only works for stand alone > apps. When you have virtual memory enabled (while running Linux for > instance), RW uses the TLB to conduct the virtual to physical address > translations. This is great for regular code. However, when an > interrupt is detected, the CPU converts to using physical addresses for > the interrupt handler. Unfortunately, RW continues to use the TLB so it > tries to translate physical addresses, for which no "translations" > exists, so RW is unable to resolve interrupt handler instructions. > After this point, the trace is corrupted. In any case, if you are > interested in learning more about RW, you can refer to this appnote > http//direct.xilinx.com/bvdocs/appnotes/xapp545.pdf. It has links to > all manuals for the RW box and its tools. > > Lastly, for my own curiosity, how difficult was it to design and debug > your monitor? The guy I spoke to from Georgia Tech at the workshop said > they used Chipscope to learn the protocol (along with IBM's PLB spec). > He claims that this was a painstaking process. > > NN >
Thank you Nju, I am going to dig into those docs right now. My design was not intended to be a monitor, but an active bus transaction modifier. On certain transactions, I have to perform certain operations on the data going to the PPC405. This means I selectively pass data through, or perform some higher latency operations. Since I am currently interested in cache-miss performance, I only count the number of transaction requests from L1 cache. Because it is an individual word that caused the instruction miss, all other words retrieved in the transaction are, of course, not considered as a miss. This makes it extremely easy to monitor the number of transaction requests. While the module is an active component between the CPU and PLB, it is very easy to add a passive monitor once you have a way to have the EDK inject the monitor in the middle. For myself, It required some time to understand the EDK .mpd format and effectively create a PLB-PLB bridge (no logic, pure pass through), and there may be better ways with the "transparent" bus format that I haven't had time to look into. But at the time it was also my first EDK peripheral. As for 'learning' the PLB system, I found the IBM CoreConnect Bus Functional Model (BFM) for the PLB, with the PLB doc, to be instrumental in observing every kind of transaction I had to handle. I think the BFM would be far easier than using ChipScope/Docs alone. The BFM allows the generation of almost any kind of cycle-accurate PLB transaction a master and slave can use. One other model I would like to begin using is the Xilinx provided PPC405 swift model, which will allow the same code used by the real processor to run on the simulation swift model simulation. This will cause PLB transactions to occur in the same way they will on the real system, i.e. cache line fills based on the PPC405 MMU's state, etc. Regards, Tony
Nju Njoroge wrote:

Interesting question for the "Monitoring Capsule Design" paper... they 
state they monitor behavior "between the CPU and L1 Dcache."  Did they 
explain how they were able to do this, since the PPC405 and L1 are part 
of the same hard core?

There would be interesting (positive) implications for my research if I 
could also inject myself between CPU and L1, instead of only between L1 
and some instantiated L2 cache or memory bus.

Thank you,
Anthony


> Anthony Mahar wrote: > >>Hello, >> >>Is there a way to do performance monitoring on the PPC405 in the > > Virtex > >>II Pro? I am specifically interested in cache hits. >> >>I have wedged my own device between the CPU's instruction and data > > PLB > >>interfaces and can currently get cache misses. But I need to find a > > way > >>to determine cache hits of an application running under an operating >>system. >> >>If it was stand alone I could figure that information out by the > > number > >>of load and store instructions, but this is an operating system with >>context switches, interrupt handlers, etc. >> >>Is there a way to gather this information? There did not seem to be > > any > >>performance monitoring registers as seen with newer PowerPC and x86 >>systems. Can the trace port be used to passively monitor execution > > for > >>load/store instructions? > > > Unfortunately, I have few answers to your questions. However, I know of > a research group in Georgia Tech that is designing/designed a memory > access monitor, which sounds similar to yours. You may want to > correspond with them to exchange notes. I learned of their monitor at > the HPCA 2005 FPGA workshop. Here is a link to the workshop > http//cag.csail.mit.edu/warfp2005/. A link to the workshop > presentations is here at > http//cag.csail.mit.edu/warfp2005/program.html. Their presentation was > titled "Evaluating System wide Monitoring Capsule Design using Xilinx > Virtex II Pro FPGA". Their paper has their contact information. > > As for the trace port, I have used it with a IBM/Agilent RISCWatch (RW) > box, which collects a dynamic trace of the instructions over 8 million > CPU cycles. The main limitation is that it only works for stand alone > apps. When you have virtual memory enabled (while running Linux for > instance), RW uses the TLB to conduct the virtual to physical address > translations. This is great for regular code. However, when an > interrupt is detected, the CPU converts to using physical addresses for > the interrupt handler. Unfortunately, RW continues to use the TLB so it > tries to translate physical addresses, for which no "translations" > exists, so RW is unable to resolve interrupt handler instructions. > After this point, the trace is corrupted. In any case, if you are > interested in learning more about RW, you can refer to this appnote > http//direct.xilinx.com/bvdocs/appnotes/xapp545.pdf. It has links to > all manuals for the RW box and its tools. > > Lastly, for my own curiosity, how difficult was it to design and debug > your monitor? The guy I spoke to from Georgia Tech at the workshop said > they used Chipscope to learn the protocol (along with IBM's PLB spec). > He claims that this was a painstaking process. > > NN >
Anthony Mahar wrote:
> Nju Njoroge wrote: > > Interesting question for the "Monitoring Capsule Design" paper...
they
> state they monitor behavior "between the CPU and L1 Dcache." Did
they
> explain how they were able to do this, since the PPC405 and L1 are
part
> of the same hard core? >
You are right--the CPU and the L1 cache are in the same hard core, so we don't have access to the interface inside the CPU core and the cache. As I described in my previous post, they placed their monitor at the interface of the L1 cache port that are usually connected to the PLB. Thus, instead of connecting their CPU to the PLB bus, they connected the PPC core to their monitor, which is then connected to the PLB. NN
Anthony Mahar wrote:
> Nju Njoroge wrote: > > Anthony Mahar wrote: > > > >>Hello, > >> > >>Is there a way to do performance monitoring on the PPC405 in the > > > > Virtex > > > >>II Pro? I am specifically interested in cache hits. > >> > >>I have wedged my own device between the CPU's instruction and data > > > > PLB > > > >>interfaces and can currently get cache misses. But I need to find
a
> > > > way > > > >>to determine cache hits of an application running under an
operating
> >>system. > >> > >>If it was stand alone I could figure that information out by the > > > > number > > > >>of load and store instructions, but this is an operating system
with
> >>context switches, interrupt handlers, etc. > >> > >>Is there a way to gather this information? There did not seem to
be
> > > > any > > > >>performance monitoring registers as seen with newer PowerPC and x86 > >>systems. Can the trace port be used to passively monitor execution > > > > for > > > >>load/store instructions? > > > > > > Unfortunately, I have few answers to your questions. However, I
know of
> > a research group in Georgia Tech that is designing/designed a
memory
> > access monitor, which sounds similar to yours. You may want to > > correspond with them to exchange notes. I learned of their monitor
at
> > the HPCA 2005 FPGA workshop. Here is a link to the workshop > > http//cag.csail.mit.edu/warfp2005/. A link to the workshop > > presentations is here at > > http//cag.csail.mit.edu/warfp2005/program.html. Their presentation
was
> > titled "Evaluating System wide Monitoring Capsule Design using
Xilinx
> > Virtex II Pro FPGA". Their paper has their contact information. > > > > As for the trace port, I have used it with a IBM/Agilent RISCWatch
(RW)
> > box, which collects a dynamic trace of the instructions over 8
million
> > CPU cycles. The main limitation is that it only works for stand
alone
> > apps. When you have virtual memory enabled (while running Linux for > > instance), RW uses the TLB to conduct the virtual to physical
address
> > translations. This is great for regular code. However, when an > > interrupt is detected, the CPU converts to using physical addresses
for
> > the interrupt handler. Unfortunately, RW continues to use the TLB
so it
> > tries to translate physical addresses, for which no "translations" > > exists, so RW is unable to resolve interrupt handler instructions. > > After this point, the trace is corrupted. In any case, if you are > > interested in learning more about RW, you can refer to this appnote > > http//direct.xilinx.com/bvdocs/appnotes/xapp545.pdf. It has links
to
> > all manuals for the RW box and its tools. > > > > Lastly, for my own curiosity, how difficult was it to design and
debug
> > your monitor? The guy I spoke to from Georgia Tech at the workshop
said
> > they used Chipscope to learn the protocol (along with IBM's PLB
spec).
> > He claims that this was a painstaking process. > > > > NN > > > > Thank you Nju, > > I am going to dig into those docs right now. > > My design was not intended to be a monitor, but an active bus > transaction modifier. On certain transactions, I have to perform > certain operations on the data going to the PPC405. This means I > selectively pass data through, or perform some higher latency
operations.
> > Since I am currently interested in cache-miss performance, I only
count
> the number of transaction requests from L1 cache. Because it is an > individual word that caused the instruction miss, all other words > retrieved in the transaction are, of course, not considered as a
miss.
> This makes it extremely easy to monitor the number of transaction > requests. > > While the module is an active component between the CPU and PLB, it
is
> very easy to add a passive monitor once you have a way to have the
EDK
> inject the monitor in the middle. For myself, It required some time
to
> understand the EDK .mpd format and effectively create a PLB-PLB
bridge
> (no logic, pure pass through), and there may be better ways with the > "transparent" bus format that I haven't had time to look into. But
at
> the time it was also my first EDK peripheral. >
If I understand correctly, you are saying that your transaction modifier acts as a PLB Bus to PLB Bus bridge. So, in the EDK project, you connected the CPU to a PLB bus, then connected your module to that PLB bus and then connected another PLB bus on the other side of your pcore? CPU <->PLB Bus -> your pcore <-> PLB BUS <-> Memory (Cache/BRAM) If my understanding is correct, you in essence designed a PLB-PLB bridge, like the PLB-OPB bridge, right? In our research, we also designed a PLB to PLB bridge. Our pcore was initially a pass-through in between the two buses, then we placed our real module when we got the pass-through running. The guys from Georgia Tech, however, interfaced their monitor module directly with PPC's PLB ports, so they couldn't use EDK's abstraction of the bus protocol through the PLB IPIF module. In fact, they had to synthesize their project in ISE since EDK wouldn't support what they were trying to do. That's why they had to use ChipScope to really see what the processor does.
> As for 'learning' the PLB system, I found the IBM CoreConnect Bus > Functional Model (BFM) for the PLB, with the PLB doc, to be
instrumental
> in observing every kind of transaction I had to handle. I think the
BFM
> would be far easier than using ChipScope/Docs alone. The BFM allows
the
> generation of almost any kind of cycle-accurate PLB transaction a
master
> and slave can use. > > One other model I would like to begin using is the Xilinx provided > PPC405 swift model, which will allow the same code used by the real > processor to run on the simulation swift model simulation. This will
> cause PLB transactions to occur in the same way they will on the real
> system, i.e. cache line fills based on the PPC405 MMU's state, etc. >
In designing our pass-through, we used the swift models. I definitely recommend learning how to use them. The swift models allow you to conduct full-system simulations. As for the BFM's, we weren't able to use them for our pcore since EDK 6.3i IPIF Create/Import wizard didn't support the use of Verilog modules (7.1 now supports this). We could have hacked this by using a netlist, but you cannot pass parameters/generics into a netlist, which is a feature that is required for our pcore. I have used the BFM's for a VHDL module I worked on in the past and I agree that they too were helpful. NN
Anthony Mahar wrote:
> Nju Njoroge wrote: > > Anthony Mahar wrote: > > > >>Hello, > >> > >>Is there a way to do performance monitoring on the PPC405 in the > > > > Virtex > > > >>II Pro? I am specifically interested in cache hits. > >> > >>I have wedged my own device between the CPU's instruction and data > > > > PLB > > > >>interfaces and can currently get cache misses. But I need to find
a
> > > > way > > > >>to determine cache hits of an application running under an
operating
> >>system. > >> > >>If it was stand alone I could figure that information out by the > > > > number > > > >>of load and store instructions, but this is an operating system
with
> >>context switches, interrupt handlers, etc. > >> > >>Is there a way to gather this information? There did not seem to
be
> > > > any > > > >>performance monitoring registers as seen with newer PowerPC and x86 > >>systems. Can the trace port be used to passively monitor execution > > > > for > > > >>load/store instructions? > > > > > > Unfortunately, I have few answers to your questions. However, I
know of
> > a research group in Georgia Tech that is designing/designed a
memory
> > access monitor, which sounds similar to yours. You may want to > > correspond with them to exchange notes. I learned of their monitor
at
> > the HPCA 2005 FPGA workshop. Here is a link to the workshop > > http//cag.csail.mit.edu/warfp2005/. A link to the workshop > > presentations is here at > > http//cag.csail.mit.edu/warfp2005/program.html. Their presentation
was
> > titled "Evaluating System wide Monitoring Capsule Design using
Xilinx
> > Virtex II Pro FPGA". Their paper has their contact information. > > > > As for the trace port, I have used it with a IBM/Agilent RISCWatch
(RW)
> > box, which collects a dynamic trace of the instructions over 8
million
> > CPU cycles. The main limitation is that it only works for stand
alone
> > apps. When you have virtual memory enabled (while running Linux for > > instance), RW uses the TLB to conduct the virtual to physical
address
> > translations. This is great for regular code. However, when an > > interrupt is detected, the CPU converts to using physical addresses
for
> > the interrupt handler. Unfortunately, RW continues to use the TLB
so it
> > tries to translate physical addresses, for which no "translations" > > exists, so RW is unable to resolve interrupt handler instructions. > > After this point, the trace is corrupted. In any case, if you are > > interested in learning more about RW, you can refer to this appnote > > http//direct.xilinx.com/bvdocs/appnotes/xapp545.pdf. It has links
to
> > all manuals for the RW box and its tools. > > > > Lastly, for my own curiosity, how difficult was it to design and
debug
> > your monitor? The guy I spoke to from Georgia Tech at the workshop
said
> > they used Chipscope to learn the protocol (along with IBM's PLB
spec).
> > He claims that this was a painstaking process. > > > > NN > > > > Thank you Nju, > > I am going to dig into those docs right now. > > My design was not intended to be a monitor, but an active bus > transaction modifier. On certain transactions, I have to perform > certain operations on the data going to the PPC405. This means I > selectively pass data through, or perform some higher latency
operations.
> > Since I am currently interested in cache-miss performance, I only
count
> the number of transaction requests from L1 cache. Because it is an > individual word that caused the instruction miss, all other words > retrieved in the transaction are, of course, not considered as a
miss.
> This makes it extremely easy to monitor the number of transaction > requests. > > While the module is an active component between the CPU and PLB, it
is
> very easy to add a passive monitor once you have a way to have the
EDK
> inject the monitor in the middle. For myself, It required some time
to
> understand the EDK .mpd format and effectively create a PLB-PLB
bridge
> (no logic, pure pass through), and there may be better ways with the > "transparent" bus format that I haven't had time to look into. But
at
> the time it was also my first EDK peripheral. >
If I understand correctly, you are saying that your transaction modifier acts as a PLB Bus to PLB Bus bridge. So, in your XPS project, you connected the CPU to a PLB bus, then connected your module to that PLB bus and then connected another PLB bus on the other side of your pcore? I assume you also used Create/Import IPIF Wizard, right. CPU <->PLB Bus -> your pcore <-> PLB BUS <-> Memory (Cache/BRAM) If my understanding is correct, you in essence designed a PLB-PLB bridge, as in the diagram above. In our research, we also designed a PLB to PLB bridge. Our pcore was initially a pass-through in between the two buses, then we placed our real RTL when we got the pass-through working. The guys from Georgia Tech, however, interfaced their monitor module directly with PPC's PLB ports, so they couldn't use EDK's abstraction of the bus protocol through the PLB IPIF module. In fact, they had to synthesize their project in ISE since EDK wouldn't support what they were trying to do. That's why they had to use ChipScope to really see what the processor does.
> As for 'learning' the PLB system, I found the IBM CoreConnect Bus > Functional Model (BFM) for the PLB, with the PLB doc, to be
instrumental
> in observing every kind of transaction I had to handle. I think the
BFM
> would be far easier than using ChipScope/Docs alone. The BFM allows
the
> generation of almost any kind of cycle-accurate PLB transaction a
master
> and slave can use. > > One other model I would like to begin using is the Xilinx provided > PPC405 swift model, which will allow the same code used by the real > processor to run on the simulation swift model simulation. This will
> cause PLB transactions to occur in the same way they will on the real
> system, i.e. cache line fills based on the PPC405 MMU's state, etc. >
In designing our pass-through, we used the swift models. I definitely recommend learning how to use them. The swift models allow you to conduct full-system simulations. As for the BFM's, we weren't able to use them for our pcore since EDK 6.3i IPIF Create/Import wizard didn't support the use of Verilog modules (7.1 supports this now). We could have hacked this by using a netlist, but you cannot pass parameters/generics into a netlist, which is a feature we require for our pcore. I have used the BFM's for a VHDL module I worked on in the past and I agree that they too were helpful. NN
As they state in their paper 
http://cag.csail.mit.edu/warfp2005/submissions/29-suh.pdf

"In our initial study, we deploy a monitoring capsule in Dcaches to mon-
itor the memory behavior between a CPU and L1 Dcache."

It is not possible to monitor signals between the CPU and L1 cache (I or 
D).  Was the monitoring of CPU/L1 inferred by the cache misses seen 
coming from L1?  Even so, a lot of memory behavior is missed when only 
observing cache misses.

Regards,
Tony

Nju Njoroge wrote:
> Anthony Mahar wrote: > >>Nju Njoroge wrote: >> >>Interesting question for the "Monitoring Capsule Design" paper... > > they > >>state they monitor behavior "between the CPU and L1 Dcache." Did > > they > >>explain how they were able to do this, since the PPC405 and L1 are > > part > >>of the same hard core? >> > > You are right--the CPU and the L1 cache are in the same hard core, so > we don't have access to the interface inside the CPU core and the > cache. As I described in my previous post, they placed their monitor at > the interface of the L1 cache port that are usually connected to the > PLB. Thus, instead of connecting their CPU to the PLB bus, they > connected the PPC core to their monitor, which is then connected to the > PLB. > > NN >
Anthony Mahar wrote:
> As they state in their paper > http://cag.csail.mit.edu/warfp2005/submissions/29-suh.pdf > > "In our initial study, we deploy a monitoring capsule in Dcaches to
mon-
> itor the memory behavior between a CPU and L1 Dcache." > > It is not possible to monitor signals between the CPU and L1 cache (I
or
> D). Was the monitoring of CPU/L1 inferred by the cache misses seen > coming from L1?
They had two versions of their monitor--one for the MicroBlaze core and one for the PPC. For the PPC, they inferred the cache missess as seen from the L1. With the uBlaze, since they have access to the L1 cache signals, they could wedge their monitor in it.
>Even so, a lot of memory behavior is missed when only > observing cache misses. > > Regards, > Tony > > Nju Njoroge wrote: > > Anthony Mahar wrote: > > > >>Nju Njoroge wrote: > >> > >>Interesting question for the "Monitoring Capsule Design" paper... > > > > they > > > >>state they monitor behavior "between the CPU and L1 Dcache." Did > > > > they > > > >>explain how they were able to do this, since the PPC405 and L1 are > > > > part > > > >>of the same hard core? > >> > > > > You are right--the CPU and the L1 cache are in the same hard core,
so
> > we don't have access to the interface inside the CPU core and the > > cache. As I described in my previous post, they placed their
monitor at
> > the interface of the L1 cache port that are usually connected to
the
> > PLB. Thus, instead of connecting their CPU to the PLB bus, they > > connected the PPC core to their monitor, which is then connected to
the
> > PLB. > > > > NN > >