Utilizing the parallelism offered by multicore CPUs is hard, though profiling and tracing are well-established techniques to understand, debug, engineer and optimize codes. While many tools are available to capture profiles and traces, these tools are often difficult to use in industrial contexts. Tool development often started with sequential applications in mind to transition to parallelism not until later, resulting in improper feature sets and usability. In contrast, parallel tools are often targeted towards HPC with a strong focus on MPI and OpenMP. As this turns these tools less suitable for codes using alternative threading models (POSIX, Qt, and ACE), this talk presents extensions to the open-source profiling and tracing tools Score-P and Scalasca. Score-P captures detailed program execution data allowing Scalasca to perform a sophisticated performance and wait-state analysis.


Our talk targets profiling and tracing infrastructures associated with trace-analysis mechanisms to provide insight into parallel programs. We describe extensions to the established tools Score-P and Scalasca and their usage in real-life and industrial application scenarios. Thus, our talk targets developers of profiling and tracing tools as well as end users. Addressing multicore and distributed systems, tool developers are specifically addressed as we cover user-space trace collection and extraction, filtering, aggregation, formats, automated trace analysis, tracing large clusters and distributed systems, HW counters, visualization, analysis of large trace datasets, and the integration between trace tools. End users are specifically addressed as we present real-world application experiences along with typical target scenarios and needs derived from an industrial viewpoint.


Daniel Becker received his Ph.D. degree from RWTH Aachen University in 2009. He completed his Ph.D. project at the Jülich Supercomputing Centre in the area of scalable performance analysis tools. His career path alsoincludes research stays at academic and industrial organizations including Porsche (Germany), Nokia (Germany), the University of Tennessee (USA), the IBM T.J. Watson Research Center (USA), and theGerman Research School for Simulation Sciences (Germany). Today, he works within Siemens’ Corporate Technology division, where he focuses on migration strategies from sequential to parallel software architectures and associated supportive concurrency tools.

Christian Rössel has been active in the field of computer simulation and performance analysis since his time as a student at Cologne University (Germany) from where he received his Diploma in Physics in 2004. After a year working as a consultant and software engineer for the Münster University of Applied Sciences, he moved to PTV AG in Karlsruhe where he joined the traffic simulation development team. From 2009 on he works as a scientific staff member at the Jülich Supercomputing Centre (JSC), reinforcing JSC’s cross-sectional team “Performance Analysis”. His work is primarily focused within projects developing Score-P, the next generation measurement infrastructure for the performance analysis tools Scalasca, Vampir, TAU and Periscope.

After receiving his Ph.D. in computer science from the University of Koblenz-Landau (Germany) in 2005, Markus Geimer joined the Jülich Supercomputing Centre (JSC) as a research scientist beginning of 2006. Today, he is the deputy head of the cross-sectional team “Performance Analysis” of JSC and the lead developer of the parallel trace-analysis component of the Scalasca performance analysis toolset. Moreover, he is involved in the design and development of the Score-P instrumentation and measurement infrastructure, as well as in many training activities for both Score-P and Scalasca.