Skip to main content

Power and Performance

Software Analysis and Optimization

  • 1st Edition - April 2, 2015
  • Latest edition
  • Author: Jim Kukunas
  • Language: English

Power and Performance: Software Analysis and Optimization is a guide to solving performance problems in modern Linux systems. Power-efficient chips are no help if the software… Read more

World Book Day celebration

Where learning shapes lives

Up to 25% off trusted resources that support research, study, and discovery.

Description

Power and Performance: Software Analysis and Optimization is a guide to solving performance problems in modern Linux systems. Power-efficient chips are no help if the software those chips run on is inefficient. Starting with the necessary architectural background as a foundation, the book demonstrates the proper usage of performance analysis tools in order to pinpoint the cause of performance problems, and includes best practices for handling common performance issues those tools identify.

Key features

  • Provides expert perspective from a key member of Intel’s optimization team on how processorsand memory systems influence performance
  • Presents ideas to improve architectures running mobile, desktop, or enterprise platforms
  • Demonstrates best practices for designing experiments and benchmarking throughout the software lifecycle
  • Explains the importance of profiling and measurement to determine the source of performance issues

Readership

Software engineers seeking to measure and improve the performance and power efficiency of their applications

Table of contents

  • Dedication
  • Introduction
    • Performance Apologetic
    • A Word on Premature Optimization
    • The Roadmap
  • Part 1: Background Knowledge
    • Chapter 1: Early Intel® Architecture
      • Abstract
      • 1.1 Intel® 8086
      • 1.2 Intel® 8087
      • 1.3 Intel® 80286 and 80287
      • 1.4 Intel® 80386 and 80387
    • Chapter 2: Intel® Pentium® Processors
      • Abstract
      • 2.1 Intel® Pentium®
      • 2.2 Intel® Pentium® Pro
      • 2.3 Intel® Pentium® 4
    • Chapter 3: Intel® Core™ Processors
      • Abstract
      • 3.1 Intel® Pentium® M
      • 3.2 Second Generation Intel® Core™ Processor Family
    • Chapter 4: Performance Workflow
      • Abstract
      • 4.1 Step 0: Defining the Problem
      • 4.2 Step 1: Determine the Source of the Problem
      • 4.3 Step 2: Determine Whether the Bottleneck Can Be Avoided
      • 4.4 Step 3: Design a Reproducible Experiment
      • 4.5 Step 4: Check Upstream
      • 4.6 Step 5: Algorithmic Improvement
      • 4.7 Step 6: Architectural Tuning
      • 4.8 Step 7: Testing
      • 4.9 Step 8: Performance Regression Testing
    • Chapter 5: Designing Experiments
      • Abstract
      • 5.1 Choosing a Metric
      • 5.2 Dealing with External Variables
      • 5.3 Timing
      • 5.4 Phoronix Test Suite
  • Part 2: Monitors
    • Chapter 6: Introduction to Profiling
      • Abstract
      • 6.1 PMU
      • 6.2 Top-Down Hierarchical Analysis
    • Chapter 7: Intel® VTune™ Amplifier XE
      • Abstract
      • 7.1 Installation and Configuration
      • 7.2 Data Collection and Reporting
    • Chapter 8: Perf
      • Abstract
      • 8.1 Event Infrastructure
      • 8.2 Perf Tool
    • Chapter 9: Ftrace
      • Abstract
      • 9.1 DebugFS
      • 9.2 Kernel Shark
    • Chapter 10: GPU Profiling Tools
      • Abstract
      • 10.1 Traditional Graphics Stack
      • 10.2 buGLe
      • 10.3 Apitrace
    • Chapter 11: Other Helpful Tools
      • Abstract
      • 11.1 GNU Profiler
      • 11.2 Gcov
      • 11.3 PowerTOP
      • 11.4 LatencyTOP
      • 11.5 Sysprof
  • Part 3: Optimization Techniques
    • Chapter 12: Toolchain Primer
      • Abstract
      • 12.1 Compiler Flags
      • 12.2 ELF and the x86/x86_64 ABIs
      • 12.3 CPU Dispatch
      • 12.4 Coding Style
      • 12.5 x86 Unleashed
    • Chapter 13: Branching
      • Abstract
      • 13.1 Avoiding Branches
      • 13.2 Improving Prediction
    • Chapter 14: Optimizing Cache Usage
      • Abstract
      • 14.1 Processor Cache Organization
      • 14.2 Querying Cache Topology
      • 14.3 Prefetch
      • 14.4 Improving Locality
    • Chapter 15: Exploiting Parallelism
      • Abstract
      • 15.1 SIMD
    • Chapter 16: Special Instructions
      • Abstract
      • 16.1 Intel® Advanced Encryption Standard New Instructions (AES-NI)
      • 16.2 PCLMUL-Packed Carry-Less Multiplication
      • 16.3 CRC32
      • 16.4 SSE4.2 String Functions
  • Index

Review quotes

"...covers the intended topics with enough clarity and depth to serve both as a potential textbook and as a reference for practitioners…This one of the best technical books I have read in a while."—Computing Reviews

Product details

  • Edition: 1
  • Latest edition
  • Published: April 2, 2015
  • Language: English

About the author

JK

Jim Kukunas

Jim Kukunas began programming at a young age, teaching himself C and x86 assembly. He is an alumnus of Allegheny College with a degree in Computer Science. Today, he is a software engineer in Intel's Open Source Technology Center. As a performance optimization engineer on the core Linux kernel team, much of his work focuses on kernel space and user space performance optimizations. His efforts have enhanced many projects including the Linux kernel, Zlib, the Englightenment Foundation Libraries, Meego, Android, and many others.
Affiliations and expertise
Software engineer, Intel’s Open Source Technology Center, Hillsboro, OR, USA

View book on ScienceDirect

Read Power and Performance on ScienceDirect