Some metrics for ARM Mali performance analysis in DS-5 Streamline or MGD.
DS-5 Streamline
Mali-470 (Utgard)
GPU Bandwidth = (Words read, master + Words written, master) * Bus Width
Overdraw = Fragments Passed Z/stencil count * Number of Cores / Resolution * FPS
GPU Bandwidth = (38342943+38661456) * (128/8) = 1232070384 bytes/s = 1174.99 MB/s Overdraw = (79194586+79152584+79112550+79117609)/(1920x1080) = 152.67
Mali-T820 (Midgard)
Job Slots
- JS0: fragment shading.
- JS1: vertex, geometry, compute, and tiling.
- JS2: vertex, geometry, and compute.
Fragment Percentage = (JS0 Active / GPU frequency) * 100
Vertex percentage = (JS1 Active / Frequency) * 100
Load Store CPI = Full Pipeline issues / Load Store Instruction Words Completed
GPU Bandwidth = (External read beats + External write beats) * Bus Width
Overdraw = Fragment Threads Started * Number of Cores/ Resolution * FPS
GPU Bandwidth = (35942654+29615172) * (128/8) = 1048925216 bytes/s = 1000.33 MB/s
Fragment Percentage = (10347901 / 10351030) * 100 = 99.96%
Vertex percentage = ( 1635340 / 10351030) * 100 = 15.79%
Mali Graphic Debugger(MGD)
GPU Budget
Total vertex cycle count/frame/pixel = vertex cycles / frame buffer pixels
Total fragment cycle count/frame/pixel = fragment cycles / frame buffer pixels
Total cycle count/frame/pixel = fragment cycle + vertex cycle
Mali-470 (Utgard)
T470MP4@750MHz with 1920x1080 resolution, application rendering framebuffer 512x512.
Total vertex cycle count/frame/pixel = 1224001 / (512 * 512) ~= 4.66 cycles/frame/pixel
Total fragment cycle count/frame/pixel = 6429459 / (512 * 512) = 24.52 cycles/frame/pixel
Total cycle count/frame/pixel = 24.52+4.66 = 29.18 cycles/frame/pixel
vertexCycleBudget = (750M cycles/sec) / (60 frames/sec * 108926) = 114.75 cycles/frame/vertex
Mali-T820 (Midgard)
T820MP3@620MHz with 1920x1080 resolution, application rendering framebuffer 512x512.
Total vertex cycle count/frame/pixel = 919273 / (512 * 512) = 3.5 cycles/frame/pixel
Total fragment cycle count/frame/pixel = 6713456 / (512 * 512) = 25.6 cycles/frame/pixel
Total cycle count/frame/pixel = 29.1 cycles/frame/pixel
vertexCycleBudget = (620M * 3 cycles/sec) / (60 frames/sec * 92141) = 672.88 cycles/frame/vertex
References:
GPU Processing Budget Approach to Game Development
Mali Midgard Family Performance Counters
Using Streamline to Guide Cache Optimization
Mali GPU Tools: A Case Study, Part 1
Using Streamline to Optimize Applications for Mali GPUs
Performance Analysis and Debugging with Mali.pdf