go pprof
[TOC]
pprof
pprof为go自带性能分析工具,
pprof有两个包用来分析程序一个是net/http/pprof另一个是runtime/pprof,
net/http/pprof是对runtime/pprof包进行封装并用http暴露出来。runtime/pprof可以用来产生 dump 文件,再使用 Go Tool PProf 来分析这运行日志。
使用net/http/pprof分析web服务
pprof分析web项目,只需要导入包即可, import _ "net/http/pprof"
.
服务端代码:
import (
"encoding/json"
"fmt"
"net/http"
_ "net/http/pprof"
"reflect"
"time"
)
// 字符串拼接
func StrJ(w http.ResponseWriter, r *http.Request) {
s := ""
for i := 0; i < 1000; i++ {
s += "In this moment"
}
}
func reflectT(w http.ResponseWriter, r *http.Request) {
type D struct {
A int8
B string
C int64
}
d := D{
1, "2", 10000,
}
for i := 0; i < 10000; i++ {
fmt.Println(reflect.ValueOf(d).FieldByName("A").Int())
fmt.Println(reflect.ValueOf(d).FieldByName("B").String())
fmt.Println(reflect.ValueOf(d).FieldByName("C").Int())
}
}
//一个稍微复杂点,吃内存的程序
type AppInfo struct {
Action string `json:"action"`
AppName string `json:"app_name"`
PackageName string `json:"package_name"`
Time int64 `json:"time"`
VersionCode uint8 `json:"version_code"`
VersionName string `json:"version_name"`
}
func AppList(w http.ResponseWriter, r *http.Request) {
var appList []AppInfo
for i := 1; i < 100000; i++ {
action := fmt.Sprintf("%08d", i)
item := AppInfo{
Action: action,
AppName: "硬件管家",
PackageName: "com.woyou.hardwarekeeper",
Time: time.Now().UnixNano(),
VersionCode: 210,
VersionName: "2.3.32",
}
appList = append(appList, item)
}
rs, _ := json.Marshal(appList)
w.Write(rs)
}
func Server() {
http.HandleFunc("/str", StrJ)
http.HandleFunc("/reflect", reflectT)
http.HandleFunc("/appList", AppList)
http.ListenAndServe(":8080", nil)
}
客户端代码
import (
"fmt"
"log"
"net/http"
"time"
)
func Request() {
i := 0
for {
go func() {
if _, err := http.Get("http://192.168.33.224:8080/reflect"); err != nil {
log.Fatalln(err)
}
}()
go func() {
if _, err := http.Get("http://192.168.33.224:8080/str"); err != nil {
log.Fatalln(err)
}
}()
go func() {
if _, err := http.Get("http://192.168.33.224:8080/appList"); err != nil {
log.Fatalln(err)
}
}()
time.Sleep(time.Second)
i++
fmt.Println("req : ", i)
}
}
通过 http://192.168.33.12:8080/debug/pprof/
访问profile信息:
debug/pprof/
Types of profiles available:
Count Profile
7 allocs
0 block
0 cmdline
4 goroutine
7 heap
0 mutex
0 profile
7 threadcreate
0 trace
full goroutine stack dump
- /debug/pprof/allocs: 内存分配的抽样
- /debug/pprof/block:Goroutine阻塞事件的记录。默认每发生一次阻塞事件时取样一次。
-
/debug/pprof/profile:访问这个链接会自动进行 CPU profiling,持续 30s,并生成一个文件供下载
- /debug/pprof/goroutines:活跃Goroutine的信息的记录。仅在获取时取样一次。
- /debug/pprof/heap: 堆内存分配情况的记录。默认每分配512K字节时取样一次。
- /debug/pprof/mutex: 查看争用互斥锁的持有者。
- /debug/pprof/threadcreate: 系统线程创建情况的记录。 仅在获取时取样一次。
交互式终端
- 查看heap profile:
go tool pprof http://192.168.33.12:8080/debug/pprof/heap
- 查看goroutine blocking profile:
go tool pprof http://192.168.33.12:8080/debug/pprof/block
如CPU profile : go tool pprof http://192.168.33.12:8080/debug/pprof/profile?seconds=30
通过seconds设置采集等待时间
[root@localhost webapi-dataservice]# go tool pprof http://192.168.33.12:8080/debug/pprof/profile?seconds=30
Fetching profile over HTTP from http://192.168.33.12:8080/debug/pprof/profile?seconds=30
Saved profile in /root/pprof/pprof.main.samples.cpu.001.pb.gz
File: main
Type: cpu
Time: Mar 17, 2020 at 3:35pm (CST)
Duration: 30.16s, Total samples = 18.84s (62.47%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 15.72s, 83.44% of 18.84s total
Dropped 136 nodes (cum <= 0.09s)
Showing top 10 nodes out of 72
flat flat% sum% cum cum%
7.06s 37.47% 37.47% 7.79s 41.35% syscall.Syscall
5.98s 31.74% 69.21% 5.99s 31.79% runtime.nanotime (inline)
1.10s 5.84% 75.05% 1.10s 5.84% runtime.usleep
0.34s 1.80% 76.86% 0.94s 4.99% runtime.mallocgc
0.29s 1.54% 78.40% 0.29s 1.54% runtime.futex
0.25s 1.33% 79.72% 0.27s 1.43% runtime.casgstatus
0.21s 1.11% 80.84% 0.21s 1.11% runtime.nextFreeFast (inline)
0.19s 1.01% 81.85% 0.19s 1.01% runtime.epollwait
0.15s 0.8% 82.64% 0.15s 0.8% runtime.duffcopy
0.15s 0.8% 83.44% 0.41s 2.18% runtime.newobject
交互模式数据可以写入具体文件 top > 001.txt
Go tool pprof 分析工具
需安装 graphviz yum install -y graphviz
Go Tool PProf 工具可以对 prof 文件进行更详细的交互式分析,可以生成调用关系图和火焰图。
Go tool pprof常用基本调试基本命令(默认30s采集时间,可通过–seconds),生成profile压缩gz文件(在/root/pprof目录下的.gz),进入交互,执行命令如top(执行的过程,访问web应用来产生得到相应的监测数据),如果想查看pprof其他命令,可输入help。
生成pdf:
进入交互模式, ` go tool pprof http://192.168.33.12:8080/debug/pprof/profile?seconds=30
输入
pdf`
会在当前目录下生成
其他命令:
(pprof) help
Commands:
callgrind Outputs a graph in callgrind format
comments Output all profile comments
disasm Output assembly listings annotated with samples
dot Outputs a graph in DOT format
eog Visualize graph through eog
evince Visualize graph through evince
gif Outputs a graph image in GIF format
gv Visualize graph through gv
kcachegrind Visualize report in KCachegrind
list Output annotated source for functions matching regexp
pdf Outputs a graph in PDF format
peek Output callers/callees of functions matching regexp
png Outputs a graph image in PNG format
proto Outputs the profile in compressed protobuf format
ps Outputs a graph in PS format
raw Outputs a text representation of the raw profile
svg Outputs a graph in SVG format
tags Outputs all tags in the profile
text Outputs top entries in text form
top Outputs top entries in text form
topproto Outputs top entries in compressed protobuf format
traces Outputs all profile samples in text form
tree Outputs a text rendering of call graph
web Visualize graph through web browser
weblist Display annotated source in a web browser
o/options List options and their current values
quit/exit/^D Exit pprof
Options:
call_tree Create a context-sensitive call tree
compact_labels Show minimal headers
divide_by Ratio to divide all samples before visualization
drop_negative Ignore negative differences
edgefraction Hide edges below <f>*total
focus Restricts to samples going through a node matching regexp
hide Skips nodes matching regexp
ignore Skips paths going through any nodes matching regexp
mean Average sample value over first value (count)
nodecount Max number of nodes to show
nodefraction Hide nodes below <f>*total
noinlines Ignore inlines.
normalize Scales profile based on the base profile.
output Output filename for file-based outputs
prune_from Drops any functions below the matched frame.
relative_percentages Show percentages relative to focused subgraph
sample_index Sample value to report (0-based index or name)
show Only show nodes matching regexp
show_from Drops functions above the highest matched frame.
source_path Search path for source files
tagfocus Restricts to samples with tags in range or matched by regexp
taghide Skip tags matching this regexp
tagignore Discard samples with tags in range or matched by regexp
tagshow Only consider tags matching this regexp
trim Honor nodefraction/edgefraction/nodecount defaults
trim_path Path to trim from source paths before search
unit Measurement units to display
Option groups (only set one per group):
cumulative
cum Sort entries based on cumulative weight
flat Sort entries based on own weight
granularity
addresses Aggregate at the address level.
filefunctions Aggregate at the function level.
files Aggregate at the file level.
functions Aggregate at the function level.
lines Aggregate at the source code line level.
: Clear focus/ignore/hide/tagfocus/tagignore
type "help <cmd|option>" for more information
参数解读
flat flat% sum% cum cum% calls calls%
flat
代表的是该函数自身代码的执行时长,不包括他调用的其他函数时长cum
代表的是该函数自身代码+所有调用的函数的执行时长flat%和cum%
指的就是flat耗时和cum耗时占总耗时(也就是6.21秒)的百分比sum
列表前几行所占百分比总和, 从上到下累加,便于查看
在echo中使用pprof
import (
"github.com/labstack/echo"
"net/http/pprof"
)
func Router(e *echo.Echo) {
e.GET("/debug/pprof/allocs", prof)
e.GET("/debug/pprof/heap", prof)
e.GET("/debug/pprof/goroutine", prof)
e.GET("/debug/pprof/profile", cpu)
}
func prof(c echo.Context) error {
pprof.Index(c.Response(), c.Request())
return nil
}
func cpu(c echo.Context) error {
pprof.Profile(c.Response(), c.Request())
return nil
}
runtime/pprof
runtime/pprof 包含了一系列用于调试信息的方法,可以很方便的对堆栈进行调试。
StartCPUProfile
:开始监控cpuStopCPUProfile
:停止监控cpu,使用StartCPUProfile后一定要调用该函数停止监控。WriteHeapProfile
:把堆中的内存分配信息写入分析文件中。
使用:
import (
"encoding/json"
"fmt"
"log"
"os"
"runtime"
"runtime/pprof"
"time"
)
func RunCliProf() {
f, err := os.Create("cpu.prof")
if err != nil {
log.Fatalln("could not create CPU profile: ", err)
}
//开启CPU监控
if err := pprof.StartCPUProfile(f); err != nil {
log.Fatal("could not start CPU profile: ", err)
}
defer pprof.StopCPUProfile() //停止监控cpu
//逻辑代码
for i := 0; i < 20; i++ {
appList()
}
fd, err := os.Create("mem.prof")
defer fd.Close()
if err != nil {
log.Fatalln("could not create memory profile: ", err)
}
runtime.GC() // GC,获取最新的数据信息
if err := pprof.WriteHeapProfile(fd); err != nil { // 写入内存信息
log.Fatal("could not write memory profile: ", err)
}
}
func appList () {
var appList []AppInfo
for i := 1; i < 100000; i++ {
action := fmt.Sprintf("%08d", i)
item := AppInfo{
Action: action,
AppName: "硬件管家",
PackageName: "com.woyou.hardwarekeeper",
Time: time.Now().UnixNano(),
VersionCode: 210,
VersionName: "2.3.32",
}
appList = append(appList, item)
}
_, err := json.Marshal(appList)
if err != nil {
log.Fatalln(err)
}
}
分析文件 go tool pprof cpu.prof
go-torch
# 安装FlameGraph
git clone https://github.com/brendangregg/FlameGraph.git
将FlameGraph/flamegraph.pl 加入环境变量$PATH
flamegraph.pl -h 测试
# 安装go-torch
go get -v github.com/uber/go-torch
生成火焰图: 从Go 1.11开始,火焰图形可视化go tool pprof直接可用,
内存分析
go-torch -u http://192.168.33.224:8080/debug/pprof/allocs
或
go tool pprof -raw -seconds 30 http://192.168.33.224:8080/debug/pprof/profile
CPU
go-torch -u http://192.168.33.224:8080/debug/pprof/profile?seconds=30
go-torch cpuprofile.out 生成火焰图
性能基准测试
- 性能测试函数必须以Benchmark开头,代码文件必须以_test.go结尾。
- go test 默认不会执行性能测试函数,需要加上参数
-bench
, 通过不断调整B.N的值,反复执行测试函数,知道获取测试结果。 - 运行
go test -bench .
- 通过
run=none
可以忽略单元测试 - 设定CPU参数,设定并发限制
go test -bench . -cpu 1,2,3
- 有时耗时操作,循环测试次数过少,不够准确,可以benchtime设定最小测试时间增加循环次数。
go test -bench . -benchtime 5s
- 运行指定的方法
go test -bench Add1
- 查看内存分配
go test -bench . -benchmem
//go 性能测试基准测试
//基准测试的代码文件必须以_test.go结尾
//基准测试的函数必须以Benchmark开头,必须是可导出的
//基准测试函数必须接受一个指向Benchmark类型的指针作为唯一参数
//基准测试函数不能有返回值
//最后的for循环很重要,被测试的代码要放到循环里
//b.N是基准测试框架提供的,表示循环的次数,因为需要反复调用测试的代码,才可以评估性能
import (
"sync"
"testing"
)
var m sync.Mutex
func call() {
m.Lock()
m.Unlock()
}
func deferCall() {
m.Lock()
defer m.Unlock()
}
func BenchmarkCall(b *testing.B) {
for i := 0; i < b.N; i++ {
call()
}
}
func BenchmarkCallDefer(b *testing.B) {
for i := 0; i < b.N; i++ {
deferCall()
}
}
//运行 go test -bench=. -run=none
//输出
//BenchmarkCall-8 92373063 13.6 ns/op
//BenchmarkCallDefer-8 32450937 37.4 ns/op 每次操作需要话费37.4纳秒
//go test -bench=. -benchmem -memprofile memprofile.out -cpuprofile profile.out
- 常用flag,通过
go help testflag
可查看 ``` -bench regexp:运行性能测试,支持表达式对测试函数进行筛选。-bench .则是对所有的benchmark函数测试 -run regexp:运行单元测试函数, 比如-run ABC只测试函数名中包含ABC的测试函数 -benchmem:性能测试的时候显示测试函数的内存分配的统计信息 -benchtime t : 性能测试运行的时间,默认是1s -count n:运行测试和性能多少此,默认一次 -timeout t:测试时间如果超过t, panic,默认10分钟 -v:显示测试的详细信息,也会把Log、Logf方法的日志显示出来 -cpuprofile cpu.out : 是否输出cpu性能分析文件 -memprofile mem.out : 是否输出内存性能分析文件 -blockprofile block.out : 是否输出内部goroutine阻塞的性能分析文件
-test.blockprofile block.out : 是否输出内部goroutine阻塞的性能分析文件 -test.blockprofilerate n: 基本同上,控制的是goroutine阻塞时候打点的纳秒数。默认不设置就相当于-test.blockprofilerate=1,每一纳秒都打点记录一下 -test.parallel n : 性能测试的程序并行cpu数,默认等于GOMAXPROCS。 -test.cpu 1,2,4 : 程序运行在哪些CPU上面,使用二进制的1所在位代表,和nginx的nginx_worker_cpu_affinity是一个道理 -test.short : 将那些运行时间较长的测试用例运行时间缩短
```go
//内存分配
func heap() []byte {
return make([]byte, 1024*10)
}
//go test -bench Heap -benchmem
func BenchmarkHeap(b *testing.B) {
b.ReportAllocs() //输出内存分配信息,无论是否使用-benchmem
for i := 0; i < b.N; i++ {
_ = heap()
}
}
代码覆盖率
go test -cover
分析
-
引发性能问题的如执行时间过长,内存占用过多,意外的阻塞。
-
交互模式查看内存采样数据
- flat 仅当前函数,不包括他调用的其他函数
- sum 列表前几行所占百分比总和
- cum 当前函数调用堆栈累积
$ go test -bench=. -benchmem -memprofile memprofile.out -cpuprofile profile.out
$ go tool pprof memprofile.out
输入top10
(pprof) top10
Showing nodes accounting for 5844.77MB, 99.95% of 5847.79MB total
Dropped 23 nodes (cum <= 29.24MB)
Showing top 10 nodes out of 22
flat flat% sum% cum cum%
5184.02MB 88.65% 88.65% 5184.02MB 88.65% go-sword-202001/practice.strJoin2
242.50MB 4.15% 92.80% 242.50MB 4.15% go-sword-202001/practice.rset1
132.13MB 2.26% 95.06% 132.13MB 2.26% bytes.makeSlice
119.12MB 2.04% 97.09% 119.12MB 2.04% strings.(*Builder).grow
106.50MB 1.82% 98.91% 106.50MB 1.82% reflect.(*structType).Field
60.50MB 1.03% 99.95% 167MB 2.86% go-sword-202001/practice.rset
0 0% 99.95% 132.13MB 2.26% bytes.(*Buffer).Grow
0 0% 99.95% 132.13MB 2.26% bytes.(*Buffer).grow
0 0% 99.95% 132.13MB 2.26% go-sword-202001/practice.BenchmarkBufferWrite
0 0% 99.95% 167MB 2.86% go-sword-202001/practice.BenchmarkRSet
peek malg
列出调用来源list malg
列出源码行统计样式,一遍直观定位web
,``web malg` 输出svg图形,需要安装Graphviz