go pprof

[TOC]

pprof

pprof为go自带性能分析工具，

pprof有两个包用来分析程序一个是net/http/pprof另一个是runtime/pprof，

net/http/pprof是对runtime/pprof包进行封装并用http暴露出来。runtime/pprof可以用来产生 dump 文件，再使用 Go Tool PProf 来分析这运行日志。

使用net/http/pprof分析web服务

pprof分析web项目，只需要导入包即可, import _ "net/http/pprof".

服务端代码：

import (
	"encoding/json"
	"fmt"
	"net/http"
	_ "net/http/pprof"
	"reflect"
	"time"
)

// 字符串拼接
func StrJ(w http.ResponseWriter, r *http.Request) {
	s := ""
	for i := 0; i < 1000; i++ {
		s += "In this moment"
	}
}

func reflectT(w http.ResponseWriter, r *http.Request) {
	type D struct {
		A int8
		B string
		C int64
	}

	d := D{
		1, "2", 10000,
	}
	for i := 0; i < 10000; i++ {
		fmt.Println(reflect.ValueOf(d).FieldByName("A").Int())
		fmt.Println(reflect.ValueOf(d).FieldByName("B").String())
		fmt.Println(reflect.ValueOf(d).FieldByName("C").Int())
	}
}

//一个稍微复杂点，吃内存的程序
type AppInfo struct {
	Action 		string	`json:"action"`
	AppName 	string	`json:"app_name"`
	PackageName	string	`json:"package_name"`
	Time		int64	`json:"time"`
	VersionCode uint8	`json:"version_code"`
	VersionName string	`json:"version_name"`
}

func AppList(w http.ResponseWriter, r *http.Request) {
	var appList []AppInfo
	for i := 1; i < 100000; i++ {
		action := fmt.Sprintf("%08d", i)
		item := AppInfo{
			Action:			action,
			AppName:		"硬件管家",
			PackageName:	"com.woyou.hardwarekeeper",
			Time:			time.Now().UnixNano(),
			VersionCode:	210,
			VersionName:	"2.3.32",
		}
		appList = append(appList, item)
	}
	rs, _ := json.Marshal(appList)
	w.Write(rs)
}



func Server() {
	http.HandleFunc("/str", StrJ)
	http.HandleFunc("/reflect", reflectT)
	http.HandleFunc("/appList", AppList)
	http.ListenAndServe(":8080", nil)
}

客户端代码

import (
	"fmt"
	"log"
	"net/http"
	"time"
)

func Request() {
	i := 0
	for {
		go func() {
			if _, err := http.Get("http://192.168.33.224:8080/reflect"); err != nil {
				log.Fatalln(err)
			}
		}()
		go func() {
			if _, err := http.Get("http://192.168.33.224:8080/str"); err != nil {
				log.Fatalln(err)
			}
		}()
		go func() {
			if _, err := http.Get("http://192.168.33.224:8080/appList"); err != nil {
				log.Fatalln(err)
			}
		}()

		time.Sleep(time.Second)
		i++
		fmt.Println("req : ", i)
	}
}

通过 http://192.168.33.12:8080/debug/pprof/ 访问profile信息:

debug/pprof/
Types of profiles available:
Count	Profile
7	allocs
0	block
0	cmdline
4	goroutine
7	heap
0	mutex
0	profile
7	threadcreate
0	trace
full goroutine stack dump 

/debug/pprof/allocs: 内存分配的抽样
/debug/pprof/block：Goroutine阻塞事件的记录。默认每发生一次阻塞事件时取样一次。
/debug/pprof/profile：访问这个链接会自动进行 CPU profiling，持续 30s，并生成一个文件供下载
/debug/pprof/goroutines：活跃Goroutine的信息的记录。仅在获取时取样一次。
/debug/pprof/heap：堆内存分配情况的记录。默认每分配512K字节时取样一次。
/debug/pprof/mutex: 查看争用互斥锁的持有者。
/debug/pprof/threadcreate: 系统线程创建情况的记录。仅在获取时取样一次。

交互式终端

查看heap profile：go tool pprof http://192.168.33.12:8080/debug/pprof/heap
查看goroutine blocking profile：go tool pprof http://192.168.33.12:8080/debug/pprof/block

如CPU profile : go tool pprof http://192.168.33.12:8080/debug/pprof/profile?seconds=30 通过seconds设置采集等待时间

[root@localhost webapi-dataservice]# go tool pprof http://192.168.33.12:8080/debug/pprof/profile?seconds=30
Fetching profile over HTTP from http://192.168.33.12:8080/debug/pprof/profile?seconds=30
Saved profile in /root/pprof/pprof.main.samples.cpu.001.pb.gz
File: main
Type: cpu
Time: Mar 17, 2020 at 3:35pm (CST)
Duration: 30.16s, Total samples = 18.84s (62.47%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 15.72s, 83.44% of 18.84s total
Dropped 136 nodes (cum <= 0.09s)
Showing top 10 nodes out of 72
      flat  flat%   sum%        cum   cum%
     7.06s 37.47% 37.47%      7.79s 41.35%  syscall.Syscall
     5.98s 31.74% 69.21%      5.99s 31.79%  runtime.nanotime (inline)
     1.10s  5.84% 75.05%      1.10s  5.84%  runtime.usleep
     0.34s  1.80% 76.86%      0.94s  4.99%  runtime.mallocgc
     0.29s  1.54% 78.40%      0.29s  1.54%  runtime.futex
     0.25s  1.33% 79.72%      0.27s  1.43%  runtime.casgstatus
     0.21s  1.11% 80.84%      0.21s  1.11%  runtime.nextFreeFast (inline)
     0.19s  1.01% 81.85%      0.19s  1.01%  runtime.epollwait
     0.15s   0.8% 82.64%      0.15s   0.8%  runtime.duffcopy
     0.15s   0.8% 83.44%      0.41s  2.18%  runtime.newobject

交互模式数据可以写入具体文件 top > 001.txt

Go tool pprof 分析工具

需安装 graphviz yum install -y graphviz

Go Tool PProf 工具可以对 prof 文件进行更详细的交互式分析，可以生成调用关系图和火焰图。

Go tool pprof常用基本调试基本命令(默认30s采集时间，可通过–seconds)，生成profile压缩gz文件(在/root/pprof目录下的.gz)，进入交互，执行命令如top（执行的过程，访问web应用来产生得到相应的监测数据），如果想查看pprof其他命令，可输入help。

生成pdf： 进入交互模式， ` go tool pprof http://192.168.33.12:8080/debug/pprof/profile?seconds=30输入pdf` 会在当前目录下生成

其他命令：

(pprof) help
  Commands:
    callgrind        Outputs a graph in callgrind format
    comments         Output all profile comments
    disasm           Output assembly listings annotated with samples
    dot              Outputs a graph in DOT format
    eog              Visualize graph through eog
    evince           Visualize graph through evince
    gif              Outputs a graph image in GIF format
    gv               Visualize graph through gv
    kcachegrind      Visualize report in KCachegrind
    list             Output annotated source for functions matching regexp
    pdf              Outputs a graph in PDF format
    peek             Output callers/callees of functions matching regexp
    png              Outputs a graph image in PNG format
    proto            Outputs the profile in compressed protobuf format
    ps               Outputs a graph in PS format
    raw              Outputs a text representation of the raw profile
    svg              Outputs a graph in SVG format
    tags             Outputs all tags in the profile
    text             Outputs top entries in text form
    top              Outputs top entries in text form
    topproto         Outputs top entries in compressed protobuf format
    traces           Outputs all profile samples in text form
    tree             Outputs a text rendering of call graph
    web              Visualize graph through web browser
    weblist          Display annotated source in a web browser
    o/options        List options and their current values
    quit/exit/^D     Exit pprof

  Options:
    call_tree        Create a context-sensitive call tree
    compact_labels   Show minimal headers
    divide_by        Ratio to divide all samples before visualization
    drop_negative    Ignore negative differences
    edgefraction     Hide edges below <f>*total
    focus            Restricts to samples going through a node matching regexp
    hide             Skips nodes matching regexp
    ignore           Skips paths going through any nodes matching regexp
    mean             Average sample value over first value (count)
    nodecount        Max number of nodes to show
    nodefraction     Hide nodes below <f>*total
    noinlines        Ignore inlines.
    normalize        Scales profile based on the base profile.
    output           Output filename for file-based outputs
    prune_from       Drops any functions below the matched frame.
    relative_percentages Show percentages relative to focused subgraph
    sample_index     Sample value to report (0-based index or name)
    show             Only show nodes matching regexp
    show_from        Drops functions above the highest matched frame.
    source_path      Search path for source files
    tagfocus         Restricts to samples with tags in range or matched by regexp
    taghide          Skip tags matching this regexp
    tagignore        Discard samples with tags in range or matched by regexp
    tagshow          Only consider tags matching this regexp
    trim             Honor nodefraction/edgefraction/nodecount defaults
    trim_path        Path to trim from source paths before search
    unit             Measurement units to display

  Option groups (only set one per group):
    cumulative       
      cum              Sort entries based on cumulative weight
      flat             Sort entries based on own weight
    granularity      
      addresses        Aggregate at the address level.
      filefunctions    Aggregate at the function level.
      files            Aggregate at the file level.
      functions        Aggregate at the function level.
      lines            Aggregate at the source code line level.
  :   Clear focus/ignore/hide/tagfocus/tagignore

  type "help <cmd|option>" for more information

参数解读

flat flat% sum% cum cum% calls calls%

flat代表的是该函数自身代码的执行时长，不包括他调用的其他函数时长
cum 代表的是该函数自身代码+所有调用的函数的执行时长
flat%和cum%指的就是flat耗时和cum耗时占总耗时（也就是6.21秒）的百分比
sum 列表前几行所占百分比总和, 从上到下累加，便于查看

在echo中使用pprof

import (
	"github.com/labstack/echo"
	"net/http/pprof"
)

func Router(e *echo.Echo) {
	e.GET("/debug/pprof/allocs", prof)
	e.GET("/debug/pprof/heap", prof)
	e.GET("/debug/pprof/goroutine", prof)
	e.GET("/debug/pprof/profile", cpu)
}

func prof(c echo.Context) error {
	pprof.Index(c.Response(), c.Request())
	return nil
}

func cpu(c echo.Context) error {
	pprof.Profile(c.Response(), c.Request())
	return nil
}

runtime/pprof

runtime/pprof 包含了一系列用于调试信息的方法，可以很方便的对堆栈进行调试。

StartCPUProfile：开始监控cpu
StopCPUProfile：停止监控cpu，使用StartCPUProfile后一定要调用该函数停止监控。
WriteHeapProfile：把堆中的内存分配信息写入分析文件中。

使用：

import (
	"encoding/json"
	"fmt"
	"log"
	"os"
	"runtime"
	"runtime/pprof"
	"time"
)


func RunCliProf() {
	f, err := os.Create("cpu.prof")
	if err != nil {
		log.Fatalln("could not create CPU profile: ", err)
	}
	//开启CPU监控
	if err := pprof.StartCPUProfile(f); err != nil {
		log.Fatal("could not start CPU profile: ", err)
	}
	defer pprof.StopCPUProfile()	//停止监控cpu

	//逻辑代码
	for i := 0; i < 20; i++ {
		appList()
	}

	fd, err := os.Create("mem.prof")
	defer fd.Close()
	if err != nil {
		log.Fatalln("could not create memory profile: ", err)
	}
	runtime.GC() // GC，获取最新的数据信息
	if err := pprof.WriteHeapProfile(fd); err != nil { // 写入内存信息
		log.Fatal("could not write memory profile: ", err)
	}
}

func appList () {
	var appList []AppInfo
	for i := 1; i < 100000; i++ {
		action := fmt.Sprintf("%08d", i)
		item := AppInfo{
			Action:			action,
			AppName:		"硬件管家",
			PackageName:	"com.woyou.hardwarekeeper",
			Time:			time.Now().UnixNano(),
			VersionCode:	210,
			VersionName:	"2.3.32",
		}
		appList = append(appList, item)
	}
	_, err := json.Marshal(appList)
	if err != nil {
		log.Fatalln(err)
	}
}

分析文件 go tool pprof cpu.prof

go-torch

# 安装FlameGraph
git clone https://github.com/brendangregg/FlameGraph.git
将FlameGraph/flamegraph.pl 加入环境变量$PATH
flamegraph.pl -h 测试

# 安装go-torch
go get -v github.com/uber/go-torch

生成火焰图： 从Go 1.11开始，火焰图形可视化go tool pprof直接可用，

内存分析

go-torch -u http://192.168.33.224:8080/debug/pprof/allocs
或
go tool pprof -raw -seconds 30 http://192.168.33.224:8080/debug/pprof/profile

CPU

go-torch -u http://192.168.33.224:8080/debug/pprof/profile?seconds=30

go-torch cpuprofile.out 生成火焰图

性能基准测试

性能测试函数必须以Benchmark开头，代码文件必须以_test.go结尾。
go test 默认不会执行性能测试函数，需要加上参数-bench, 通过不断调整B.N的值，反复执行测试函数，知道获取测试结果。
运行go test -bench .
通过run=none 可以忽略单元测试
设定CPU参数，设定并发限制go test -bench . -cpu 1,2,3
有时耗时操作，循环测试次数过少，不够准确，可以benchtime设定最小测试时间增加循环次数。go test -bench . -benchtime 5s
运行指定的方法go test -bench Add1
查看内存分配go test -bench . -benchmem

//go 性能测试基准测试
//基准测试的代码文件必须以_test.go结尾
//基准测试的函数必须以Benchmark开头，必须是可导出的
//基准测试函数必须接受一个指向Benchmark类型的指针作为唯一参数
//基准测试函数不能有返回值
//最后的for循环很重要，被测试的代码要放到循环里
//b.N是基准测试框架提供的，表示循环的次数，因为需要反复调用测试的代码，才可以评估性能
import (
	"sync"
	"testing"
)
var m sync.Mutex

func call() {
	m.Lock()
	m.Unlock()
}

func deferCall() {
	m.Lock()
	defer m.Unlock()
}

func BenchmarkCall(b *testing.B) {
	for i := 0; i < b.N; i++ {
		call()
	}
}

func BenchmarkCallDefer(b *testing.B) {
	for i := 0; i < b.N; i++ {
		deferCall()
	}
}

//运行 go test -bench=. -run=none
//输出
//BenchmarkCall-8         92373063                13.6 ns/op
//BenchmarkCallDefer-8    32450937                37.4 ns/op 每次操作需要话费37.4纳秒

//go test -bench=. -benchmem -memprofile memprofile.out -cpuprofile profile.out

常用flag，通过go help testflag可查看 ``` -bench regexp:运行性能测试，支持表达式对测试函数进行筛选。-bench .则是对所有的benchmark函数测试 -run regexp:运行单元测试函数，比如-run ABC只测试函数名中包含ABC的测试函数 -benchmem:性能测试的时候显示测试函数的内存分配的统计信息 -benchtime t : 性能测试运行的时间，默认是1s -count n:运行测试和性能多少此，默认一次 -timeout t:测试时间如果超过t, panic,默认10分钟 -v:显示测试的详细信息，也会把Log、Logf方法的日志显示出来 -cpuprofile cpu.out : 是否输出cpu性能分析文件 -memprofile mem.out : 是否输出内存性能分析文件 -blockprofile block.out : 是否输出内部goroutine阻塞的性能分析文件

-test.blockprofile block.out : 是否输出内部goroutine阻塞的性能分析文件 -test.blockprofilerate n: 基本同上，控制的是goroutine阻塞时候打点的纳秒数。默认不设置就相当于-test.blockprofilerate=1，每一纳秒都打点记录一下 -test.parallel n : 性能测试的程序并行cpu数，默认等于GOMAXPROCS。 -test.cpu 1,2,4 : 程序运行在哪些CPU上面，使用二进制的1所在位代表，和nginx的nginx_worker_cpu_affinity是一个道理 -test.short : 将那些运行时间较长的测试用例运行时间缩短

```go
//内存分配
func heap() []byte {
	return make([]byte, 1024*10)
}

//go test -bench Heap -benchmem
func BenchmarkHeap(b *testing.B) {
	b.ReportAllocs() //输出内存分配信息，无论是否使用-benchmem
	for i := 0; i < b.N; i++ {
		_ = heap()
	}
}

代码覆盖率

go test -cover

分析

引发性能问题的如执行时间过长，内存占用过多，意外的阻塞。
交互模式查看内存采样数据
- flat 仅当前函数，不包括他调用的其他函数
- sum 列表前几行所占百分比总和
- cum 当前函数调用堆栈累积

$ go test -bench=. -benchmem -memprofile memprofile.out -cpuprofile profile.out
$ go tool pprof memprofile.out
输入top10
(pprof) top10
Showing nodes accounting for 5844.77MB, 99.95% of 5847.79MB total
Dropped 23 nodes (cum <= 29.24MB)
Showing top 10 nodes out of 22
      flat  flat%   sum%        cum   cum%
 5184.02MB 88.65% 88.65%  5184.02MB 88.65%  go-sword-202001/practice.strJoin2
  242.50MB  4.15% 92.80%   242.50MB  4.15%  go-sword-202001/practice.rset1
  132.13MB  2.26% 95.06%   132.13MB  2.26%  bytes.makeSlice
  119.12MB  2.04% 97.09%   119.12MB  2.04%  strings.(*Builder).grow
  106.50MB  1.82% 98.91%   106.50MB  1.82%  reflect.(*structType).Field
   60.50MB  1.03% 99.95%      167MB  2.86%  go-sword-202001/practice.rset
         0     0% 99.95%   132.13MB  2.26%  bytes.(*Buffer).Grow
         0     0% 99.95%   132.13MB  2.26%  bytes.(*Buffer).grow
         0     0% 99.95%   132.13MB  2.26%  go-sword-202001/practice.BenchmarkBufferWrite
         0     0% 99.95%      167MB  2.86%  go-sword-202001/practice.BenchmarkRSet

peek malg 列出调用来源
list malg 列出源码行统计样式，一遍直观定位
web，``web malg` 输出svg图形，需要安装Graphviz