25 调试

[TOC]

调试

Go提供的调试工具有测试用例、特征分析、事件追踪。
测试用例 testing 是Go标准库提供的自动化测试框架，基本测试用例、性能测试用例等。
特征分析 pprof 是Go标准库提供的样本采集和样本分析工具，采集分析一段时间内的CPU占用、内存分配、协程栈等信息。
事件追踪 trace 是Go标准库提供的事件追踪工具，可以追踪协程的创建开始和结束、协程的堵塞（系统调用、通道、锁）、网络I/O、系统调用、垃圾回收等。

测试用例 go test

testing 是Go标准库提供的自动化测试框架。通过 go test 命令，能够自动执行4种形式的函数：
TestXxxx(t *testing.T) 基本测试用例。
BenchmarkXxxx(b *testing.B) 性能测试用例。
Example_Xxx() 控制台输出的测试用例。
TestMain(m *testing.M) 测试用例的 Main 函数，可以自定义一些实用的功能。
最常用的是基本测试用例 testing.T 和性能测试用例 testing.B。
除了标准库外还有官方模拟框架mock 【https://github.com/golang/mock】，可以与 testing 集成。go mock 可以模拟依赖项，简化测试。

运行测试命令
go test / 用来运行某个 package 内的所有测试用例。
运行当前 package 内的用例：go test packageName 或 go test .。
递归测试当前目录下的所有的 package：go test ./...。
go test 命令默认不运行 benchmark 用例，运行 benchmark 用例要加上 -bench 参数。

testing 的变量
test.short : 一个快速测试的标记，在测试用例中可以使用 testing.Short() 来绕开一些测试
test.outputdir : 输出目录
test.coverprofile : 测试覆盖率参数，指定输出文件
test.run : 指定正则来运行某个或某些测试用例
test.memprofile : 内存分析参数，指定输出文件
test.memprofilerate : 内存分析参数，内存分析的抽样率
test.cpuprofile : cpu 分析输出参数，为空则不做 cpu 分析
test.blockprofile : 阻塞事件的分析参数，指定输出文件
test.blockprofilerate : 阻塞事件的分析参数，指定抽样频率
test.timeout : 超时时间
test.cpu : 指定 cpu 数量
test.parallel : 指定运行测试用例的并行数

testing 的结构体
M : main 测试使用的结构体
PB : Parallel benchmarks 并行测试使用的结构体
T : 普通测试用例
TB : 测试用例的接口
B : 压力测试
BenchmarkResult : 压力测试结果
Cover : 代码覆盖率相关结构体
CoverBlock : 代码覆盖率相关结构体
InternalBenchmark : 内部使用的结构体
InternalExample : 内部使用的结构体
InternalTest : 内部使用的结构体

common 常用方法

common 结构体绑定了一些通用方法
testing.T（基本测试用例）和 testing.B（性能测试用例）都内嵌了common，因此拥有了common的方法

type common struct {
	mu          sync.RWMutex         // guards this group of fields
	output      []byte               // Output generated by test or benchmark.
    ...
}
type B struct {
	common
	importPath       string // import path of the package containing the 
	...
}
type T struct {
	common
	isParallel bool
	context    *testContext // For running tests and subtests.
}

打印信息

    t.Log() // 输出信息
	t.Logf(format)    // 输出有 format 的信息

断言错误时，判断这个测试用例失败

    t.Fail()    // case 失败，测试用例继续
	t.FailNow() // case 失败，测试用例中断

跳过一个错误，但是不标示测试用例失败

    t.SkipNow() // case 跳过，测试用例不继续

跳过用例，并打印信息

	t.Skip()    // Log + SkipNow
	t.Skipf(format)   // Skipf : Logf + SkipNow

断言失败时，测试用例失败，打印必要的信息，但测试用例继续

	t.Error()   //  Logf + Fail
	t.Errorf(format)    //  Logf + Fail

断言失败时，测试用例失败，打印出必要的信息，测试用例中断

	t.Fatal()   // Log + FailNow
	t.Fatalf(format)    // Logf + FailNow

基本测试用例 testing.T

测试文件
文件名以_test.go结尾的，执行go test的时候才会执行到相应的代码。
import testing 这个包。
测试用例函数以Test(大写字母)开头。
测试用例会按照源代码中写的顺序依次执行。
测试函数TestX()的参数是testing.T，使用该类型来记录错误或者是测试状态。
函数中通过调用testing.T的Error、Errorf、FailNow、Fatal、 FatalIf方法，说明测试不通过，调用Log方法用来记录测试的信息。

被测试的函数

package util

import (
	"crypto/md5"
	"fmt"
	"io"
	"log"
)

//	MD5加密
//	plaintext string	明文
//	string	密文，32个字符16进制string
//	error 不为nil加密失败
func MD5(plaintext string) (string,error){
	m := md5.New()
	_, err := io.WriteString(m, plaintext)
	if err != nil {
		log.Println(err)
		return "",err
	}
	arr := m.Sum(nil)
	return fmt.Sprintf("%x", arr),nil
}

//	两次MD5加密
//	plaintext string	明文
//	string	密文，32个字符16进制string
//	error 不为nil加密失败
func MD5Double(plaintext string) (string,error){
	return MD5(plaintext)
}

测试文件 md5util_test.go

package test

import (
	"demo2/util"
	"testing"
)

func TestMD5(t *testing.T){
	pwd,err := util.MD5("password")
	if nil != err {
		t.Error(err)
		return
	}
	t.Log(pwd)
}

func TestMD5Double(t *testing.T){
	pwd,err := util.MD5Double("password")
	if nil != err {
		t.Error(err)
		return
	}
	t.Log(pwd)
}

测试命令 go test
运行目录下所有测试文件 go test -v

E:\c\go\goworkgit\demo2\test>go test -v
=== RUN   TestMD5
    md5util_test.go:14: 5f4dcc3b5aa765d61d8327deb882cf99
--- PASS: TestMD5 (0.00s)
=== RUN   TestMD5Double
    md5util_test.go:23: 5f4dcc3b5aa765d61d8327deb882cf99
--- PASS: TestMD5Double (0.00s)
PASS
ok      demo2/test      0.229s

性能测试用例 testing.B

性能测试 testing.B 结构的主要成员

    common： 内嵌的testing.common。
    N：每个测试中用户代码执行次数。
    benchFunc：测试函数。
    benchTime：性能测试最少执行时间，默认为1s，可以通过能数-benchtime 2s指定。
    bytes：每次迭代处理的字节数。
    timerOn：计时启动标志，默认为false，启动计时为true。
    startAllocs：测试启动时记录堆中分配的对象数。
    startBytes：测试启动时记录堆中分配的字节数。
    netAllocs：测试结束后记录堆中新增加的对象数，公式：结束时堆中分配的对象数。
    netBytes：测试对事后记录堆中新增加的字节数。

性能测试 testing.B 结构的主要方法

func (b *B) StartTimer()    // 启动计时器，一般由testing自动调用，也可以用于跳过某些不需要计入测试结果的过程StopTimer后手动启动及时。
func (b *B) StopTimer()     // 停止计时器，一般由testing自动调用，也可以用于跳过某些不计入测试结果的过程。
func (b *B) ResetTimer()    // 重置计时器并且清空分配对象数和内存，测试前工作完成后可调用重置
func (b *B) ReportAllocs() // 打印内存信息

testing.B.N 自动调整
性能测试会至少执行b.benchTime长的秒数，默认为1s。
先执行一遍测试执行一次花费的时长，如果时间较短，那么b.N值要足够大才可以测得更精确，如果时间较长，b.N值相应的会减少，否则会影响测试效率。
最终的b.N会被定格在某个10的指数级，是为了方便阅读测试报告。
通过 go test -bench=. -benchtime=5s 增加 -benchtime 参数增加测试时长
通过 go test -bench=. -count=100 增加 -count 参数增加测试次数（整体次数）

benchmarkExample.go 被测试函数

package test2

func Prime(max int) []int {
	list := *new([]int)
	for i:=2;i<=max;i++ {
		flag := true
		for j:=2;j<i;j++ {
			if i%j == 0{
				flag = false
				break
			}
		}
		if flag {
			list = append(list,i)
		}
	}
	return list
}

func Prime2(max int) []int {
	list := *new([]int)
	for i:=2;i<=max;i++ {
		flag := true
		for j:=2;j<i;j++ {
			if i%j == 0{
				flag = false
				//break	注释了break会更加耗时
			}
		}
		if flag {
			list = append(list,i)
		}
	}
	return list
}

测试函数 benchmarkExample_test.go

package test2

import (
	"testing"
)

func Benchmark_Prime(b *testing.B){
	b.ReportAllocs()	// 报告内存信息
	for i:=1;i<=b.N;i++{
		Prime(i * 100)
	}
}

func Benchmark_Prime2(b *testing.B){
	for i:=1;i<=b.N;i++{
		Prime2(i * 100)
	}
}

go test -bench=. 测试结果

E:\c\go\goworkgit\demo2\test2>go test -bench=.
goos: windows
goarch: amd64
pkg: demo2/test2
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
Benchmark_Prime-16           225          23621860 ns/op           37590 B/op         12 allocs/op
Benchmark_Prime2-16          100          46308958 ns/op
PASS
ok      demo2/test2     10.682s

测试函数

运行次数

平均每次耗时(纳秒)

每次分配的字节数

每个操作分配

Benchmark_Prime-16

216

22351284 ns/op

37590 B/op

12 allocs/op

Benchmark_Prime2-16

100

44229132 ns/op

testing.M 定义 before 和 after 函数

测试用例运行之前、之后的相同逻辑可以封装到不同函数中使用TestMain来组织调用顺序。

package test4

import (
	"fmt"
	"os"
	"testing"
)


var mp map[string]int

//	before 在测试用例之前执行
func before(){
	fmt.Println("before 初始化数据")
	mp = make(map[string]int)
}

//	after 在测试用例之后执行
func after(){
	fmt.Println("after 释放资源")
	mp = nil
}

func TestA(t *testing.T){
	t.Log("执行 TestA")
	mp["a"] = 9999
}

func TestB(t *testing.T){
	t.Logf("执行 TestB:%d",mp["a"])
}

func TestMain(m *testing.M) {
	before()
	code := m.Run()	// 执行测试用例
	after()
	os.Exit(code)
}

E:\c\go\goworkgit\demo2\test4>go test -v
before 初始化数据
=== RUN   TestA
    MainExample_test.go:25: 执行 TestA
--- PASS: TestA (0.00s)
=== RUN   TestB
    MainExample_test.go:30: 执行 TestB:9999
--- PASS: TestB (0.00s)
PASS
after 释放资源
ok      demo2/test4     0.211s

子测试

testing.T 和 testing.B 在测试函数中可以通过 .Run() 函数运行子测试，只是并不常用。

package test5

import "testing"

func TestSubtests(t *testing.T){
	t.Run("sub1", func(t *testing.T) {
		t.Log("子测试1")
	})
	t.Run("sub2",func(t *testing.T){
		t.Log("子测试2")
	})
}

特征分析 pprof

pprof 分为采样和分析两个阶段，采样是获取一段时间内的样本数据，获取样本后分析数据样本。
通过pprof对指标或特征进行分析，可以通过分析查找程序中内存泄露、协程泄露等错误，也能查看程序运行时资源的利用率。
Go标准库提供了 net/http/pprof 和 runtime/pprof 获取样本特征数据。

pprof 分析
pprof 样本数据都会以 Protocol Buffers 格式序列化数据并通过gzip写入文件。
通过 go tool pprof 工具对样本进行分析，先对样本文件解码还原为 Protocol Buffers 格式。
Profile 是一系列样本集合，包含样本类型、样本数组、函数、行号、文件名等。

获取样本 net/http/pprof

服务型应用 net/http/pprof 获取样本
通过 net/http/pprof 获取样本要引入【_ "net/http/pprof"】，net/http/pprof 包的 init 函数会注册路由。

net/http/pprof 源码，init()函数中注册了获取样本的路由

package pprof

...

func init() {
	http.HandleFunc("/debug/pprof/", Index)
	http.HandleFunc("/debug/pprof/cmdline", Cmdline)
	http.HandleFunc("/debug/pprof/profile", Profile)
	http.HandleFunc("/debug/pprof/symbol", Symbol)
	http.HandleFunc("/debug/pprof/trace", Trace)
}

web服务器导入【_ "net/http/pprof"】通过http获取样本
test1() 和 test2() 函数轮流打印输出，test1() 还会创建一个切片并对元素进行排序，增加样本的数据

package main

import (
	"crypto/rand"
	"fmt"
	"math/big"
	"net/http"
	_ "net/http/pprof"
	"time"
)

func main() {
	chTest1 := make(chan int,1)
	chTest2 := make(chan int,1)
	go test1(chTest1,chTest2)
	go test2(chTest1,chTest2)
	chTest1 <- 1
	http.ListenAndServe(":8081",nil)
}

func test1(ch1 <- chan int,ch2 chan <- int){
	for {
		<- ch1
		fmt.Println("test1 begin")
		sortSList(generateList())
		fmt.Println("test1 end")
		ch2 <- 1
	}
}

func test2(ch1 chan <- int,ch2 <- chan int){
	for {
		<- ch2
		time.Sleep(1 * time.Second)
		fmt.Println("test2 .......")
		ch1 <- 1
	}
}

func generateList() []int64{
	list := *new([]int64)
	for i:=0;i<100000;i++{
		result,_ := rand.Int(rand.Reader, big.NewInt(1000))
		list = append(list,result.Int64())
	}
	return list
}

func sortSList(list []int64) []int64{
	for i:=0;i<len(list);i++ {
		for j:=0;j<len(list);j++ {
			if list[i] < list[j] {
				list[i],list[j] = list[j],list[i]
			}
		}
	}
	return list
}

访问服务器获得分析样本
http://localhost:8081/debug/pprof/
allocs：所有过去内存分配的样本。
block：导致同步原语阻塞的堆栈跟踪。
cmdline：当前程序的命令行调用。
goroutine：所有当前 goroutine 的堆栈跟踪，堆：活动对象的内存分配样本，增加 gc 参数可以在获取堆样本之前运行 GC；互斥锁：竞争互斥锁持有者的堆栈跟踪；配置文件：CPU 配置文件，增加 seconds 参数中指定持续时间。
threadcreate：操作系统线程的堆栈跟踪
trace：当前程序执行的轨迹。增加 seconds 参数指定持续时间。
mutex：主要用于堵塞分析，分析互斥锁带来的休眠时间等。

获取样本

获取程序在60秒内占用CPU的样本
http://localhost:8081/debug/pprof/profile?seconds=60

获取程序所有协程堆栈的样本
http://localhost:8081/debug/pprof/goroutine

获取程序堆内存的样本
http://localhost:8081/debug/pprof/heap

获取样本 runtime/pprof

工具型应用使用 runtime/pprof 应用退出的时候把 profiling 的样本保存到文件

package main

import (
	"crypto/rand"
	"fmt"
	"math/big"
	"os"
	"runtime/pprof"
)

func main() {
	//	输出 cpu样本文件
	cpuFile, _ := os.OpenFile("cpu.pprof", os.O_CREATE|os.O_RDWR, 0644)
	//	输出 内存样本文件
	heapFile, _ := os.OpenFile("heap.pprof", os.O_CREATE|os.O_RDWR, 0644)
	defer func(){
		pprof.StopCPUProfile()
		cpuFile.Close()
		heapFile.Close()
	}()
	pprof.StartCPUProfile(cpuFile)
	pprof.WriteHeapProfile(heapFile)
	// 生成 200000 个元素切片排序
	list := *new([]int64)
	for i:=0;i<200000;i++{
		result,_ := rand.Int(rand.Reader, big.NewInt(1000))
		list = append(list,result.Int64())
	}

	for i:=0;i<len(list);i++ {
		for j:=0;j<len(list);j++ {
			if list[i] < list[j] {
				list[i],list[j] = list[j],list[i]
			}
		}
	}
	fmt.Println(list)
}

命令行分析数据 go tool pprof

分析数据-命令交互查看

go tool pprof 常用命令，命令调用形式 go tool pprof -命令

命令

说明

help

查看支持的命令

top

输出函数调用条目，-cum更改为以cum排序(默认flat排序)，加数字来更改显示条数。

list

输出正则匹配指定名称的函数源码，带注释。

peek

展示匹配正则名称的函数调用和被调用的信息。

traces

输出采样中所有函数的调用顺序。

web

浏览器输出可视化的调用关系，可通过正则匹配或正则忽略关键字。

tree

与peek类似，但是输出全部的信息。

pdf

生成pdf文件报告

    C:\Users\laoniqiu>go tool pprof http://localhost:8081/debug/pprof/profile?seconds=60
    Fetching profile over HTTP from http://localhost:8081/debug/pprof/profile?seconds=60
    Saved profile in C:\Users\laoniqiu\pprof\pprof.samples.cpu.001.pb.gz
    Type: cpu
    Time: Feb 6, 2022 at 9:16pm (CST)
    Duration: 1mins, Total samples = 230ms ( 0.38%)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof)
    
    
    C:\Users\laoniqiu>go tool pprof -top E:\c\go\goworkgit\demo2\cpu.pprof
    Type: cpu
    Time: Feb 7, 2022 at 11:46pm (CST)
    Duration: 25.76s, Total samples = 17.66s (68.57%)
    Showing nodes accounting for 17.16s, 97.17% of 17.66s total
    Dropped 54 nodes (cum <= 0.09s)
      flat  flat%   sum%        cum   cum%
    16.51s 93.49% 93.49%     16.57s 93.83%  main.main
     0.29s  1.64% 95.13%      0.29s  1.64%  runtime.stdcall1
     0.19s  1.08% 96.21%      0.19s  1.08%  runtime.stdcall3
     0.08s  0.45% 96.66%      0.43s  2.43%  runtime.findrunnable
     0.03s  0.17% 96.83%      0.14s  0.79%  runtime.checkTimers
     0.03s  0.17% 97.00%      0.67s  3.79%  runtime.schedule
     0.02s  0.11% 97.11%      0.23s  1.30%  runtime.wakep
     0.01s 0.057% 97.17%      0.12s  0.68%  runtime.modtimer
         0     0% 97.17%      0.21s  1.19%  runtime.(*pageAlloc).scavenge

图表分析和火焰图分析 go tool pprof

分析数据-图形化界面
下载安装 graphviz 可视化工具

    下载 http://www.graphviz.org/download/
    安装 graphviz

通过浏览器访问网页查看分析的数据。

    go tool pprof -http=:端口 【文件地址】
    使用
    go tool pprof -http=:27780 C:\Users\laoniqiu\Downloads\goroutine
    或者
    go tool pprof -http=:27780 http://localhost:8081/debug/pprof/goroutine
    查看分析访问 
    http://localhost:27780/ui/

图表分析 Graph
访问样本，由于 seconds=30 需要等待30秒的样本
go tool pprof -http=:27780 http://localhost:8081/debug/pprof/profile?seconds=30

样本 Graph

节点字体大小
大字体当前值较大。
小字体当前值较小。

边框颜色
红色：值较大。
黄色：值较小。
灰色：值接近0。

箭头大小
消耗资源越多箭头越粗。
消耗资源越小箭头越细。

箭头线型
虚线箭头：两个节点之间的某些节点已被忽略，为间接调用。
实线箭头：两个节点之间为直接调用。

火焰图分析 Flame Graph
go tool pprof -http=:27780 http://localhost:8081/debug/pprof/profile?seconds=30
Go 1.11 后 pprof分析工具中内置了火焰图，主要用于分析堆内存和CPU的使用情况。
火焰图形状和颜色像火焰，主要用于分析软件的特征和性能。火焰图可以快速识别出最频繁使用的代码路径，分析出程序的瓶颈所在。

样本 Flame Graph

root 是整个程序的开始，其它的框都是一个函数。
火焰图每一层的函数都是平级的，下层函数是其对应的上层函数的子函数。
函数调用栈越长火焰图就越高。
框越长颜色越深，占用CPU时间越长。
在 main.sortSList 那行左边的小格打开是 main.generateList，main.generateList 的 CPU 占用时间（0.32%,0.06s）极短所以占了很小的格子。

main.generateList

内存分析

allocs 分配内存样本，heap 堆内存样本
go tool pprof -http=:27780 http://localhost:8081/debug/pprof/allocs
go tool pprof -http=:27780 http://localhost:8081/debug/pprof/heap

内存分析四种类型参数 http://localhost:27780/ui/?si=参数【如：alloc_objects】
alloc_objects 已经被分配的对象，不考虑对象释放情况。
alloc_space 分配的空间
inuse_objects 正在使用的对象数量【heap 没有这项数据】
inuse_space 在使用的空间【heap 没有这项数据】

http://localhost:27780/ui/?si=inuse_space 堆内存已使用空间样本，main.generateList 使用了869.29kB

堆内存 inuse_space Graph

内存采样
对堆内存采样不是记录每次 mallocgc 分配堆内存，只有多次分配内存积累到指标 MemProfileRate 以上时才会记录。
记录下来的样本都是一个bucket，bucket会存储到全局mbuckets链表中，mbuckets链表中的对象加入了span中的special序列不会被GC扫描。
借助栈追踪，bucket 中保留了当前分配的内存大小、触发内存分配的函数和函数调用链。栈追踪后发现当前样本有相同的调用链就不会重复记录，只会更新bucket记录的内存大小。

CPU 占用分析

pprof分析 CPU 占用情况可以在不破坏原始程序的情况下估计出函数执行时间。
pprof 的 CPU 样本采集是通过调用操作系统借助程序中断的功能实现的，因此不需要修改原始程序依然能采集到样本。
go tool pprof -top http://localhost:8081/debug/pprof/profile?seconds=30

类型

描述

flat

函数占用CPU的耗时

flat%

函数占用CPU的耗时的百分比

sum%

top命令前一位的函数以及本函数flat%之和

cum

加上该函数调用之前的累计CPU耗时

cum%

加上该函数调用之前的累计CPU耗时的百分比

分析30秒的样本

C:\Users\laoniqiu>go tool pprof -top http://localhost:8081/debug/pprof/profile?seconds=30
Fetching profile over HTTP from http://localhost:8081/debug/pprof/profile?seconds=30
Saved profile in C:\Users\laoniqiu\pprof\pprof.samples.cpu.010.pb.gz
Type: cpu
Time: Feb 9, 2022 at 6:46pm (CST)
Duration: 30.01s, Total samples = 20.32s (67.70%)
Showing nodes accounting for 19.25s, 94.73% of 20.32s total
Dropped 91 nodes (cum <= 0.10s)
      flat  flat%   sum%        cum   cum%
    16.05s 78.99% 78.99%     16.10s 79.23%  main.sortSList (inline)
     1.17s  5.76% 84.74%      1.17s  5.76%  runtime.stdcall3
     0.57s  2.81% 87.55%      0.57s  2.81%  runtime.stdcall1
     0.31s  1.53% 89.07%      1.84s  9.06%  runtime.findrunnable
     0.21s  1.03% 90.11%      0.21s  1.03%  runtime.stdcall4
     0.15s  0.74% 90.85%      0.76s  3.74%  runtime.checkTimers
     0.14s  0.69% 91.54%      0.14s  0.69%  runtime.stdcall6
     0.12s  0.59% 92.13%      0.12s  0.59%  runtime.stdcall2
     0.11s  0.54% 92.67%      0.11s  0.54%  runtime.(*randomEnum).next (inline)
     0.09s  0.44% 93.11%      0.24s  1.18%  runtime.lock2
     0.07s  0.34% 93.45%      2.27s 11.17%  runtime.schedule
     0.04s   0.2% 93.65%      0.49s  2.41%  runtime.injectglist
     0.04s   0.2% 93.85%      0.71s  3.49%  runtime.startm
     0.03s  0.15% 94.00%      0.55s  2.71%  runtime.bgscavenge.func1
     0.03s  0.15% 94.14%      0.31s  1.53%  runtime.scavengeSleep
     0.03s  0.15% 94.29%      0.35s  1.72%  runtime.wakep
     0.01s 0.049% 94.34%      1.22s  6.00%  runtime.(*pageAlloc).scavengeRangeLocked
     0.01s 0.049% 94.39%      0.40s  1.97%  runtime.injectglist.func1 (inline)
     0.01s 0.049% 94.44%      0.15s  0.74%  runtime.netpoll
     0.01s 0.049% 94.49%      0.22s  1.08%  runtime.netpollBreak
     0.01s 0.049% 94.54%      2.30s 11.32%  runtime.park_m
     0.01s 0.049% 94.59%      0.59s  2.90%  runtime.runOneTimer
     0.01s 0.049% 94.64%      0.58s  2.85%  runtime.semawakeup
     0.01s 0.049% 94.69%      0.18s  0.89%  runtime.stopm
     0.01s 0.049% 94.73%      1.18s  5.81%  runtime.sysUnused
         0     0% 94.73%     16.22s 79.82%  main.test1
         0     0% 94.73%      1.32s  6.50%  runtime.(*pageAlloc).scavenge
         0     0% 94.73%      1.24s  6.10%  runtime.(*pageAlloc).scavengeOne
         0     0% 94.73%      0.48s  2.36%  runtime.bgscavenge
         0     0% 94.73%      1.34s  6.59%  runtime.bgscavenge.func2
         0     0% 94.73%      0.24s  1.18%  runtime.lock (inline)
         0     0% 94.73%      0.24s  1.18%  runtime.lockWithRank (inline)
         0     0% 94.73%      2.30s 11.32%  runtime.mcall
         0     0% 94.73%      0.23s  1.13%  runtime.modtimer
         0     0% 94.73%      1.17s  5.76%  runtime.mstart
         0     0% 94.73%      0.58s  2.85%  runtime.notewakeup
         0     0% 94.73%      0.34s  1.67%  runtime.resetspinning
         0     0% 94.73%      0.23s  1.13%  runtime.resettimer (inline)
         0     0% 94.73%      0.59s  2.90%  runtime.runtimer
         0     0% 94.73%      0.12s  0.59%  runtime.semasleep
         0     0% 94.73%      1.35s  6.64%  runtime.systemstack
         0     0% 94.73%      0.23s  1.13%  runtime.wakeNetPoller
         0     0% 94.73%      0.52s  2.56%  runtime.wakeScavenger
         0     0% 94.73%      0.21s  1.03%  time.resetTimer (inline)

协程栈分析

协程栈分析主要是查看线程的数量、检查协程是否泄露、查看协程在执行哪些函数，依据这些数据判断协程是否正常。
图形化分析 go tool pprof -http=:27780 http://localhost:8081/debug/pprof/goroutine
或者使用命令行分析 go tool pprof -tree http://localhost:8081/debug/pprof/goroutine
runtime.chanrecv 是协程在等待接收通道数据进入堵塞。
runtime.asyncPreempt 异步抢占正在执行的协程。
runtime.gopark 是协程的休眠函数，执行 test2() 函数的协程在获取样本时有可能是在执行time.Sleep(1 * time.Second) 进入休眠状态。

常用命令
go tool pprof -tree 查找函数调用链
go tool pprof -top 查看协程按状态数量排序

C:\Users\laoniqiu>go tool pprof -top http://localhost:8081/debug/pprof/goroutine
Fetching profile over HTTP from http://localhost:8081/debug/pprof/goroutine
Saved profile in C:\Users\laoniqiu\pprof\pprof.goroutine.010.pb.gz
Type: goroutine
Time: Feb 9, 2022 at 6:25pm (CST)
Showing nodes accounting for 8, 100% of 8 total
      flat  flat%   sum%        cum   cum%
         7 87.50% 87.50%          7 87.50%  runtime.gopark
         1 12.50%   100%          1 12.50%  runtime/pprof.runtime_goroutineProfileWithLabels
         0     0%   100%          3 37.50%  bufio.(*Reader).ReadLine
         0     0%   100%          3 37.50%  bufio.(*Reader).ReadSlice
         0     0%   100%          3 37.50%  bufio.(*Reader).fill
         0     0%   100%          1 12.50%  internal/poll.(*FD).Accept
         0     0%   100%          4 50.00%  internal/poll.(*FD).Read
         0     0%   100%          1 12.50%  internal/poll.(*FD).acceptOne
         0     0%   100%          5 62.50%  internal/poll.(*pollDesc).wait
         0     0%   100%          5 62.50%  internal/poll.execIO
         0     0%   100%          5 62.50%  internal/poll.runtime_pollWait
         0     0%   100%          1 12.50%  main.main
         0     0%   100%          1 12.50%  main.test1
         0     0%   100%          1 12.50%  main.test2
         0     0%   100%          1 12.50%  net.(*TCPListener).Accept
         0     0%   100%          1 12.50%  net.(*TCPListener).accept
         0     0%   100%          4 50.00%  net.(*conn).Read
         0     0%   100%          4 50.00%  net.(*netFD).Read
         0     0%   100%          1 12.50%  net.(*netFD).accept
         0     0%   100%          1 12.50%  net/http.(*ServeMux).ServeHTTP
         0     0%   100%          1 12.50%  net/http.(*Server).ListenAndServe
         0     0%   100%          1 12.50%  net/http.(*Server).Serve
         0     0%   100%          3 37.50%  net/http.(*conn).readRequest
         0     0%   100%          4 50.00%  net/http.(*conn).serve
         0     0%   100%          3 37.50%  net/http.(*connReader).Read
         0     0%   100%          1 12.50%  net/http.(*connReader).backgroundRead
         0     0%   100%          1 12.50%  net/http.HandlerFunc.ServeHTTP
         0     0%   100%          1 12.50%  net/http.ListenAndServe (inline)
         0     0%   100%          3 37.50%  net/http.readRequest
         0     0%   100%          1 12.50%  net/http.serverHandler.ServeHTTP
         0     0%   100%          1 12.50%  net/http/pprof.Index
         0     0%   100%          1 12.50%  net/http/pprof.handler.ServeHTTP
         0     0%   100%          3 37.50%  net/textproto.(*Reader).ReadLine (inline)
         0     0%   100%          3 37.50%  net/textproto.(*Reader).readLineSlice
         0     0%   100%          1 12.50%  runtime.chanrecv
         0     0%   100%          1 12.50%  runtime.chanrecv1
         0     0%   100%          1 12.50%  runtime.main
         0     0%   100%          5 62.50%  runtime.netpollblock
         0     0%   100%          1 12.50%  runtime/pprof.(*Profile).WriteTo
         0     0%   100%          1 12.50%  runtime/pprof.writeGoroutine
         0     0%   100%          1 12.50%  runtime/pprof.writeRuntimeProfile
         0     0%   100%          1 12.50%  time.Sleep

base 基准分析
-base 对比两个协程样本文件中的协程数量，排查协程是否泄露。
go tool pprof -base 样本文件基准文件

协程栈样本采集
协程栈样本数据关注的是当前协程数量，以及大部分协程在执行的函数，因此每次获取协程栈样本会启动STW获取当前所有协程的快照。

pprof协程栈样本收集流程

事件追踪 trace

trace 提供了追踪运行时的完整事件和宏观视野，但是查看协程内部函数占用的CPU时间、内存分配等信息还是要结合pprof使用。
pprof 实现样本采集内部是调用 trace 实现的，trace 在初始阶段需要 STW 然后获取协程的快照、状态、栈帧信息，接着设置trace.enable=true 开启 GC ，最后重新启动所有协程。
Go 在编译时源码中重要的事件都加入trace.enable 的判断是否追踪事件，开启trace时会触发traceEvent进行追踪记录事件。追踪记录的事件会使用LEB128编码缓存到每个逻辑处理器(p.tracebuf)中，当逻辑处理器缓存被存到上限就会把缓存的数据转移到全局trace缓存链表中，trace 工具会有一个专门用于读取全局trace缓存链表的数据，访问全局trace缓存链表是有互斥锁的，在转移逻辑处理器的缓存数据到全局trace链表时读取协程会进入堵塞，全局trace的数据会被序列化写到文件中，文件中存的数据是JSON格式的。
trace 追踪时限到后会结束 trace 任务，程序会再次进入STW状态，刷新逻辑处理器上的tracebuf缓存，设置trace.enable=false,完成了整个 trace 样本收集周期。

trace 追踪指定时间内程序发生的事件的完整信息
协程创建、开始和结束。
协程堵塞（系统调用、通道、锁）。
网络I/O事件。
系统调用事件。
垃圾回收。

trace 样本采集

trace 样本收集有 runtime/trace 和 net/http/pprof 两种方式

runtime/trace 采集
trace.Start(traceFile)
defer trace.Stop()

package main

import (
	"fmt"
	"os"
	"runtime"
	"runtime/trace"
	"strconv"
	"strings"
)

func main() {
	traceFile, _ := os.OpenFile("traceFile", os.O_CREATE|os.O_RDWR, 0644)
	trace.Start(traceFile)
	defer trace.Stop()
	ch := make(chan int,1)
	for i:=0;i<10;i++{
		go testA(ch)
	}
	for i:=0;i<3000;i++ {
		ch <- 1
	}
	close(ch)
}

func testA(ch <- chan int){
	id,_ := GetGoID()
	fmt.Printf("当前协程编号%d\n",id)
	for {
		if nil == ch {
			return
		}
		_,ok := <- ch
		if !ok {
			ch = nil
			return
		}
		buf := make([]byte,1024)
		buf[0] = 1
	}
}

//	GetGoID 读取当前协程编号
//	int 协程编号
//	error	不为nil则获取协程编号失败
func GetGoID() (int,error){
	var (
		buf [64]byte
		n = runtime.Stack(buf[:],false)
		stk = strings.TrimPrefix(string(buf[:n]),"goroutine")
	)
	idField := strings.Fields(stk)[0]
	id,err := strconv.Atoi(idField)
	if nil != err {
		return 0,err
	}
	return id,nil
}

获取到的协程编号，样本文件名称 traceFile。

当前协程编号11
当前协程编号9
当前协程编号8
当前协程编号7
当前协程编号10

net/http/pprof 采集
要引入 _ "net/http/pprof" 执行 init() 函数注册路由。
采集30秒数据 http://localhost:8081/debug/pprof/trace?seconds=30

trace 分析

分析工具 go tool trace 文件
go tool trace C:\Users\laoniqiu\Downloads\traceFile
会使用浏览器打开HTML页面

页面超链接
View trace ：查看跟踪
Goroutine analysis ： Goroutine 分析
Network blocking profile (⬇) ：网络阻塞配置文件( ⬇ )
Synchronization blocking profile (⬇) ：同步阻塞配置文件( ⬇ )
Syscall blocking profile (⬇) ： Syscall 阻塞配置文件( ⬇ )
Scheduler latency profile (⬇) ：调度程序延迟配置文件( ⬇ )
User-defined tasks ：用户定义的任务
User-defined regions ：用户定义的区域
Minimum mutator utilization ：最小 mutator 利用率

View trace ：查看跟踪；详解
第1行：时间线。
第2行 Goroutines：显示每个时间点正在运行的协程数量及等待调度的协程数量。存在大量等待调度的协程表明调度器繁忙。
第3行 Heap：执行期间的内存分配情况，检查内存泄露和每次运行时GC释放的内存非常有用。
第4行 Threads：正在使用的操作系统线程数和被系统阻止的线程数。
第5部分 Proc：逻辑处理器。
第6部分：对应第5部分的逻辑处理器下协程的信息，协程开始、结束的详细数据。

第6部分参数
Title：协程的名字，如 G11 main.testA 。
Start：协程开始时间。
Wall Duration：协程持续时间。
Start Stack Trace：协程开始时的栈追踪。
End Stack Trace：协程结束时的栈追踪。
Event：协程产生的事件信息，点击事件的 Link 能够得到更详细的图表。

View trace

trace 分析使用场景

协程延迟执行分析
程序中有重要协程长时间无法运行就会带来延迟问题，系统调用堵塞、通道堵塞、互斥锁堵塞、被GC堵塞或者调度器没有按照预期的频率运行等都可能造成协程执行延迟。
通过 trace 查看逻辑处理器的时间线查找重要协程长时间被阻塞的时间段，查看这个时间段内发生的事件，在大部分情况下都可以找到延迟执行的原因。

诊断不良并行
程序正常情况保持适当的协程数量和并行率，如果预期会使用 90% CPU 的运行速度比预期的慢，就有可能是因为程序没有按照预期并行，可以查找程序中的关键路径是否有并发，如果没有并发则查看是否可以让这些关键路径并发提高效率。

Previous24 垃圾回收 Next26 程序资源监控

Last updated 3 years ago

hashtag调试

hashtag测试用例 go test

hashtag特征分析 pprof

hashtag事件追踪 trace

调试

测试用例 go test

特征分析 pprof

事件追踪 trace