nginx和php响应时间过长502排查

说明:
web页面响应时间长,要怎么排查?
9
因为是lnmp系统,可以通过设置nginx日志,记录nginx处理请求的时间、开启php慢执行来排查

解决:
一、修改nginx.conf日志格式,记录nginx响应时间
# vim /usr/local/nginx/conf/nginx.conf //添加$request_time $upstream_response_time参数

log_format  access  ‘$remote_addr – $remote_user [$time_local] “$request” ‘
‘$request_time $upstream_response_time ‘
‘$status $body_bytes_sent “$http_referer” ‘
‘”$http_user_agent” $http_x_forwarded_for’;

$request_time: request processing time in seconds with a milliseconds resolution;time elapsed between the first bytes were read from the client and the log write after the last bytes were sent to the client$request_time。nginx处理请求的时间,指的就是从接受用户请求数据到发送完回复数据的时间。

$upstream_response_time: keeps servers response times in seconds with a milliseconds resolution. Several responses are also separated by commas and colons. $upstream_response_timephp-cgi的响应时间,说的有点模糊,它指的是从Nginx向后端建立连接开始,到接受完数据然后关闭连接为止的时间。因为会有重试,它可能有多个时间段。一般来说,$upstream_response_time 会比$request_time时间短。(其实也可以加上upstream_status的状态返回值)

截取部份日志,可以看到客户端获取test.php。nginx共花费了5.308s(这个时间包括了php后端处理的时间),php后端处理也花费了5.308s。这说明响应慢很有可能是因为php程序的原因。接下就就是排查php
$request_time时间比$upstream_response_time长,这有可能是因为web页面通过post上传较大的数据,nginx一直在接收数据。

8.8.8.8 – – [17/Jan/2014:16:20:51 +0800] “GET /index.php HTTP/1.1” 5.308 5.308 200 6364 “http://coolnull.com/test.php” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)” –

二、查看php-fpm慢查询日志。php-fpm提供了慢执行日志,可以将执行比较慢的脚本的调用过程dump到日志中。
配置比较简单,PHP 5.3.3 之前设置如下:

The timeout (in seconds) for serving of single request after which a php backtrace will be dumped to slow.log file
‘0s’ means ‘off’
<value name=”request_slowlog_timeout”>1s</value>

The log file for slow requests
<value name=”slowlog”>logs/slow.logs</value>

PHP 5.3.3 之后设置以下如下:

; The log file for slow requests
; Default Value: not set
; Note: slowlog is mandatory if request_slowlog_timeout is set
;slowlog = log/$pool.log.slow
slowlog = /data/logs/php/php-fpm.log.slow

; The timeout for serving a single request after which a PHP backtrace will be
; dumped to the ‘slowlog’ file. A value of ‘0s’ means ‘off’.
; Available units: s(econds)(default), m(inutes), h(ours), or d(ays)
; Default Value: 0
;request_slowlog_timeout = 0
request_slowlog_timeout = 3s

还可以将执行时间太长的进程直接终止,设置下执行超时时间即可。

PHP 5.3.3 之前版本:
The timeout (in seconds) for serving a single request after which the worker process will be terminated
Should be used when ‘max_execution_time’ ini option does not stop script execution for some reason
‘0s’ means ‘off’
<value name=”request_terminate_timeout”>10s</value>

PHP 5.3.3 之后设置以下如下:

;The timeout for serving a single request after which the worker process will ; be killed.This option should be used when the ‘max_execution_time’ ini option ; does not stop script execution for some reason. A value of ‘0’ means ‘off’.;Available units: s(econds)(default), m(inutes), h(ours),or d(ays);DefaultValue:0 request_terminate_timeout =10s

加上慢执行日志后,基本可以从慢执行日志中看出问题所在,比如:

Feb 07 19:00:30.378095 pid 7012 (pool default)
script_filename = /www/test.php
[0x000000000115ea08] flock() /www/logs.php:26
[0x0000000001159810] lock_stats() /www/test.php:12

Feb 07 19:00:31.033073 pid 7038 (pool default)
script_filename = /www/test.php
[0x00000000012686e8] flock() /www/logs.php:26
[0x00000000012634f0] lock_stats() /www/test.php:12

很明显是程序中产生了死锁,导致各个 PHP-CGI 进程互相等待资源而锁死。据此,再进行进一步的程序分析,就更具方向性了。有时候php-fpm慢执行日志只会给出执行长的php程序,具体还得程序那边配合排查。

此条目发表在开源代码分类目录,贴了, 标签。将固定链接加入收藏夹。

发表回复