我们这里的数据库每天utc19:30开始将MHAPP表空间中7天以前的数据挪至MHHISTORY表空间中,将MHHISTORY表空间中90天之前的数据删除,同时每天把昨天的数据export到本地目录。
每天晚上19:30开始,数据库出现锁升级,数据库参数采用的都是默认参数,如下:
$ db2 get db cfg|grep 'LOCK'
Max storage for lock list (4KB) (LOCKLIST) = AUTOMATIC(777376)
Percent. of lock lists per application (MAXLOCKS) = AUTOMATIC(66)
Lock timeout (sec) (LOCKTIMEOUT) = -1
Block non logged operations (BLOCKNONLOGGED) = NO
Lock timeout events (MON_LOCKTIMEOUT) = NONE
Deadlock events (MON_DEADLOCK) = WITHOUT_HIST
Lock wait events (MON_LOCKWAIT) = NONE
$ db2 get db cfg|grep MAXLOCKS
Percent. of lock lists per application (MAXLOCKS) = AUTOMATIC(66)
$ db2 get db cfg|grep MAX
Percent. of lock lists per application (MAXLOCKS) = AUTOMATIC(66)
Max number of active applications (MAXAPPLS) = AUTOMATIC(323)
Max DB files open per application (MAXFILOP) = 61440
Percent max primary log space by transaction (MAX_LOG) = 0
所升级开始后,holded的锁和target number of locks 总维持在50%,然后ecaliscation 的锁数量又会比holed的锁数量少6个,请问这是什么原因呢?
2018-07-27-19.33.13.266796 Instance:db2inst1 Node:000
PID:6553838(db2agent (ZBBB) 0) TID:20887 Appid:172.20.2.2.34731.180815082849
data management sqldEscalateLocks Probe:1 Database:ZBBB
ADM5501I DB2 is performing lock escalation. The affected application is named
"db2jcc_application", and is associated with the workload name
"SYSDEFAULTUSERWORKLOAD" and application ID "172.20.2.2.34731.180815082849" at
member "0". The total number of locks currently held is "10098408", and the
target number of locks to hold is "5049204". The current statement being
executed is "delete from (select reftime,row_number() over(order by reftime) rn
from MHHISTORY.BULL01_ALL) where reftime<'2018-4-28 00:00:00 ' and
rn<=20000". Reason code: "1"
^^
2018-07-27-19.33.13.279160 Instance:db2inst1 Node:000
PID:6553838(db2agent (ZBBB) 0) TID:20887 Appid:172.20.2.2.34731.180815082849
data management sqldEscalateLocks Probe:3 Database:ZBBB
ADM5502W The escalation of "10098402" locks on table "MHHISTORY.BULL01_ALL" to
lock intent "X" was successful.
^^
2018-07-27-19.33.57.632442 Instance:db2inst1 Node:000
PID:6553838(db2agent (ZBBB) 0) TID:20887 Appid:172.20.2.2.34731.180815082849
data management sqldEscalateLocks Probe:1 Database:ZBBB
ADM5501I DB2 is performing lock escalation. The affected application is named
"db2jcc_application", and is associated with the workload name
"SYSDEFAULTUSERWORKLOAD" and application ID "172.20.2.2.34731.180815082849" at
member "0". The total number of locks currently held is "9893767", and the
target number of locks to hold is "4946883". The current statement being
executed is "delete from (select reftime,row_number() over(order by reftime) rn
from MHHISTORY.BULL01_ALL) where reftime<'2018-4-28 00:00:00 ' and
rn<=20000". Reason code: "1"
^^
2018-07-27-19.33.57.632997 Instance:db2inst1 Node:000
PID:6553838(db2agent (ZBBB) 0) TID:20887 Appid:172.20.2.2.34731.180815082849
data management sqldEscalateLocks Probe:3 Database:ZBBB
ADM5502W The escalation of "9893761" locks on table "MHHISTORY.BULL01_ALL" to
lock intent "X" was successful.
^^
2018-07-27-19.37.46.083092 Instance:db2inst1 Node:000
PID:6553838(db2agent (ZBBB) 0) TID:30034 Appid:172.20.2.2.35793.180815084544
data management sqldEscalateLocks Probe:1 Database:ZBBB
首先DB2的锁升级原理是:在DB2中首先产生的锁一般是行锁,每行会产生一个锁,一个锁会占用一定的内存空间(128字节),如果表的行数很多的时候,占用的锁堆就会越来越大,达到一个阈值时,因为内存空间的限制,就会将行锁变为表锁,因为表锁只需一个或几个就可以(具体情况看一个app需要操作多少表),但表锁肯定比行锁占用的锁堆要小得多。这就是锁升级的概念。
现在来看看你们那边的情况。从日志上来看,是由于delete语句需要的锁数量太多,导致了锁升级。锁升级多少会影响到性能,但影响不大,这个倒不需要太多关注。重点是升级之后并发性降低了,因为由行锁升级为表锁后,其它app很容易出现锁等待,如果等待时间超过了locktimeout定义的时间就会出现锁超时,这个对于高并发的数据库来说是相当致命的。
要解决这个问题有几点建议:
1.将delete操作按条件拆分,就是不要一条语句里就删除大量的数据,可以按照时间条件分成5次或者10次,这样每次操作的锁占用的锁就会降低,这样就不会容易引起锁升级。
2.语句中操作合适的隔离级别,一般建议用 UR隔离级别。
3.使用命令lock table提前获取表锁,再执行delete操作,因为已经提前获取表锁,因为也就不会产生锁 升级。但是要注意这样的并发性会变得很差,其它的连接如果要操作该表需要等待这个表锁的连接执行完并提交,从而释放表锁之后才可以操作,所以这个方法就需要慎重使用了。
所以这里就推荐结合1和2方法来操作。