最近在看 SQL SERVER 2008 查询性能优化
,书中说当一个表创建了聚集索引,那么表中的行会按照主键索引的顺序物理排列,这里有一个关键词叫:物理排列
,如果不了解底层原理,真的会被忽悠过去,其实仔细想一想不可能实现严格的 物理排列
,那对性能是非常大的损害,本篇我们就从底层出发聊一聊到底是怎么回事。
如果用 C# 代码来演示严格的物理排列,大概是这样的。
static void Main(string[] args) { List<int> list = new List<int>() {1,2,4,5 }; list.Insert(2, 3); Console.WriteLine(string.Join(",", list)); }
从代码看我用 Insert
将 3 插入到了 list 集合中形成了物理有序,但不要忘了 Insert
的复杂度是 O(N),而且还要将 3 后面的数据整体挪动,可以参考源码中的 Array.Copy
方法。
public void Insert(int index, T item) { if (_size == _items.Length) { EnsureCapacity(_size + 1); } if (index < _size) { Array.Copy(_items, index, _items, index + 1, _size - index); } _items[index] = item; _size++; _version++; }
现在你可以想一想,如果我们每次在 Insert 的时候 SQLSERVER 都要将数据页上的数据往后挪,那这个性能有多差?
为了方便讲述,先创建一个测试表,插入 4 条记录,再创建一个聚集索引,sql 代码如下:
IF OBJECT_ID('t') IS NOT NULL DROP TABLE t; CREATE TABLE t (a CHAR(5), b INT) INSERT INTO t(a,b) VALUES('aaaaa',1); INSERT INTO t(a,b) VALUES('ddddd',4); INSERT INTO t(a,b) VALUES('ccccc',3); INSERT INTO t(a,b) VALUES('eeeee',5); CREATE CLUSTERED INDEX idx_a ON t(a);
数据果然是有序的,严格的按照 a , c, d , e
排序,接下来用 dbcc 观察下在底层数据页上这几条记录是不是物理有序的? 查询 SQL 如下:
DBCC TRACEON(3604) DBCC IND(MyTestDB,t,-1) DBCC PAGE(MyTestDB,1,472,2)
Page数据页的输出结果如下:
PAGE: (1:472) PAGE HEADER: Page @0x000002C6E75D0000 m_pageId = (1:472) m_headerVersion = 1 m_type = 1 m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x4 m_objId (AllocUnitId.idObj) = 269 m_indexId (AllocUnitId.idInd) = 256 Metadata: AllocUnitId = 72057594055557120 Metadata: PartitionId = 72057594048348160 Metadata: IndexId = 1 Metadata: ObjectId = 850102069 m_prevPage = (0:0) m_nextPage = (0:0) pminlen = 13 m_slotCnt = 4 m_freeCnt = 8024 m_freeData = 160 m_reservedCnt = 0 m_lsn = (49:1616:23) m_xactReserved = 0 m_xdesId = (0:0) m_ghostRecCnt = 0 m_tornBits = 0 DB Frag ID = 1 Allocation Status GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED PFS (1:1) = 0x40 ALLOCATED 0_PCT_FULL DIFF (1:6) = CHANGED ML (1:7) = NOT MIN_LOGGED DATA: Memory Dump @0x000000DF137F8000 000000DF137F8000: 01010000 04000001 00000000 00000d00 00000000 .................... 000000DF137F8014: 00000400 0d010000 581fa000 d8010000 01000000 ........X........... 000000DF137F8028: 31000000 50060000 17000000 00000000 00000000 1...P............... 000000DF137F803C: 00000000 01000000 00000000 00000000 00000000 .................... 000000DF137F8050: 00000000 00000000 00000000 00000000 10000d00 .................... 000000DF137F8064: 61616161 61010000 00030000 10000d00 63636363 aaaaa...........cccc 000000DF137F8078: 63030000 00030000 10000d00 64646464 64040000 c...........ddddd... 000000DF137F808C: 00030000 10000d00 65656565 65050000 00030000 ........eeeee....... 000000DF137F80A0: 00002121 21212121 21212121 21212121 21212121 ..!!!!!!!!!!!!!!!!!! ...
从 Memory Dump
区节的内存地址看,这四条记录果然是有序的,
接下来就是关键了,到底是不是物理有序,我们再插入一条 bbbbb
记录,看下会不会将 ccccc
所在的内存地址上的内容整体往后挪?测试的 sql 语句如下:
INSERT INTO t(a,b) VALUES('bbbbb',2); SELECT * FROM t;
貌似真的给塞进去了,那到底是不是这样呢? 带着好奇心再次观察下底层的索引数据页
。
PAGE: (1:472) PAGE HEADER: Page @0x000002C6D76C4000 m_pageId = (1:472) m_headerVersion = 1 m_type = 1 m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x0 m_objId (AllocUnitId.idObj) = 269 m_indexId (AllocUnitId.idInd) = 256 Metadata: AllocUnitId = 72057594055557120 Metadata: PartitionId = 72057594048348160 Metadata: IndexId = 1 Metadata: ObjectId = 850102069 m_prevPage = (0:0) m_nextPage = (0:0) pminlen = 13 m_slotCnt = 5 m_freeCnt = 8006 m_freeData = 176 m_reservedCnt = 0 m_lsn = (49:1640:2) m_xactReserved = 0 m_xdesId = (0:0) m_ghostRecCnt = 0 m_tornBits = 487522741 DB Frag ID = 1 Allocation Status GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED PFS (1:1) = 0x40 ALLOCATED 0_PCT_FULL DIFF (1:6) = CHANGED ML (1:7) = NOT MIN_LOGGED DATA: Memory Dump @0x000000DF0FDF8000 000000DF0FDF8000: 01010000 00000001 00000000 00000d00 00000000 .................... 000000DF0FDF8014: 00000500 0d010000 461fb000 d8010000 01000000 ........F........... 000000DF0FDF8028: 31000000 68060000 02000000 00000000 00000000 1...h............... 000000DF0FDF803C: b5010f1d 01000000 00000000 00000000 00000000 .................... 000000DF0FDF8050: 00000000 00000000 00000000 00000000 10000d00 .................... 000000DF0FDF8064: 61616161 61010000 00030000 10000d00 63636363 aaaaa...........cccc 000000DF0FDF8078: 63030000 00030000 10000d00 64646464 64040000 c...........ddddd... 000000DF0FDF808C: 00030000 10000d00 65656565 65050000 00030000 ........eeeee....... 000000DF0FDF80A0: 10000d00 62626262 62020000 00030000 00002121 ....bbbbb.........!! 000000DF0FDF80B4: 21212121 21212121 21212121 21212121 21212121 !!!!!!!!!!!!!!!!!!!! ... 000000DF0FDF9FF4: 21219000 80007000 a0006000 !!....p...`. OFFSET TABLE: Row - Offset 4 (0x4) - 144 (0x90) 3 (0x3) - 128 (0x80) 2 (0x2) - 112 (0x70) 1 (0x1) - 160 (0xa0) 0 (0x0) - 96 (0x60)
从 Memory Dump
节的内存地址看,bbbbb
并没有插入到 aaaaa 和 cccccc 之间,而是写入到页面尾部的空闲空间中,接下来就有一个问题了,为什么 sql 输出中是有序的呢?怎么做到的? 如果你了解 Page 的 Slot 布局,你会发现 Slot1
指向的就是 bbbbb
这条记录的首地址,画一张图就是这样。
从图中我们就明白了最终的原理,当 Insert 时,SQLSERVER 并没有对表记录重排,而只是将指向的 Slot 槽位进行了重排,将物理无序做成了一种逻辑有序。
其实大家只要往高性能上想,肯定不会实现物理有序的,太伤性能了,在 物理无序
上抽象出一层 逻辑有序
不失为一种好办法。