Introduction
The
NEWSEQUENTIALID
system function is an addition to SQL Server 2005. It seeks to bring together, what used to be, conflicting requirements in SQL Server 2000; namely identity-level insert performance, and globally unique values.
Background
GUIDs (Globally Unique Identifiers)
are pseudo-random 16-byte (128-bit) numbers whose generation algorithm provides a sufficiently high degree of probability that the same GUID will never be generated twice on any computer at any time. Although a GUID is a binary array, it is often represented in its hexadecimal form using the following format, for example: dbbc2827-edf8-4a2d-92ad-c1e0059304d7. This makes them suitable for the 'distributed uniqueness requirement' of our Orders table.
The problem would be pretty much solved there unless we have some non-trivial performance requirements, especially around insert volumes and speed. As GUIDs are effectively pseudo-random (the degree of randomness / predictability was changed by Microsoft in response to privacy concerns), they do not possess the same orderly insert benefits as identity columns when used as the clustering key. The insert IO pattern changes from sequential to random-access when using GUIDs instead of an
IDENTITY
column. Not only do GUIDs require far more IO to locate the insert location in a clustered index (the index has to be traversed from the root for every insert), fragmentation at the leaf level leads to less dense pages, which in turn requires defragmentation / reindexing.
Using the Code
NEWSEQUENTIALID()
Enter
NEWSEQUENTIALID()
, stage left. This is a new system function included in SQL Server 2005 that combines the best of both worlds.
NEWSEQUENTIALID()
will generate a GUID that will be greater in value than the previously generated one. It is easy to see how this function works by looking at the output the following TSQL script generates:
create
table
TestTable (
id
uniqueidentifier
default
newsequentialid()
not
null
primary
key
clustered
,
sequence
int
);
declare
@count
int
;
set
@count
=
0
;
while
@count
<
100
begin
insert
TestTable (
sequence
)
values
(
@count
);
set
@count
=
@count
+
1
;
end
;
select
*
from
TestTable
order
by
id;
Output:
id sequence
------------------------------------ -----------
FA780E3B-03C2-DB11-BD9F-0011D82F3F23 0
FB780E3B-03C2-DB11-BD9F-0011D82F3F23 1
FC780E3B-03C2-DB11-BD9F-0011D82F3F23 2
FD780E3B-03C2-DB11-BD9F-0011D82F3F23 3
FE780E3B-03C2-DB11-BD9F-0011D82F3F23 4
5C790E3B-03C2-DB11-BD9F-0011D82F3F23 98
5D790E3B-03C2-DB11-BD9F-0011D82F3F23 99
You'll notice that there is a clear bit-shifting pattern that SQL Server is employing to keep the GUIDs increasing in relative value. These are the restrictions in
NEWSEQUENTIALID
's use:
NEWSEQUENTIALID
cannot be used in arbitrary TSQL statements. It can only be specified as the
DEFAULT
value for an unique identifier column. More than one column in a table can use
NEWSEQUENTIALID
. It cannot be combined with other scalar functions, e.g.,
REVERSE(NEWSEQUENTIALID())
is not allowed. The network card's MAC address is easily identifiable in the generated GUID. My laptop's MAC address is 00-01-4A-28-64-8B. This is one of the
NEWSEQUENTIALID
GUIDs that was generated:
4EAC7708-30C3-DB11-B902-00014A28648B
. Hence the privacy warnings in BOL: Correlating the position of the MAC address in the GUID with the bits that get incremented / shifted for each successive GUID sheds light on why the GUIDs remain globally unique. The MAC address portion of the GUID does not change.
Performance Comparison
The following test demonstrates the benefits of
NEWSEQUENTIALID
. The insert performance of the various clustering key variations was measured and compared. For each test, the same
TestTable
was recreated with a different Id column definition:
IDENTITY()
NEWID()
NEWSEQUENTIALID()
The following TSQL contains the DDL used to construct the tables for the three clustered keys:
create
table
TestTable (
id
int
identity
(
1
,
1
)
not
null
primary
key
clustered
,
sequence
int
not
null
,
data
char
(
250
)
not
null
default
'
'
);
create
table
TestTable (
id
uniqueidentifier
default
newid()
not
null
primary
key
clustered
,
sequence
int
not
null
,
data
char
(
250
)
not
null
default
'
'
);
create
table
TestTable (
id
uniqueidentifier
default
newsequentialid()
not
null
primary
key
clustered
,
sequence
int
not
null
,
data
char
(
250
)
not
null
default
'
'
);
For each test, 50,000 rows were inserted into the TestTable table using the following TSQL script:
declare
@count
int
;
set
@count
=
0
;
while
@count
<
50000
begin
insert
TestTable (
sequence
)
values
(
@count
);
set
@count
=
@count
+
1
;
end
;
The following metrics were gathered after each run:
select
reads, writes
from
sys.dm_exec_sessions
where
session_id =
@@spid
;
select
index_type_desc, index_depth, page_count,
avg_page_space_used_in_percent, avg_fragmentation_in_percent, record_count
from
sys.dm_db_index_physical_stats(db_id(),
object_id(
'
TestTable'
),
null
,
null
,
'
detailed'
)
where
index_level =
0
;
The results are quite compelling:
|
Reads
|
Writes
|
Leaf Pages
|
Avg Page Used
|
Avg Fragmentation
|
Record Count
|
IDENTITY()
|
0
|
1,683
|
1,667
|
98.9%
|
0.7%
|
50,000
|
NEWID()
|
0
|
5,386
|
2,486
|
69.3%
|
99.2%
|
50,000
|
NEWSEQUENTIALID()
|
0
|
1,746
|
1,725
|
99.9%
|
1.0%
|
50,000
|
Conclusion
The most striking is the number of writes required by the
NEWID
system function. This, coupled with the average page density of
69%
, is evidence of the page splitting caused by the random distribution of inserts at the leaf level. As soon as a page fills up, it needs to be split into two pages of 50% each for the insert to complete. Not only has page splitting resulted in poor page density, it has fragmented the data pages quite badly (there is a
99%
probability that the next data page is not next to the current one). In our tests, the most likely place for a free page required for the page split is at the end of the table irrespective of where the row is being inserted. Therefore, to read the rows in order, the scan needs to keep jumping back and forth between widely distributed split pages, hence the appalling fragmentation. The minor difference in page count between
IDENTITY
and
NEWSEQUENTIALID
metrics is due to the difference in size between
IDENTITY
's
INT
(4 bytes) and
NEWSEQUENTIALID
's
UNQUEIDENTIFIER
(16 bytes). However, note that any non-clustered index on the
UNIQUEIDENTIFIER
table will be four times larger than the same index on the
IDENTITY
table. Therefore,
IDENTITY
is still preferred over
NEWSEQUENTIALID
when choosing between the two. The
NEWSEQUENTIALID
system function has clearly lived up to its claim of providing GUID-like uniqueness coupled with identity-level insert performance. The number of writes, fragmentation, and page density are all inline with identity-level metrics. These benefits make
NEWSEQUENTIALID
a compelling feature.
I wrote this article in Portuguese in 2005 (www.agileware.com.br), translate it into English in 2007, the same year the creation of this company that ripped me off, thank you!
Sign in
·
View Thread
Basically the same as Zippy said. The comparison isn't quite adequate since both identity and newsequentalid are constantly growing identifications. There's not so much need for page splitting when they are used in your test. Since newid instead is more random. I'd say that's the main reason for your results (fragmentation and writes).
Could you add results using nonclustered indexes. That would give idea about the effectivenes in different situations.
Also I think you should take the b-tree height into consideration since I guess it may act differently. After all this could also have a performance impact.
The need to optimize rises from a bad design.
My articles
[
^
]
Sign in
·
View Thread
In the next revision of the article, comparing nonclustered indexes.
Before SQL Server 2005 I had major problems with the use of GUID failed to create clustered indexes for him.
Sign in
·
View Thread
What about nonclustered indexes, or the clustering being on another table?
zippy1981
16-Jan-09 8:52
zippy1981
16-Jan-09 8:52
One other option is to have a GUID Primary key on a nonclustered index. Not all tables need clustered indexes.
For a fair comparison, the same tests should be tried with nonclustered primary key in all three cases. Also, some read tests should be performed as well.
Sign in
·
View Thread
Re: What about nonclustered indexes, or the clustering being on another table?
Alfredo_Fernandez
16-Jan-09 9:30
Alfredo_Fernandez
16-Jan-09 9:30
My comparison is aimed to show that the new type of data, NEWSEQUENTIALID () revolve the problem that had to clusted a guid.
Tank's...
Sign in
·
View Thread
Web01
2.8:2024-06-13:1