在Aurora DSQL中使用标识列和序列
Source: AWS - Databases
Amazon Aurora DSQL now supports PostgreSQL-compatible identity columns and sequence objects, so developers can generate unique integer identifiers with configurable performance characteristics optimized for distributed workloads.
In distributed database environments, generating unique, sequential identifiers is a fundamental challenge: coordinating across multiple nodes creates performance bottlenecks, especially under high concurrency workloads. In this post, we show you how to create and manage identity columns for auto-incrementing IDs, selecting between identity columns and standalone sequence objects, and improving cache settings while choosing between UUIDs and integer sequences for your workload requirements.
Solution overview
Aurora DSQL implements identity columns and sequence objects as distributed coordination primitives that generate unique integer values across the cluster. Identity columns provide streamlined syntax for the common auto-incrementing ID use case, while standalone sequences offer advanced control for complex scenarios.
Prerequisites
You must have the following prerequisites to follow along with this post.
AWS account requirements:
- Active AWS account with Aurora DSQL cluster provisioned
- Aurora DSQL cluster running
- Aurora DSQL requires all connections to use Transport Layer Security (TLS) encryption. To establish secure connections, your client system must trust the Amazon Root Certificate Authority (Amazon Root CA 1). This certificate is pre-installed on many operating systems.
Required permissions:
CREATEpermission on schema for creating sequences and tablesUSAGEorUPDATEpermission on sequences for generating valuesALTERpermission for modifying existing sequences or identity columns
Tools and software needed:
- PostgreSQL-compatible client (psql, DBeaver, or application database driver)
- SQL client configured to connect to Aurora DSQL endpoint
Create a table with an identity column
Selecting an appropriate cache value is an important part of using sequences and identity columns effectively. The cache setting determines how identifier allocation behaves under load, influencing both system throughput and how closely values reflect allocation order. Higher cache values allow for higher throughput by serving sequence numbers from locally pre-allocated ranges, but this efficiency comes with a trade-off: any reserved values that go unused are lost, making gaps and out-of-order allocation more apparent.
To maximize achievable throughput under concurrency, use a larger cache size, for example:
CREATE TABLE orders (
order_id BIGINT GENERATED ALWAYS AS IDENTITY
(CACHE 70000),
customer_id BIGINT NOT NULL,
order_date DATE DEFAULT CURRENT_DATE,
total_amount DECIMAL(10,2)
);
To minimize gaps in the sequence ordering, use a cache size of 1, for example:
CREATE TABLE invoice_numbers (
invoice_id BIGINT GENERATED ALWAYS AS IDENTITY
(CACHE 1),
invoice_date DATE NOT NULL,
customer_name TEXT
);
Key configuration decisions:
- GENERATED ALWAYS: System controls all values; users can’t override
- GENERATED BY DEFAULT: Users can provide explicit value when needed
- CACHE 70000: Optimized for high-volume concurrent inserts
- CACHE 1: Closer to sequential ordering, lower throughput
Inserting data using identity columns
In this section, we show some examples of how identity columns work during INSERT statements.
Automatic value generation example
INSERT INTO orders (customer_id, total_amount)
VALUES (12345, 299.99);
Multiple inserts example
INSERT INTO orders (customer_id, total_amount)
VALUES
(12346, 149.50),
(12347, 599.00),
(12348, 89.99);
Using the DEFAULT keyword explicitly example
Alternatively, the keyword DEFAULT can be specified in place of a value to explicitly request the sequence-generated value.
INSERT INTO orders
(order_id, customer_id, total_amount)
VALUES (DEFAULT, 12349, 449.99);
Overriding GENERATED ALWAYS (when necessary) example
Only works with OVERRIDING SYSTEM VALUE clause.
INSERT INTO orders
(order_id, customer_id, total_amount)
OVERRIDING SYSTEM VALUE
VALUES (999999, 12350, 1299.99);
Create and use standalone SEQUENCE objects
While identity columns provide a convenient built-in solution for auto-incrementing values, there are situations where creating standalone sequences offers greater flexibility and control. Standalone sequences are particularly useful in three scenarios:
- When you need to share a single sequence across multiple tables to maintain a unified numbering system
- When you require ID generation outside of INSERT operations (such as pre-generating identifiers for batch processing or external systems)
- When you have advanced sequence manipulation requirements that go beyond auto-increment functionality
Create a sequence with a large cache for high concurrency example
CREATE TABLE customers (
customer_id BIGINT GENERATED ALWAYS AS IDENTITY
(CACHE 65536) PRIMARY KEY,
customer_name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE SEQUENCE customer_seq
START WITH 1000
INCREMENT BY 1
CACHE 100000;
Use Sequence in INSERT example
INSERT INTO customers
(customer_id, customer_name, email)
OVERRIDING SYSTEM VALUE
VALUES (
nextval('customer_seq'),
'Jane Smith',
'jane@example.com'
);
Generate value in application logic example
SELECT nextval('customer_seq');
Query sequence and identity column state
DSQL provides several system views and functions to monitor sequence behavior and inspect their current state. You can check sequence parameters, view current values, and examine identity column configurations to understand how your auto-increment mechanisms are functioning.
View current sequence values, as an example
Check sequence parameters and current state.
SELECT * FROM pg_sequences WHERE sequencename = 'customer_seq';
schemaname | sequencename | sequenceowner | data_type | start_value | min_value | max_value | increment_by | cycle | cache_size | last_value
------------+--------------+---------------+-----------+-------------+-----------+---------------------+--------------+-------+------------+------------
public | customer_seq | admin | bigint | 1 | 1 | 9223372036854775807 | 10 | f | 200000 | 2009990
Get current value in this session
First get the nextval and then get currval.
SELECT nextval('customer_seq');
nextval
---------
2010000
(1 row)
SELECT currval('customer_seq');
currval
---------
2010000
(1 row)
View all sequences.
SELECT schemaname, sequencename,
last_value, cache_size
FROM pg_sequences;
schemaname | sequencename | last_value | cache_size
------------+------------------------------------+------------+------------
public | customers_customer_id_seq | | 65536
public | orders_order_id_seq | 419999 | 150000
public | customer_seq | 4009990 | 200000
public | event_log_event_id_seq | | 100000
public | customer_accounts_account_number_seq | | 1
public | order_display_seq | | 70000
public | customer_new_seq | | 100000
Inspect identity column properties
View identity column configuration
SELECT table_name, column_name,
is_identity, identity_generation
FROM information_schema.columns
WHERE is_identity = 'YES';
Modify identity columns and sequences
After creating sequences or identity columns, you might need to adjust their behavior to meet changing requirements. With Aurora DSQL, you can modify various sequence properties including generation behavior, starting values, cache sizes, and increment steps.
You can use the ALTER TABLE/ALTER COLUMN commands to modify identity columns or ALTER SEQUENCE for standalone sequences.
Change identity column behavior example:
Switch from ALWAYS to BY DEFAULT to allow manual ID insertion during data migration or bulk imports.
ALTER TABLE orders
ALTER COLUMN order_id SET GENERATED BY DEFAULT;
Restart identity sequence at a specific value example
Reset the sequence after data cleanup or to reserve ID ranges for different purposes (for example, starting production IDs at 50000 after test data).
ALTER TABLE orders
ALTER COLUMN order_id RESTART WITH 50000;
Modify cache size of underlying sequence example
Increase cache size to improve INSERT performance in high-throughput applications by reducing sequence-related I/O operations.
ALTER TABLE orders ALTER COLUMN order_id SET CACHE 150000;
Modify standalone sequences examples
Change cache size to optimize performance for frequently accessed sequences in write-heavy workloads.
Change cache size
ALTER SEQUENCE customer_seq CACHE 200000;
Restart sequence
Restart sequence after data archival or to align numbering with business requirements (for example, starting the new fiscal year at 10000).
ALTER SEQUENCE customer_seq RESTART WITH 10000;
Change increment
Change increment to create gaps between IDs for organizational purposes (for example, reserving ranges for different regions or departments).
ALTER SEQUENCE customer_seq INCREMENT BY 10;
Validation and testing
After configuring identity columns and sequences, it’s important to verify that they’re generating values as expected. You can test the automatic ID generation by inserting records and using the RETURNING clause to see the generated values immediately.
Verify identity column behavior
Insert test records and observe generated IDs to confirm the sequence is working correctly and starting from the expected value.
INSERT INTO orders (customer_id, total_amount)
VALUES (99999, 100.00)
RETURNING order_id;
Verify sequential generation within session to ensure IDs increment properly with each subsequent insert.
INSERT INTO orders (customer_id, total_amount)
VALUES (99999, 200.00)
RETURNING order_id;
Test concurrent behavior (requires multiple sessions) examples
When multiple database sessions insert records simultaneously, sequence caching can affect the order in which IDs are generated. This example simulates concurrent inserts from two different sessions to demonstrate how cache settings impact ID generation patterns. Understanding this behavior is crucial for applications that require predictable ID ordering or need to optimize for high-concurrency workloads.
Session 1:
BEGIN;
INSERT INTO orders (customer_id, total_amount)
VALUES (88888, 150.00)
RETURNING order_id;
Take note of the returned ID.
order_id
----------
420000
Session 2 (simultaneously):
BEGIN;
INSERT INTO orders (customer_id, total_amount)
VALUES (77777, 250.00)
RETURNING order_id;
Compare with Session 1 ID.
order_id
----------
570000
Expected behavior (with CACHE >= 65536):
Sequence values are unique but might not be generated in strictly increasing order across sessions, and gaps can occur when cached values go unconsumed. Consistent with PostgreSQL semantics for cached sequences under concurrent use, where both systems ensure distinct values without providing sequential ordering.
Expected behavior (with CACHE =1):
Sequence values follow allocation order more closely over time.
Consistent with PostgreSQL’s non-cached sequence behavior. Each session fetches one value at a time, so cross-session ordering is more predictable. To preserve strictly sequential ID vending you must set CACHE = 1 and concurrent requests must be avoided.
Implement workload-specific patterns
The recommended cache size for sequences depends on your specific application requirements. High-throughput systems benefit from larger caches to reduce database round-trips, while applications requiring strict sequential ordering need minimal or no caching. This section demonstrates three common patterns with different cache strategies.
High-volume event logging (CACHE >= 65536) example
Event logging systems prioritize write performance over sequential ordering because events are typically queried by timestamp rather than ID. A large cache size minimizes sequence-related overhead during burst traffic periods.
CREATE TABLE event_log (
event_id BIGINT GENERATED ALWAYS AS IDENTITY
(CACHE 100000),
event_type TEXT NOT NULL,
event_timestamp TIMESTAMPTZ
DEFAULT CURRENT_TIMESTAMP,
event_data TEXT
);
Account number generation (CACHE = 1) example
Financial or customer-facing account numbers often require predictable, gap-free sequential numbering for auditing, compliance, or customer service purposes. Setting CACHE = 1 allocates IDs in order, though at the cost of performance.
CREATE TABLE customer_accounts (
account_number BIGINT GENERATED ALWAYS AS IDENTITY
(CACHE 1),
account_type TEXT NOT NULL,
created_date DATE DEFAULT CURRENT_DATE
);
Hybrid approach (UUID primary key + sequence display ID) example:
This pattern combines the benefits of both approaches: UUIDs provide globally unique, distributed, system-friendly primary keys, while a cached sequence generates user-friendly display numbers for customer-facing interfaces like order tracking.
CREATE SEQUENCE order_display_seq CACHE 70000;
CREATE TABLE orders_hybrid (
order_uuid UUID PRIMARY KEY
DEFAULT gen_random_uuid(),
order_number BIGINT
DEFAULT nextval('order_display_seq'),
customer_id BIGINT NOT NULL,
order_date DATE DEFAULT CURRENT_DATE
);
Best practices
Choosing CACHE size:
- Use
CACHE >= 65536when:- Identifiers generated at high frequency (thousands per second)
- Many concurrent sessions performing inserts
- Workload tolerates gaps in identifier values. The database always vends IDs in sequential, ascending order—apparent out-of-order allocation is a client-side observation, not a database behavior (this is also true of standard PostgreSQL, where client submission order does not determine ID assignment)
- Examples: IoT telemetry, event logs, job run IDs
- Use
CACHE = 1when:- Allocation rates are moderate (hundreds per second or less)
- Closer sequential ordering is important
- Minimizing gaps is a priority
- Examples: invoice numbers, account IDs, reference numbers
Identity columns compared to standalone sequences:
- Prefer
IDENTITYcolumns for: Single-table auto-incrementing IDs (most common use case) - Use standalone sequences for: Sharing sequences across tables, generating IDs outside
INSERToperations
UUIDs compared to integer sequences:
- Default to UUIDs for: Primary keys in high-scale distributed workloads
- Use integer sequences for: Human-readable display IDs, reporting, external integrations
- Consider a hybrid approach: UUID primary key + sequence-based display number
Managing gaps:
- Gaps are inevitable due to transaction rollbacks, cached value loss, and concurrent allocation
- Applications should be designed to handle non-contiguous ID values. The database always vends IDs in sequential, ascending order so gaps arise from external factors (rollbacks, cache loss, concurrency) rather than out-of-order vending
- Never rely on sequences for gapless numbering
Performance improvement:
- Larger cache sizes reduce coordination overhead but increase visible gaps
- Use explicit transactions when ordering matters within a single session
- Monitor sequence allocation patterns under production load
Troubleshooting examples
Issue: “Permission denied for sequence”
Grant necessary permissions.
GRANT USAGE ON SEQUENCE customer_seq TO application_user;
GRANT UPDATE ON SEQUENCE customer_seq TO application_user;
Issue: “Cannot insert explicit value for identity column”
Use OVERRIDING SYSTEM VALUE for GENERATED ALWAYS columns.
INSERT INTO orders (order_id, customer_id, total_amount)
OVERRIDING SYSTEM VALUE
VALUES (12345, 99999, 500.00);
Or change to GENERATED BY DEFAULT.
ALTER TABLE orders
ALTER COLUMN order_id SET GENERATED BY DEFAULT;
Issue: Large gaps in sequence values.
- Expected behavior with
CACHE >= 65536 - Occurs when cached values aren’t fully consumed before the session ends
- Consider reducing cache size if gaps are problematic for your use case
Issue: Sequence values not strictly increasing across sessions
- Expected behavior with
CACHE >= 65536in distributed systems - Use
CACHE = 1if strict ordering is required - Use explicit transactions for more predictable ordering within sessions
Clean up
If you created resources to follow along with identity column tutorials or need to clean up your database schema, this guide walks you through the process of removing identity column behavior from your database objects. Whether you need to drop identity properties while preserving your data, remove sequences, or clean up entire tables, you’ll find the SQL commands and syntax you need here. We’ll also cover important cost implications and system limits to help you manage your database resources efficiently.
- Remove identity column behavior:
- Drop identity property (keeps column and data)
ALTER TABLE orders ALTER COLUMN order_id DROP IDENTITY;
- Drop identity property (keeps column and data)
- Drop sequences:
- Drop standalone sequence
DROP SEQUENCE customer_seq; - Drop sequence and dependent objects
DROP SEQUENCE customer_seq CASCADE;
- Drop standalone sequence
- Drop tables with identity columns
- Automatically drops associated sequence
DROP TABLE orders;
- Automatically drops associated sequence
Cost implications:
- No additional charges for sequences or identity columns
- Dropping unused sequences reduces metadata overhead (5,000 sequence limit per cluster)
Conclusion
In this post, we explored how identity columns provide PostgreSQL-compatible auto-incrementing IDs improved for distributed workloads. We demonstrated that standalone sequences offer advanced control for complex scenarios, while UUIDs remain the recommended choice for primary keys, with integer sequences better suited for display IDs and human-readable identifiers. Finally, we discussed how gaps and ordering effects are inherent characteristics of distributed sequence allocation that developers should account for in their application design.
Ready to experiment with identity columns and sequences? Visit the Aurora DSQL Playground to run these examples in a live environment. You can test different cache settings, observe concurrent behavior, and explore how sequences work in real-time without setting up your own database.