ä½æēØ Amazon DocumentDB 8.0 å éä½ ēåŗēØēØåŗ
Source: AWS - Databases
Amazon DocumentDB (with MongoDB compatibility) announced the general availability of Amazon DocumentDB 8.0 that delivers breakthrough performance improvements that can transform your application experience. With up to 7x faster aggregation pipeline latency and 5x improved storage compression, you can build faster applications while significantly reducing costs.
Amazon DocumentDB 8.0 brings in support for MongoDB 8.0 API driver compatibility while maintaining support for applications built using MongoDB API versions 6.0 and 7.0. This post explores the new features in Amazon DocumentDB 8.0 and demonstrates how they improve performance and cost efficiency.
Amazon DocumentDB is a serverless, fully managed, MongoDB API-compatible document database service that cost-effectively runs critical document workloads at virtually any scale without managing infrastructure. Amazon DocumentDB serves tens of thousands of customers globally across all industries. You can enhance your applications with gen AI and machine learning (ML) capabilities using vector search for Amazon DocumentDB and integration with Amazon SageMaker Canvas.
New features in Amazon DocumentDB 8.0:
- MongoDB 8.0 API Driver CompatibilityĀ ā Amazon DocumentDB 8.0 adds compatibility with MongoDB 8.0 API drivers while maintaining support for MongoDB API version 6.0 and 7.0.
- Query Planner Version 3Ā ā Delivers up to 2x overall performance improvement over Planner v2 with up to 7x faster query latency for aggregation pipelines. Planner v3 adds intelligent optimizations including match stage pull-up, $lookup and $unwind coalescing, and distinct scan optimization for low cardinality indexes.
- Zstandard (Zstd) Dictionary-Based CompressionĀ ā Amazon DocumentDB 8.0 introduces Zstd compression as an alternative to LZ4, achieving up to 5x better compression ratios for smaller documents.
- Text Index v2Ā ā Improved text search with better parsing for complex strings, including support for URLs, email addresses, and special characters.
- Collation SupportĀ ā Perform language-specific string comparisons with support for locale-aware sorting and filtering.
- ViewsĀ ā Views function as virtual read-only collections that present data from underlying collection based on specified aggregation pipeline.
- Vector Search EnhancementsĀ ā Parallel vector index builds are now up to 30x faster, enabling you to build AI-powered applications with significantly reduced index creation time.
- New Aggregation StagesĀ ā Support for six aggregation stages including $merge, $bucket, $replaceWith, $vectorSearch, $set, and $unset stages. For a full list of supported aggregation stages, see Aggregation pipeline operators.
- New operators: Support for five new operators including $pow, $rand, $dateTrunc, $dateToParts, and $dateFromParts. For a full list of supported operators, see Query and projection operators.
For full release notes, seeĀ Amazon DocumentDB release notes.
Letās dive deeper into the enhancements that make Amazon DocumentDB 8.0 more performant for your applications
Smarter, faster queries: New Query Planner Version 3
New Query Planner V3 (NQP V3) is built on the foundation of v2 Planner, which achieved up to 10x performance improvements for find and update queries. v3 extends these gains to distinct and aggregation operations through intelligent optimizations.
What makes Planner v3 faster?
NQP V3 employs three key optimizations that dramatically reduce query execution time:
- Distinct Scan OptimizationĀ ā Uses index-only scans for low-cardinality fields
- Match Stage Pull-UpĀ ā Moves filter operations earlier in aggregation pipelines
- $lookup and $unwind CoalescingĀ ā Combines join and unwind operations to remove intermediate processing
Test data and cluster configuration:
Test data: Queries used in this section are executed against the dataset generated with py-tpcc benchmarking suite using the following configuration.
python3 tpcc.py --config <config> --clients 20 --warehouses 100 --no-execute mongodb
This execution results in 3 million records in Customer, History and Orders collections and 10 million in the Stock collection.
Cluster and instance configuration: Query execution times are based on queries running on an r6g.2xlarge instance with default configuration settings for Amazon DocumentDB clusters 5.0 and 8.0 respectively
Distinct scan optimization:
For distinct operations on low-cardinality indexed fields, Planner v3 introduces a new Distinct_Scan execution stage that uses index-only scans (IXONLYSCAN),improving performance for distinct operations on low cardinality indexes.For distinct operations on indexed fields with low cardinality, Planner v3 introduces a new Distinct_Scan execution stage that index-only scans (IXONLYSCAN)improving performance for distinct operations.
With the Distinct_Scan optimisation, the query to find out distinct warehouse IDs returned results in single digit millisecond on version 8.0 whereas the query in 5.0 took multiple seconds.
var startTime = new Date();
var result = db.CUSTOMER.distinct('C_W_ID');
var endTime = new Date();
var timeDiff = endTime - startTime;
var count = result.length;
print("Execution time: " + timeDiff + " ms");
print("Distinct count: " + count);
| Version | Execution Time |
|---|---|
| Amazon DocumentDB 5.0 | 5,285 ms |
| Amazon DocumentDB 8.0 | 6 ms |
Amazon DocumentDB 8.0 returns result in 6 ms compared to 5,285 ms in Amazon DocumentDB 5.0, representing a significant performance improvement.
To see the query plan for your distinct query run explain():
db.CUSTOMER.explain().distinct('C_W_ID');
Match stage pull-up: Smarter pipeline execution
NQP v3 automatically reorders your aggregation pipelines, moving filter operations ($match) earlier when possible. This reduces the number of documents processed by expensive transformation stagesāwithout requiring you to rewrite your queries.
For example, here is a query with the match stage later in the pipeline to find the top 10 highest-value customers.
var startTime = new Date();
db.CUSTOMER.aggregate([
{ $project: { C_W_ID: 1, C_D_ID: 1, C_BALANCE: 1, C_YTD_PAYMENT: 1 } },
{ $match: { C_W_ID: { $lte: 5 }, C_D_ID: { $lte: 5 } } },
{ $addFields: { totalValue: { $add: ["$C_BALANCE", "$C_YTD_PAYMENT"] } } },
{ $sort: { totalValue: -1 } },
{ $limit: 10 }
]);
var endTime = new Date();
print("Top 10 Customers: " + (endTime - startTime) + " ms");
This query returns results in milliseconds on version 8.0, compared to over 10 seconds on version 5.0:
| Version | Execution Time |
|---|---|
| Amazon DocumentDB 5.0 | 14,113 ms |
| Amazon DocumentDB 8.0 | 619 ms |
Note: When the query is rewritten to place the match stage first in the pipeline, version 5.0 returns the result in under a second.
$lookup and $unwind Coalescing:
When joining collections with $lookup followed by $unwind (a common pattern), NQP V3 automatically combines these operations. This removes intermediate data structures and reduces memory overhead.
Here is a query to get the payment transaction report from HISTORY and CUSTOMERS collections using $lookup and $unwind to get the info for top customers at warehouse 1, showing payments > $10 with customer daily revenue tracking.
var startTime = new Date();
db.HISTORY.aggregate([
{ $match: { H_W_ID: 1, H_C_ID: { $lte: 500 }, H_AMOUNT: { $gte: 10 } }},
{ $limit: 100 },
{ $lookup: { from: "CUSTOMER", localField: "H_C_ID", foreignField: "C_ID", as: "customer" }},
{ $unwind: "$customer" },
{ $project: { H_AMOUNT: 1, H_DATE: 1, "customer.C_FIRST": 1, "customer.C_LAST": 1 }}
], { allowDiskUse: true });
var endTime = new Date();
print("Payment Transactions (Fast): " + (endTime - startTime) + " ms");
This query returns results in milliseconds on version 8.0, compared to over 19 seconds on version 5.0:
| Version | Execution Time |
|---|---|
| Amazon DocumentDB 5.0 | 19,161 ms |
| Amazon DocumentDB 8.0 | 604 ms |
Store more data, pay less: dictionary-based compression
Amazon DocumentDB introduces a new compression algorithm Zstandard (Zstd) dictionary-based compression, in version 8.0, resulting in faster inserts while delivering up to 5x compression ratios that can dramatically reduce your storage costs. With storage expenses scaling linearly with data size, this enhancement allows you to store significantly more data without proportionally increasing your storage footprint.
Compression enabled by default
Amazon DocumentDB 8.0 automatically enables compression with Zstd as the default algorithm. Any document larger than 128 bytes benefits from compression. While these defaults work for most workloads, you can adjust settings like change the compression algorithm to LZ4, if your use case demands it.
Storage savings
Testing with the TPC-C benchmark dataset demonstrates the substantial storage savings possible with Zstd compression. The ORDERS collection achieved aĀ 5.32x compression ratio, meaning you store over 5 times more data in the same space. Hereās how different collections performed:

How ZSTD dictionaries work
Zstd dictionaries learn from your actual data. Amazon DocumentDB samples your documents to build a custom 2KB compression dictionary tailored to your specific collection. This dictionary captures patterns in your field names and values, achieving significantly higher compression ratiosāespecially for collections with consistent schemas or repeated field names.
Key characteristics:
- Each collection maintains its own dedicated 2KB dictionary
- Requires a minimum of 100 documents to train the dictionary
- Documents larger than 128 bytes are automatically compressed after the first 100 documents are inserted
- The dictionary adapts to your data model, not a generic pattern.
Track compression performance using Collection stats commands:
db.collection.stats()
// Or use the collStats command
db.runCommand({collStats: "collection_name"})
Example output:
// Sample output
db.runCommand({collStats:"ORDERS", scale:1024*1024})
{
ns: 'tpcc.ORDERS',
MVCCIdStats: { MVCCIdScale: 0.03, storageSegmentBase: { MVCCIdScale: 0.03 } },
gcRuntimeStats: { numRuns: 0 },
documentFragmentStats: {
combinedStatsAvailability: 'N/A',
storageSegmentBase: { statsAvailability: 'N/A' }
},
collScans: 0,
count: 3000000,
size: 6340.02685546875,
avgObjSize: 2216.5204,
storageSize: 1192.3671875,
storageSizeStats: { storageSegmentBase: 1192.3671875 },
capped: false,
nindexes: 3,
totalIndexSize: 809.8046875,
indexSizes: {
'O_C_ID_1_O_D_ID_1_O_W_ID_1_O_ID_-1_O_CARRIER_ID_1_O_ENTRY_ID_1': 472.8671875,
O_W_ID_1_O_D_ID_1_O_ID_1_O_C_ID_1: 220.734375,
_id_: 116.203125
},
unusedStorageSize: {
unusedBytes: 10231808,
unusedPercent: 0.82,
storageSegmentBase: { unusedBytes: 10231808, unusedPercent: 0 }
},
compression: {
enable: true,
threshold: 128,
algorithm: 'zstd',
usingDictionary: true
},
cacheStats: {
collBlksHit: 0,
collBlksRead: 0,
collHitRatio: 0,
idxBlksHit: 0,
idxBlksRead: 0,
idxHitRatio: 0
},
idxScans: 0,
opCounter: { numDocsIns: 0, numDocsUpd: 0, numDocsDel: 0 },
lastReset: '2026-02-20 12:39:14.156813+00',
ok: 1,
operationTime: Timestamp({ t: 1771591154, i: 1 })
}
For more information, see dictionary-based compression in Amazon DocumentDB
Enhanced search capabilities: text index v2 and collation
Amazon DocumentDB 8.0 enhances search capabilities with two new features: Text Index V2, which brings intelligent parsing for complex string formats, and Collation support, which enables language-aware sorting and filtering for global applications.
Text index v2: better parsing for complex strings
Building on the text search capabilities introduced in Amazon DocumentDB 5.0, Text Index v2 delivers advanced tokenization that makes structured text formats fully searchable. When you index content containing URLs, email addresses, file paths, or scientific notation, Amazon DocumentDB automatically breaks them into searchable components.
Enhanced format support: Text Index v2 intelligently parses and indexes the following formats:
- Email addressesĀ ā Search ājanedoe@company.comā using ājanedoeā, ācompanyā, or ācomā
- URLsĀ ā Find āhttps://company.com/path/resourceā with ācompanyā, āpathā, or āresourceā
- File pathsĀ ā Search ā/home/user/thesis.pdfā by āhomeā, āuserā, āthesisā, or āpdfā
- Scientific notationĀ ā Parse ā6.022e023ā or ā1e5ā as searchable components
- Decimal numbersĀ ā Index ā3.14ā or ā0.001ā with proper numeric handling
- Signed integersĀ ā Support for ā+1ā and ā-1ā formats
- XML contentĀ ā Parse āHello &ā while handling entities correctly
Underscore(_) Handling: Text Index v2 treats underscores as part of the word, preserving identifiers like āuser_profileā or āapi_keyā as single searchable tokens essential for technical content and variable names.
Let us see a few example queries, these queries wonāt return any results with Text Index v1 but will return the results in Text Index v2.
Sample Data:
db.user_references.insertMany([
{ "_id": 1, "type": "email", "value": "jane@example.com", "domain": "example.com", "username": "janedoe" },
{ "_id": 2, "type": "email", "value": "janeroe@example.com", "domain": "example.com", "username": "janeroe" },
{ "_id": 3, "type": "file_path", "value": "/home/user/example/thesis.pdf", "directory": "/home/user/example", "filename": "thesis.pdf", "extension": "pdf" },
{ "_id": 4, "type": "file_path", "value": "/home/user/path/jane.pdf", "directory": "/home/user/path", "filename": "jane.pdf", "extension": "pdf" },
{ "_id": 5, "type": "url", "value": "http://www.example.com/path", "protocol": "http", "domain": "www.example.com", "path": "/path" },
{ "_id": 6, "type": "url", "value": "https://example.com/path/../home", "protocol": "https", "domain": "example.com", "path": "/path/../home" }
])
Create text index:
db.user_references.createIndex({ "value": "text" });
Queries:
// Search for: jane
rs0:PRIMARY> db.user_references.find({ $text: { $search: "jane" } });
{ "_id" : 1, "type" : "email", "value" : "jane@example.com", "domain" : "example.com", "username" : "janedoe" }
{ "_id" : 4, "type" : "file_path", "value" : "/home/user/path/jane.pdf", "directory" : "/home/user/path", "filename" : "jane.pdf", "extension" : "pdf" }
// Search for: janeroe
rs0:PRIMARY> db.user_references.find({ $text: { $search: "janeroe" } });
{ "_id" : 2, "type" : "email", "value" : "janeroe@example.com", "domain" : "example.com", "username" : "janeroe" }
// Search for: example
rs0:PRIMARY> db.user_references.find({ $text: { $search: "example" } });
{ "_id" : 2, "type" : "email", "value" : "janeroe@example.com", "domain" : "example.com", "username" : "janeroe" }
rs0:PRIMARY> db.user_references.find({ $text: { $search: "example" } });
{ "_id" : 1, "type" : "email", "value" : "jane@example.com", "domain" : "example.com", "username" : "janedoe" }
{ "_id" : 2, "type" : "email", "value" : "janeroe@example.com", "domain" : "example.com", "username" : "janeroe" }
{ "_id" : 3, "type" : "file_path", "value" : "/home/user/example/thesis.pdf", "directory" : "/home/user/example", "filename" : "thesis.pdf", "extension" : "pdf" }
{ "_id" : 5, "type" : "url", "value" : "http://www.example.com/path", "protocol" : "http", "domain" : "www.example.com", "path" : "/path" }
{ "_id" : 6, "type" : "url", "value" : "https://example.com/path/../home", "protocol" : "https", "domain" : "example.com", "path" : "/path/../home" }
// Search for: pdf
rs0:PRIMARY> db.user_references.find({ $text: { $search: "pdf" } });
{ "_id" : 3, "type" : "file_path", "value" : "/home/user/example/thesis.pdf", "directory" : "/home/user/example", "filename" : "thesis.pdf", "extension" : "pdf" }
{ "_id" : 4, "type" : "file_path", "value" : "/home/user/path/jane.pdf", "directory" : "/home/user/path", "filename" : "jane.pdf", "extension" : "pdf" }
// Search for: path
rs0:PRIMARY> db.user_references.find({ $text: { $search: "path" } });
{ "_id" : 4, "type" : "file_path", "value" : "/home/user/path/jane.pdf", "directory" : "/home/user/path", "filename" : "jane.pdf", "extension" : "pdf" }
{ "_id" : 5, "type" : "url", "value" : "http://www.example.com/path", "protocol" : "http", "domain" : "www.example.com", "path" : "/path" }
{ "_id" : 6, "type" : "url", "value" : "https://example.com/path/../home", "protocol" : "https", "domain" : "example.com", "path" : "/path/../home" }
Collation: language-specific string comparison
Collation brings language-specific string comparison and sorting to Amazon DocumentDB. This is essential for applications serving international users where alphabetical ordering varies by language and locale. You can now perform case-insensitive searches and apply locale-specific rules for accurate text matching.
Collation settings can be applied at the collection level or index level, and are supported exclusively on Query Planner v3 (enabled by default in version 8.0).
Case-insensitive search
Letās see an example to perform case-insensitive search on customer data.
//Create collection with Collation settings
db.createCollection("customers", { collation: { locale: "en_US", strength: 2 } });
//Insert some sample data
db.customers.insertMany(
[
{_id:1, email: "alejandro@example.com", firstName: "Alejandro", lastName: "Rosalez", department: "Engineering", status: "ACTIVE"},
{_id:2, email: "ANA@EXAMPLE.COM", firstName: "Ana", lastName: "Carolina Silva", department: "Marketing", status: "active"},
{_id:3, email: "Arnav@Example.com", firstName: "Arnav", lastName: "Desai", department: "Engineering", status: "Active" },
{_id:4, email: "carlos@example.com", firstName: "Carlos", lastName: "Salazar", department: "Sales", status: "PENDING" },
{_id:5, email: "DIEGO@EXAMPLE.COM", firstName: "Diego", lastName: "Ramirez", department: "Engineering", status: "active" },
{_id:6, email: "jorge@example.com", firstName: "Jorge", lastName: "Souza", department: "Operations", status: "active" }
]);
Querying emails using a search term that differs in case from the stored value in the database
db.customers.findOne({ email: "arNav@example.com"});
{
"_id" : 3, "email" : "Arnav@Example.com", "firstName" : "Arnav", "lastName" : "Desai", "department" : "Engineering", "status" : "Active"
}
db.customers.findOne({ email: "Diego@example.com"});
{
"_id": 5, "email": "DIEGO@EXAMPLE.COM", "firstName": "Diego", "lastName": "Ramirez", "department": "Engineering", "status": "active"
}
Locale-aware searching (French):
Letās see an example to perform locale aware accent in-sensitive search on customer data.
db.createCollection("customers_fr", { collation: { locale: "fr", strength: 1 } });
db.customers_fr.insertMany([
{_id:1, email: "marĆa@example.com", firstName: "MarĆa", lastName: "GarcĆa", ville: "Paris", statut: "ACTIF" },
{_id:2, email: "MARCIA@EXAMPLE.COM", firstName: "MƔrcia", lastName: "Oliveira", ville: "Lyon", statut: "actif" },
{_id:3, email: "sofia@example.com", firstName: "SofĆa", lastName: "MartĆnez", ville: "Marseille", statut: "Actif" },
{_id:4, email: "wang@example.com", firstName: "Wang", lastName: "Xiulan", ville: "Bordeaux", statut: "ACTIF" },
{_id:5, email: "ZHANG@EXAMPLE.COM", firstName: "Zhang", lastName: "Wei", ville: "Toulouse", statut: "en attente" },
{_id:6, email: "saanvi@example.com", firstName: "Saanvi", lastName: "Sarkar", ville: "Nice", statut: "actif" }
]);
Search for lastName: āGarciaā will return the record as the collation settings will treat Garcia= GarcĆa
db.customers_fr.find({lastName: "Garcia"})
{
"_id": 1, "email": "marĆa@example.com", "firstName": "MarĆa", "lastName": "GarcĆa", "ville": "Paris", "statut": "ACTIF"
}
db.customers_fr.find({lastName: " GarcĆa"})
{
"_id": 1, "email": "marĆa@example.com", "firstName": "MarĆa", "lastName": "GarcĆa", "ville": "Paris", "statut": "ACTIF"
}
Case insensitive search with Locale setting
db.customers_fr.find({email:"marcia@EXAMPLE.COM"})
{
"_id" : 2, "email" : "MARCIA@EXAMPLE.COM", "firstName" : "MƔrcia", "lastName" : "Oliveira", "ville" : "Lyon","statut" : "actif"
}
To view index-level collation settings, useĀ db.collection.getIndexes(), which returns detailed index information including collation configuration. By default, indexes inherit the collectionās collation settings.
db.customers_fr.getIndexes()
To view collation setting on an existing collection, useĀ db.getCollectionInfos()Ā function.
db.getCollectionInfos({name: "customers_fr"})
Reduce complexity, enhance access control with Views
Amazon DocumentDB 8.0 supports views with planner v3. A view is a read-only, virtual collection defined by an aggregation pipeline that presents transformed or calculated results from one or more source collectionsāwithout modifying the underlying data.
Amazon DocumentDB 8.0 introduces support for Views with Query Planner v3, so you can create read-only virtual collections defined by an aggregation pipeline that presents transformed or calculated results from one or more source collections. Views present transformed data from underlying collections through aggregation pipelinesāwithout duplicating or modifying the source data.
Using Views, you can streamline the application maintenance process. Instead of writing complex aggregation pipelines in multiple places, you can encapsulate query logic once in a database-stored view definitionYou can use Views to implement granular security that restricts data access at row-level through role-based access control. You can create views that filter sensitive data and expose only specific fields, then grant users access exclusively to their designated views, not to the underlying collections.
Hereās an example of creating a view on the Customers collection that filters data and limits fields, along with creating a user with access to the specific view.
// 1. Create the view (records with status as active, excluding email field)
db.createView(
" active_customers",
"customers",
[
{ $match: { "status": "active" } },
{ $project: { "email": 0 } }
]
)
// 2. Create a user with access only to the view
db.createUser({
user: "<View only user>",
pwd: "<your password>",
roles: [
{
role: "read",
db: "<Database Name>",
collection: "active_customers" // View created om the previous step
}
]
})
This view only user can queryĀ active_customersĀ but cannot: access underlying customer collection or see customer records that have other than active status or see customer email addresses.
To see the definition of a view useĀ db.getCollectionInfos()Ā function.
db.getCollectionInfos({name: "customers_active"})
Upgrading to Amazon DocumentDB 8.0:
You can upgrade your Amazon DocumentDB 5.0 instance-based clusters to version 8.0 using AWS Database Migration Service (AWS DMS). For more information, seeĀ upgrading your Amazon DocumentDB cluster.
Getting started with Amazon DocumentDB 8.0
Amazon DocumentDB 8.0 is available now in all AWS Regions where Amazon DocumentDB operates.
Using AWS Console
To get started with Amazon DocumentDB 8.0 create a new cluster using AWS Console by choosing version 8.0.
- Navigate to Amazon DocumentDB console and choose create.

- Select instance-based clusters and for Engine version choose, 8.0.0.

Continue to configure your cluster settings according to your requirements, for more information, see creating a new Amazon DocumentDB clusters
Using AWS CLI
When using the AWS CLI to create an Amazon DocumentDB cluster, specify:
ā-engine-versionĀ asĀ 8.0.0Ā to create an Amazon DocumentDB 8.0 cluster.
aws docdb create-db-cluster \
--db-cluster-identifier <cluster identifier> \
--engine docdb \
--engine-version 8.0.0 \
--master-username <username> \
--master-user-password <Password>
After cluster creation, add instances using create-db-instance.
Upgrading to Amazon DocumentDB 8.0:
You can upgrade your Amazon DocumentDB 5.0 instance-based clusters to version 8.0 using AWS Database Migration Service (DMS). For more information, seeĀ upgrading your Amazon DocumentDB cluster.
Conclusion
Amazon DocumentDB 8.0 provides MongoDB 8.0 API compatibility with six aggregation stages and five operators, while delivering significant performance improvements via Query Planner v3, with up to 5x storage savings through Zstd compression, and enhanced search capabilities with Text Index v2 and collation support. Views simplify architecture and access control, and 30x faster vector index builds accelerate development of GenAI powered use cases.
For more information about recent launches and blog posts, seeĀ Amazon DocumentDB resources.