DynamoDB is a serverless NoSQL DB that avoids joins, favors denormalization, and fits predictable access. A guide for SQL devs to learn its core concepts.

Reading time:

10–15 minutes
A SQL Developer’s Guide to DynamoDB

Amazon DynamoDB is a serverless, NoSQL, fully managed database designed from the ground up with a specific set of problems and workloads in mind, rather than trying to be a general-purpose solution for all database needs. It eliminates the need for altering the structure of tables, indexing, backups and patching which in a live, large-scale relational database can be a slow and risky process.

DynamoDB offers flexible data models, allowing you to store complex, evolving data structures without a predefined schema. Its key strengths lie in its ability to deliver single-digit millisecond performance, automatic scaling, high availability, automatically replication across availability zones, while also managing underlying hardware and security. This makes it an ideal choice for applications with large amounts of data and strict latency requirements, such as financial, gaming or streaming.

The first thing that caught my attention about DynamoDB/NoSQL is that it forces you to forget (almost) everything you know about how data is stored in a database, and this isn’t a minor suggestion – it’s the core principle for success. For years, I’ve been working with relational databases like PostgreSQL, MySQL or MSSQL where I have been trained to think in terms of normalization: breaking data into its smallest logical components across multiple tables to ensure integrity and avoid duplication. I learned to rely on JOIN operation to reassemble this data on the fly to answer complex questions. DynamoDB throws this rulebook out the window.

With no JOINs or foreign key constraints, DynamoDB schema design is driven by a different mindset. You must know your application’s data access patterns first, then design your tables to answer those specific queries. This often means denormalizing data – duplicating it to ensure a single request can retrieve all necessary information. This approach is key to achieving DynamoDB’s consistent, single-digit millisecond performance at any scale.

Key Concepts

Tables, items and attributes

Similarly to a relational database, tables, items and attributes are the core components of DynamoDB. Table is a collections of items, and each item is a collection of their own unique attributes. You can create multiple tables to store your data, but it is often recommended for workloads with well-understood access patterns to use a single table design as it aims to optimize data access patterns, improve performance, and reduce costs by eliminating the need for maintaining multiple tables and complex relationships between them. Neither attributes nor their data types need to be defined. Only thing that is mandatory are primary keys.

Primary keysAttributes
User IdEmailFirst NameLast Name
1jason@gmail.comJasonStatham
2peter@email.comPeterGriffin
3harry@email.comHarryPotter
user-table

Primary Keys

Every item in DynamoDB is identified by a primary key, which can be simple (a partition key or PK) or composite (a partition key and a sort key or SK). Primary key attributes must be a single string, number, or binary value. The partition key (PK) is essential for storage as DynamoDB hashes its value to determine which logical partition an item belongs to. These partitions are automatically replicated across multiple Availability Zones for high availability.

Composite key exists for situations where 1 partition key can identify a collection of items. DynamoDB keeps items identified by the same partition key together on the same partition and sorted by the value of the sort key (SK). To read items from database you must use both the partition key and sort key.

The following example extends the user table by providing a simplistic visualization of how DynamoDB utilizes partitions.

Partition 1
PK (User Id)SK (Company)EmailFirst NameLast Name
1Company 1jason@gmail.comJasonStatham
Company 2jason@outlook.comJasonStatham
user-table
Partition 2
PK (User Id)SK (Company)EmailFirst NameLast Name
2Company 3peter@email.comPeterGriffin
user-table
Partition 3
PK (User Id)SK (Company)EmailFirst NameLast Name
3Company 4harry@email.comHarryPotter
Company 5harry@email.comHarryPotter
user-table

Data Operations

DynamoDB supports basic operations for CRUD functionality. These are called:

  • GetItem – get an item from database.
  • PutItem – used primarily for creating new item. If the item with primary key exists, it will replace all attributes.
  • UpdateItem – used primarily to update item with a list of attributes to be updated. If the item with primary doesn’t exist, it will create new one.
  • DeleteItem – delete item from database.

Using the AWS SDK for Java, the DynamoDB Enhanced Client, allows you to map Java classes to DynamoDB tables and perform CRUD operations easily.

Java
import software.amazon.awssdk.enhanced.dynamodb.DynamoDbTable;

private final DynamoDbTable<ShoppingCart> dynamoDbTable;

public ShoppingCart read(String partitionKey) {
    var key = Key.builder().partitionValue(partitionKey).build();
    return dynamoDbTable.getItem(key);
}

public void create(ShoppingCart shoppingCart) {
    dynamoDbTable.putItem(shoppingCart);
}

public void update(ShoppingCart shoppingCart) {
    dynamoDbTable.updateItem(shoppingCart);
}

public void delete(Long partitionKey) {
    var key = Key.builder().partitionValue(partitionKey).build();
    dynamoDbTable.deleteItem(key);
}

Each of these operations require a partition key and a sort key if applicable. In addition to the four basic CRUD operations, DynamoDB also provides BatchGetItem and BatchWriteItem. Batch operations reduce number of network round trips from your service to DynamoDB. With BatchGetItem you can read up to 100 items and with BatchWriteItem you can write up to 25 items with 1 network call.

Queries

There are 2 main approaches on how to read data from DynamoDB appart from the GetItem basic operation. You can use Query or Scan. Query is used to find items based on primary key, whether simple or composite. DynamoDB hashes the partition key, identifies the partition, and fetches all items corresponding to that partition key. Optionally, you can provide the sort key. Scan on the other hand reads all data from the table, but is limited to 1MB. Scan is less efficient than any other operation in DynamoDB as it has to read all partitions, so for faster response times, design your tables so that your applications can use Query.

Here is how to perform a Query supplying composite primary key with DynamoDB Enhanced Client.

Query
import software.amazon.awssdk.enhanced.dynamodb.DynamoDbTable;

private final DynamoDbTable<ShoppingCart> dynamoDbTable;

public PageIterable<ShoppingCart> query(Long partitionKey) {
    var condition = QueryConditional.keyEqualTo(b -> b.partitionValue(partitionKey));
    var query = QueryEnhancedRequest.builder().queryConditional(condition).build();
    return dynamoDbTable.query(query);
}

If you need to read all records from the table, you can use Scan.

Scan
import software.amazon.awssdk.enhanced.dynamodb.DynamoDbTable;

private final DynamoDbTable<ShoppingCart> dynamoDbTable;

public PageIterable<ShoppingCart> scan() {
    return dynamoDbTable.scan();
}

Secondary Indexes

Secondary indexes in DynamoDB are essential for supporting diverse query patterns. They create a separate, alternate copy of your data, allowing you to query the table using a different key. For example, if your main table is organized by userId, you can create a secondary index to quickly look up users by their email without a slow and expensive full table scan. This new data structure copies the attributes you choose and organizes them by the new key.

There are 2 types of secondary indexes: Global secondary index and Local secondary index. With global secondary index or GSI you define a new pair of partition key and sort key that can be different from those on the table. You can use local secondary index or LSI when you want to use a different order on your data for partition key by defining a new sort key. You can have up to 5 LSI and up to 20 GSI per table.

Following is a simple visualization on how our data from user-table would look like with a GSI created.

Secondary PKAttributes
EmailUser IDFirst NameLast Name
jason@gmail.com1JasonStatham
peter@email.com2PeterGriffin
harry@email.com3HarryPotter
user-table

Read Consistency

In relational databases, consistency is one of the four pillars of ACID. When we perform a SELECT query, we expect it to return only data that was committed before the query began. This behavior is known as strong read consistency. For example, in PostgreSQL, this is the default behavior provided by the READ COMMITTED isolation level.

DynamoDB, in contrast, offers two levels of read consistency:

  1. Eventually Consistent Reads (default)
    By default, all read operations in DynamoDB, whether on a table or an index, use eventual consistency. This means that the result of a read might not reflect the most recent write if it was performed shortly before the read. However, if you repeat the same read after a brief period, the updated data will eventually become visible.
    Eventually consistent reads apply to Tables, Local Secondary Indexes (LSIs) and Global Secondary Indexes (GSIs).
  2. Strongly Consistent Reads
    DynamoDB also supports strongly consistent reads, which guarantee that the response reflects all successful writes prior to the read operation. To enable this behavior, you must explicitly set the ConsistentRead parameter to true when using supported operations such as GetItem, Query, or Scan.
    This ensures your application retrieves the most up-to-date data, similar to what you would expect from a traditional SQL database.

The SQL Mindset vs. The DynamoDB Approach

In relational databases, SQL (Structured Query Language) is used to query and retrieve data. Due to the normalized nature of relational data models, these queries often rely heavily on JOIN operations to combine information from multiple tables. While this provides a powerful and flexible way to access data, it can become increasingly resource-intensive as the size of your dataset grows.

In contrast, NoSQL databases like DynamoDB eliminate the need for JOINs, which is a fundamental principle behind their design. By removing the overhead of JOINs, DynamoDB is able to offer predictable performance at any scale.

Multiple Tables in DynamoDB: A Familiar but Costly Pattern

When transitioning from SQL to DynamoDB, you might be tempted to create one table per entity, mirroring traditional relational modeling practices. While this approach is familiar to most SQL developers, and offers some benefits like clear separation of data and easier backups, it also introduces several trade-offs in the context of DynamoDB.

Each table is still organized by a primary key, and its performance is optimized accordingly. However, if you need to retrieve data from multiple tables, you’ll need to issue separate requests and perform any necessary joins in your application code. This results in:

  • Increased I/O overhead, as each table read is a separate network call
  • Higher latency, especially when multiple entities must be fetched together
  • Greater costs, due to more read operations and the maintenance of multiple tables

Embracing Single Table Design

An alternative and more DynamoDB native approach is the single table design. This strategy involves storing multiple entity types in a single table, organized using carefully structured primary keys. Please bear in mind, that we are talking about single table per service, in general you will most likely have multiple DynamoDB tables in your subscription.

Key benefits of this approach include:

  • Improved performance: Related data can be fetched in a single query based on shared partition keys
  • Reduced costs: Fewer read operations and simplified table management
  • Optimized for access patterns: Design is shaped around how your application queries data

This is possible because DynamoDB groups all items with the same partition key together on the same partition. Each item is uniquely identified by a sort key, allowing multiple data types to coexist in the same table.

However, this design also introduces trade-offs:

  • Backups and restores apply to the entire table, not individual entity types
  • The data model may be less intuitive to developers accustomed to relational schemas

Let’s revisit our earlier example of user-table. Imagine we have an online store and want to the single table design for multiple entities.

Primary KeysAttributes
PKSKEmailFirst NameLast Name
USER#123ACCOUNTpeter@email.comPeterGriffin
ORDER#123Item IDItem NameCount
456Laptop1
789Monitor2
SHOPPING_CARTItem IDItem NameCount
147Keyboard1
online-store

This layout allows us to efficiently query all data related to a specific user with a single request (PK = USER#123) and eliminates the need to perform costly JOIN operations. The hash symbol is a preferred way to separate hierarchies, but it’s up to you how you design your primary keys.

Denormalization

In relational databases, we normalize data to reduce redundancy and enforce data integrity through relationships and constraints. However, in DynamoDB, performance and scalability come first. This often requires denormalizing data, so it’s stored in a way that’s optimized for access, and speed.

Denormalization in DynamoDB means structuring data so that everything your application needs for a given operation can be retrieved with a single query, no joins, no secondary lookups, no cross-table relationships. When you denormalize data, you only need to query sections of your table, which scales to millions of requests.

There are basically two ways to denomalize:

  • Embedding – include 1:1 relations to your items, for example user settings, profile or address. Avoid embedding large 1:N relations and arrays as it may exceed DynamoDB’s 400KB item size limit.
  • Duplication – focus on duplicating immutable values.

A denormalized database would look like following.

Primary KeysAttributes
PKSKEmailFirst NameLast NameThemeCurrency
USER#123ACCOUNTpeter@email.comPeterGriffinDarkEUR
ORDER#123Item IDItem NameCount
456Laptop1
789Monitor2
SHOPPING_CARTItem IDItem NameCount
147Keyboard1
USER#234ACCOUNTEmailFirst NameLast NameThemeCurrency
harry@email.comHarryPotterLightUS
ORDER#234Item IDItem NameCount
147Keyboard3
SHOPPING_CARTItem IDItem NameCount
369Headset2
987Monitor1
online-store

User settings like Currency and Theme, are embedded directly in user item. In relational database, this would require a JOIN operation. The item Keyboard (Item ID=145) appears in both ORDER and SHOPPING_CART, showing how is information duplicated to eliminate the need for JOIN.

Conclusion

Using DynamoDB requires a shift in mindset – from traditional, normalized relational models to access-pattern-first, denormalized design. It’s not just a different database, it’s a different way of thinking.

By modeling your data around how your application reads and writes, you can take full advantage of DynamoDB’s strengths: low-latency performance, seamless scalability, and high availability with minimal operational effort.

It may feel unfamiliar at first, especially if you come from a SQL background, but once you stop trying to make DynamoDB behave like a relational database, you’ll realize it excels at solving a different class of problems – fast, scalable, and cloud-native.


Leave a Reply

Your email address will not be published. Required fields are marked *