Featured image of post Everything you need to know about AWS DynamoDB

Everything you need to know about AWS DynamoDB

DynamoDB is a NoSQL database from AWS. It can be quite overwhelming when starting with a serverless stack to learn all the DynamoDB concepts and caveats. We’ve put the most important questions and answers for you all in a single article.

How does DynamoDB work?

DynamoDB is a managed NoSQL database. The main concepts in DynamoDB are tables, items and attributes. In DynamoDB, you cannot create a database, only tables. The database is abstracted away. A table is a collection of items, and each item consists of a collection of attributes. You can store items as JSON documents in DynamoDB. The main feature of DynamoDB is that it is infinitely scalable. You can run very small applications on it for almost no cost, or run an app with millions of users.

What is the Partition Key?

You can compare the partition key in DynamoDB with a primary key in a traditional relational database. That means each item within a DynamoDB table has to have a unique partition key. DynamoDB uses the partition key of your item to divide it over a set of partitions. This is what makes DynamoDB scalable, because it can divide your items over a set of partition and thus scale partitions individually.

For choosing your partition key, it is important that you choose an attribute with enough variation, and an attribute that is about evenly distributed over your dataset. If you choose your partition key in such a way that there is little variation, then DynamoDB cannot effectively distribute your items, and scalability will suffer because of it. In that case you might encounter throttling.

You can also choose for a composite key via string concatenation to achieve the desired variation. For example, combine an email address with a product ID and country code (For example: myemail@test#12345).

What is the Sort Key in DynamoDB?

The sort key in DynamoDB is used to sort the items in your table. It is sometimes also referred to as the range attribute. When you set a sort key, it will be part of the primary key. All items with the same partition key will be stored together and sorted based on their sort key. With a sort key, you can have multiple items with the same partition key, but a differing sort key, since the sort key is now part of the primary key. You'll want to select a partition and sort key based on how you access your data.

Is the Sort Key part of the Primary Key in DynamoDB?

When you set a sort key in DynamoDB for your table, the sort key then becomes part of the primary key of your item, which means you are now using a composite primary key. To get an item by its primary key in DynamoDB, you will need to provide both the partition as well as the sort key.

For the sort key, you can also consider to use multiple attributes via string concatenation. To find items with a sort key, you can use operators such as equals, smaller or greater than and provide a range of values. For sort keys of type string, you can also use a begins_with operator by providing a substring.

Why should I never do a table scan in DynamoDB?

A scan operation on DynamoDB means that every item in the table or index is accessed to find your item. This operation is very inefficient. Accessing data in DynamoDB should mainly be done via queries or a call to GetItem. You can achieve this by analyzing which data you store, how you access it and creating the right indexes for it.

How to query for data in DynamoDB?

You can query for data in DynamoDB by providing the name of the partition key and a value for it. If you didn't set a sort key, you will then get the item because there will only be one (since the partition key is the primary key). If you did set a sort key, you will get all the items with that partition key sorted based on the sort key. Also, you can provide a filter for the sort key, as mentioned earlier (with operations such as begins with, range, greater than/less than). You can also add a filter for other attributes, as long as you have provided the partition and (part of) the sort key.

That's it. That's all you can query. This simplicity is what makes DynamoDB so scalable, but is also what makes it difficult to use for beginners. You cannot filter on a part of the partition key or query other attributes in DynamoDB without knowing the partition key. The only way to achieve this is by creating more indexes.

What is a Global Secondary Index in DynamoDB?

If you want to read your data based on some other attribute then the partition key, and you don't want to do a table scan, then you can create a Global Secondary Index (GSI). GSIs allow you to take your table, and map some other attribute or attributes as the primary key. For example, suppose you have a comments table with as primary key a unique ID. Now we want to get only the comments created by a user. We can then create a new index based on the createdBy attribute of the comment with a sort key set to the created timestamp. We can then easily get all the comments for a given user, sorted by when it was created. You can then create queries as normal, but by providing the GSI name you can use your different partition and sort key.

A Global Secondary Index works for the whole table. You can add GSIs after creating a DynamoDB table, but each table can only have a maximum of 20 GSIs.

What is a Local Secondary Index in DynamoDB?

A Local Secondary Index, or LSI, allows your table to have multiple sort keys. An LSI is only applied to items that have the same partition key. Each DynamoDB table can have a maximum of five LSIs and can only work when the total size of item set per partition key is a maximum of 10 GB. You cannot add an LSI after you have created a table, it needs to be done at the same time as table creation.

What about consistency for DynamoDB?

DynamoDB supports both eventual and strong consistency for reads. When you need strongly consistent reads, you cannot use Global Secondary Indexes. Also, you will have to pay more, since strong consistent reads require more capacity.

Should I use on-demand or provisioned capacity for DynamoDB?

You can use DynamoDB in two modes, either on-demand or by provisioning your read and write capacity beforehand. When using DynamoDB in on-demand mode, you will only pay for what you use and DynamoDB will increase or decrease the capacity based on the usage of the table. On-demand is useful for when you don't know the load of your application yet, as well as for small MVPs. In a later stage you can optimize your application and switch to the provisioned mode of DynamoDB.

In on-demand mode, there are some default quotas in place, but they are mostly for you wallet instead of DynamoDB not being able to handle the load. By default that quota is 40,000 read units and 40,000 write units. The on-demand free tier offers only 25 GB of storage, no free read or writes.

When using the provisioned DynamoDB, you are also free tier eligible for 25 read units and 25 write units of capacity, plus 25 GB of storage per month. So when you want to stay in the free tier, we recommend to use the provisioned mode within the given limits.

For which programming languages are DynamoDB SDKs available?

Amazon provides DynamoDB SDKs for .NET, Java, JavaScript, C++, Go, Kotlin, PHP, Python, Ruby and Rust. Note that not all SDKs have stable versions released yet.

For JavaScript, AWS offers a low-level DynamoDBClient, which is the same as other languages, but also a higher level DynamoDBDocumentClient that is easier to work with. The DynamoDBDocumentClient comes from @aws-sdk/lib-dynamodb, while the default DynamoDB SDK is in @aws-sdk/client-dynamodb. When using the document client, it is easy to get confused and mix up imports between both libraries. Also, always check while reading documentation or articles online which client they are using, to prevent errors or incorrect information. Both libraries share class names that are equal but with a different implementation. You can also consider using a ORM, for JavaScript a popular solution is Dynamoose.

Can I run DynamoDB locally?

Yes, you can run DynamoDB locally for development purposes. Amazon provides a runnable JAR and offers Docker images. Amazon also provides a workbench which you can use to connect to your table, design your table and indexes and run queries.

Is DynamoDB a relational database?

No, DynamoDB is a NoSQL key-value database to store documents.