load csv file to aws dynamodb

This blog will describe one of the many ways to load a csv data file into AWS dynamodb database using python boto3 library provided by AWS. Amazon DynamoDB is a fully-managed NoSQL database service that offers speedy performance and scalability to store and retrieve any amount of data.
Click here if you would like to setup local dynamodb within your computer or wanted to try the script outside of your aws dynamodb environment

Let’s get started! First, copy below data, past it into excel sheet then save it as “courses.csv”.

Sample records (in courses.csv)
id,firstname,lastname,course
1,jean,joseph,sql server
2,garellard,daniel,postgresql
3,daniel,garellard,mysql
4,daniel,jean,sql server
5,joseph,jean,sql server
6,joseph,garellard,mysql
7,garellard,jean,postgresql

Now, execute below script to create a new table.

import boto3
import os, csv
dynamodb = boto3.session.Session(profile_name='dev').resource('dynamodb', endpoint_url='http://localhost:8000', region_name='us-east-1')
table_name = 'courses'
params = {
    'TableName': table_name,
    'KeySchema': [
        {'AttributeName': 'id', 'KeyType': 'HASH'},
        {'AttributeName': 'course', 'KeyType': 'RANGE'}
    ],
    'AttributeDefinitions': [
        {'AttributeName': 'id', 'AttributeType': 'N'},
        {'AttributeName': 'course', 'AttributeType': 'S'}
    ],
    'ProvisionedThroughput': {
        'ReadCapacityUnits': 10,
        'WriteCapacityUnits': 10
    }
}
table = dynamodb.create_table(**params)
print("Table status:", table.table_status)
print(f"Creating {table_name}...")
table.wait_until_exists()

Courses table successfully created as we can see from below screenshot.

Execute below scripts to insert the csv data file(courses.csv) into dynamodb courses table you just created.

FILE_PATH = r'C:\courses.csv'
table = dynamodb.Table(table_name)
if os.path.exists(FILE_PATH):
    with open(FILE_PATH, 'r', newline='') as CSV_FILE:
        DATA = csv.reader(CSV_FILE, delimiter=',')
        COLUMNS = next(DATA)#remove the header
        for EACH_ITEM in DATA:
            table.put_item(
                            Item={
                                    'id':int(EACH_ITEM[0])
                                    ,'firstname':EACH_ITEM[1]
                                    ,'lastname':EACH_ITEM[2]
                                    ,'course':EACH_ITEM[3]
                                }
                            )

Let’s retrieve the data you just inserted from dynamodb courses table.

print(table.scan()['Items'])

Conclusion:
It is very easy with the help of python aws boto3 library to load csv data into dynamodb table.

Feel free to comment, share and like this post.

Published by Jean Joseph

Jean Joseph is a Database, Big Data, Data Warehouse Platform, Data Pipeline, Database Architecture Solutions Provider as well as a Data Engineer enthusiast among other disciplines.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: