3 ways to test S3 in Python

At some point, every engineer has to decide whether to write tests for something or just ship the feature and move on.

Under a time crunch, I’ll often write tests for the easy things (e.g. pure functions) or write the tests that provide the biggest bang for their buck (e.g. an end-to-end integration test for the service).

Testing code that interacts with external systems, like a database or S3, requires a bit more effort. However, important business logic often happens in this code and recently I’ve become more interested in testing it.

In this post, I’ll explore three ways of testing S3 in Python.

Setup

Let’s consider a simple CRUD app for recipes, backed by S3.

from dataclasses import dataclass
import json

import boto3

S3_BUCKET = "recipes"


def get_s3():
    return boto3.client("s3")


@dataclass
class Recipe:
    name: str
    instructions: str

    @classmethod
    def get_by_name(cls, name: str):
        """Looks up a Recipe by name
        
        Args:
            name (str): Recipe name
        
        Returns a Recipe object
        """
        response = get_s3().get_object(Bucket=S3_BUCKET, Key=name)
        response = json.loads(response["Body"].read())
        return cls(response["name"], response["instructions"])

    @classmethod
    def update_instructions(cls, name: str, new_instructions: str):
        """Updates the instructions for a recipe
        
        Args:
            name (str): Name of the recipe to update
            new_instructions (str): New instructions
        """
        recipe = cls.get_by_name(name)
        recipe.instructions = new_instructions
        return recipe

    @classmethod
    def delete(cls, name: str):
        """Deletes a recipe
        
        Args:
            name (str): Name of the recipe to delete
        """
        get_s3().delete_object(Bucket=S3_BUCKET, Key=name)

    def to_json(self):
        """Serialize the recipe to json
        
        Returns:
            str: JSON representation of the Recipe
        """
        return json.dumps({"name": self.name, "instructions": self.instructions})

    def save(self):
        """Persists a recipe to S3
        """
        serialized_recipe = self.to_json().encode("utf-8")
        get_s3().put_object(Bucket=S3_BUCKET, Key=self.name, Body=serialized_recipe)

All tests below use pytest. All code is runnable and available on Github.

Option 1: moto

Moto is a Python library that makes it easy to mock out AWS services in tests. Let’s use it to test our app.

First, create a pytest a fixture that creates our S3 bucket. All S3 interactions within the mock_s3 context manager will be directed at moto’s virtual AWS account.

import boto3
from moto import mock_s3
import pytest


from recipe import Recipe, S3_BUCKET


@pytest.fixture
def s3():
    """Pytest fixture that creates the recipes bucket in 
    the fake moto AWS account
    
    Yields a fake boto3 s3 client
    """
    with mock_s3():
        s3 = boto3.client("s3")
        s3.create_bucket(Bucket=S3_BUCKET)
        yield s3

Next, we can test creating a new Recipe and fetching it.

def test_create_and_get(s3):
    Recipe(name="nachos", instructions="Melt cheese on chips").save()

    recipe = Recipe.get_by_name("nachos")
    assert recipe.name == "nachos"
    assert recipe.instructions == "Melt cheese on chips"

If we try to fetch a Recipe that doesn’t exist, an exception should be raised. This test covers that scenario.

def test_get_does_not_exist(s3):
    with pytest.raises(s3.exceptions.NoSuchKey):
        recipe = Recipe.get_by_name("sandwich")

We can also update a Recipe. This test confirms that the data is updated after save() is called.

def test_update(s3):
    old_instructions = "Melt cheese on chips"
    new_instructions = "Microwave a plate full of tortilla chips and cheese"

    Recipe(name="nachos", instructions=old_instructions).save()

    new_recipe = Recipe.update_instructions(
        name="nachos", new_instructions=new_instructions
    )

    # Nothing changes until you call save()
    recipe = Recipe.get_by_name("nachos")
    assert recipe.instructions == old_instructions

    new_recipe.save()

    # Recipe updates after saving
    recipe = Recipe.get_by_name("nachos")
    assert recipe.instructions == new_instructions

Finally, we can delete a recipe and confirm that the data in S3 disappears.

def test_delete(s3):
    Recipe(name="nachos", instructions="Melt cheese on chips").save()

    response = s3.list_objects_v2(Bucket=S3_BUCKET)
    assert len(response["Contents"]) == 1
    assert response["Contents"][0]["Key"] == "nachos"

    Recipe.delete("nachos")

    # Data in S3 is gone after deleting the recipe
    response = s3.list_objects_v2(Bucket=S3_BUCKET)
    assert "Contents" not in response.keys()

Overall, moto does a great job of implementing the S3 API. It’s easy to install, feels just like the real S3, and doesn’t require any code changes.

Option 2: Botocore stubs

Botocore stubs allow you to mock out S3 requests with fake responses.

Below is a pytest fixture that creates an S3 stub. Since other S3 clients won’t use this stub, we also need to patch get_s3 and replace its return value with the stub - thereby forcing all S3 clients in the Recipe class to use our stub.

import datetime
import json
from dateutil.tz import tzutc
from io import BytesIO
from unittest.mock import patch

import boto3
from botocore.stub import Stubber, ANY
from botocore.response import StreamingBody
import pytest

from recipe import Recipe, S3_BUCKET


@pytest.fixture
def s3_stub():
    """Pytest fixture that mocks the get_s3 function with a S3 client stub
    
    Yields a Stubber for the S3 client
    """
    s3 = boto3.client("s3")
    stubber = Stubber(s3)

    with patch("recipe.get_s3", return_value=s3):
        yield stubber

Then, we can stub out responses for the put_object and get_object S3 APIs. With those stubs in place, we can run the test that creates and subsequently fetches a Recipe.

def test_create_and_get(s3_stub):
    # Stub out the put_object response
    # Note: These stubs are incomplete - I omitted things such as
    # HTTP headers for brevity
    put_object_response = {
        "ResponseMetadata": {
            "RequestId": "5994D680BF127CE3",
            "HTTPStatusCode": 200,
            "RetryAttempts": 1,
        },
        "ETag": '"6299528715bad0e3510d1e4c4952ee7e"',
    }
    put_object_expected_params = {"Bucket": ANY, "Key": ANY, "Body": ANY}
    s3_stub.add_response("put_object", put_object_response, put_object_expected_params)

    # Create the StreamingBody that will be returned by get_object
    encoded_message = json.dumps(
        {"name": "nachos", "instructions": "Melt cheese on chips"}
    ).encode("utf-8")
    raw_stream = StreamingBody(BytesIO(encoded_message), len(encoded_message))

    # Stub out the get_object response
    get_object_response = {
        "ResponseMetadata": {
            "RequestId": "6BFC00970E62BC8F",
            "HTTPStatusCode": 200,
            "RetryAttempts": 1,
        },
        "LastModified": datetime.datetime(2020, 4, 6, 5, 39, 29, tzinfo=tzutc()),
        "ContentLength": 58,
        "ETag": '"6299528715bad0e3510d1e4c4952ee7e"',
        "ContentType": "binary/octet-stream",
        "Metadata": {},
        "Body": raw_stream,
    }
    get_object_expected_params = {"Bucket": ANY, "Key": ANY}
    s3_stub.add_response("get_object", get_object_response, get_object_expected_params)

    # Activate the stubber
    with s3_stub:
        recipe = Recipe(name="nachos", instructions="Melt cheese on chips")
        recipe.save()

        recipe = Recipe.get_by_name("nachos")
        assert recipe.name == "nachos"
        assert recipe.instructions == "Melt cheese on chips"

While botocore stubs are functional, I don’t like working with them for several reasons:

They require a lot more prep. Creating stubs is time-consuming. Even if you run the real code interactively and copy the response, some things need to be replaced - such as the StreamingBody above.
They’re fragile and fake. Responses are returned first in, first out - so if you call the S3 APIs in a different order than you added the responses, it will throw an error. If you have a bug in how you call the API, it might not be caught.
To make the stubs look somewhat realistic, you have to mock many fields that your code doesn’t care about and bloat your tests with fake responses.
They leak implementation details from the module being tested. For example, if a module switched from using s3.list_objects to s3.list_objects_v2, the test would fail because it depends on a specific API being called. This creates an unnecessary dependency on the private API of the module, instead of testing the public API.

Option 3: Localstack

A third option is localstack, which allows you to bring up an entire AWS cloud stack locally.

First, we need to bring up localstack. I choose to do this with docker-compose.

version: "3.7"
services:
  tests:
    image: s3_testing:latest
    networks:
      - app
    entrypoint:
      - /app/wait-for-it.sh
      - -t
      - "30"
      - localstack:4572
      - --
      - pytest
      - test/
    environment:
      - AWS_ACCESS_KEY_ID=fake
      - AWS_DEFAULT_REGION=fake
      - AWS_SECRET_ACCESS_KEY=fake
  localstack:
    image: localstack/localstack
    ports:
      - "4566-4599:4566-4599"
    networks:
      - app
    environment:
      - SERVICES=s3
networks:
  app:
    driver: bridge

Next, we mock get_s3 again and this time replace it with an S3 client that is connected to localstack.

from unittest.mock import patch

import boto3
import pytest

from recipe import Recipe, S3_BUCKET


@pytest.fixture
def s3_localstack():
    s3 = boto3.client("s3", endpoint_url="http://localstack:4572")
    s3.create_bucket(Bucket=S3_BUCKET)
    with patch("recipe.get_s3", return_value=s3):
        yield s3

With this mock in place, we can run the same tests that we ran with moto.

def test_create_and_get(s3_localstack):
    Recipe(name="nachos", instructions="Melt cheese on chips").save()

    recipe = Recipe.get_by_name("nachos")
    assert recipe.name == "nachos"
    assert recipe.instructions == "Melt cheese on chips"


def test_get_does_not_exist(s3_localstack):
    with pytest.raises(s3_localstack.exceptions.NoSuchKey):
        recipe = Recipe.get_by_name("sandwich")

Localstack is extremely easy to use, but it takes almost 30 seconds to spin up on my machine.

Recommendation

Both moto and localstack are very powerful and easy to work with. Both solutions do a good job of implementing the S3 API, and they also support other AWS services including EC2, RDS, Lambda, and more. They can both be used to test code in other languages in addition to Python.

Localstack is probably the closest thing to actually connecting to AWS, but for my simple use case presented above, I can’t justify the extra overhead and time required to spin up the stack. Therefore, I recommend moto as it’s the most lightweight solution that properly implements the S3 API.

For more complicated projects that are testing S3 performance, localstack could be a good choice. Botocore stubs don’t make the cut.

What did I miss?

Thanks for reading! If you have any feedback, I’d love to hear from you - follow me on Twitter or message me on LinkedIn.