At some point, every engineer has to decide whether to write tests for something or just ship the feature and move on.
Under a time crunch, I’ll often write tests for the easy things (e.g. pure functions) or write the tests that provide the biggest bang for their buck (e.g. an end-to-end integration test for the service).
Testing code that interacts with external systems, like a database or S3, requires a bit more effort. However, important business logic often happens in this code and recently I’ve become more interested in testing it.
In this post, I’ll explore three ways of testing S3 in Python.
Setup
Let’s consider a simple CRUD app for recipes, backed by S3.
from dataclasses import dataclass
import json
import boto3
S3_BUCKET = "recipes"
def get_s3():
return boto3.client("s3")
@dataclass
class Recipe:
name: str
instructions: str
@classmethod
def get_by_name(cls, name: str):
"""Looks up a Recipe by name
Args:
name (str): Recipe name
Returns a Recipe object
"""
response = get_s3().get_object(Bucket=S3_BUCKET, Key=name)
response = json.loads(response["Body"].read())
return cls(response["name"], response["instructions"])
@classmethod
def update_instructions(cls, name: str, new_instructions: str):
"""Updates the instructions for a recipe
Args:
name (str): Name of the recipe to update
new_instructions (str): New instructions
"""
recipe = cls.get_by_name(name)
recipe.instructions = new_instructions
return recipe
@classmethod
def delete(cls, name: str):
"""Deletes a recipe
Args:
name (str): Name of the recipe to delete
"""
get_s3().delete_object(Bucket=S3_BUCKET, Key=name)
def to_json(self):
"""Serialize the recipe to json
Returns:
str: JSON representation of the Recipe
"""
return json.dumps({"name": self.name, "instructions": self.instructions})
def save(self):
"""Persists a recipe to S3
"""
serialized_recipe = self.to_json().encode("utf-8")
get_s3().put_object(Bucket=S3_BUCKET, Key=self.name, Body=serialized_recipe)
All tests below use pytest. All code is runnable and available on Github.
Option 1: moto
Moto is a Python library that makes it easy to mock out AWS services in tests. Let’s use it to test our app.
First, create a pytest a fixture that creates our S3 bucket. All S3 interactions within the mock_s3
context manager will be directed at moto’s virtual AWS account.
import boto3
from moto import mock_s3
import pytest
from recipe import Recipe, S3_BUCKET
@pytest.fixture
def s3():
"""Pytest fixture that creates the recipes bucket in
the fake moto AWS account
Yields a fake boto3 s3 client
"""
with mock_s3():
s3 = boto3.client("s3")
s3.create_bucket(Bucket=S3_BUCKET)
yield s3
Next, we can test creating a new Recipe and fetching it.
def test_create_and_get(s3):
Recipe(name="nachos", instructions="Melt cheese on chips").save()
recipe = Recipe.get_by_name("nachos")
assert recipe.name == "nachos"
assert recipe.instructions == "Melt cheese on chips"
If we try to fetch a Recipe that doesn’t exist, an exception should be raised. This test covers that scenario.
def test_get_does_not_exist(s3):
with pytest.raises(s3.exceptions.NoSuchKey):
recipe = Recipe.get_by_name("sandwich")
We can also update a Recipe. This test confirms that the data is updated after save()
is called.
def test_update(s3):
old_instructions = "Melt cheese on chips"
new_instructions = "Microwave a plate full of tortilla chips and cheese"
Recipe(name="nachos", instructions=old_instructions).save()
new_recipe = Recipe.update_instructions(
name="nachos", new_instructions=new_instructions
)
# Nothing changes until you call save()
recipe = Recipe.get_by_name("nachos")
assert recipe.instructions == old_instructions
new_recipe.save()
# Recipe updates after saving
recipe = Recipe.get_by_name("nachos")
assert recipe.instructions == new_instructions
Finally, we can delete a recipe and confirm that the data in S3 disappears.
def test_delete(s3):
Recipe(name="nachos", instructions="Melt cheese on chips").save()
response = s3.list_objects_v2(Bucket=S3_BUCKET)
assert len(response["Contents"]) == 1
assert response["Contents"][0]["Key"] == "nachos"
Recipe.delete("nachos")
# Data in S3 is gone after deleting the recipe
response = s3.list_objects_v2(Bucket=S3_BUCKET)
assert "Contents" not in response.keys()
Overall, moto does a great job of implementing the S3 API. It’s easy to install, feels just like the real S3, and doesn’t require any code changes.
Option 2: Botocore stubs
Botocore stubs allow you to mock out S3 requests with fake responses.
Below is a pytest fixture that creates an S3 stub. Since other S3 clients won’t use this stub, we also need to patch get_s3
and replace its return value with the stub - thereby forcing all S3 clients in the Recipe class to use our stub.
import datetime
import json
from dateutil.tz import tzutc
from io import BytesIO
from unittest.mock import patch
import boto3
from botocore.stub import Stubber, ANY
from botocore.response import StreamingBody
import pytest
from recipe import Recipe, S3_BUCKET
@pytest.fixture
def s3_stub():
"""Pytest fixture that mocks the get_s3 function with a S3 client stub
Yields a Stubber for the S3 client
"""
s3 = boto3.client("s3")
stubber = Stubber(s3)
with patch("recipe.get_s3", return_value=s3):
yield stubber
Then, we can stub out responses for the put_object
and get_object
S3 APIs. With those stubs in place, we can run the test that creates and subsequently fetches a Recipe.
def test_create_and_get(s3_stub):
# Stub out the put_object response
# Note: These stubs are incomplete - I omitted things such as
# HTTP headers for brevity
put_object_response = {
"ResponseMetadata": {
"RequestId": "5994D680BF127CE3",
"HTTPStatusCode": 200,
"RetryAttempts": 1,
},
"ETag": '"6299528715bad0e3510d1e4c4952ee7e"',
}
put_object_expected_params = {"Bucket": ANY, "Key": ANY, "Body": ANY}
s3_stub.add_response("put_object", put_object_response, put_object_expected_params)
# Create the StreamingBody that will be returned by get_object
encoded_message = json.dumps(
{"name": "nachos", "instructions": "Melt cheese on chips"}
).encode("utf-8")
raw_stream = StreamingBody(BytesIO(encoded_message), len(encoded_message))
# Stub out the get_object response
get_object_response = {
"ResponseMetadata": {
"RequestId": "6BFC00970E62BC8F",
"HTTPStatusCode": 200,
"RetryAttempts": 1,
},
"LastModified": datetime.datetime(2020, 4, 6, 5, 39, 29, tzinfo=tzutc()),
"ContentLength": 58,
"ETag": '"6299528715bad0e3510d1e4c4952ee7e"',
"ContentType": "binary/octet-stream",
"Metadata": {},
"Body": raw_stream,
}
get_object_expected_params = {"Bucket": ANY, "Key": ANY}
s3_stub.add_response("get_object", get_object_response, get_object_expected_params)
# Activate the stubber
with s3_stub:
recipe = Recipe(name="nachos", instructions="Melt cheese on chips")
recipe.save()
recipe = Recipe.get_by_name("nachos")
assert recipe.name == "nachos"
assert recipe.instructions == "Melt cheese on chips"
While botocore stubs are functional, I don’t like working with them for several reasons:
-
They require a lot more prep. Creating stubs is time-consuming. Even if you run the real code interactively and copy the response, some things need to be replaced - such as the StreamingBody above.
-
They’re fragile and fake. Responses are returned first in, first out - so if you call the S3 APIs in a different order than you added the responses, it will throw an error. If you have a bug in how you call the API, it might not be caught.
-
To make the stubs look somewhat realistic, you have to mock many fields that your code doesn’t care about and bloat your tests with fake responses.
-
They leak implementation details from the module being tested. For example, if a module switched from using
s3.list_objects
tos3.list_objects_v2
, the test would fail because it depends on a specific API being called. This creates an unnecessary dependency on the private API of the module, instead of testing the public API.
Option 3: Localstack
A third option is localstack, which allows you to bring up an entire AWS cloud stack locally.
First, we need to bring up localstack. I choose to do this with docker-compose.
version: "3.7"
services:
tests:
image: s3_testing:latest
networks:
- app
entrypoint:
- /app/wait-for-it.sh
- -t
- "30"
- localstack:4572
- --
- pytest
- test/
environment:
- AWS_ACCESS_KEY_ID=fake
- AWS_DEFAULT_REGION=fake
- AWS_SECRET_ACCESS_KEY=fake
localstack:
image: localstack/localstack
ports:
- "4566-4599:4566-4599"
networks:
- app
environment:
- SERVICES=s3
networks:
app:
driver: bridge
Next, we mock get_s3
again and this time replace it with an S3 client that is connected to localstack.
from unittest.mock import patch
import boto3
import pytest
from recipe import Recipe, S3_BUCKET
@pytest.fixture
def s3_localstack():
s3 = boto3.client("s3", endpoint_url="http://localstack:4572")
s3.create_bucket(Bucket=S3_BUCKET)
with patch("recipe.get_s3", return_value=s3):
yield s3
With this mock in place, we can run the same tests that we ran with moto.
def test_create_and_get(s3_localstack):
Recipe(name="nachos", instructions="Melt cheese on chips").save()
recipe = Recipe.get_by_name("nachos")
assert recipe.name == "nachos"
assert recipe.instructions == "Melt cheese on chips"
def test_get_does_not_exist(s3_localstack):
with pytest.raises(s3_localstack.exceptions.NoSuchKey):
recipe = Recipe.get_by_name("sandwich")
Localstack is extremely easy to use, but it takes almost 30 seconds to spin up on my machine.
Recommendation
Both moto and localstack are very powerful and easy to work with. Both solutions do a good job of implementing the S3 API, and they also support other AWS services including EC2, RDS, Lambda, and more. They can both be used to test code in other languages in addition to Python.
Localstack is probably the closest thing to actually connecting to AWS, but for my simple use case presented above, I can’t justify the extra overhead and time required to spin up the stack. Therefore, I recommend moto as it’s the most lightweight solution that properly implements the S3 API.
For more complicated projects that are testing S3 performance, localstack could be a good choice. Botocore stubs don’t make the cut.
What did I miss?
Thanks for reading! If you have any feedback, I’d love to hear from you - follow me on Twitter or message me on LinkedIn.