Database Tests Are Lying to You — Fix It With Testcontainers

When building the data layer of one of my project, I tried following standard Python testing advice: wire the test suite to an in-memory SQLite database for maximum speed. I immediately hit an architectural wall.

My User and Book models rely heavily on PostgreSQL-specific features—specifically ARRAY columns to store arrays of bookmarked IDs and categories, and JSONB for heavily nested document content. SQLite does not natively support PostgreSQL arrays. The moment my test suite attempted to initialize the schema and execute an array-append operation on a user's favourite_books, the tests either crashed or silently corrupted the data type.

I faced a stark choice: compromise my production schema and rewrite my queries to appease a local SQLite limitation, or find a way to run real infrastructure inside my test suite. If a test database cannot even parse your production schema, any passing test it yields is a lie.

2. The Common Mistake — SQLite for Postgres, mongomock for MongoDB

Developers inevitably reach for SQLite and mongomock because the setup is frictionless. Changing a SQLAlchemy connection string to sqlite:///:memory: or patching PyMongo with mongomock.MongoClient takes seconds. They run completely in RAM, require no Docker daemons, and execute in fractions of a millisecond. In a vacuum, they appear to be the perfect isolated testing tools.

The reality is that they implement an entirely different set of database constraints, query planners, and execution rules than your production infrastructure.

When testing a PostgreSQL-backed service, SQLite fundamentally chokes on Postgres-native data types before your tests even execute. In the project the User model utilizes PostgreSQL's ARRAY type to store favourite_books as a list of integers, and the Book model uses JSONB for heavily nested content.

from sqlmodel import SQLModel, Field, Column
from sqlalchemy.dialects.postgresql import ARRAY, JSONB
from sqlalchemy import Integer

class Book(SQLModel, table=True):
    id: int = Field(default=None, primary_key=True)
    # SQLite natively lacks JSONB binary storage and constraint validation
    content: dict = Field(default={}, sa_column=Column(JSONB))

class User(SQLModel, table=True):
    id: str = Field(primary_key=True)
    # SQLite cannot compile this PostgreSQL-specific schema natively
    favourite_books: list[int] = Field(default=[], sa_column=Column(ARRAY(Integer)))

If you attempt to wire your test suite to an in-memory sqlite:///:memory: database, SQLModel.metadata.create_all(engine) may silently downgrade JSONB to plain JSON and treat ARRAY columns as text, allowing schema creation to succeed while corrupting your type guarantees from the start,.

To force the local tests to run, developers often implement SQLAlchemy compilation hacks—writing custom decorators to intercept PostgreSQL types and force SQLite to parse them as plain text or generic JSON strings. I initially used this exact workaround. While it successfully prevents the suite from crashing on startup, it creates a deeply unreliable testing environment. You lose PostgreSQL's strict binary serialization and native array intersection operators. Your local tests end up validating loose string manipulation, while your production application is executing strict data operations.

The exact same illusion occurs when utilizing mongomock to simulate a MongoDB cluster. In one of my project, I built a deep search feature that allows users to scan their conversational history to find specific sessions and exact messages. This requires passing a complex aggregation pipeline to MongoDB, heavily utilizing $filter and $regexMatch within a $project stage to dynamically sift through nested chat arrays directly at the database level.

# repository/chat_repository.py

from typing import List, Dict

class ChatRepository: 
   def __init__(self, db): 
     self.chat_collection = db['chats']

   def aggregate_user_chats_by_search_term(self, user_id: str, search_term: str) -> List[Dict]:
    pipeline = [
        {"$match": {"user_id": user_id}},
        {"$project": {
            "_id": 1,
            "title": 1,
            "timestamp": 1,
            "matching_messages": {
                "$filter": {
                    "input": "$queries",
                    "as": "msg",
                    "cond": {
                        "$regexMatch": {
                            "input": {"$ifNull": ["$$msg.content", ""]}, 
                            "regex": search_term,
                            "options": "i" 
                        }
                    }
                }
            }
        }},
        {"$match": {
            "matching_messages": {"$ne": []}
        }},
        {"$sort": {"timestamp": -1}}
    ]
    
    return list(self.chat_collection.aggregate(pipeline))

Because mongomock reimplements MongoDB's aggregation engine in pure Python, it has significant blind spots. It chokes on advanced array operators and string evaluations . Tests utilizing mongomock will often crash with a NotImplementedError or silently fail to filter the nested arrays correctly. You are left with a terrible choice: either your tests lie about your production behavior, or you are forced to cripple your highly optimized database query into a slow, application-level Python loop just to make the mock happy.

3. The Other Mistake -Mocking the ORM Directly

Recognizing the flaws in SQLite and mongomock, developers often swing to the opposite extreme: avoiding the database entirely by mocking the ORM directly. Using Python's unittest.mock.MagicMock, they intercept the SQLAlchemy session or the PyMongo collection methods at the function level.

from unittest.mock import MagicMock
from repositories.bookmark_repository import BookmarkRepository

def test_get_bookmarked_books_orm_mock():
    mock_session = MagicMock()
    mock_user = MagicMock(favourite_books=[1, 42])
    
    # Brittle: Tying the test to the exact ORM implementation
    mock_session.get.return_value = mock_user
    
    repo = BookmarkRepository(mock_session)
    result = repo.get_bookmarked_books("user_123")
    
    assert len(result) == 2
    mock_session.get.assert_called_once_with(User, "user_123")

This pattern guarantees test fragility. You are testing whether you called the SQLAlchemy get method correctly, not whether the database state actually mutated or retrieved the correct records. If you refactor the internal repository logic to use an explicit select().where() statement instead of .get() for performance optimization, this test immediately fails. The business behavior remained perfectly identical, but the test shattered because it was coupled to the internal implementation details of the ORM rather than the data contract.

4. The Fix — Testcontainers

The only way to guarantee that your repository queries work in production is to execute them against the exact database engine running in production. testcontainers-python provides an API to programmatically spin up disposable Docker containers for your test suite, allowing you to run fast, isolated tests against the real infrastructure. Yes, spinning up Docker containers adds a few seconds of startup time to your test suite. But remember: your suite runs once per CI push, not per developer keystroke. That overhead is negligible compared to a production outage.

4a. PostgreSQL with testcontainers

Dropping and recreating PostgreSQL tables between hundreds of tests takes too long. The professional pattern is to spin up one global PostgreSQL container for the entire test session, create the schema once, and use SQLAlchemy's transactional savepoints to isolate individual tests. This ensures that when a test finishes, its data is rolled back, leaving a pristine database for the next test.

import pytest
from sqlmodel import Session, SQLModel, create_engine
from testcontainers.postgres import PostgresContainer
from repositories.bookmark_repository import BookmarkRepository
from core.models import User

# Note: This uses the synchronous Session for simplicity.
# For async FastAPI projects, replace with AsyncSession and 
# async fixtures using pytest-asyncio and an async driver like asyncpg.

@pytest.fixture(scope="session")
def postgres_engine():
    with PostgresContainer("postgres:15-alpine") as postgres:
        engine = create_engine(postgres.get_connection_url())
        SQLModel.metadata.create_all(engine)
        yield engine

@pytest.fixture
def db_session(postgres_engine):
    with postgres_engine.connect() as connection:
        transaction = connection.begin()
        with Session(bind=connection, join_transaction_mode="create_savepoint") as session:
            yield session
        transaction.rollback()

def test_repository_persists_array_state(db_session: Session):
    test_user = User(id="user_123", favourite_books=[])
    db_session.add(test_user)
    db_session.commit()

    repo = BookmarkRepository(db_session)
    repo.toggle_book("user_123", 99)

    # This asserts against the real PostgreSQL array enforcement
    fetched = db_session.get(User, "user_123")
    assert fetched.favourite_books == [99]

4b. MongoDB with testcontainers

Testing complex MongoDB aggregations requires the exact same approach. testcontainers programmatically boots a real MongoDB instance via Docker. You inject the authenticated client into your repository tests, insert the specific BSON document structures representing your raw data, and execute the complex aggregation pipelines.

import pytest
from pymongo import MongoClient
from testcontainers.mongodb import MongoDbContainer
from repository.chat_repository import ChatRepository

@pytest.fixture(scope="session")
def mongo_client():
    # Boots a real, ephemeral MongoDB cluster for the test session
    with MongoDbContainer("mongo:7.0") as mongo:
        client = MongoClient(mongo.get_connection_url())
        yield client

@pytest.fixture
def mongo_db(mongo_client):
    db = mongo_client.get_database("test_db")
    yield db
    # Drop collections after each test to maintain strict isolation
    for collection in db.list_collection_names():
        db.drop_collection(collection)

def test_chat_search_aggregation_filters_messages_correctly(mongo_db):
    # 1. ARRANGE: Insert complex nested documents mimicking production data
    mongo_db.chats.insert_many([
        {
            "user_id": "user_123",
            "title": "Quantum Physics Chat",
            "timestamp": "2026-06-06T10:00:00Z",
            "queries": [
                {"content": "What is a qubit?"},
                {"content": "Explain quantum entanglement to me."},
                {"content": "How does superposition work?"}
            ]
        },
        {
            "user_id": "user_123",
            "title": "Dinner Recipes",
            "timestamp": "2026-06-05T10:00:00Z",
            "queries": [{"content": "How do I bake a chocolate cake?"}]
        }
    ])

    repo = ChatRepository(mongo_db)

    # 2. ACT: Execute the complex pipeline that mongomock fails on
    # We are searching for "entanglement" (testing case-insensitivity too)
    results = repo.aggregate_user_chats_by_search_term("user_123", "EnTaNgLeMeNt")

    # 3. ASSERT: Verifies real MongoDB \(filter and \)regexMatch engine behavior
    assert len(results) == 1
    assert results[0]["title"] == "Quantum Physics Chat"
    
    # Crucially, verify the database successfully filtered the nested array
    # down to ONLY the specific message that matched the regex
    assert len(results[0]["matching_messages"]) == 1
    assert results[0]["matching_messages"][0]["content"] == "Explain quantum entanglement to me."

What changed here is not just the tooling — it is that your tests now make the exact same guarantees your production database makes. Because you are testing against the actual C++ MongoDB engine rather than a Python approximation, you can confidently write highly optimized, database-level data transformations without fear of your test suite holding you back.

5. Quick Tips — Other Deps

When testing cache layers, fakeredis is a safe exception to the real infrastructure rule. Because Redis data structures are simple, it reliably mimics common TTL expirations and atomic pipeline operations in memory without risking divergent production behavior.

For AWS S3 object storage, moto is the industry standard. It automatically intercepts internal boto3 requests at the transport layer, providing real bucket creation, presigned URL generation, and multipart upload behavior without requiring a localized MinIO container.

For integrations where your application acts as the client, pytest-httpserver spins up a real, local HTTP server. This allows you to assert against actual outgoing payloads and serve deterministic JSON responses without patching Python HTTP libraries.

When executing external HTTP calls to third-party services like a Groq LLM endpoint, mock at the transport layer using respx or responses. This ensures your error handling logic correctly processes real network timeouts and HTTP 500 status codes rather than artificially bypassing them.

Time-dependent logic and test data generation deserve deeper treatment — a follow-up post will cover freezegun for deterministic clock control and factory_boy for replacing brittle hardcoded fixtures with reusable SQLAlchemy-integrated model factories.

6. Closing

The pattern across all of these is the same: the closer your test dependency is to the real thing, the fewer production surprises you get. Testcontainers costs you a Docker daemon and a few seconds of startup. What it buys is tests that make the exact same guarantees your production stack makes. That trade is always worth it.

Your Database Tests Are Lying to You — Here's How to Fix That

2. The Common Mistake — SQLite for Postgres, mongomock for MongoDB

3. The Other Mistake -Mocking the ORM Directly

4. The Fix — Testcontainers

4a. PostgreSQL with testcontainers

4b. MongoDB with testcontainers

5. Quick Tips — Other Deps

6. Closing

Comments

More from this blog

One Pipeline, Four Test Frameworks — Designing CI/CD That Doesn't Slow You Down

What Your E2E Tests Don't Tell You About Session Security

Fat Service Layer, Brittle Tests — The Repository Pattern Is the Fix

Command Palette

2. The Common Mistake — SQLite for Postgres, mongomock for MongoDB

3. The Other Mistake -Mocking the ORM Directly

4. The Fix — Testcontainers

4a. PostgreSQL with testcontainers

4b. MongoDB with testcontainers

5. Quick Tips — Other Deps

6. Closing

Comments

More from this blog