Dear Testing, I was wrong about you

Dear Testing,

I’m sorry I was so wrong about you. When we first met in my Software Design class, I didn’t know what to think of you. Honestly, I think you were misconstrued. Your purpose and design were skipped over, and we were tossed into assert statements without regard. For our final project in that class, there was a requirement everyone had to write tests for the code they worked on.

I worked on the database and business logic. Tasks that you are made for. Some of my teammates however, who only worked on the front-end, struggled to figure out how they could write any tests. Before we divided the work, we foolishly forgot to think that testing front-ends programmatically isn’t a real thing.

I remember falling asleep in a table in another classroom at 1AM while they were still trying to figure it out. When I woke up and 2 hours later, still nothing had come about. In the end, I don’t even remember what we did, I think we set up assert statements for the colors of buttons or something. Its okay, you can laugh.

Since that day, I doubted you for all the wrong reasons. Like some idiot, I tested my code manually, irrespective of potential edge cases, and not validating before writing more code. At least I never put all my code in a single big try except block.

I’m sorry testing. Please have me back.

Arez

For those who have worked in a production code environment, you probably are thinking “no duh”. You likely have experienced testing all the time, and you likely have a love hate relationship with it, and some strong opinions. However, to the uninitiated testing really isn’t a topic that is covered well enough. Either you have a class that directly focuses on the topic briefly or you learn on the job. Looking through the internet to understand the methodology behind testing, you have to watch various different videos and piece it all together.

To be fair, my neglect of testing was to some extent out of laziness. Whenever I wrote code for a project, even for my senior capstone which was a large codebase. I never once tested. Instead, I just knew the entire codebase so well that I was able to debug most issues that came up.

In reality, the enlightenment was right in front of my eyes. I don’t recall the specific event, but in reaction to a software oversight, one of my favorite professors told me “You can’t call yourself an ‘engineer’ if you don’t test everything the best you can” She was right. But testing should be a skill that is incredibly valuable even for smaller scale personal projects, and developing with tests in mind helps us write better code.

It is not enough to just learn HOW to test, its important to WHEN we should test, WHAT should we test, and WHY we should test. Anyone after watching a video could understand how to write a basic test. Yet if you do not understand the methodological reasoning behind testing, knowing how to write a test is meaningless. It is that methodology that made testing feel cumbersome to me. Understanding the connections between the when, where, why and how enabled me to overcome that block.

I certainly am still developing my intuition when it comes to testing, and managing tests. However, we all start somewhere and I am writing the following article to demonstrate some testing fundamentals, and I will be doing so using a popular python module for testing known as pytest. However a lot of these concepts can be applied to other languages as well. In the following I am going to take a function that needs a makeover, and use it to demonstrate fundamental testing concepts.

Testing == Better Code

Lets use the following code sample to show what I mean. Note that the code samples that I am going to use are intentionally simple. As I want to communicate the overarching ideas behind testing, rather than having you understand some complicated functionality.

Lets say I have the following function. In short, this function reads a file for a keyword, and “scrambles” the first appearance of that keyword in the file. Then writes to a new file.

				
					
def process_text_file(keyword, shift_amount=3):

    input_file = "src/input.txt"
    output_file = "src/output.txt"
    
    try:
        with open(input_file, 'r') as f:
            text = f.read()
    except FileNotFoundError:
        return "Error: File not found"
    
    keyword_index = text.find(keyword)
    
    if keyword_index == -1:
        result = text
    else:
        before_keyword = text[:keyword_index]
        after_keyword = text[keyword_index + len(keyword):]
        
        shifted_keyword = ""
        
        for char in keyword:
            if char.isalpha():
                shifted_keyword += chr(ord(char) + shift_amount)
            else:
                shifted_keyword += char
        
        result = before_keyword + shifted_keyword + after_keyword
    
    with open(output_file, 'w') as f:
        f.write(result)
    
    return result

If we want to break it down line by line the function:

Reads text from a file at src/input.txt
Searches for a keyword inside of the text
- If the keyword is not found, it returns the text in the file
If the keyword is found, we take the index of that keyword in the returned string and use it to split the text from text before and after the keyword
We use this index to split the keyword into text before and after it in the file buffer
We shift the ordinal value of each of the characters after the keyword by the shift amount
We combine the newly shifted keyword with the text before and after it.
We write all of this to the file src/output.txt

There’s nothing fundamentally wrong with this function. Oftentimes we will go and write code like this, and then make it look better and be more extensible through refactoring. I have found that for me, thinking about making the code testable (even if I don’t test it) has helped me in my refactoring process.

Now, lets say that this function belongs to a larger module. One that maybe generates our text, writes it to the file, and provides the keyword to search for. Whatever it may be. Lets say we run the module and we have the keyword “hello” and a shift amount of 3, expecting the output file to contain “khoor” (each letter shifted by 3 ordinally), however instead we get a completely different result – maybe the keyword wasn’t shifted at all, or the file is empty, or we get an error.

Debugging wise, it becomes difficult to tell where we went wrong. Was it the file reading or writing , the keyword search logic, the character shifting? Was it something completely outside of the scope of the function? You might be thinking “Okay well we can just verify all of those work by just looking at the code” To which I say, you can probably do it for this simple example, but you can’t do this for everything you code. If you feel like you can, well congratulations, you have just solved the Halting Problem and Turing and Gödel are rolling in their graves.

Through creating test(s) we can verify which component is working as expected, and which isn’t. Lets go ahead and create a test for this function using pytest.

How to write a test

Before we start writing tests, lets go over some of the basics of testing in pytest. Originally, when writing this article I was intending on going into more detail about pytest, however I am going to focus on testing fundamentals today.

Foremost, the most important thing in testing (regardless of language) is the assert. We use an assert whenever we want to compare what we expected vs what we actually obtained.

Say I had this simple adding function:

				
					def add(a, b):
   return a + b

and I wanted to verify it was working. I could create the following test in pytest:

				
					import add 
import pytest

def test_add():
    result = add(2, 3)
    assert result == 5

Then I could run it using pytest test_filename.py in the terminal, or simply pytest if I want to run all tests in the current directory. This seems pretty simple, but imagine all of the outputs that you can verify with assert! You can verify data, return statements and much more.

Another important concept in testing is the idea of mocking. Mocking is the practice of replacing real dependencies or external systems with fake, controlled versions during testing. This allows you to test your code in isolation without relying on things like actual files, databases, network calls, or other external resources. For example, instead of reading from a real file on disk, you could mock the file reading operation to return predetermined test data.

For instance lets say we are calling on an API, but we don’t actually want to call on the API itself for whatever reason (rate-limiting, no connection, etc.) Well what we can instead do is mock it:

				
					import requests
from unittest.mock import patch

def get_user_data(user_id):
    """Fetches user data from an external API."""
    response = requests.get(f"https://api.example.com/users/{user_id}")
    data = response.json()

    user_info = {
        "user_id": data["id"],
        "full_name": data["name"],
        "contact_email": data["email"],
        "is_active": data.get("active", True)
    }
    
    return user_info

# In pytest, this replaces any calls to request.get with the mock
@patch('requests.get')
def test_get_user_data(mock_get):
	# Sets the return value of requests.get to be hardcoded to these values
    mock_get.return_value.json.return_value = {
        "id": 123,
        "name": "John Doe",
        "email": "john@example.com",
        "active": True
    }
    
    #Call the function, with the requests.get call inside mocked
    result = get_user_data(123)
    
  
    assert result["full_name"] == "John Doe"
    assert result["contact_email"] == "john@example.com"
    assert result["is_active"] == True
    
   
    mock_get.assert_called_once_with("https://api.example.com/users/123")

In this example, we never actually make a real HTTP request to the API. Instead, we mock requests.get to return fake data. This doesn’t seem all that bad, but imagine doing this for all sorts of things. Further, ask yourself: are we really testing anything of value here? What we are really testing is how we manipulate the data, not that we can get it,.

In fact, lets take a look at a test for our original process_text_file() to see how mocks could be horrific.

				
					from unittest.mock import mock_open, patch
import pytest

def test_process_text_file_with_mocks():
    """Test process_text_file using mocks to avoid actual file I/O."""
    
    mock_file_content = "Hello world, this is a test"
    
    with patch('builtins.open', mock_open(read_data=mock_file_content)) as mock_file:
        result = process_text_file("world", shift_amount=3)
        
        assert result == "Hello zruog, this is a test"

        mock_file.assert_any_call("src/input.txt", 'r')
        mock_file.assert_any_call("src/output.txt", 'w')
        mock_file().write.assert_called_once_with("Hello zruog, this is a test")

Don’t you see how this is disgusting? I have to read through a bunch of mock calls to even understand what is going on, and what I am even testing. We are needing to write multiple extra lines of code to just imitate reading and writing to a file and that the calls for it to be read and written from worked. Imagine if we had a file that was longer for instance, or recall our previous API call. This is only one test case, could you imagine doing this multiple times? As our code changes, our tests change too, with all these mocks, it becomes a bigger headache to make changes to our tests in the future.

How can we fix this? Well how about we take the logic that actually modifies the data, and separate it from json reading code. So looking at our original function, lets go ahead and do the following:

				
					def process_text_file(keyword):

    input_file = "src/input.txt"
    output_file = "src/output.txt"
    
    try:
        with open(input_file, 'r') as f:
            text = f.read()
    except FileNotFoundError:
        return "Error: File not found"
    
    #Replace shifting logic with a call to the function
    shift_word(text, keyword, 3)
        
    with open(output_file, 'w') as f:
        f.write(result)
    
def shift_word(text, keyword, shift_amount): 

	keyword_index = text.find(keyword)
    
    if keyword_index == -1:
        return text
    else:
        before_keyword = text[:keyword_index]
        after_keyword = text[keyword_index + len(keyword):]
        
        shifted_keyword = ""
        
        for char in keyword:
            if char.isalpha():
                shifted_keyword += chr(ord(char) + shift_amount)
            else:
                shifted_keyword += char
        
        result = before_keyword + shifted_keyword + after_keyword
	    return result

First off, this looks cleaner, but second this looks easier to test! Now if I want to test if my shifting logic works (which lets be real, is where things are likely to break) We can write the following:

				
					
def test_shift_word():
    
    text = "Hello world, this is a test"
    result = shift_word(text, "world", shift_amount=3)
    
    # "world" with shift of 3 becomes "zruog"
    assert result == "Hello zruog, this is a test"

Isn’t that beautiful. Now I can actually read the test and understand what the function shift_words should do super quickly. This is what is known as Abstraction. Now like anything abstraction can get things messy, so we don’t want to abstract away everything. For instance, I probably wouldn’t want to extract the json read and write, but if you were doing some more complicated read or write functions from a database, then you likely would put those in a separate function. Those should also be tested

The test we just created for the shift_word function is known as a unit test. A unit test is a test that focuses on testing a single, isolated piece of functionality – in this case, just the word-shifting logic. Unit tests are fast, don’t depend on external resources like files or databases, and make it easy to pinpoint exactly where bugs occur since they test one specific function or component at a time. Ideally you would want to write multiple unit tests for a function, one for each different input (case) into the function. Or if you’d like you could combine multiple cases in a single test, the choice is up to you.

Integration Test

Now lets say however for some reason, I want to actually access and test some files I have, and use the json read and write. A test like this which uses real files on disk and tests how multiple components work together (file reading, word shifting, and file writing) is known as an integration test. Integration tests are tests that verify how different parts of your system work together as a whole. Unlike unit tests, integration tests interact with real external resources and dependencies, testing the actual integration points between components. They’re slower and more complex to set up than unit tests, but they catch issues that only appear when components interact with each other.

So, if we want to do an integration test on our code, we are going to have to write a test for process_text_file as that includes writing and reading to the json. In reality, I don’t often see the value in doing this type of testing, cause if we fail to write or read from a JSON, it likely isn’t to do with our code. If it is, it should pop up when debugging, and be solved easily. For me, I still think there is value to tests like this at times. For instance, inside of testing for the C2 framework I am building I will test the agent sending responses back to my server, and if the server handles them properly using redis. To me that is valuable cause I can validate all steps in the chain of my C2 communication are functioning.

So, integration tests are the one place where mocking sort of has a place. But I still think you can just use the actual service you are mocking. If I am calling on an API, I’d rather just call on the actual API itself. If I am using some sort of database, or other storage file like a json. We can also opt to pass in a “fake” module. That does a lot of the overhead of mocking for us, by making it so that we can call on the same functions we would for the real service on the fake. For instance, fakeredis is a great tool that I have found when it comes to running my tests. Or, if there isn’t a fake option we could have a separate server that is running a database just for testing purposes, or a single sql lite file.

Lets try and apply an integration test to our current process_text_file function.
Taking a look at our current function, you can see a big issue with how this would work for testing.

				
					
def process_text_file(keyword):

    input_file = "src/input.txt"
    output_file = "src/output.txt"
    
    try:
        with open(input_file, 'r') as f:
            text = f.read()
    except FileNotFoundError:
        return "Error: File not found"
        
    shift_word(text, keyword, 3)
        
    with open(output_file, 'w') as f:
        f.write(result)

Foremost, our input and output files are hardcoded in! This is a bad idea for many reasons, but it doesn’t allow us the flexibility to use either a mocked test version of the file when we test. So we rewrite the function as so:

				
					
def process_text_file(keyword, input_file, output_file):
    
    try:
        with open(input_file, 'r') as f:
            text = f.read()
    except FileNotFoundError:
        return "Error: File not found"
        
    shift_word(text, keyword, 3)
        
    with open(output_file, 'w') as f:
        f.write(result)

Pretty simple, and frankly you would be surprised at how often I have written or seen code that doesn’t do this step. This is what is often known as Dependency Injection. Dependency injection is the practice of passing dependencies (like file paths, database connections, API clients, etc.) into a function or class as parameters. Instead of having your function create or reference its own dependencies internally, you “inject” them from the outside. This makes your code more flexible, reusable, and testable because you can easily swap out real dependencies for mocks or fakes during testing, or change configurations without modifying the function’s internal code. For example, when I mentioned fakeredis earlier, in my project, I would just pass a client to my fakeredis instead of to my real redis server. So the only thing I have to change in testing is the parameter that is passed in. No more writing mocks for each potential case.

Dependency injection is super cool, and it makes testing easy, and is enabled by a lot of fundamental design patterns. I highly recommend watching this video by CodeAesthetic to get started.

Returning to our code, I can now have two fake files that store the input and the output. Writing the test for this then becomes super easy, assuming that I have an input_file and output_file stored on there.

				
					
import pytest
import os

def test_process_text_file():
    
    input_file = "test_input.txt"
    output_file = "test_output.txt"
    
    try:
        result = process_text_file("world", input_file, output_file)
        
        assert result == "Hello zruog, this is a test"
    
        with open(output_file, 'r') as f:
            output_content = f.read()
        assert output_content == "Hello zruog, this is a test"
        
    finally:
	    #Function to clean up the output file, if we wanted we could have pytest handle this for us using the tmp_path functionality
        if os.path.exists(output_file):
            os.remove(output_file)

Now if our previous unit test that we worked for the shifting logic passes, but this test fails. We can assume that the bug likely has something to do with our json read and write (or we messed up in the test) rather in the shifting logic. This makes debugging much easier! In fact this approach has actually helped me find bugs in external services like fakeredis

That covers the second level of tests, or integration tests. The final level of tests are End-to-End tests or E2E tests. E2E tests are tests that verify your entire application workflow from start to finish, simulating real user interactions and behavior. Unlike unit tests that focus on individual functions or integration tests that check how components work together, E2E tests validate that your complete system works correctly as a user would experience it. This makes them the hardest to maintain as well, and they are the lowest “quantity of test you will have”

Now I am not going to go out and pretend i know all about E2E testing, cause frankly I have done very little of it. However it is a critical type of testing, that deserves its own attention once you get to that stage in development.

When to test and what to test

So, when should we start writing tests and what should we test? There is no right answer to this, and the when and what of testing is quite nuanced. In short these decisions rely on a variety of factors. Such as the scope of the project and what you are writing, your timeline, project requirements and much much more. Those factors also implicate the limits of other topics I have covered: there is such thing as too much testing, too much dependency injection, and too much abstraction.

As a general rule of thumb, I often prefer to write the code first in all of its ugly glory, then once I understand my code, I begin refactoring and making it work for testing. Otherwise, you can begin to abstract code and make it harder to program for code that you might not even use, or doesn’t need it in the end.

First we write, then we refine.

This blog has already covered a lot of ground, but there is still so much that can be touched on, and so much that I still have yet to learn. For instance, in this blog I didn’t touch much on E2E tests as much, or even the specific details of testing in pytest and how I use it. I will save that for another day.

Lastly, I tried my best to save my general opinions away, but you can see the type of testing approach I have found: I don’t enjoy doing excessive unit tests, I try and avoid mocking, and I like abstraction and dependency injection. Is my approach going to always be this way? No. Do I have an insane amount of experience in this? No. Do I always follow these principles correctly or always implement them correctly? Definitely not.

If you’d like to hear from other opinions besides my own, here are some starter videos and blog posts that could be helpful. Until next time 🙂

I Mock Your Mocks by Lane Wagner and a video of the Primeagen reading through it
Dependency Injection by CodeAesthetic (I love every single video on this channel)
Abstraction Can Make Your Code Worse by CodeAesthetic
Pytest Tutorial by Tech with Tim
The Grug Brained Developer I suggest reading this entire thing, the primeagen has a great video reviewing parts of this article in this video, that helped me realize some things: When to Unit, E2E, and Integration Test
How to make code more testable, by factoring out and abstracting side effects by Studying with Alex
Test your own projects 🙂