Building an XML file convertor project for DMARC records using Django

by Anna Makarudze


June 24, 2022, 4:54 a.m.



A couple of months ago, we received a bounty email warning us about our Django Girls domain lacking a DMARC policy. This meant that anyone could send phishing emails using our domain which is not good. We didn’t react as quickly as we should have and only took this matter seriously when we started receiving those phishing emails from hello@djangogirls.org in our own @djangogirls.org mailboxes!

We quickly published a DMARC policy and started receiving many DMARC reports in XML format. XML is not really human-readable therefore we could not read or interpret the DMARC records being sent to us. Surfing the internet led me to an online service for analyzing DMARC records that involved signing up for an account, a cost we did not plan for as Django Girls. However, we still needed a way to read the DMARC files and I was also looking for fun projects to work on for my GitHub profile so I decided, why not kill two birds with one stone. The source code for this project is available here.

Architecture

In my spare time, I have been reading Architecture Patterns with Python by Harry J.W. Percival and Bob Gregory which has made me aware of the Domain-Driven Design (DDD) methodology and also pushed me to push myself to implement Test-Driven Development (TDD) for all my new projects. Being able to write unit tests for my code before (and sometimes) after I have written it has shown me how well I understand the code I am writing.

The app was developed using: * Python 3.9+ * Django 4.0+ * Postgres 12 * xmltodict for parsing xml.

Apps

The project is as simple as a Django project can be as it has only one app, xmlreader. It uses the Django ORM and Django’s MVC pattern. The app has been added to the settings.py file in the INSTALLED_APPS section as shown below.

Snippets of settings.py

INSTALLED_APPS += [
        "xmlreader",
]

I also changed the DATABASES setting to use Postgres engine and also provide database settings for GitHub Actions CI as shown below.

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": os.environ.get("DATABASE_NAME"),
        "USER": os.environ.get("DATABASE_USER"),
        "PASSWORD": os.environ.get("DATABASE_PASSWORD"),
        "HOST": os.environ.get("DATABASE_HOST"),
        "PORT": os.environ.get("DATABASE_PORT"),
    }
}

if os.environ.get("GITHUB_WORKFLOW"):
    DATABASES = {
        "default": {
            "ENGINE": "django.db.backends.postgresql",
            "NAME": "github_actions",
            "USER": "postgres",
            "PASSWORD": "postgres",
            "HOST": "127.0.0.1",
            "PORT": "5432",
        }
    }

And I also prefer to put my templates in one place so I also changed that setting.

TEMPLATES = [
    {
        "BACKEND": "django.template.backends.django.DjangoTemplates",
        "DIRS": [os.path.join(BASE_DIR, "templates")],
        "APP_DIRS": True,
        "OPTIONS": {
            "context_processors": [
                "django.template.context_processors.debug",
                "django.template.context_processors.request",
                "django.contrib.auth.context_processors.auth",
                "django.contrib.messages.context_processors.messages",
            ],
        },
    },
]

Then just as an example of how to configure the other environment variables, I just use the format I have adopted from working with the Django Girls repo for DEBUG mode as shown below.

# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = os.environ.get("SECRET_KEY")

# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = os.environ.get("DEBUG") != "FALSE"

if DEBUG:
    SECRET_KEY = "hello"

ALLOWED_HOSTS = []

if not DEBUG:
    ALLOWED_HOSTS += [os.environ.get("ALLOWED_HOSTS")]

Last but not least, I added some STATIC files settings and that’s all I needed for this to work.

STATIC_ROOT = os.path.join(BASE_DIR, "staticfiles")
STATICFILES_DIRS = [os.path.join(BASE_DIR, "static")]

Models

The app has only one model, Upload for storing the XML files. The model definition is shown below:

models.py

from django.db import models


class Upload(models.Model):
    file = models.FileField(upload_to="xml_files")
    date_uploaded = models.DateTimeField(auto_now_add=True)

    def __str__(self):
        return f"{self.file}"

Forms

The app has one form with only one field for taking the file path of the XML file, which extends the Django ModelForm. The form definition is shown below:

forms.py

from django import forms
from django.utils.translation import gettext_lazy as _

from .models import Upload


class UploadFileForm(forms.ModelForm):
    class Meta:
        model = Upload
        fields = ["file"]

    file = forms.FileField(label=_("Upload File"), allow_empty_file=True)

Views

The project has three views, an index view which renders the homepage and allows users to upload files as well as list all files uploaded, a static about which renders the page which describes the app, and a results view that reads the XML file and convert it to a dict and renders the contents of the dict.

The code for these views is shown below:

views.py

import xmltodict

from django.shortcuts import get_object_or_404, redirect, render
from django.urls import reverse

from .forms import UploadFileForm
from .models import Upload


def index(request):
    title = "Home"
    files = Upload.objects.all()
    if request.method == "POST":
        form = UploadFileForm(request.POST, request.FILES)
        if form.is_valid():
            file = form.save()
            return redirect(reverse("xmlreader:results", args=(file.pk,)))
    else:
        form = UploadFileForm()
    return render(
        request, "xmlreader/index.html", {"title": title, "form": form, "files": files}
    )


def about(request):
    title = "About"
    return render(request, "xmlreader/about.html", {"title": title})


def results(request, id):
    title = "Decoded File"
    file = get_object_or_404(Upload, id=id)

    with file.file.open(mode="rb") as xml_file:
        data_dict = xmltodict.parse(xml_file.read())

    return render(
        request, "xmlreader/results.html", {"title": title, "data_dict": data_dict}
    )

Templates

The templates/xmlreader folder has four templates, the base.html, about.html, index.html and the results.html. I will not discuss the code for the about.html and base.html files since they are both static, you can view them in the repo. I will however talk a bit about the index.html and results.html.

The index.html contains a form that has only one field, the upload file field. However, because I want to upload both the file contents and its name, I have to specify enctype="multipart/form-data" for the field to work. I also list the files which already have been uploaded with a link for each should one want to view them again below the form. The results.html presents the results of the decoded file. I had fun trying to get through the nested dictionaries.

Controllers/URLs

The xmlreader/urls.py has only three URLs and it has been included in the xml_reader/urls.py using the django.urls.include and that’s all I needed for the app to work.

Managing development environment

Pip-tools

I used pip-tools to manage the packages required by the project in the virtualenv. The requirements.in contains a list of required packages which are compiled to the requirements.txt file by running the command pip-compile. Installing the packages is done by running the standard pip install -r requirements.txt. I was introduced to pip-tools by one of our Django Girls contributors, Mark Walker, when we upgraded the Django Girls website from Django 2.0 to 2.2 and finally 3.2 and I have been hooked to it ever since.

Python-dotenv

It is best practice to have project secrets out of the settings.py. For this, I use python-dotenv to manage environment variables for my development environment and testing. pytest-dotenv makes it possible to specify the env files when running tests locally. This is another package I started working with after working with Mark Walker on upgrading the Django Girls website.

An example of a .env file is shown below:

export DJANGO_SETTINGS_MODULE="xml_reader.settings"
export DJANGO_DEBUG="FALSE"
export SECRET_KEY=""
export ALLOWED_HOSTS=[]
export COVERAGE="TRUE"
export DATABASE_NAME="dmarcs"
export DATABASE_USER=""
export DATABASE_PASSWORD=""
export DATABASE_HOST="127.0.0.1"
export DATABASE_PORT="5432"

Code formatting

I used black for code formatting instead of Flake8 since Django 4.0 + comes with all Python code already formatted by black. The other reason is having used Flake8 on Django Girls website, I noticed you can only configure your repo to give you Flake8 warnings on commit but still have to fix the Flake8 issues manually. However, black will format all your Python code automatically for you by running the command below on your local repo.

black .

Testing

I have been using pytest a lot in my project since I became so familiar working with it on the Django Girls website so it's no surprise I used pytest for unit testing this project. I coupled it with pytest-django to enable easy access to the test database, and pytest-dotenv for setting test environment variables. I also added coverage for coverage reports.

To run tests, I use the following command:

coverage run -m pytest

Continuous Integration

For the CI, I used GitHub Actions and set up three YAML files, one for Django which runs tests, and another for black which checks code formatting, and one for coverage which runs tests and upload the results to Codecov.

Final Outcome

I uploaded the XML files to the project on my localhost and managed to print PDF files of the DMARC reports we received. From the converted reports, it was clear that there were three domains that were failing DIKM and therefore would be rejected. These were gappssmtp.com (for Google, which we use for our email), mailchimp.com (our newsletter signups), and sendgrid.net (which we use to send emails from our website).

I had never configured DIKM and SPF records before so Claire reached out to her social network and we managed to get three awesome people to help us with our DMARC issues. We followed their recommendations and managed to get our automated emails working again securely and we haven’t received any more phishing emails from the hello@djangogirls.org email address!

Feel free to try and replicate this project yourself or clone my repo and play around with it.

Search
Coding Tips

Pytest assert True


Category: Django

Pytest equivalent of

unittest.TestCase.assertTrue()

is

assert some_condition.

For example

self.assertTrue(form.is_valid())

using pytest would be

assert form.is_valid().

✨Magic ✨