Generating DNS noise

DNS is one of those thing most people never think about. It’s one of those things in the background that quietly does its job and no one pays it no mind. Which is why it’s surprising to people when they discover that if their DNS traffic can be logged, a very informative picture of them can be created.

To combat this user fingerprinting/tracking, enhancements to DNS like DNS over HTTPS and DNS over TLS come into existence. And while I’m not knocking those solutions, I would like to point out that the endpoint still knows what lookups you performed. Unfortunately, there isn’t really any way currently to ask a server ‘hey, whats the IP address for google.com’’ and not have the other end know that you asked for Google’s website. It’s literally the nature of the task for the other end to know all the websites you asked for name resolution of.

So what can a person do to make it harder for the other end to create a useful picture of your surfing habits? Well, data is only as useful as it is clean. If you have a list of 1000 DNS requests from the user, then you can be fairly certain where that user was surfing (more technically, where the source IP was surfing, but let’s not get pedantic). However, if you had a list of 1000000 DNS requests, of which 1000 were the original legitimate DNS requests, and the remaining 999000 were randomly generated by a process, then how easily could you determine where the user was actually surfing? Essentially, you’ve reduced the Signal-to-Noise so low that the signal is ’lost’.

With this goal in mind, I set out to introduce some noise into my DNS requests as my weekend project this weekend. It turned out to be easier than I expected thanks to some existing work by others (always stand on other’s shoulders when you can). The first thing I did was clone down a copy of noisy. After playing with it a bit, I felt like it was a decent base, but I didn’t care for the default “root_urls” (The Pirate Bay? Really? I don’t need my ISP sending me cease-and-desist letters, thank you), I didn’t think there were enough “root_urls”, and I didn’t like that the “root_urls” were never updated. Thankfully, Cisco has a project called Umbrella that seems like a good fit for all three concerns.

At this point, it was just a matter of gluing all these pieces together. A sudo later, and it was time to start working. The first thing I did was create /etc/noisy to hold my modified config and to provide a working directory for the daemon. Daemon, you say? Yup, for the second step, we create a systemd service file to run our noise process. The /etc/systemd/system/noisy.service file ends up looking like:

[Unit]
Description=Simple random DNS, HTTP/S internet traffic noise generator

[Service]
WorkingDirectory=/etc/noisy
ExecStart=/usr/bin/python /srv/repos/noisy/noisy.py --config /etc/noisy/config.json
SyslogIdentifier=noisy

[Install]
WantedBy=multi-user.target

With this incredibly simple setup in place, systemd will launch the Python script in our copy of the repo (/srv/repos/noisy on my system), use the config file I write in /etc/noisy, and will log to journald while identifying itself as noisy in the journal.

Now we just need to write our config file to use the hosts from Umbrella. Thankfully, noisy uses JSON for its config file, so rewriting it is trivial. I created a Bash script to handle this for me:

#!/bin/bash

mv /etc/noisy/top-1m.csv.zip /etc/noisy/top-1m.csv.zip.old

wget -q -O /etc/noisy/top-1m.csv.zip http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip

unzip -q -d /etc/noisy -o /etc/noisy/top-1m.csv.zip

cat << EOF > /etc/noisy/config.json
{
        "max_depth": 25,
        "min_sleep": 3,
        "max_sleep": 6,
        "timeout": false,
        "root_urls": [
EOF

cat /etc/noisy/top-1m.csv | cut -d, -f2 | while read url; do
        echo -e "\t\t\"http://${url}\"," >> /etc/noisy/config.json
done

echo -e "\t\t\"https://medium.com\"" >> /etc/noisy/config.json

cat << EOF >> /etc/noisy/config.json
        ],
        "blacklisted_urls": [
                "https://t.co",
                "t.umblr.com",
                "messenger.com",
                "itunes.apple.com",
                "l.facebook.com",
                "bit.ly",
                "mediawiki",
                ".css",
                ".ico",
                ".xml",
                "intent/tweet",
                "twitter.com/share",
                "dialog/feed?",
                ".json",
                "zendesk",
                "clickserve",
                ".png",
                ".iso"
        ],
        "user_agents": [
EOF

cat /srv/repos/noisy/config.json|jq .user_agents | tail -n +2 >> /etc/noisy/config.json

echo '}' >> /etc/noisy/config.json

sed -i 's/^M//' /etc/noisy/config.json

systemctl restart noisy

This script moves the previously downloaded Umbrella file out of the way, downloads the latest version from S3, and unzips it. This gives us a CSV file that we’ll use in a moment. The script then starts writing the config file by setting some defaults and starting the ‘root_urls’ key. We then look over every line in the CSV file, and pull the URL from the 2nd field. We prepend ‘http://’ to the URL and write the result out to our config file. To close out the ‘root_urls’ section, we write medium.com to it (just to always have a known quantity as the ’end marker’ if I need to debug anything). Up next, we copy over the ‘blacklisted_urls’ keys from the default config, and finally we use jq to pull all the ‘user_agents’ into our config. Because Cisco uses DOS line-endings, we run a sed on the resulting file to remove these and we’re good to go. I’m sure there’s a “better” way to do all this, but it wasn’t worth optimizing for now, imho.

Now that we’ve got our file in shape, we tell systemd to restart the daemon and we’re off to the races, as journalctl shows:

Aug 14 17:09:02 nuc noisy[240416]: INFO:root:Visiting https://www.potpourrigift.com
Aug 14 17:09:07 nuc noisy[240416]: INFO:root:Visiting https://www.potpourrigift.com/ShopCategory.aspx?ID=320,362&ITEMS=RE9032%7CR0D067%7CR82102%7CRD9008&HPLoc=MB18
Aug 14 17:09:13 nuc noisy[240416]: INFO:root:Visiting
https://www.potpourrigift.com/CustomerService.aspx?page=Free+Catalog

And now I’m sure my ISP hates me ;)