DNS is one of those thing most people never think about. It’s one of those things in the background that quietly does its job and no one pays it no mind. Which is why it’s surprising to people when they discover that if their DNS traffic can be logged, a very informative picture of them can be created.
To combat this user fingerprinting/tracking, enhancements to DNS like DNS over HTTPS and DNS over TLS come into existence. And while I’m not knocking those solutions, I would like to point out that the endpoint still knows what lookups you performed. Unfortunately, there isn’t really any way currently to ask a server ‘hey, whats the IP address for google.com’’ and not have the other end know that you asked for Google’s website. It’s literally the nature of the task for the other end to know all the websites you asked for name resolution of.
So what can a person do to make it harder for the other end to create a useful picture of your surfing habits? Well, data is only as useful as it is clean. If you have a list of 1000 DNS requests from the user, then you can be fairly certain where that user was surfing (more technically, where the source IP was surfing, but let’s not get pedantic). However, if you had a list of 1000000 DNS requests, of which 1000 were the original legitimate DNS requests, and the remaining 999000 were randomly generated by a process, then how easily could you determine where the user was actually surfing? Essentially, you’ve reduced the Signal-to-Noise so low that the signal is ’lost’.
With this goal in mind, I set out to introduce some noise into my DNS requests as my weekend project this weekend. It turned out to be easier than I expected thanks to some existing work by others (always stand on other’s shoulders when you can). The first thing I did was clone down a copy of noisy. After playing with it a bit, I felt like it was a decent base, but I didn’t care for the default “root_urls” (The Pirate Bay? Really? I don’t need my ISP sending me cease-and-desist letters, thank you), I didn’t think there were enough “root_urls”, and I didn’t like that the “root_urls” were never updated. Thankfully, Cisco has a project called Umbrella that seems like a good fit for all three concerns.
At this point, it was just a matter of gluing all these pieces together.
A sudo
later, and it was time to start working. The first thing I did was
create /etc/noisy
to hold my modified config and to provide a working
directory for the daemon. Daemon, you say? Yup, for the second step, we create
a systemd service file to run our noise
process. The
/etc/systemd/system/noisy.service
file ends up looking like:
[Unit]
Description=Simple random DNS, HTTP/S internet traffic noise generator
[Service]
WorkingDirectory=/etc/noisy
ExecStart=/usr/bin/python /srv/repos/noisy/noisy.py --config /etc/noisy/config.json
SyslogIdentifier=noisy
[Install]
WantedBy=multi-user.target
With this incredibly simple setup in place, systemd will launch the Python
script in our copy of the repo (/srv/repos/noisy
on my system), use the
config file I write in /etc/noisy
, and will log to journald
while
identifying itself as noisy
in the journal.
Now we just need to write our config file to use the hosts from Umbrella.
Thankfully, noisy
uses JSON for its config file, so rewriting it is trivial.
I created a Bash script to handle this for me:
#!/bin/bash
mv /etc/noisy/top-1m.csv.zip /etc/noisy/top-1m.csv.zip.old
wget -q -O /etc/noisy/top-1m.csv.zip http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip
unzip -q -d /etc/noisy -o /etc/noisy/top-1m.csv.zip
cat << EOF > /etc/noisy/config.json
{
"max_depth": 25,
"min_sleep": 3,
"max_sleep": 6,
"timeout": false,
"root_urls": [
EOF
cat /etc/noisy/top-1m.csv | cut -d, -f2 | while read url; do
echo -e "\t\t\"http://${url}\"," >> /etc/noisy/config.json
done
echo -e "\t\t\"https://medium.com\"" >> /etc/noisy/config.json
cat << EOF >> /etc/noisy/config.json
],
"blacklisted_urls": [
"https://t.co",
"t.umblr.com",
"messenger.com",
"itunes.apple.com",
"l.facebook.com",
"bit.ly",
"mediawiki",
".css",
".ico",
".xml",
"intent/tweet",
"twitter.com/share",
"dialog/feed?",
".json",
"zendesk",
"clickserve",
".png",
".iso"
],
"user_agents": [
EOF
cat /srv/repos/noisy/config.json|jq .user_agents | tail -n +2 >> /etc/noisy/config.json
echo '}' >> /etc/noisy/config.json
sed -i 's/^M//' /etc/noisy/config.json
systemctl restart noisy
This script moves the previously downloaded Umbrella file out of the way,
downloads the latest version from S3, and unzips it. This gives us a CSV file
that we’ll use in a moment. The script then starts writing the config file by
setting some defaults and starting the ‘root_urls’ key. We then look over
every line in the CSV file, and pull the URL from the 2nd field. We
prepend ‘http://’ to the URL and write the result out to our config file. To
close out the ‘root_urls’ section, we write medium.com
to it (just to always have a known quantity as the ’end marker’ if I need to
debug anything). Up next, we copy over the ‘blacklisted_urls’ keys from the
default config, and finally we use jq
to pull all the ‘user_agents’ into our
config. Because Cisco uses DOS line-endings, we run a sed
on the resulting
file to remove these and we’re good to go. I’m sure there’s a “better” way to do
all this, but it wasn’t worth optimizing for now, imho.
Now that we’ve got our file in shape, we tell systemd
to restart the daemon
and we’re off to the races, as journalctl
shows:
Aug 14 17:09:02 nuc noisy[240416]: INFO:root:Visiting https://www.potpourrigift.com
Aug 14 17:09:07 nuc noisy[240416]: INFO:root:Visiting https://www.potpourrigift.com/ShopCategory.aspx?ID=320,362&ITEMS=RE9032%7CR0D067%7CR82102%7CRD9008&HPLoc=MB18
Aug 14 17:09:13 nuc noisy[240416]: INFO:root:Visiting
https://www.potpourrigift.com/CustomerService.aspx?page=Free+Catalog
And now I’m sure my ISP hates me ;)