Tuesday 9 May 2023

Let's Encrypt on OPNSense, using a local Bind server because I'm too cheap for Namecheap API

I've recently been migrating my home network to use an ProxMox + OPNSense based router. I used to use a fairly high end consumer grade tri-band router/AP flashed with dd-wrt, but I've long been frustrated with the fact that it basically could not be updated - whenever I tried a newer versions of dd-wrt it always ended in major stability issues forcing me to downgrade, and even if that wasn't an issue, dd-wrt recommends erasing the nvram when applying an update, which effectively means wiping all the settings and having to configure it again from scratch. This means that even if those stability issues have been resolved I'm still not really able to afford trying to update to find out, and as such I'm effectively running firmware that is almost a decade old and who knows what kind of security vulnerabilities it might be susceptible to as a result.

I've been pondering what to do about this for years, but a few recent factors have finally pushed me to upgrade:

  • We have a smart home now, and the number of devices trying to connect to the 2.4GHz WiFi simultaneously was overwhelming our consumer grade WiFi devices and we'd often find a device unable to connect ("Kettle isn't responding", or we'd see one of the esphome fallback hotspots show up). Our TpLink router provided by our old ISP has a hard limit of 30 devices, and I don't think my other consumer grade APs were doing much better. When every light switch/bulb is a device on your network, this becomes an issue very quickly.
  • We recently upgraded to NBN Fibre to the Premesis with gigabit down, and our old WiFi devices were nowhere near this fast. Even the brand new TpLink WiFi 6 router provided by our new ISP cannot actually handle this speed - on WiFi with the largest channel width it supports (80MHz) it maxes out just shy of 700mbps even at point blank range.
  • We had a recent incident where our dd-wrt access point/router mysteriously locked up for several hours paralyzing our home network and smart home, and nothing I could do would make it responsive. WiFi was down, the switch was down, I couldn't even get to the admin page to find out what in the blazes was going on, and no amount of rebooting would help - actually, it seemed like every time it was about to bring up the WiFi the fault light illuminated and it rebooted itself. After a few hours it mysteriously started working again, and since dd-wrt doesn't save logs I have no idea what happened, but given how old the firmware was it wouldn't surprise me at all if it was the victim of a wireless Denial of Service attack. Unfortunately I didn't have any other devices that supported monitor mode ready to run Kismet or similar to prove this.

So, given that consumer grade WiFi+router combo devices tend to be poor at both tasks we've now separated them - our WiFi is now on a Ubiquiti WiFi 6 Pro access point, which is capable of doing around 1.5gbps on the 5GHz network (to nearby devices on a 160MHz channel, but even the 80MHz channel can do over 900mbps, whipping the ISP provided TpLink) and claims to be able to support 300+ simultaneous devices, which should hopefully sort out our smart home connectivity issues for the forseable future (though we might still need a second for devices with poor signal strength on the other side of the house - still using a consumer grade AP for those...).

As for the router component - that's now an OPNSense software router running in a virtual machine under ProxMox on one of these mini routers from AliExpress.

As for choosing OPNSense over PFSense - for the moment that choice is made for me as PFSense doesn't yet support the 2.5gbps network ports on this device. When that changes I may consider it as I do generally value stability over bleeding edge, and OPNSense has not exactly been bug free so far (though the development team have responded near instantly to the bug reports I've filed so far, so that's a huge plus). The nice thing about running these under ProxMox is that I'll be able to shut down OPNSense VM and boot up a PFSense VM in it's place when it's ready to try out and I can easily switch back if need be.

Since installing the new router I've been slowly migrating services over to it from my previous router and old HP Microserver - Dynamic DNS, regular DNS and DHCP are now on OPNSense (not exactly without incident - but DHCP bug report was filed and the OPNSense dev team had fixed the issue in under 2 hours. I do miss being able to just edit a dnsmasq config file directly as we could do in dd-wrt, but realistically the web forms work fine in OPNSense). The unifi controller is now in one ProxMox container and frigate is in another. I've still got a few other services to move like Home Assistant and Plex, but there's a few others I want to set up that will need signed SSL certificates, so today's task was figuring out how to get Let's Encrypt working in OPNSense... and oh gawd this turned out to be not such an easy task. This was very much a one thing after another after another after another... And this is why I'm writing this blog post now, while it's still fresh in my mind and so next time I go through this I can refer back to it.

Previously I've had this all working on Debian on my HP Microserver, where it basically places a challenge file on the web server to prove to Let's Encrypt that I own the web server that the domain name points to, and I remember it taking me a while to figure out how to make that work, but I remember that it wasn't too difficult in the end - at least I didn't deem that experience worthy of a blog post! OPNSense's os-acme-client plugin supports essentially this same method so my first thought was to use that... but there was a couple of problems that meant I ultimately did not attempt using this:

  • They introduction page in the OPNSense ACME plugin says they are "not recommended" and "Other challenge types should be preferred".
  • This method requires that the acme plugin temporarily takes over port 80 / 443 on the router, leading to some brief downtime when this happens. My current setup under Debian is not subject to this as the plugin is able to use the running apache web server so can complete the challenge with no downtime. In reality this probably isn't much concern for a home network, as the downtime would be infrequent and brief, and home internet doesn't exactly have the best uptime anyway... but it is still not desirable.
  • They have three settings "IP Auto-Discovery", "Interface" and "IP Address" that all state "NOTE:This will ONLY work if the official IP addresses are LOCALLY configured on your OPNsense firewall", which is not currently the case for me as I still have the ISP provided router between my OPNSense router and the Internet (so my OPNSense router has a private IP on its WAN interface), as it is needed to provide a VoIP service (why this ISP doesn't use one of the UNI-V ports on the NBN NTD Box like my previous ISP I don't know).
  • Even if I bypassed my ISP router so that the OPNSense router would have a public IP, if the "IP Address" field is mandatory (which is unclear, possibly one or both of the other settings would suffice in its place), my IP address is not static (ISP charges extra for that), and I do not want to have to edit anything if my IP changes (this will be a recurring theme throughout the rest of this post).

Ok, that leaves... DNS-01 as the only option... that or forgoing setting this up on OPNSense altogether, but I also want to play with using OpenVPN under OPNSense at a later date, and as I understand it that needs a signed SSL certificate so I have multiple reasons to push on (Edit: DO NOT use Let's Encrypt for OpenVPN, there are serious security concerns with doing so. Always use your own personal CA for OpenVPN)...

My darkstarsword.net domain is registered through Namecheap, and Namecheap is supported by the acme.sh/Let's Encrypt script, and it looks very simple to use - only needing a user and API key filled out. I already have an API key that I use for dynamic DNS and I don't even need to fill out my IP address - perfect!!! Or at least that's what I would be saying if I hadn't read the acme script's documentation on Namecheap first or noted some bug reports warning of dynamic DNS entries being wiped out after running the script. The API key they want is not the one used for dynamic DNS - it's a business / dev tools API key that is only available if your account has more than $50 credit (the fact that I've already paid 10 years in advance doesn't count apparently) or meets some other requirements. And you DO need to fill in your IP address on Namecheap's side - and as noted earlier, I don't want to go and edit anything when my IP changes.

So, that's out.

What are my options? Migrate to a different DNS provider that doesn't have such arduous requirements? Self hosting a name server doesn't seem viable given - again, my IP address is not static and I want darkstarsword.net to be stable as many of the subdomains I've added point to various cloud servers that should be available even if my home internet is down - like for instance, this blog. The acme.sh documentation does talk about a DNS Alias mode, but that suggests it needs a second domain and then I'd need to register that at another name provider which doesn't seem much better than just migrating my existing domain... but wait, why does it need a separate domain? It's just setting up a CNAME record pointing at the other domain - couldn't that point to a subdomain of my existing domain instead? Could that subdomain have its nameserver be self hosted on my own equipment and then have OPNSense update that? Yes, yes it can.

To try to clarify things I'm going to substitute some of the fun hostnames I'm using for more descriptive ones. In namecheap (or whatever other DNS provider you are using) you want similar to the following entries:

  • Type="A+ Dynamic DNS Record" Host="dyndns" - This will be dynamically updated to point to your home IP.
  • Type="NS Record" Host="home_subdomain" Value="dyndns.example.net." - This creates a subdomain managed by a nameserver running on your home IP.
  • Type="CNAME Record" Host="_acme-challenge.dyndns" Value="_acme-challenge.home_subdomain.example.net." - This tells the Let's Encrypt acme.sh challenge script to look for the challenge TXT record in your home_subdomain when creating an SSL certificate for "dyndns.example.net".

The A+ Dynamic DNS record type is specific to namecheap I think, other providers might work differently. On OPNSense this is updated via the os-ddclient plugin - install via System -> Firmware -> Plugins and configure under Services -> Dynamic DNS. This was reasonably straight forward to set up and I didn't encounter any issues here. Make sure that the name is resolving to your home IP before proceeding.

You can add additional CNAME records for additional hosts that you want certificates for, just substituting "....dyndns" in the Host field, or if you want to create a wildcard certificate just use Host="_acme-challenge" instead.

Next step is to install a DNS server on OPNSense... well, it already has Unbound and/or dnsmasq for your internal DNS, but AFAIK neither of those will work and so we need another one, and of course we can't just replace them because there's a bunch of features in OPNSense that only work with one or both of those, so... we'll be running two DNS servers on different ports. Some people elect to have one of these forward requests to the other, but I'm not going to do that as my internal network has no need of BIND, and the Internet has no need of my internal DNS, so at least for now I'll keep them independent of each other.

Head over to System -> Firmware -> Plugins and install os-bind. Start setting it up under Services -> BIND -> Configuration.

In the ACLs tab, create a new ACL, call it "anywhere" and set networks to "0.0.0.0/0" (maybe we can lock this down to just Let's Encrypt IPs + localhost/LAN?).

Back in the General tab, enable the plugin, change "Listen IPs" from "0.0.0.0" to "any" (this will be unecessary soon - I spotted they fixed this in github earlier today), change "Allow Query" to the "anywhere" ACL you just created and save. At this point you might want to verify that you can connect to BIND from your LAN - I was stuck here for some time until I worked out the issue with Listen IPs:

dig @192.168.1.1 -p 53530 example.com +short
93.184.216.34

Now, head over to the Primary Zones tab (I guess this used to be called Master Zones?) and create a zone for your home subdomain. Following the naming examples above and substituting with your own, set "Zone Name" to "home_subdomain.example.net", "Allow Query" to the "anywhere" ACL, "Mail Admin" to your email, and "DNS Server" to "dyndns.example.net".

Now create an NS record in this zone - without this BIND will refuse to load the zone. Leave the "Name" field blank, set "Type" to "NS" and set "Value" to "dyndns.example.net." - note, the trailing . is important here to indicate this is a fully qualified domain name, otherwise it would point to a sub-sub-sub...sub?-domain and BIND would complain about that too. Note that just because you need the trailing . here doesn't mean you need it elsewhere, and there's probably a few places that would break if you add it (and some where it won't matter or gets automatically added if it's missing, like on namecheap).

Now go and look at the Log Files section for BIND, and make sure you see "zone home_subdomain.example.net/IN: loaded serial ..." and not some error.

Next head on over to Firewall -> NAT -> Port Forward and add a new entry. Interface should be "WAN" (probably already set), Protocol needs to be changed to "TCP/UDP" (important, DNS needs both), Destination should be "WAN Address", "Destination Port Range" should have both From and To set to "DNS", "Redirect Target IP" should be "127.0.0.1" and "Redirect Target Port" should be "(other)" 53530. Put something meaningful in the Description field, such as "External DNS -> BIND (for ACME LetsEncrypt)", and save, then apply changes to the firewall when prompted.

At this point you might want to test whether this is working - I added a "test" A record to my zone in BIND to a recognisable IP address and was able to confirm that "test.home_subdomain.example.net" successfully resolved to that IP, and I didn't have to explicily point dig to my name server - it was able to find it through the breadcrumb trail through namecheap, to my BIND server then find the record. I did this test from an external server, but since we didn't set up any forwarding between Unbound and BIND testing from your LAN should be nearly equivelent.

Alright, home stretch - all that's left is setting up the ACME Plugin to use Let's Encrypt and start issuing certificates. Unfortunately this part went anything but smoothly for me, but given how quickly OPNSense devs move, the issues I encountered will likely already be fixed for you by the time you read this - they're already in github while I'm writing this.

Over in System -> Firmware -> Plugins install os-acme-client. Then head on over to Services -> ACME Client to configure it. Under Settings enable the plugin and apply. Under Accounts create two new accounts, one with the ACME CA set to "Let's Encrypt" and the second set to "Let's Encrypt Test CA" - the former is the real one, the later we use to make sure things work without worrying about being rate limited if something goes wrong. Give them distinct names so you can tell them apart at a glance and fill out your email. You can ignore the EAB fields.

Take a detour over to System -> Access -> Users and edit the root user. Find "API Keys" near the bottom and click the plus to add a new one. This will give you an apikey.txt file that you should open as you will need it in a moment.

Head back over to Services -> ACME Client -> Challenge Types and add a new entry. I named mine "OPNSense Bind Plugin" and set the type to "DNS-01" and "DNS Service" to "OPNSense BIND Plugin". I left "OPNSense Server (FQDN)" set to "localhost" (this is for the dns update script running on OPNSense to find the OPNSense API, it's not used by Let's Encrypt so I don't see any reason to use anything other than localhost here) and "OPNSense Server Port" on 443 - you may need to change this if you are using that port for another service like nginx and have relocated the OPNSense web interface to another port (in my case 443 is still being port forwarded to my old server, though this will likely change soon). "User API key" and "User API token" should be filled out with the "key=....." and "secret=....." (without the literal "key=" and "secret=" part) values from the apikey.txt file you obtained in the previous step. Save.

Almost done - under Certificates create a new certificate. Set the "Common Name" to "dyndns.example.net" (substituting for your own host and domain, obviously). If you are going to create a test certificate first (recommended), write something like "test" in the Description field and set the account to the "Let's Encryt Test CA" from earlier. "Challenge Type" should be "OPNSense Bind Plugin" and "DNS Alias Mode" should be "Challenge Alias Mode" (meaning the CNAME record you added in Namecheap a few pages ago is pointing to a record in your home subdomain named "_acme-challenge" - you can use the other option here if you decided you were too cool for that name. Automatic might work too - I haven't tried it), and "Challenge Alias" should be "home_subdomain.example.net".

Save. Make sure your certificate is enabled and click the "Issue/Renew All Certificates" button (or the one next to the certificate if you want to do it individually). Check the logs (both system + ACME), see if it worked. For me it didn't - I got an "Invalid domain" error that cost me a few hours of debugging to find it was fallout from the global movement to strike the potentially insensitive terms "master" and "slave" from general use, but that's fixed now (in github at the time of writing, hopefully live by the time anyone reads this).

If that worked, then duplicate the certificate, change the description and account to the real live "Let's Encrypt" CA, save, disable the test certificate and issue the real one. Also maybe delete the test certificate from System -> Trust -> Certificates.

That's as far as I've got for now - I haven't actually started using the certificate for anything yet (hopefully that part will be a bit easier), but I think this is enough for one blog post. Before I go though some food for thought - while setting this up I have been wondering if there might be any security concerns with this setup and potentially there could be - if an attacker was using the same ISP as you they could potentially try to take your IP - say they went to your house and shut off your power at your breaker box, then started rapidly connecting and disconnecting their own internet hoping to be randomly assigned the IP address that you were using and your dynamic DNS entry still points to until you get back online to refresh it. If they succeed they would potentially be able to issue certificates for your domains that they could then use to masquerade as your servers in future MITM attacks - maybe it's a good idea not to set your wildcard _acme-challenge so they are limited to hijacking names you intended for your home service which are probably not going to be of much use to them anyway - sure, they could theoretically MITM you while you're in a coffee shop WiFi connecting back to your home servers, but if they are capable of that you have much bigger problems on your hands. I don't think most people should be overly concerned about this, and if you are consider asking your ISP for a static IP address - after all, if this is of legitimate concern in your threat model it's worth remembering that there are a host of other similar issues possible with using a dynamic IP.

No comments: