The real cost of SaaS
Every SaaS subscription is a recurring tax that compounds quietly. Google Workspace, LucidLink, managed email, cloud AI APIs — individually each one seems reasonable. Aggregated across a year, they represent a budget that could fund a VPS fleet and the engineering time to run it.
The Prachyam Infrastructure Deep Dive project made this concrete: six RackNerd VPS running Mailcow, plus Nextcloud and the development mesh, against a managed-email alternative I estimate at roughly ₹2.78cr (~$300k, medium confidence — a projected enterprise quote, not a realized bill) over the campaign. That number didn't become visible until I did the accounting. The SaaS alternative felt cheap per line item. The self-hosted alternative felt like overhead until I added it up.
DNS is everything
If your DNS is wrong, nothing works. SPF, DKIM, and DMARC alignment errors are the ones that surface slowly — emails pass technical authentication but fail DMARC because the signing domain doesn't match the From header. Private DNS with dnsmasq for internal services, Caddy for wildcard TLS on *.dev.dharmic.cloud via DNS-01 challenge, PTR records on every outbound IP. I've debugged more DNS issues than I care to admit, and each one taught something the managed-service abstraction would have hidden.
Monitor before you scale, instrument before you optimise
Don't scale until you know what's slow. The instinct when something feels slow is to add resources. The right instinct is to add instrumentation first. The queue starvation incident at Prachyam — transactional emails sitting behind bulk sends for over an hour — looked like a resource problem. It was a configuration problem. More servers wouldn't have fixed it.
Instrument first, optimise second. Grafana dashboards reading from Prometheus, DMARC aggregate reports parsed into SQLite, Postfix log tailing — these are the surfaces that tell you what's actually happening before you decide what to change. See Karmpath Architecture Decisions for the same principle applied to the application layer.
Mesh networking changed how I think about infrastructure
Tailscale turned underpowered machines into a distributed build cluster without any VPN configuration complexity. The Prachyam dev mesh: three machines on a Tailscale mesh, dnsmasq resolving *.dev.prachyam.local to node IPs, Caddy for HTTPS termination. The iMac runs the monorepo build; the other two nodes run the Docker service stack. Each machine does what it's good at. The mesh makes them look like one system. (The personal evolution of this pattern uses *.dev.dharmic.cloud on a single machine.)
The lesson that generalised: the constraint isn't usually the hardware. It's whether the hardware is networked correctly and whether the services are distributed across it sensibly.
Backups are not optional, and untested backups don't exist
If you're self-hosting, backups are your responsibility. Automated nightly backups saved Prachyam twice — once from a misconfigured container volume that lost a mail queue, once from an accidental database drop during a migration test. Both times the backup existed and the restore worked. Neither backup was tested before the incident.
Testing restores is the boring discipline that matters. An untested backup is a wishful claim. Add a quarterly restore test to whatever runs your backups. Find out what the restore process actually takes before you need to know.
The 80/20 of security
HTTPS everywhere via real TLS certificates, SSH key authentication only, fail2ban for brute-force rate limiting, regular dependency and OS updates. These four things handle the vast majority of attack surface on a self-hosted stack.
The other 20% is threat modelling for your specific case: who has network access to which services, what's public-facing vs Tailscale-only, what the blast radius is if a specific service is compromised. The 80% is table stakes. The 20% is the reasoning you do when the stakes are real.