In the efforts of making my website privacy-friendly, I moved away from Google Analytics. This was the next step after self-hosting fonts for the website. I have been self-hosting Plausible Analytics since last 3 months. I will be covering why I did it, how I’m hosting it on my VPS, the email setup, and more.
Before we get to any further details, why do I even want analytics in the first place? It’s great to have your own place on the internet. If it is your own website, you would probably want to know how many people are visiting your website, and what articles are they reading.
These basic details make me feel good, because people are reading what I have written. Readers might even be finding my posts interesting as well. These stats motivate me to write more (although is not the only factor). Motivation is Overvalued. Environment Often Matters More. — James Clear.
Like many people, I was using Google Analytics the day I started blogging. It’s very simple to get it setup on a website. If something is very easy to adopt, then the chances of using it are obviously high.
The major issue with Google Analytics is that it collects too much data. It is not at all privacy-friendly, so I’m giving a bad experience to my visitors. The data is owned by Google, and includes a lot of metrics. As I said earlier as well, it is better to stay away from them as much as possible.
The other problem is the complexity. For use cases like mine, it becomes very complex due to the number of metrics. In the last 4 years, I have only looked at the number of visitors and per page visits. I realized this while reading about Plausible.
Read What makes Plausible a great Google Analytics alternative by Plausible to understand this in more details.
Plausible Analytics is a free as in freedom alternative to Google Analytics. It is AGPL v3+ licensed tool.
Plausible Analytics is an open-source project dedicated to making web analytics more privacy-friendly. Our mission is to reduce corporate surveillance by providing an alternative web analytics tool which doesn’t come from the AdTech world.
— Plausible Analytics website
I came across Plausible Analytics while I was reading someone’s blog. And this is exactly what I wanted, privacy-friendly, simple, and open source. I cannot afford the hosted version, so I decided to self-host Plausible.
Listen to this Changelog podcast episode De-Google-ing your website analytics to understand motivation behind the project.
Here are other alternatives I came across.
The plausible/hosting repository contains files required for self-hosting. It uses Docker Compose to run Plausible, ClickHouse and Postgres together. It also comes with a mail server for sending account activation mails, notifications etc.
I use Podman instead of Docker on my VPS. So I used podman-compose to run Plausible.
The official documentation for self-hosting covers all the details needed to get it running. I will talk more about the modifications and additions I did.
The comparison between self-hosted and managed offering is available in the README.
Here are a few of the changes made to the Compose file.
Apart from mail
container, all other containers did not have restart
policy. So I added the always
policy to other containers. This makes
sure that the containers always restart if they fail or exit.
A snippet from the docker-compose.yml
:
plausible:
restart: always
You can read more about restart
on podman run man
page
and Compose
spec.
The original Compose file uses data volumes to persist the ClickHouse
and Postgres data. These data volumes are not very
straightforward
to backup and restore. So I decided to bind mount host paths as
volumes inside the containers. I also had to add the :z
flag, as I
have SELinux enabled.
It looks something like this in the docker-compose.yml
:
plausible_events_db:
volumes:
- ${HOME}/data/event-data:/var/lib/clickhouse:z
Find more details about volumes on podman run man page.
With this I can simply stop the containers with podman-compose stop
,
and take the backup of the host directories.
We have port mapping in ports
key as host_port:container_port
. The
host_port
gets bound on all the interfaces. This means the container
gets exposed to all the IP addresses which are assigned to the
machine, including the public one. This was not the intended behavior,
that is risky. I prefer to have a reverse proxy, which gets exposed
publicly. There is a pending
PR which makes this
change.
Snippet from docker-compose.yml
:
plausible:
ports:
- 127.0.0.1:8000:8000
This change makes sure that the Plausible container is accessible only
at localhost:8000
on the host.
Now that I had Plausible running on my server, it was time to expose it publicly. This was a new VPS, so I didn’t have any web server on it. While reading Plausible docs, I came across Caddy server.
I always felt that Nginx is good, but it has too many knobs to tweak. Those knobs are crucial when it comes to following security best practices. Caddy was exactly what I wanted. It is much easier to setup, and takes care of SSL provisioning with Let’s Encrypt.
Caddy 2 is a powerful, enterprise-ready, open source web server with automatic HTTPS written in Go.
— Caddy Server website
I installed it form the @caddy/caddy Copr repository.
I decided to expose only few paths of Plausible. This is mainly because only two people use this Plausible instance, and both of us have SSH access to the VPS. We use SSH port forwarding to access the web interface.
The paths /api
and /js
are enough to collect the analytics events.
analyse.geeksocket.in {
reverse_proxy /api/* localhost:8000
reverse_proxy /js/* localhost:8000
}
Plausible self-hosting setup comes with an Exim mail server. This server is used to send emails related to account management, weekly and monthly reports.
Most of the mail service providers validate SPF, DKIM to prevent mail forgery. If you are not implementing at least one of them, it is highly possible that your email will get rejected, or it will land in the spam folder.
I had to set the following variable the plausible-conf.env
:
MAILER_EMAIL=hello@analyse.geeksocket.in
And the following for the mail
container in docker-compose.yml
:
mail:
environment:
- MAILNAME=analyse.geeksocket.in
The above configurations ensure that Exim server uses correct hostname when sending emails.
Setting the SPF (TXT) DNS record, ensured that the email gets
delivered to inbox. You can check how that record looks by running
dig TXT analyse.geeksocket.in
.
I’m just backing up host paths which are mounted inside the Postgres and ClickHouse containers. I will update this section when I start doing database level exports.
Here are the few links which you might find useful when self-hosting Plausible.
I kept both Google Analytics and Plausible Analytics for a few days. You can see the results below. The difference is because of Google Analytics is being blocked by a lot of visitors.
This has been a great experience so far. I’m free from the guilt of compromising the privacy of my readers. I get the required stats in one glance. Updates are painless, as the maintainers make sure the releases for self-hosted versions are well tested on their managed offering. I encourage you to try it out.
Comments are not enabled on this site. The old comments might still be displayed. You can reply on one of the platforms listed in ‘Posted on’ list, or email me.