GeekSocket Plug in and be Geekified

Self-hosting Plausible Analytics

In the efforts of making my website privacy-friendly, I moved away from Google Analytics. This was the next step after self-hosting fonts for the website. I have been self-hosting Plausible Analytics since last 3 months. I will be covering why I did it, how I’m hosting it on my VPS, the email setup, and more.

Why do I need analytics?

Before we get to any further details, why do I even want analytics in the first place? It’s great to have your own place on the internet. If it is your own website, you would probably want to know how many people are visiting your website, and what articles are they reading.

These basic details make me feel good, because people are reading what I have written. Readers might even be finding my posts interesting as well. These stats motivate me to write more (although is not the only factor). Motivation is Overvalued. Environment Often Matters More. — James Clear.

The Google Analytics trap

Like many people, I was using Google Analytics the day I started blogging. It’s very simple to get it setup on a website. If something is very easy to adopt, then the chances of using it are obviously high.

The major issue with Google Analytics is that it collects too much data. It is not at all privacy-friendly, so I’m giving a bad experience to my visitors. The data is owned by Google, and includes a lot of metrics. As I said earlier as well, it is better to stay away from them as much as possible.

The other problem is the complexity. For use cases like mine, it becomes very complex due to the number of metrics. In the last 4 years, I have only looked at the number of visitors and per page visits. I realized this while reading about Plausible.

Read What makes Plausible a great Google Analytics alternative by Plausible to understand this in more details.

Enter Plausible Analytics

Plausible Analytics is a free as in freedom alternative to Google Analytics. It is AGPL v3+ licensed tool.

Plausible Analytics is an open-source project dedicated to making web analytics more privacy-friendly. Our mission is to reduce corporate surveillance by providing an alternative web analytics tool which doesn’t come from the AdTech world.
— Plausible Analytics website

I came across Plausible Analytics while I was reading someone’s blog. And this is exactly what I wanted, privacy-friendly, simple, and open source. I cannot afford the hosted version, so I decided to self-host Plausible.

Listen to this Changelog podcast episode De-Google-ing your website analytics to understand motivation behind the project.

Other alternatives

Here are other alternatives I came across.

  • Matomo (formerly known as Piwik):
    I had come across Matomo when it was called Piwik, I felt that it has too many features than I need (similar to Google Analytics).
  • GoatCounter
    This is something I found after I started using Plausible. The UI is very basic and I did not like it to be honest. It also has a different license, I don’t have much idea about it.
  • GoAccess
    This is something I might try someday if I start hosting my website with Caddy rather than Netlify. I found it interesting.

Self-hosting Plausible Analytics

The plausible/hosting repository contains files required for self-hosting. It uses Docker Compose to run Plausible, ClickHouse and Postgres together. It also comes with a mail server for sending account activation mails, notifications etc.

I use Podman instead of Docker on my VPS. So I used podman-compose to run Plausible.

The official documentation for self-hosting covers all the details needed to get it running. I will talk more about the modifications and additions I did.

The comparison between self-hosted and managed offering is available in the README.

Compose file changes

Here are a few of the changes made to the Compose file.

Always restarting the containers

Apart from mail container, all other containers did not have restart policy. So I added the always policy to other containers. This makes sure that the containers always restart if they fail or exit.

A snippet from the docker-compose.yml:

  plausible:
    restart: always

You can read more about restart on podman run man page and Compose spec.

Bind mounting paths from host as volumes

The original Compose file uses data volumes to persist the ClickHouse and Postgres data. These data volumes are not very straightforward to backup and restore. So I decided to bind mount host paths as volumes inside the containers. I also had to add the :z flag, as I have SELinux enabled.

It looks something like this in the docker-compose.yml:

  plausible_events_db:
    volumes:
      - ${HOME}/data/event-data:/var/lib/clickhouse:z

Find more details about volumes on podman run man page.

With this I can simply stop the containers with podman-compose stop, and take the backup of the host directories.

Listening on localhost

We have port mapping in ports key as host_port:container_port. The host_port gets bound on all the interfaces. This means the container gets exposed to all the IP addresses which are assigned to the machine, including the public one. This was not the intended behavior, that is risky. I prefer to have a reverse proxy, which gets exposed publicly. There is a pending PR which makes this change.

Snippet from docker-compose.yml:

  plausible:
    ports:
      - 127.0.0.1:8000:8000

This change makes sure that the Plausible container is accessible only at localhost:8000 on the host.

Setting up Caddy server

Now that I had Plausible running on my server, it was time to expose it publicly. This was a new VPS, so I didn’t have any web server on it. While reading Plausible docs, I came across Caddy server.

I always felt that Nginx is good, but it has too many knobs to tweak. Those knobs are crucial when it comes to following security best practices. Caddy was exactly what I wanted. It is much easier to setup, and takes care of SSL provisioning with Let’s Encrypt.

Caddy 2 is a powerful, enterprise-ready, open source web server with automatic HTTPS written in Go.
— Caddy Server website

I installed it form the @caddy/caddy Copr repository.

Exposing selective paths of Plausible

I decided to expose only few paths of Plausible. This is mainly because only two people use this Plausible instance, and both of us have SSH access to the VPS. We use SSH port forwarding to access the web interface.

The paths /api and /js are enough to collect the analytics events.

analyse.geeksocket.in {
  reverse_proxy /api/* localhost:8000
  reverse_proxy /js/* localhost:8000
}

Mail server setup

Plausible self-hosting setup comes with an Exim mail server. This server is used to send emails related to account management, weekly and monthly reports.

Most of the mail service providers validate SPF, DKIM to prevent mail forgery. If you are not implementing at least one of them, it is highly possible that your email will get rejected, or it will land in the spam folder.

I had to set the following variable the plausible-conf.env:

MAILER_EMAIL=hello@analyse.geeksocket.in

And the following for the mail container in docker-compose.yml:

  mail:
    environment:
      - MAILNAME=analyse.geeksocket.in

The above configurations ensure that Exim server uses correct hostname when sending emails.

Setting the SPF (TXT) DNS record, ensured that the email gets delivered to inbox. You can check how that record looks by running dig TXT analyse.geeksocket.in.

Backing up the data

I’m just backing up host paths which are mounted inside the Postgres and ClickHouse containers. I will update this section when I start doing database level exports.

Here are the few links which you might find useful when self-hosting Plausible.

Final results

I kept both Google Analytics and Plausible Analytics for a few days. You can see the results below. The difference is because of Google Analytics is being blocked by a lot of visitors.

Data for February — Google Analytics

Data for February — Plausible Analytics

This has been a great experience so far. I’m free from the guilt of compromising the privacy of my readers. I get the required stats in one glance. Updates are painless, as the maintainers make sure the releases for self-hosted versions are well tested on their managed offering. I encourage you to try it out.


Comments

Comments are not enabled on this site. The old comments might still be displayed. You can reply on one of the platforms listed in ‘Posted on’ list, or email me.