Valuing user privacy — PostHog Analytics

What is PostHog?

Tracking your users… the good way.

PostHog is a free open-source product analytics platform similar in function to Google Analytics. It was launched back in February and has seen a lot of exciting new feature releases throughout this year. Unlike a lot of the other analytics tools, PostHog is built with developers in mind to help you get feedback on your products directly. It is available as a self-hosted solution to ensure your data is kept privately in your own hands.

If you’re someone who cares about privacy (or you read the blog title) you may initially think tracking users is a wholly negative thing, but gathering data about how our services are used is how we improve them. When software is your product, and it’s used anonymously all around the world with no direct line of communication, analytics data becomes a big part of your customer feedback. PostHog allows you to gather and analyse that data while keeping it out of the hands of third parties.

Google Analytics is used in over 70% of the top 10k sites worldwide, and over 50% of the top 1m. Google offer most features of Analytics for free, and they use the data collected about your users to drive their advertising network. As I’ve written about before, this is quite concerning to the privacy focused among us. In the wake of Equifax and Cambridge Analytica it’s also of concern to legislators too these days, with GDPR causing a huge shift in the way data is handled globally. Other regions are following suit with their own laws like CCPA, India’s DPB, Brazil’s LGPD, and countries like Australia and Canada are among many calling for digital privacy law reform.

If these laws don’t highlight the need to keep as much personal data out of the hands of advertising giants as possible, there is a financial incentive too. Complying with all these laws is tricky, and could be extremely expensive if you get it wrong.

Hands up if you use an ad blocker…

This blog is aimed at other developers, so there’s a good chance the vast majority are familiar with and use an ad blocker, along with over 25% of other web users. Most good ad blockers will already be blocking Google Analytics scripts by default, giving some pretty unreliable data in your Google Analytics dashboard.

What’s so great about PostHog?

PostHog is open-source! Not just the software, PostHog as a company is joining the likes of GitLab and Sentry in being totally open-source.

Self-hosting is the only way to ensure your data is kept 100% private end to end. While PostHog do offer a hosting service, the core focus of PostHog is as a free self-hosted solution. PostHog provides many well-documented deployment options, the easiest of which is a 1-click Heroku deployment, and there’s a Docker image available too of course. There’s also a full walkthrough for deploying from source if you want a custom setup. Since you’re in control of your data with PostHog you don’t have to pay anything to export it as you would with some other services, and you can query that data however you like externally. With no third parties involved you can worry slightly less about legal compliance too (only slightly!).

Because the software is open-source and MIT licensed you are free to modify it however you like. That means you can customise the way your data is stored to make sense for your product, or improve the interface with custom features in order to work faster. You can also contribute or raise an issue directly on the GitHub repository, and maybe you’ll get a fix in record time! Beat that support, Google!

Of course, the most important feature of PostHog is the analytics. How does it compare to Google Analytics? Here’s a list of what I find to be the most important analytics features of PostHog.

  • Active user metrics ☑️
  • Audience patterns (cohorts in PostHog) ☑️
  • Explore users (You can also connect your back-end to enhance this data) ☑️
  • Session data - Visited Paths, Timings, User device details, Referrers, IP ☑️
  • Realtime reporting ☑️
  • Create and visualise funnels ☑️
  • User retention stats ☑️
  • Heatmaps (no extra setup needed) ☑️
  • Session recording (Currently in beta) ☑️
  • Isn’t blocked by ad blockers ☑️

By default PostHog takes a straight-forward approach to collecting interaction data — track as much as possible and filter it later into Actions. The autocapture feature can capture all DOM interactions as an Event, such as clicks on divs and links (it will not gather sensitive data on form inputs by default). This allows you to change how you group your data after the fact to analyse trends however you want, without having to manually set up actions before they are run. If autocapture doesn’t fit your needs, you can call posthog.capture() to manually log an event.

There is also support for firing events from your back-end to supplement your data with more information about how your product is being used. There are libraries for all the languages you’d expect, and an API for working with anything else. You could use this feature to link an anonymous user with their signed-in identity after they log in, or you could use it provide structured data for an Event so you can create an Action and filter on it later.

So what are the downsides?

PostHog is still very new and being actively developed. If that’s something that puts you off because you like tried-and-tested, or you don’t like the possibility of breaking changes, you can relax a little. Development has progressed quickly, and potentially unstable features are labelled as beta or hidden behind feature flags, and bugs get addressed quickly.

A Bad Workman Blames His Tools

I’m don’t really think this one counts as a downside, but you get out what you put in. Making good use of your analytics data is tricky in any analytics platform. PostHog gives you all the tools you need to develop key insights into your products, but it’s up to you to select the right filters to extract that data in a meaningful way. There is some data that is presented to you by default such as Sessions, Retention, and Paths; this data is great to get you started but you’ll want to invest a little time into building Events and Actions and Cohorts to get data that is useful to you. Fortunately there are some in-depth tutorials for those of us who aren’t great at knowing how best to use analytics data.

How do I use it?

Integrating PostHog with your website/app’s front-end is very simple, you just need to add a small JavaScript snippet into your HTML. To get the snippet, you’ll first have to set up your PostHog deployment.

Deployment with Heroku

Use the 1-click installer to create a full deployment of PostHog. Yes, it’s actually that easy! While you can use the free tier of Heroku, it’s not recommended for anything other than testing due to the 10k database row limit. PostHog saves every interaction with your website, each click and the associated data, to the database which will hit the 10k limit in just a few days even on very small websites. However, if you just want a quick demo of how PostHog works for your product, using the free tier is an excellent choice.

Deployment with Docker Compose

This is how I deploy PostHog on starthubs.uk. I was already using Docker Compose for deployment so it was up and running in just a few extra lines. The PostHog docker image requires Postgres and Redis; if you do not already use those in your deployment I recommend just following PostHog’s example docker-compose.yml file for your setup.

If you do already use Postgres or Redis containers in your Docker setup, there are a few extra steps you will need to take if you want to share these services between your existing app and PostHog.

This example will take a Django + Postgres + Redis + Celery project and add PostHog. Since PostHog uses all of these technologies itself, this example is sort of a worse-case for complexity and possible conflicts. To simplify things a little bit this example does not use a reverse proxy, and I’ll only include the essential Docker attributes.

# docker-compose.yml
version: "3.7"

services:
  db:
    image: postgres:alpine
    networks:
      public:
    environment:
      POSTGRES_DB: myappdb  # The name of your postgres database
      POSTGRES_USER: postgresuser
      POSTGRES_PASSWORD: postgrespass
    volumes:
      # posthog-db-setup.sql is responsible for creating the posthog db
      - ./posthog-db-setup.sql:/docker-entrypoint-initdb.d/posthog-db-setup.sql

  redis:
    image: redis:alpine
    networks:
      public:

  # Our django web app
  web:
    build: .
    command: python manage.py runserver 0.0.0.0:8000  # Change in production
    environment:
      POSTGRES_HOST: db  # The name of your postgres service
      POSTGRES_DB: myappdb
      POSTGRES_USER: postgresuser
      POSTGRES_PASSWORD: postgrespass
      REDIS_URL: redis://redis:6379/
    ports:
      - 8000:8000
    networks:
      public:
    depends_on:
      - db
      - redis

  posthog:
    image: posthog/posthog:latest
    container_name: posthog_web
    restart: always
    ports:
      - 8001:8000  # Our django app is already using 8000, so use 8001
    networks:
      public:
    depends_on:
      - db
      - redis
    environment:
      IS_DOCKER: "true"
      DEBUG: 'true'  # Delete this line in production!
      DATABASE_URL: postgres://postgresuser:[email protected]:5432/posthog
      REDIS_URL: redis://redis:6379/
      SECRET_KEY: < your secret key >

networks:
  public:

On line 15 (in services>db>volumes) we are telling the Postgres service to run an entrypoint script called posthog-db-setup.sql by mounting it to the docker-entrypoint-initdb.d directory. This script is responsible for creating the posthog database and is placed in the same directory as our compose file. It contains only one line:

CREATE DATABASE posthog;

Because this is an entrypoint script, Postgres will only run it the first time the container is created. If you want to create the posthog database on an existing container you can run a command like this with the db container running:

docker-compose exec db psql -U postgresuser -c "CREATE DATABASE posthog;"

In the compose file you’ll notice that we are mapping the posthog service to port 8001, this is because our Django app is already mapped to port 8000 on our host machine. The DATABASE_URL variable in the posthog service includes the username and password for the Postgres service, and db refers to the name of the database service.

That’s all we need for the Docker Compose file, but we’ll want to configure some settings in our Django app to make sure we avoid conflicts with PostHog since our app also uses Celery and Redis caching.

By default, Celery will try to consume all tasks it finds in the message broker. Because we are sharing the same broker (Redis) between our web app and PostHog we need to tell our Celery workers to only listen to a certain queue so it doesn’t try handling PostHog’s tasks. We can do this by defining a new task queue in our Django app:

# Settings.py
from kombu import Queue

CELERY_BROKER_URL = os.environ['REDIS_URL']
...
CELERY_TASK_DEFAULT_QUEUE = 'myappqueue'
CELERY_TASK_QUEUES = (
    Queue('myappqueue', routing_key='myapp.#'),
)

If you’re using Redis as a cache backend for Django you’ll probably want to make sure KEY_PREFIX is set. PostHog also uses Redis for caching and sets their KEY_PREFIX to posthog, so choose something different and you can avoid potential cache conflicts.

# Settings.py

CACHES = {
    'default': {
        'BACKEND': 'django_redis.cache.RedisCache',
        'LOCATION': os.environ['REDIS_URL'],
        'OPTIONS': {
            ...
        },
        'KEY_PREFIX': 'myapp'
    },
}

You should now be able to run docker-compose up to see your app live. If you visit localhost:8000 you should see your Django app, and visiting localhost:8001 will take you the your PostHog deployment.

Deployment done!

Add your JavaScript snippet to your front-end and you’re all set! When you visit pages containing your snippet you should start seeing events on your PostHog dashboard. I recommend checking out the PostHog back-end integrations to start sending events from your back-end too.

Got a comment, complaint, correction, or contribution for this post? You can get in touch on GitHub!