Motivation
Browser bookmarks can get messy fast. For example, when browsing #rstats on Bluesky, I often come across great tutorials, packages, and discussions I want to save for later. But managing bookmarks manually — creating folders, assigning links, and searching later — can be tedious.
Apps like Mozilla’s Pocket or Nextcloud Bookmark can help, but they still require manual sorting. On top of that, issues like link rot (broken links) and extracting text or code from images can make bookmark management even more tedious.
That’s when I found Hoarder in the top self-hosted apps list on selfh.st. Its features instantly sold me. It’s everything you could want from a bookmark service:
Save links, store images and PDFs, and add simple notes
Use OCR to extract text from images
Organize with lists and automatic tagging via LLM
Auto-save links from RSS feeds
Archive full pages and download videos
Full text search for all content
[Planned:] Download content for offline reading
The interface is clean and user-friendly, too. Try out the demo!
Installation
Prerequisites:
Server with Linux OS (e.g., Ubuntu 22.04) and Docker installed
A Domain
A reverse proxy for SSL and gateway management (e.g., Nginx Proxy Manager)
Firefox or Chrome browser (plugin only available for these)
The official installation guide for using Docker compose is straightforward (other approaches available). But seeing an applied example that works is always nice.
Step 1: Register a Subdomain
You can host Hoarder locally on your machine, but you probably want to use it on multiple devices. Visit your domain registrar and add an A-record (e.g., bookmarks.yourdomain.com). For help setting this up, refer to this guide.
Step 2: Create a New Directory
On your Linux server, create a directory for Hoarder:
mkdir hoarder
cd hoarder
Step 3: Download the Docker Compose File
wget https://raw.githubusercontent.com/hoarder-app/hoarder/main/docker/docker-compose.yml
Understanding the Structure of Hoarder Through the .yml File
The content of the .yml file looks something like this:
version: "3.8"
services:
web:
image: ghcr.io/hoarder-app/hoarder:${HOARDER_VERSION:-release}
restart: unless-stopped
volumes:
- data:/data
ports:
- 3000:3000
env_file:
- .env
environment:
MEILI_ADDR: http://meilisearch:7700
BROWSER_WEB_URL: http://chrome:9222
# OPENAI_API_KEY: ...
DATA_DIR: /data
chrome:
image: gcr.io/zenika-hub/alpine-chrome:123
restart: unless-stopped
command:
- --no-sandbox
- --disable-gpu
- --disable-dev-shm-usage
- --remote-debugging-address=0.0.0.0
- --remote-debugging-port=9222
- --hide-scrollbars
meilisearch:
image: getmeili/meilisearch:v1.11.1
restart: unless-stopped
env_file:
- .env
environment:
MEILI_NO_ANALYTICS: "true"
volumes:
- meilisearch:/meili_data
volumes:
meilisearch:
data:
The file defines three services: web, chrome, and meilisearch.
The web service is the main Hoarder application. It runs on port 3000 and uses the data volume mounted at /data for persistent storage. This is where bookmarks and related metadata are stored. Configuration settings, such as the addresses of the chrome and meilisearch services, are loaded from a .env file.
The chrome service provides browser rendering capabilities by running an Alpine-based Chrome browser in headless mode. It is responsible for tasks such as generating link previews, fetching images, and extracting descriptions for bookmarks. The web service communicates with chrome over port 9222 using the Chrome DevTools Protocol. The debugging address is set to allow internal communication within the Docker network.
The meilisearch service acts as the search backend, enabling fast and efficient natural language queries. For example, users can search for bookmarks using phrases like “R programming tutorial about beautiful color palettes,” and Meilisearch will return relevant matches. It maintains its own data directory, stored in the meilisearch volume, where it creates optimized indexes for querying. The web service feeds bookmark data to Meilisearch over port 7700.
In summary, the web service manages bookmarks and user interactions, chrome enhances the functionality with automated rendering and preview generation, and meilisearch ensures that searches are fast and accurate.
Note: To change the default port (3000), you can update it to something like 3005:3000 in the .yml file. Ensure that the second number (3000) remains unchanged, as it represents the internal port used by the app.
Step 4: Configure Settings
You could make changes to the compose file to adjust settings, but these would be overwritten with updates. That is why using an .env file is recommended. Create one with:
sudo nano .env
Now populate it. Here are the most notable settings as the time of writing. Have look at the annotations.
HOARDER_VERSION=release
NEXTAUTH_SECRET=super_random_string
NEXTAUTH_URL=http://booksmarks.yourdomain.com:3000
MEILI_ADDR=http://meilisearch:7700
MEILI_MASTER_KEY=another_random_string
OPENAI_API_KEY=your_api_key
INFERENCE_TEXT_MODEL=gpt-4o-mini
INFERENCE_IMAGE_MODEL=gpt-4o-mini
INFERENCE_CONTEXT_LENGTH=2048
INFERENCE_LANG=english
MAX_ASSET_SIZE_MB=50
DISABLE_SIGNUPS=false
OCR_LANGS=eng
CRAWLER_FULL_PAGE_SCREENSHOT=false
CRAWLER_FULL_PAGE_ARCHIVE=false
CRAWLER_NAVIGATE_TIMEOUT_SEC=60
CRAWLER_VIDEO_DOWNLOAD=false
CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE=250
- 1
- Pulls the latest stable version. Alternatively, specify a version (e.g., 0.10.0)
- 2
- Hoarder uses Nextauth for authenticated login. Enter a strong password here. You can generate one by running “openssl rand -base64 36”
- 3
- Your subdomain URL followed by Hoarders default port. If you use another port, don´t forget to use it here.
- 4
- Optionally, you can enable Meilisearch. It is a lightweight and open source search engine that allows you to smart search your booksmarks.
- 5
- Set the key for Meilisearch.
- 6
- Optionally, Hoarder uses LLMs for automatic tagging of saved resources. You can either use the OpenAI API, or a self-hosted Ollama model. See the documentation for how to set up Ollama. I use the OpenAI API, because I use a low powered VPS that does not have the resources to run LLMs itself smoothly.
- 7
- Use any OpenAI model you like. The GPT-4o-Mini model is capable of producing good bookmark tags, and doesn´t break the bank (~3000 requests per 1 $ US Dollar).
- 8
- Set the max. context length of a query. 2048 is probably enough for tagging. Setting this higher will consume more money or resources.
- 9
- Set the language of generated tags.
- 10
- Sets the max. size of uploaded resources. Default is 4 MB.
- 11
- Here you can enable or disable user signups for your instance. Keep this “false” for now, so you can sign up yourself.
- 12
- Set the language for tessaract (extracts text from images)
- 13
- If you have the disk space, you can enable to store a full screenshot of a given page.
- 14
- If you have the disk space, you can automatically archive pages.
- 15
- Give the crawler 60 seconds to navigate to a given page. Default is 30. I increased the amount in case of a bad connection.
- 16
- Automatic video downloads.
- 17
- Max. size of downloaded videos. This determines the quality of videos. Default is 50. -1 disables any limit.
Once configured, start your instance with:
docker compose up -d
Step 5: Configure Nginx Proxy Manager
Open your Nginx Proxy Manager (NPM) interface and add a new proxy host for your subdomain. Use 127.17.0.1 or localhost along with Hoarder’s default port (3000, or the custom port you specified). Enable websocket support, exploit protection, and SSL.
Step 6: Sign Up
Navigate to the subdomain you set up, and you should be greeted by Hoarder’s login interface. Create an account to get started.
And voilà! You’re now inside your Hoarder instance. Before making any changes, I went back to the .env file and set DISABLE_SIGNUPS=true. Since I’m the only user for now, this step ensures that bots can’t create accounts. After updating the file, I ran docker compose up -d again to apply the changes.
To test everything, including the AI tagging feature using the OpenAI API, I added four R-related links I wanted to revisit later.
The Chrome scraper worked perfectly, fetching both preview images and descriptions for the links. And AI-generated tags were added automatically! These tags are generally accurate enough for efficient searching later. For example, when I added a Posit blog post about brand.yml (a tool for maintaining consistent branding across Quarto publications), the generated tags were relevant.
However, not all tags were spot on. A GitHub repository showcasing a Quarto close-read example generated overly generic tags. I simply added two more to improve it. The good news? The four API calls to OpenAI cost a fraction of a cent in total; an good trade-off for the time saved.
And that’s it! You can now access your Hoarder instance from your subdomain and start organizing your online discoveries with ease.
Step 7: Install Browser Plugin and Mobile App
To quickly save bookmarks into Hoarder, install the browser plugin and mobile app. Check the official documentation for installation links and setup instructions.
Giving Back
There is no doubt that Hoarder will save you time and lead to a better online experience. If you want to motivate the devs around creator Mohamed Bassem to keep working on the project, or just say thanks, you can sponsor the project on GitHub or buy Mohamed a coffee. Hoarder is an open source project, so engaging with it in discussions, report bugs, suggesting features, and contributing code is also a great way to keep it thriving.