Why and how to run your own FreeBSD package cache.

June 18, 2024

Motivation:

The official FreeBSD package repositories or more specifically the CDN delivering these packages as a service to the public can be slow depending on where in the world you are. Also the more bandwidth and requests per second you add to their load (e.g. with “clever” parallel pkg-fetch(8) scripts) the less there is for everyone else.

Problem analysis:

The official pkg(7) repository databases are signed by a FreeBSD project public key-pair (a copy of the public halves for validation can be found in /usr/share/keys/pkg/). The repository databases in turn contain strong cryptographic hashes of all contained packages. This means that while FreeBSD packages are by default fetched via HTTPS the transport encryption is not required for integrity.

The FreeBSD package CDN mirrors provide a valid ETag, but other than that are configured to be “hostile” to third party caching (Cache-Control: max-age=0, private).

The pkg(7) code assumes that repositories (database + packages) are in sync. To avoid user frustration our cache must not return stale cache hits.

A possible solution:

Use Varnish as HTTP cache and validate every cache hit with a HEAD request against the FreeBSD package CDN to compare the latest ETag with the cached response and stunnel to maintain transport encryption from the cache to the FreeBSD package CDN since Varnish does not support HTTPS directly.

Find the fastest upstream servers for your cache.

The different FreeBSD package CDN servers will offer vastly different bandwidth and latency depending on where in the world your cache is. To find the fastest servers as upstream for your cache install the fastest_pkg command: pkg install ports-mgmt/fastest_pkg). Run fastest_pkg on your intended caching server. Save the output for later (and ignore its recommendation to change your configuration).

Setup the TLS proxy.

Run pkg install security/stunnel to install stunnel (or pkg install --yes -- security/stunnel to install without asking for confirmation).

Reduce the /usr/local/etc/stunnel/stunnel.conf configuration from the lengthy example to just this:

; ****************************************
; * Global options                       *
; ****************************************

; (Useful for troubleshooting)
;	foreground = yes
;	debug      = info
;	output     = /tmp/stunnel.log

; ****************************************
; * Include configuration file fragments *
; ****************************************

include = /usr/local/etc/stunnel/conf.d

Configure stunnel to run as daemon by placing the following fragment in /usr/local/etc/stunnel/conf.d/00-daemon.conf:

pid    = /var/run/stunnel/stunnel.pid
setuid = stunnel
setgid = stunnel

Now it's time to list those mirror servers you consider fast enough to be useful upstreams in /usr/local/etc/stunnel/conf.d/pkg.conf (sorted in descending order by measured bandwidth) e.g.:

[pkg0.fra.freebsd.org]
client      = yes
accept      = 127.0.0.1:8000
connect     = pkg0.fra.freebsd.org:443
verifyChain = yes
CApath      = /etc/ssl/certs
checkHost   = pkg.freebsd.org
OCSPaia     = yes

[pkg0.sjb.freebsd.org]
client      = yes
accept      = 127.0.0.1:8001
connect     = pkg0.sjb.freebsd.org:443
verifyChain = yes
CApath      = /etc/ssl/certs
checkHost   = pkg.freebsd.org
OCSPaia     = yes

[pkg0.nyi.freebsd.org]
client      = yes
accept      = 127.0.0.1:8002
connect     = pkg0.nyi.freebsd.org:443
verifyChain = yes
CApath      = /etc/ssl/certs
checkHost   = pkg.freebsd.org
OCSPaia     = yes

; ... Continue with more servers. ...

Now enable and start the stunnel service (service stunnel enable followed by service stunnel start).

You can manually test your HTTP to HTTPS proxy with fetch -vv -o /dev/null http://localhost:8000 (increment the port number for each server). If you're already familiar with a different tool (e.g. curl, wget) you can use it instead.

Setup the HTTP cache.

Unless configured otherwise Varnish will consume as much main memory as possible. Assuming the package cache this is supposed to be just one service among many on your server lets define a new login class for Varnish and restrict it to 1GiB of resident memory (if there is memory pressure).

Append the following lines to /etc/login.conf to define a memory limited login class named varnish based on the daemon class:

varnish:\
          :memoryuse=1024M:\
          :tc=daemon

Any time /etc/login.conf is modified the read-only database /etc/login.conf.db has to be regenerated using cap_mkdb(1) like this: cap_mkdb /etc/login.conf.

Now install Varnish by running pkg install www/varnish7.

Use sysrc(8) to configure the varnishd rc.d service (login class, listen address address and port, configuration file to load, storage to use for the cache content):

sysrc \
  varnishd_login_class="varnish" \
  varnishd_listen=":80" \
  varnishd_config="/usr/local/etc/varnish/pkg.vcl" \
  varnishd_storage="file,/var/cache/varnish/varnish.cache,50G,2M,sequential"

Place the following configuration into /usr/local/etc/varnish/pkg.vcl. Change the backends according to your fastest_pkg output (and cut-off point for lowest acceptable bandwidth):

vcl 4.0;

# This configuration uses the workaround described in [1] to validate cache hits using a HTTP HEAD request
# with the cached ETag to work around the "Cache-Control: max-age=0, private" returned by FreeBSD package mirrors.
#
# [1] : https://info.varnish-software.com/blog/systematic-content-validation-with-varnish
#       Archived at: https://web.archive.org/web/20240225012414/https://info.varnish-software.com/blog/systematic-content-validation-with-varnish

import directors;

# Define a backend for each FreeBSD package mirror.
# Use slow health checks to reduce the load on the project infrastructure.
# The backend definitions are sorted by measured bandwidth.

backend fra { # 16.2 MB/s
	.host = "127.0.0.1";
	.port = "8000";
	.probe = {
		.url       = "/";
		.timeout   = 5s;
		.interval  = 69s;
		.window    = 23;
		.threshold = 5;
	}
}

backend sjb { # 10.3 MB/s
	.host = "127.0.0.1";
	.port = "8001";
	.probe = {
		.url       = "/";
		.timeout   = 5s;
		.interval  = 69s;
		.window    = 23;
		.threshold = 5;
	}
}

backend nyi { # 4.8 MB/s
	.host = "127.0.0.1";
	.port = "8002";
	.probe = {
		.url       = "/";
		.timeout   = 5s;
		.interval  = 69s;
		.window    = 23;
		.threshold = 5;
	}
}
# Add more servers as needed...

# On load create a fallback type director and populate it
# with the known FreeBSD package mirrors in order of bandwidth
# measured by fastest_pkg (from ports-mgmt/fastest_pkg).
sub vcl_init {
	new pkg = directors.fallback();
	pkg.add_backend(fra ); # 16.2 MB/s
	pkg.add_backend(sjb ); # 10.3 MB/s
	pkg.add_backend(nyi ); # 4.8 MB/s
}

# Use restarts to probe the cache validity by ETag.
# Possible states are:
#   - init (req.restarts == 0)
#   - "cache_check"
#   - "backend_check"
#   - "valid"
#
# On misses no re restarts are performed. On hits
# the following state machine runs multiple steps:
#
# ┌────────────────┐
# │                ▼
# │            ┌───────┐
# │     ┌──────┤ recv  ├─────┐
# │     │      └───────┘     │
# │     ▼                    ▼
# │ ┌───────┐            ┌───────┐   ┌──────────────────┐
# │ │ hash  │            │ pass  ├──▶│ backend_fetch    │
# │ └───┬───┘            └───────┘   └─────────┬────────┘
# │     ▼                                      ▼
# │ ┌───────┐  ┌───────┐             ┌──────────────────┐
# ├─┤ hit   │  │ miss  │             │ backend_response │
# │ └───────┘  └───────┘             └─────────┬────────┘
# │                                            │
# │ ┌─────────┐                                │
# └─┤ deliver │◀───────────────────────────────┘
#   └─────────┘
#
# - First start:
#   * Save the Etag.
#   * Restart, because we need to go to the backend.
# - 1st restart:
#   * Pass, because we don't necessarilly want to put the object in cache.
#   * Use a HEAD request to fetch only the headers (including the ETag).
#   * If the backend returns a different ETag evict the conflicting cache entry.
#   * Restart (again).
# - 2nd (and last) restart
#   * Just act normal this time.

# Setup state machine and begin recording the cache hit/miss.
sub vcl_recv {
	# The first time (not yet restarted).
	if (req.restarts == 0) {
		# Use the failover director of FreeBSD package mirrors.
		set req.backend_hint = pkg.backend();

		# Clear the cache hit/miss header.
		unset req.http.X-Cache;

		# Set the internal state to "cache_check".
		set req.http.X-State = "cache_check";
		return (hash);
	# The second time (first restart).
	} else if (req.http.X-State == "backend_check") {
		return (pass);
	# The third (and last) time.
	} else {
		return (hash);
	}
}

# Hash only the URL not the Host/IP address allowing clients to share
# the cache no matter under which Host/IP address they use it.
sub vcl_hash {
	hash_data(req.url);
	return (lookup);
}

# Depending on the X-State...
sub vcl_hit {
	# Extract the ETag from the HEAD reponse.
	if (req.http.X-State == "cache_check") {
		set req.http.X-State = "backend_check";
		set req.http.etag = obj.http.etag;
		return (restart);
	# Record the cache hit
	} else {
		if (obj.ttl <= 0s && obj.grace > 0s) {
			set req.http.X-Cache = "hit graced";
		} else {
			set req.http.X-Cache = "hit";
		}
		return (deliver);
	}
}

# Record the cache miss.
sub vcl_miss {
	set req.http.X-Cache = "miss";
}

# Record the cache pass.
sub vcl_pass {
	set req.http.X-Cache = "pass";
}

# Record pipelined uncachable request.
sub vcl_pipe {
	set req.http.X-Cache = "pipe uncacheable";
}

# Record synthetic responses
sub vcl_synth {
	set req.http.X-Cache = "synth synth";

	# Show the information in the response
	set resp.http.X-Cache = req.http.X-Cache;
}

# Change the HTTP method to HEAD when probing the backend
# FreeBSD package mirrrors for the latest ETag.
sub vcl_backend_fetch {
	if ( bereq.http.X-State == "backend_check" ) {
		set bereq.method = "HEAD";
		set bereq.http.method = "HEAD";
	}
}

# Evict invalidated cache entries.
sub vcl_backend_response {
	# Is the the response to the HTTP HEAD probing request?
	if ( bereq.http.X-State == "backend_check" ) {
		# Evict objects that failed ETag validation.
		if (bereq.http.etag != beresp.http.etag) {
			ban("obj.http.etag == " + bereq.http.etag);
		}
	# Otherwise cache successful responses.
	} else if ( beresp.status == 200 ) {
		# The FreeBSD package mirrors are configured with "Cache-Control: max-age=0, private" which would prevent caching.
		# Set the TTL to 1 second to cache it at all.
		unset beresp.http.cache-control;
		set beresp.ttl = 7d;

		# Keep the response in cache for 7 days if the response has validating headers.
		if (beresp.http.ETag || beresp.http.Last-Modified) {
			set beresp.keep = 7d;
		}
		return(deliver);
	}
}

# Make sure to only deliver real responses.
sub vcl_deliver {
	# The client wants the real response not the response to the probe
	# for the latest ETag so restart (again).
	if (req.http.X-State == "backend_check") {
		set req.http.X-State = "valid";
		return (restart);
	}

	# Append cachability to the X-Cache header.
	if (obj.uncacheable) {
		set req.http.X-Cache = req.http.X-Cache + " uncacheable";
	} else {
		set req.http.X-Cache = req.http.X-Cache + " cached";
	}

	# Show the information in the response
	set resp.http.X-Cache = req.http.X-Cache;
}

The Varnish package installs two services: varnishd and varnishlog. The later consumes the logs from an in-memory buffer and writes them to the file system. Enable both services using service varnishlog enable; service varnishd enable and start them service varnishd start; service varnishlog start.

Put this in /usr/local/etc/newsyslog.conf.d/varnish.conf to enable log rotation via newsyslog(8):

/var/log/varnish.log varnishlog:varnish 640 7 * @T00 B /var/run/varnishlog.pid

To enjoy you new cache put this in /usr/local/etc/pkg/repos/FreeBSD.conf:

FreeBSD {
	url         = "http://localhost/${ABI}/latest"
	mirror_type = "NONE"
}

Replace localhost with your cache's resolvable hostname or IP address as needed.

Mastodon: https://bsd.network/web/@crest