2025-07-24
- A few days ago I deployed nginx caching for the DSpace Angular frontend
- It works on collection, community, and item pages
- I modified the nginx proxy cache key to:
$request_uri$mapped_language$mapped_encoding
- We need to use an nginx map on the language and encoding to normalize and prefer some values to maximize cache placement
- For example, “en_US”, “en_GB”, and “en” are all English so we normalize to “en”
- Similarly, for
Accept-Encoding
we prefer gzip instead of br or zstd since ALL clients support that
- I configured the cache to allow bypass when the request had a DSpace session or query strings, but after investigating cache bypasses I found a bunch of query strings being used:
?utm_source=chatgpt.com
?fbclid=IwZXh0bg...
?trk=public_post_comment-text
?zarsrc=30&utm_source=zalo
?utm_source=perplexity
- Item pages don’t use query strings, but communities and collections do!
/collections/464056ca-9a74-4ecf-b8df-c7e63db30cc5?cp.page=5
- And considering that collections are dynamic because they change every time an item is submitted, we might actually want to skip caching those…
- Well this is an interesting cache miss:
54.174.62.182 - - [24/Jul/2025:20:11:29 +0200] "HEAD /items/6112e853-2ea0-4b69-ba3e-582b2f5b7766?hsenc=123&_hsmi=123&hsCtaTracking=123&utm_medium=123&utm_source=123&utm_campaign=1231111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 HTTP/1.1" 200 0 "-" "HubSpot-Link-Resolver" MISS
- Found another new user agent, and also possible replacement for FeedBurner:
Inoreader/1.0 (+http://www.inoreader.com/feed-fetcher; 1 subscribers; )