CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

July, 2025

2025-07-24

  • A few days ago I deployed nginx caching for the DSpace Angular frontend
    • It works on collection, community, and item pages
    • I modified the nginx proxy cache key to: $request_uri$mapped_language$mapped_encoding
  • We need to use an nginx map on the language and encoding to normalize and prefer some values to maximize cache placement
    • For example, “en_US”, “en_GB”, and “en” are all English so we normalize to “en”
    • Similarly, for Accept-Encoding we prefer gzip instead of br or zstd since ALL clients support that
    • I configured the cache to allow bypass when the request had a DSpace session or query strings, but after investigating cache bypasses I found a bunch of query strings being used:
?utm_source=chatgpt.com
?fbclid=IwZXh0bg...
?trk=public_post_comment-text
?zarsrc=30&utm_source=zalo
?utm_source=perplexity
  • Item pages don’t use query strings, but communities and collections do!
/collections/464056ca-9a74-4ecf-b8df-c7e63db30cc5?cp.page=5
  • And considering that collections are dynamic because they change every time an item is submitted, we might actually want to skip caching those…
  • Well this is an interesting cache miss:
54.174.62.182 - - [24/Jul/2025:20:11:29 +0200] "HEAD /items/6112e853-2ea0-4b69-ba3e-582b2f5b7766?hsenc=123&_hsmi=123&hsCtaTracking=123&utm_medium=123&utm_source=123&utm_campaign=1231111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 HTTP/1.1" 200 0 "-" "HubSpot-Link-Resolver" MISS
  • Found another new user agent, and also possible replacement for FeedBurner:
Inoreader/1.0 (+http://www.inoreader.com/feed-fetcher; 1 subscribers; )