Note: this page details work I did to improve the thumbnails in DSpace versions 7.5 and 7.6. See the summary of upstream progress.


To read about my ongoing work to improve DSpace thumbnails see: Evaluating JPEG, WebP, and AVIF.

Improving DSpace PDF Thumbnails

DSpace has supported using ImageMagick to generate thumbnails for PDFs since version 5.0. It is possible to increase the thumbnail resolution¹, but these thumbnails are generally still a bit blurry due to a side effect of the resize operation.

In my experience the thumbnail quality can be improved by using a "supersampling" technique and by preferring the PDF CropBox over the MediaBox where possible.

I propose adding the -density 144 and -define pdf:use-cropbox=true parameters to the DSpace ImageMagick PDF thumbnail operation in DSpace 6.x and 7.x. Read below for more background and examples.

― Alan Orth (@alanorth)

Supersampling

ImageMagick's default resolution is 72 DPI. If we read the input file at a higher density and then generate a thumbnail from that, the resulting image more accurately resembles what the user would see if they opened the PDF on a screen. This is especially noticeable if the PDF contains text, gradients, or curved lines.

Based on my experience with the handful of PDFs here, the performance impact of supersampling with a "2x" density of 144 is:

Read more about the "supersampling" technique in ImageMagick.

Preferring the CropBox

ImageMagick uses the PDF MediaBox to generate the thumbnail by default, but this can produce unexpected side effects in certain PDFs because the MediaBox is generally used for print. In most cases it is better to use the CropBox because this defines the area the user sees when opening the PDF on a screen.

Read more about the CropBox and PDF page boxes in general.

The thumbnails below are 800 pixels on their longest side — usually height — and are rendered at 400 pixels in CSS for crispness and ease of comparison here. Use the slider overlay on each image to see the before and after application of the proposed parameters in ImageMagick.

10568/103447

Notes:
The text is less bold and more accurate.

10568/116598

Notes:
The source PDF uses CMYK.
ImageMagick incorrectly creates a two-page thumbnail here without the use of -define pdf:use-cropbox=true!

10568/3149

Notes:
The text is less bold and more accurate.

10568/51999

Notes:
The source PDF uses CMYK.
The text is less bold and more accurate.

10568/53155

Notes:
The text is less jagged and more accurate.

10568/68624

Notes:
The source PDF uses CMYK.
The farmer's shirt and hat are less distorted.
The text is more accurate.

10568/68680

10568/71249

Notes:
The source PDF uses CMYK.
The text is more accurate.

10568/72646

Notes:
The text is noticeably more sharp.

10568/75477

Notes:
Both the text and image are more accurate.

10568/76976

Notes:
Both the text and image are more accurate (notice the curved line in the blue sky and the woman's straw hat).

10568/77628

Notes:
The source PDF uses CMYK.
Both the text and image are more accurate (notice the man's shirt and watering can).

10568/97925

Notes:
The text is more accurate.

10568/108972

Notes:
Both the text and image are more accurate.

10568/3030

Notes:
The source PDF uses CMYK.
Both the text and image are more accurate.

More Information

The thumbnails in this gallery were generated by the src/create-thumbnails.sh script using PDF bitstreams from the CGSpace repository. CGSpace was running DSpace 6.3 at the time of writing.

Upstream Progress

Future Work

Future work may include:

Changelog