CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

April, 2023

2023-04-02

  • Run all system updates on CGSpace and reboot it
  • I exported CGSpace to CSV to check for any missing Initiative collection mappings
    • I also did a check for missing country/region mappings with csv-metadata-quality
  • Start a harvest on AReS
  • I’m starting to get annoyed at my shell script for doing ImageMagick tests and looking to re-write it in something object oriented like Python
    • There doesn’t seem to be an official ImageMagick Python binding on pypi.org, perhaps I can use Wand?
  • Testing Wand in Python:
from wand.image import Image

with Image(filename='data/10568-103447.pdf[0]', resolution=144) as first_page:
    print(first_page.height)
  • I spent more time re-working my thumbnail scripts to compare the resized images and other minor changes
    • I am realizing that doing the thumbnails directly from the source improves the ssimulacra2 score by 1-3% points compared to DSpace’s method of creating a lossy supersample followed by a lossy resized thumbnail

2023-04-03

  • The harvest on AReS that I started yesterday never finished, and actually seems to have died…
    • Also, Fabio and Patrizio from Alliance emailed me to ask if there is something wrong with the REST API because they are having problems
    • I stopped the harvest and started the plugins to get the remaining items via the sitemap…

2023-04-04

  • Presentation about CGSpace metadata, controlled vocabularies, and curation to Pooja’s communications and development team at UNEP
  • Someone from the system organization contacted me to ask how to download a few thousand PDFs from a spreadsheet with DOIs and Handles
$ csvcut -c Handle ~/Downloads/2023-04-04-Donald.csv \
    | sed \
        -e 1d \
        -e 's_https://hdl.handle.net/__' \
        -e 's_https://cgspace.cgiar.org/handle/__' \
        -e 's_http://hdl.handle.net/__' \
    | sort -u > /tmp/handles.txt
  • Then I used the get_dspace_pdfs.py script to download them

2023-04-05

  • After some cleanup on Donald’s DOIs I started the get_scihub_pdfs.py script

2023-04-06

  • I did some more work to cleanup and streamline my next generation of DSpace thumbnail testing scripts
    • I think I found a bug in ImageMagick 7.1.1.5 where CMYK to sRGB conversion fails if we use image operations like -density or -define before reading the input file
    • I started a discussion on the ImageMagick GitHub to ask
  • Yesterday I started downloading the rest of the PDFs from Donald, those that had DOIs
    • As a measure of caution, I extracted the list of DOIs and used my crossref_doi_lookup.py script to get their licenses from Crossref:
$ ./ilri/crossref_doi_lookup.py -e xxxx@i.org -i /tmp/dois.txt -o /tmp/donald-crossref-dois.csv -d
  • Then I did some CSV manipulation to extract the DOIs that were Creative Commons licensed, excluding any that were “No Derivatives”, and re-formatting the DOIs:
$ csvcut -c doi,license /tmp/donald-crossref-dois.csv \
  | csvgrep -c license -m 'creativecommons' \
  | csvgrep -c license -i -r 'by-(nd|nc-nd)' \
  | sed -e 's_^10_https://doi.org/10_' \
    -e 's/\(am\|tdm\|unspecified\|vor\): //' \
  | tee /tmp/donald-open-dois.csv \
  | wc -l
4268
  • From those I filtered for the DOIs for which I had downloaded PDFs, in the filename column of the Sci-Hub script and copied them to a separate directory:
$ for file in $(csvjoin -c doi /tmp/donald-doi-pdfs.csv /tmp/donald-open-dois.csv | csvgrep -c filename -i -r '^$' | csvcut -c filename | sed 1d); do cp --reflink=always "$file" "creative-commons-licensed/$file"; done
  • I used BTRFS copy-on-write via reflinks to make sure I didn’t duplicate the files :-D
  • I ran out of time and had to stop the process around 3,127 PDFs
    • I zipped them up and sent them to the others, along with a CSV of the DOIs, PDF filenames, and licenses

2023-04-17

  • Abenet noticed a weird issue with this item
    • The item has metadata, but the page is blank
    • When I try to edit the item’s authorization policies in XMLUI I get a nullPointerException:
Java stacktrace: java.lang.NullPointerException
	at org.dspace.app.xmlui.aspect.administrative.authorization.EditItemPolicies.addBody(EditItemPolicies.java:166)
	at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:234)
	at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy201.startElement(Unknown Source)
	at org.apache.cocoon.components.sax.XMLTeePipe.startElement(XMLTeePipe.java:87)
	at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
	at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:251)
	at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy203.startElement(Unknown Source)
	at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
	at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:251)
	at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy203.startElement(Unknown Source)
	at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
	at org.apache.cocoon.components.sax.XMLTeePipe.startElement(XMLTeePipe.java:87)
	at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
	at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:251)
	at sun.reflect.GeneratedMethodAccessor347.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy203.startElement(Unknown Source)
	at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
	at org.apache.cocoon.components.sax.XMLTeePipe.startElement(XMLTeePipe.java:87)
	at org.apache.cocoon.components.sax.AbstractXMLByteStreamInterpreter.parse(AbstractXMLByteStreamInterpreter.java:117)
	at org.apache.cocoon.components.sax.XMLByteStreamInterpreter.deserialize(XMLByteStreamInterpreter.java:44)
	at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:324)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
	at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy191.process(Unknown Source)
	at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
	at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
	at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
	at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
	at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy198.generate(Unknown Source)
	at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:326)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
	at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy191.process(Unknown Source)
	at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
	at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
	at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
	at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
	at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy198.generate(Unknown Source)
	at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:326)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
	at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy191.process(Unknown Source)
	at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
	at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
	at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
	at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
	at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy198.generate(Unknown Source)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
	at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
	at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy191.process(Unknown Source)
	at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
	at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
	at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
	at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
	at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy198.generate(Unknown Source)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
	at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
	at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy191.process(Unknown Source)
	at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
	at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
	at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
	at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
	at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy198.generate(Unknown Source)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
	at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
	at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy191.process(Unknown Source)
	at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
	at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
	at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
	at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
	at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy198.generate(Unknown Source)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
	at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
	at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy191.process(Unknown Source)
	at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
	at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
	at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
	at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
	at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy198.generate(Unknown Source)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
	at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
	at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy191.process(Unknown Source)
	at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
	at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
	at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
	at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
	at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy198.generate(Unknown Source)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
	at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
	at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy191.process(Unknown Source)
	at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
	at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
	at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
	at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
	at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy198.generate(Unknown Source)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
	at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
	at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy191.process(Unknown Source)
	at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
	at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
	at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
	at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
	at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy198.generate(Unknown Source)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
	at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:750)
	at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy191.process(Unknown Source)
	at org.apache.cocoon.components.source.impl.SitemapSource.toSAX(SitemapSource.java:362)
	at org.apache.cocoon.components.source.util.SourceUtil.toSAX(SourceUtil.java:111)
	at org.apache.cocoon.components.source.util.SourceUtil.parse(SourceUtil.java:294)
	at org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:136)
	at sun.reflect.GeneratedMethodAccessor436.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy198.generate(Unknown Source)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.processXMLPipeline(AbstractProcessingPipeline.java:544)
	at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.processXMLPipeline(AbstractCachingProcessingPipeline.java:273)
	at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:439)
	at sun.reflect.GeneratedMethodAccessor255.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
	at com.sun.proxy.$Proxy191.process(Unknown Source)
	at org.apache.cocoon.components.treeprocessor.sitemap.SerializeNode.invoke(SerializeNode.java:147)
	at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
	at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
	at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
	at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
	at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
	at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
	at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
	at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
	at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
	at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
	at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
	at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
	at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
	at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
	at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
	at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
	at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
	at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
	at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
	at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
	at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
	at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
	at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
	at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
	at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
	at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
	at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
	at org.apache.cocoon.servlet.RequestProcessor.process(RequestProcessor.java:351)
	at org.apache.cocoon.servlet.RequestProcessor.service(RequestProcessor.java:169)
	at org.apache.cocoon.sitemap.SitemapServlet.service(SitemapServlet.java:84)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
	at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:468)
	at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:443)
	at org.apache.cocoon.servletservice.spring.ServletFactoryBean$ServiceInterceptor.invoke(ServletFactoryBean.java:264)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
	at com.sun.proxy.$Proxy186.service(Unknown Source)
	at org.dspace.springmvc.CocoonView.render(CocoonView.java:113)
	at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1216)
	at org.springframework.web.servlet.DispatcherServlet.processDispatchResult(DispatcherServlet.java:1001)
	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:945)
	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:867)
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:951)
	at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:853)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
	at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:827)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:113)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter.doFilter(DSpaceCocoonServletFilter.java:160)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.dspace.app.xmlui.cocoon.servlet.multipart.DSpaceMultipartFilter.doFilter(DSpaceMultipartFilter.java:119)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:165)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
	at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:235)
	at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:451)
	at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1201)
	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
	at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:750)
  • I don’t see anything on the DSpace issue tracker or mailing list so I asked about it on the DSpace Slack…
  • Peter said CGSpace was slow and I see a lot of locks from the XMLUI
    • I looked and found many locks that were many hours and days old so I killed some:
$ psql < locks-age.sql | grep -E "[[:digit:]] days" | awk -F\| '{print $10}' | sort -u
 1050672
 1053773
 1054602
 1054702
 1056782
 1057629
 1057630
$ psql < locks-age.sql | grep -E "[[:digit:]] days" | awk -F\| '{print $10}' | sort -u | xargs kill
  • I’m also running a dspace cleanup -v, but it doesn’t seem to be finishing
    • I recall something like there being errors in the logs rather than on the command line in DSpace 6…
    • I found it in the DSpace log:
2023-04-17 21:09:46,004 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
  Detail: Key (uuid)=(a7ddf477-1c04-4de0-9c7a-4d3c84a875bc) is still referenced from table "bundle".
  • If I mark the primary bitstream as null manually the cleanup script continues until it finds a few more
    • I ended up with a long list of UUIDs to fix before the script would complete:
$ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('a7ddf477-1c04-4de0-9c7a-4d3c84a875bc', '9582b661-9c2d-4c86-be22-c3b0942b646a', '210a4d5d-3af9-46f0-84cc-682dd1431762', '51115f07-0a60-4988-8536-b9ebd2a5e15e', '0fc5021d-3264-413a-b2e2-74bda38a394e', '4704fa62-b8ab-4dfe-b7aa-0e4905f8412a')"
  • This process ended up taking a few days because each iteration ran for over four hours before failing on the next UUID, sighhhhh

2023-04-18

  • Regarding the item Abenet noticed yesterday that has a blank page and a nullPointerException
    • It appears OK on DSpace Test! https://dspacetest.cgiar.org/handle/10568/75611
    • And according to the REST API on CGSpace the item was modified on 2023-04-11, so last week…
    • According to the DSpace logs it was Francesca who edited the item last week, so I asked her for more information before I troubleshoot more

2023-04-19

  • I fixed the Bioversity item by deleting the 9781138781276.jpg bitstream via the REST API
    • I think Francesca might have changed the “format” of it?
    • Anyway, this item has a PDF so we have a proper thumbnail and don’t need that other journal cover one
  • I noticed a URL for this Bioversity item redirects incorrectly
    • I had mentioned this to Maria and Francesca a few months ago but it seems to never have been resolved
  • The dspace cleanup -v finally finished after a few days of running and stopping…
  • I decided to update the thumbnails in the Bioversity books collection because I saw a few old ones suffering from the CropBox issue
  • Also, all day there’s been a high load on CGSpace, with lots of locks in PostgreSQL
    • I had been waiting until the bitstream cleanup finished… now I might need to restart PostgreSQL to kill some old locks as something needs to give
    • I restarted PostgreSQL, but DSpace was still hanging on simple XMLUI options so I ended up restarting Tomcat
  • Tag 544 ORCID identifiers with my script
  • I updated my generation-loss.sh and improved-dspace-thumbnails scripts to include thirty-five PDFs from CGSpace (up from twenty-four) to get a larger sample
    • Now starting to get some numbers comparing JPEG, WebP, and AVIF
    • First, out of curiousity, I checked the average ssimulacra2 scores at Q75, Q80, and Q92 for each format:
Q75 Q80 Q92
JPEG 71 74 88
WebP 74 77 82
AVIF 82 83 86
  • Then I checked the quality and file size (bytes) needed to hit an average ssimulacra2 score of 80 with each format:
    • JPEG: Q89, 124923 bytes
    • WebP: Q86, 84662 bytes (33% smaller than JPEG size)
    • AVIF: Q65, 67597 bytes (56% smaller than JPEG size)
  • Google’s original WebP study uses this technique to compare WebP to JPEG too
    • As the quality settings are not comparable between formats, we need to compare the formats at matching perceptual scores (ssimulacra2 in this case)
    • I used a ssimulacra2 score of 80 because that’s the about the highest score I see with WebP using my samples, though JPEG and AVIF do go higher
    • Also, according to current ssimulacra2 (v2.1), a score of 70 is “high quality” and a score of 90 is “very high quality”, so 80 should be reasonably high enough…
  • Here is a plot of the qualities and ssimulacra2 scores:

Quality vs Score

  • Export CGSpace to check for missing Initiatives mappings

2023-04-22

  • Export the Initiatives collection to run it through csv-metadata-quality
    • I wanted to make sure all the Initiatives items had correct regions
    • I had to manually fix a few license identifiers and ISSNs
    • Also, I found a few items submitted by MEL that had dates in DD/MM/YYYY format, so I sent them to Salem for him to investigate
  • Start a harvest on AReS

2023-04-26

  • Begin working on the list of non-AGROVOC CGSpace subjects for FAO
    • The last time I did this was in 2022-06
    • I used the following SQL query to dump values from all subject fields, lower case them, and group by counts:
localhost/dspacetest= ☘ \COPY (SELECT DISTINCT(lower(text_value)) AS "subject", count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (187, 120, 210, 122, 215, 127, 208, 124, 128, 123, 125, 135, 203, 236, 238, 119) GROUP BY "subject" ORDER BY count DESC) to /tmp/2023-04-26-cgspace-subjects.csv WITH CSV HEADER;
COPY 26315
Time: 2761.981 ms (00:02.762)
  • Then I extracted the subjects and looked them up against AGROVOC:
$ csvcut -c subject /tmp/2023-04-26-cgspace-subjects.csv | sed '1d' > /tmp/2023-04-26-cgspace-subjects.txt
$ ./ilri/agrovoc_lookup.py -i /tmp/2023-04-26-cgspace-subjects.txt -o /tmp/2023-04-26-cgspace-subjects-results.csv

2023-04-27

  • The AGROVOC lookup from yesterday finished, so I extracted all terms that did not match and joined them with the original CSV so I can see the counts:
    • (I also note that the agrovoc_lookup.py script didn’t seem to be caching properly, as it had to look up everything again the next time I ran it despite the requests cache being 174MB!)
csvgrep -c 'number of matches' -r '^0$' /tmp/2023-04-26-cgspace-subjects-results.csv \
  | csvcut -c subject \
  | csvjoin -c subject /tmp/2023-04-26-cgspace-subjects.csv - \
  > /tmp/2023-04-26-cgspace-non-agrovoc.csv
  • I filtered for only those terms that had counts larger than fifty
    • I also removed terms like “forages”, “policy”, “pests and diseases” because those exist as singular or separate terms in AGROVOC
    • I also removed ambiguous terms like “cocoa”, “diversity”, “resistance” etc because there are various other preferred terms for those in AGROVOC
    • I also removed spelling mistakes like “modeling” and “savanas” because those exist in their correct form in AGROVOC
    • I also removed internal CGIAR terms like “tac”, “crp”, “internal review” etc (note: these are mostly from CGIAR System Office’s subjects… perhaps I exclude those next time?)
  • I note that many of our terms would match if they were singular, plural, or split up into separate terms, so perhaps we should pair this with an excercise to review our own terms
  • I couldn’t finish the work locally yet so I uploaded my list to Google Docs to continue later

2023-04-28

  • The ImageMagick CMYK issue is bothering me still
    • I am on a plane currently, but I have a Docker image of ImageMagick 7.1.1-3 and I compared the output of all CMYK PDFs using the same command on my local machine
    • The images from the Docker environment are correct with only -colorspace sRGB (no profiles!) as the commenters on GitHub said
    • This leads me to believe something wrong in my own environment, perhaps Ghostscript…?
    • The container has Ghostscript 9.53.3~dfsg-7+deb11u2 from Debian 11, while my Arch Linux system has Ghostscript 10.01.1-1