CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

October, 2022

2022-10-01

2022-10-03

  • Make two pull requests for DSpace 7.x
  • Udana had asked me about their RSS feed and it not showing the latest publications in his email inbox
    • He is using this feed from FeedBurner: https://feeds.feedburner.com/iwmi-cgspace
    • I don’t have access to the FeedBurner configuration, but I looked at the raw feed and see it’s just getting all the items in the IWMI community
    • This OpenSearch query should do the same: https://cgspace.cgiar.org/open-search/discover?scope=10568/16814&query=*&sort_by=3&order=DESC
    • The sort_by=3 corresponds to webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date in dspace.cfg
  • Peter sent me a CSV file a few days ago that he was unable to upload to CGSpace
    • The stacktrace from the error he was getting was:
Java stacktrace: java.lang.ClassCastException: org.apache.cocoon.servlet.multipart.PartInMemory cannot be cast to org.dspace.app.xmlui.cocoon.servlet.multipart.DSpacePartOnDisk
    at org.dspace.app.xmlui.aspect.administrative.FlowMetadataImportUtils.processUploadCSV(FlowMetadataImportUtils.java:116)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.mozilla.javascript.MemberBox.invoke(MemberBox.java:155)
    at org.mozilla.javascript.NativeJavaMethod.call(NativeJavaMethod.java:243)
    at org.mozilla.javascript.Interpreter.interpretLoop(Interpreter.java:3237)
    at org.mozilla.javascript.Interpreter.interpret(Interpreter.java:2394)
    at org.mozilla.javascript.InterpretedFunction.call(InterpretedFunction.java:162)
    at org.mozilla.javascript.ContextFactory.doTopCall(ContextFactory.java:393)
    at org.mozilla.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:2834)
    at org.mozilla.javascript.InterpretedFunction.call(InterpretedFunction.java:160)
    at org.mozilla.javascript.Context.call(Context.java:538)
    at org.mozilla.javascript.ScriptableObject.callMethod(ScriptableObject.java:1833)
    at org.mozilla.javascript.ScriptableObject.callMethod(ScriptableObject.java:1803)
    at org.apache.cocoon.components.flow.javascript.fom.FOM_JavaScriptInterpreter.handleContinuation(FOM_JavaScriptInterpreter.java:698)
    at org.apache.cocoon.components.treeprocessor.sitemap.CallFunctionNode.invoke(CallFunctionNode.java:94)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
    at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
    at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
    at org.apache.cocoon.components.treeprocessor.sitemap.SelectNode.invoke(SelectNode.java:82)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
    at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
    at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
    at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
    at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.buildPipeline(ConcreteTreeProcessor.java:186)
    at org.apache.cocoon.components.treeprocessor.TreeProcessor.buildPipeline(TreeProcessor.java:260)
    at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:107)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
    at org.apache.cocoon.components.treeprocessor.sitemap.SelectNode.invoke(SelectNode.java:87)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
    at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
    at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
    at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
    at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
    at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.buildPipeline(ConcreteTreeProcessor.java:186)
    at org.apache.cocoon.components.treeprocessor.TreeProcessor.buildPipeline(TreeProcessor.java:260)
    at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:107)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
    at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
    at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
    at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
    at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
    at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.buildPipeline(ConcreteTreeProcessor.java:186)
    at org.apache.cocoon.components.treeprocessor.TreeProcessor.buildPipeline(TreeProcessor.java:260)
    at org.apache.cocoon.components.source.impl.SitemapSource.init(SitemapSource.java:277)
    at org.apache.cocoon.components.source.impl.SitemapSource.<init>(SitemapSource.java:148)
    at org.apache.cocoon.components.source.impl.SitemapSourceFactory.getSource(SitemapSourceFactory.java:62)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:153)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:183)
    at org.apache.cocoon.generation.FileGenerator.setup(FileGenerator.java:99)
    at org.dspace.app.xmlui.cocoon.AspectGenerator.setup(AspectGenerator.java:81)
    at sun.reflect.GeneratedMethodAccessor255.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy190.setup(Unknown Source)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.setupPipeline(AbstractProcessingPipeline.java:343)
    at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.setupPipeline(AbstractCachingProcessingPipeline.java:710)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.preparePipeline(AbstractProcessingPipeline.java:466)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.prepareInternal(AbstractProcessingPipeline.java:480)
    at sun.reflect.GeneratedMethodAccessor267.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy189.prepareInternal(Unknown Source)
    at org.apache.cocoon.components.source.impl.SitemapSource.init(SitemapSource.java:292)
    at org.apache.cocoon.components.source.impl.SitemapSource.<init>(SitemapSource.java:148)
    at org.apache.cocoon.components.source.impl.SitemapSourceFactory.getSource(SitemapSourceFactory.java:62)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:153)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:183)
    at org.apache.cocoon.generation.FileGenerator.setup(FileGenerator.java:99)
    at org.dspace.app.xmlui.cocoon.AspectGenerator.setup(AspectGenerator.java:81)
    at sun.reflect.GeneratedMethodAccessor255.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy190.setup(Unknown Source)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.setupPipeline(AbstractProcessingPipeline.java:343)
    at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.setupPipeline(AbstractCachingProcessingPipeline.java:710)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.preparePipeline(AbstractProcessingPipeline.java:466)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.prepareInternal(AbstractProcessingPipeline.java:480)
    at sun.reflect.GeneratedMethodAccessor267.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy189.prepareInternal(Unknown Source)
    at org.apache.cocoon.components.source.impl.SitemapSource.init(SitemapSource.java:292)
    at org.apache.cocoon.components.source.impl.SitemapSource.<init>(SitemapSource.java:148)
    at org.apache.cocoon.components.source.impl.SitemapSourceFactory.getSource(SitemapSourceFactory.java:62)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:153)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:183)
    at org.apache.cocoon.generation.FileGenerator.setup(FileGenerator.java:99)
    at org.dspace.app.xmlui.cocoon.AspectGenerator.setup(AspectGenerator.java:81)
    at sun.reflect.GeneratedMethodAccessor255.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy190.setup(Unknown Source)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.setupPipeline(AbstractProcessingPipeline.java:343)
    at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.setupPipeline(AbstractCachingProcessingPipeline.java:710)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.preparePipeline(AbstractProcessingPipeline.java:466)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.prepareInternal(AbstractProcessingPipeline.java:480)
    at sun.reflect.GeneratedMethodAccessor267.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy189.prepareInternal(Unknown Source)
    at org.apache.cocoon.components.source.impl.SitemapSource.init(SitemapSource.java:292)
    at org.apache.cocoon.components.source.impl.SitemapSource.<init>(SitemapSource.java:148)
    at org.apache.cocoon.components.source.impl.SitemapSourceFactory.getSource(SitemapSourceFactory.java:62)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:153)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:183)
    at org.apache.cocoon.generation.FileGenerator.setup(FileGenerator.java:99)
    at org.dspace.app.xmlui.cocoon.AspectGenerator.setup(AspectGenerator.java:81)
    at sun.reflect.GeneratedMethodAccessor255.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy190.setup(Unknown Source)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.setupPipeline(AbstractProcessingPipeline.java:343)
    at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.setupPipeline(AbstractCachingProcessingPipeline.java:710)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.preparePipeline(AbstractProcessingPipeline.java:466)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.prepareInternal(AbstractProcessingPipeline.java:480)
    at sun.reflect.GeneratedMethodAccessor267.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy189.prepareInternal(Unknown Source)
    at org.apache.cocoon.components.source.impl.SitemapSource.init(SitemapSource.java:292)
    at org.apache.cocoon.components.source.impl.SitemapSource.<init>(SitemapSource.java:148)
    at org.apache.cocoon.components.source.impl.SitemapSourceFactory.getSource(SitemapSourceFactory.java:62)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:153)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:183)
    at org.apache.cocoon.generation.FileGenerator.setup(FileGenerator.java:99)
    at org.dspace.app.xmlui.cocoon.AspectGenerator.setup(AspectGenerator.java:81)
    at sun.reflect.GeneratedMethodAccessor255.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy190.setup(Unknown Source)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.setupPipeline(AbstractProcessingPipeline.java:343)
    at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.setupPipeline(AbstractCachingProcessingPipeline.java:710)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.preparePipeline(AbstractProcessingPipeline.java:466)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.prepareInternal(AbstractProcessingPipeline.java:480)
    at sun.reflect.GeneratedMethodAccessor267.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy189.prepareInternal(Unknown Source)
    at org.apache.cocoon.components.source.impl.SitemapSource.init(SitemapSource.java:292)
    at org.apache.cocoon.components.source.impl.SitemapSource.<init>(SitemapSource.java:148)
    at org.apache.cocoon.components.source.impl.SitemapSourceFactory.getSource(SitemapSourceFactory.java:62)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:153)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:183)
    at org.apache.cocoon.generation.FileGenerator.setup(FileGenerator.java:99)
    at org.dspace.app.xmlui.cocoon.AspectGenerator.setup(AspectGenerator.java:81)
    at sun.reflect.GeneratedMethodAccessor255.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy190.setup(Unknown Source)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.setupPipeline(AbstractProcessingPipeline.java:343)
    at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.setupPipeline(AbstractCachingProcessingPipeline.java:710)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.preparePipeline(AbstractProcessingPipeline.java:466)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.prepareInternal(AbstractProcessingPipeline.java:480)
    at sun.reflect.GeneratedMethodAccessor267.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy189.prepareInternal(Unknown Source)
    at org.apache.cocoon.components.source.impl.SitemapSource.init(SitemapSource.java:292)
    at org.apache.cocoon.components.source.impl.SitemapSource.<init>(SitemapSource.java:148)
    at org.apache.cocoon.components.source.impl.SitemapSourceFactory.getSource(SitemapSourceFactory.java:62)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:153)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:183)
    at org.apache.cocoon.generation.FileGenerator.setup(FileGenerator.java:99)
    at org.dspace.app.xmlui.cocoon.AspectGenerator.setup(AspectGenerator.java:81)
    at sun.reflect.GeneratedMethodAccessor255.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy190.setup(Unknown Source)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.setupPipeline(AbstractProcessingPipeline.java:343)
    at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.setupPipeline(AbstractCachingProcessingPipeline.java:710)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.preparePipeline(AbstractProcessingPipeline.java:466)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.prepareInternal(AbstractProcessingPipeline.java:480)
    at sun.reflect.GeneratedMethodAccessor267.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy189.prepareInternal(Unknown Source)
    at org.apache.cocoon.components.source.impl.SitemapSource.init(SitemapSource.java:292)
    at org.apache.cocoon.components.source.impl.SitemapSource.<init>(SitemapSource.java:148)
    at org.apache.cocoon.components.source.impl.SitemapSourceFactory.getSource(SitemapSourceFactory.java:62)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:153)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:183)
    at org.apache.cocoon.generation.FileGenerator.setup(FileGenerator.java:99)
    at org.dspace.app.xmlui.cocoon.AspectGenerator.setup(AspectGenerator.java:81)
    at sun.reflect.GeneratedMethodAccessor255.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy190.setup(Unknown Source)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.setupPipeline(AbstractProcessingPipeline.java:343)
    at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.setupPipeline(AbstractCachingProcessingPipeline.java:710)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.preparePipeline(AbstractProcessingPipeline.java:466)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.prepareInternal(AbstractProcessingPipeline.java:480)
    at sun.reflect.GeneratedMethodAccessor267.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy189.prepareInternal(Unknown Source)
    at org.apache.cocoon.components.source.impl.SitemapSource.init(SitemapSource.java:292)
    at org.apache.cocoon.components.source.impl.SitemapSource.<init>(SitemapSource.java:148)
    at org.apache.cocoon.components.source.impl.SitemapSourceFactory.getSource(SitemapSourceFactory.java:62)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:153)
    at org.apache.cocoon.components.source.CocoonSourceResolver.resolveURI(CocoonSourceResolver.java:183)
    at org.apache.cocoon.generation.FileGenerator.setup(FileGenerator.java:99)
    at sun.reflect.GeneratedMethodAccessor255.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy190.setup(Unknown Source)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.setupPipeline(AbstractProcessingPipeline.java:343)
    at org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline.setupPipeline(AbstractCachingProcessingPipeline.java:710)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.preparePipeline(AbstractProcessingPipeline.java:466)
    at org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(AbstractProcessingPipeline.java:411)
    at sun.reflect.GeneratedMethodAccessor331.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
    at com.sun.proxy.$Proxy189.process(Unknown Source)
    at org.apache.cocoon.components.treeprocessor.sitemap.SerializeNode.invoke(SerializeNode.java:147)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
    at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
    at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
    at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
    at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
    at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
    at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
    at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:55)
    at org.apache.cocoon.components.treeprocessor.sitemap.MatchNode.invoke(MatchNode.java:87)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
    at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
    at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
    at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
    at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
    at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
    at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNode.java:117)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
    at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(PipelineNode.java:143)
    at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invokeNodes(AbstractParentProcessingNode.java:78)
    at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(PipelinesNode.java:81)
    at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:239)
    at org.apache.cocoon.components.treeprocessor.ConcreteTreeProcessor.process(ConcreteTreeProcessor.java:171)
    at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcessor.java:247)
    at org.apache.cocoon.servlet.RequestProcessor.process(RequestProcessor.java:351)
    at org.apache.cocoon.servlet.RequestProcessor.service(RequestProcessor.java:169)
    at org.apache.cocoon.sitemap.SitemapServlet.service(SitemapServlet.java:84)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
    at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:468)
    at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:443)
    at org.apache.cocoon.servletservice.spring.ServletFactoryBean$ServiceInterceptor.invoke(ServletFactoryBean.java:264)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
    at com.sun.proxy.$Proxy186.service(Unknown Source)
    at org.dspace.springmvc.CocoonView.render(CocoonView.java:113)
    at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1216)
    at org.springframework.web.servlet.DispatcherServlet.processDispatchResult(DispatcherServlet.java:1001)
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:945)
    at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:867)
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:951)
    at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:853)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
    at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:827)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:113)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter.doFilter(DSpaceCocoonServletFilter.java:160)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.dspace.app.xmlui.cocoon.servlet.multipart.DSpaceMultipartFilter.doFilter(DSpaceMultipartFilter.java:119)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:165)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
    at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:235)
    at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:451)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1201)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:750)
  • So this is a broken side effect from the org.apache.cocoon.uploads.autosave=false change I made a few weeks ago
    • Importing the CSV via the command line works fine

2022-10-04

  • I stumbled across more low-quality thumbnails on CGSpace
    • Some have the description “Generated Thumbnail”, and others are manually uploaded “.jpg.jpg” ones…
    • I want to develop some more thumbnail fixer scripts to the cgspace-java-helpers suite:
      • If an item has an IM Thumbnail and a Generated Thumbnail in the THUMBNAIL bundle, remove the Generated Thumbnail
      • If an item has a PDF bitstream and a JPG bitstream with description /thumbnail/ in the ORIGINAL bundle, remove the /thumbnail/ bitstream in the ORIGINAL bundle and try to remove the /thumbnail/.jpg bitstream in the THUMBNAIL bundle

2022-10-05

  • I updated the cgspace-java-helpers to include a new FixLowQualityThumbnails script to detect the low-quality thumbnails I found above
  • Add missing ORCID identifier for an Alliance author
  • I’ve been running the dspace cleanup -v script every few weeks or months on CGSpace and assuming it finished successfully because I didn’t get a error on the stdout/stderr, but today I noticed that the script keeps saying it is deleting the same bitstreams
    • I looked in dspace.log and found the error I used to see a lot:
Caused by: org.postgresql.util.PSQLException: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
  Detail: Key (uuid)=(99b76ee4-15c6-458c-a940-866148bc7dee) is still referenced from table "bundle".
  • If I mark the primary bitstream as null manually the cleanup script continues until it finds a few more
    • I ended up with a long list of UUIDs to fix before the script would complete:
$ psql -d dspace -c "update bundle set primary_bitstream_id=NULL where primary_bitstream_id in ('b76d41c0-0a02-4f53-bfde-a840ccfff903','1981efaa-eadb-46cd-9d7b-12d7a8cff4c4','97a8b1fa-3c12-4122-9c7b-fc2a3eaf570d','99b76ee4-15c6-458c-a940-866148bc7dee','f330fc22-a787-46e2-b8d0-64cc3e166124','592f4a0d-1ed5-4663-be0e-958c0d3e653b','e73b3178-8f29-42bc-bfd1-1a454903343c','e3a5f592-ac23-4934-a2b2-26735fac0c4f','73f4ff6c-6679-44e8-8cbd-9f28a1df6927','11c9a75c-17a6-4966-a4e8-a473010eb34c','155faf93-92c5-4c17-866e-1db50b1f9687','8e073e9e-ab54-4d99-971a-66de073d51e3','76ddd62c-6499-4a8c-beea-3fc8c60200d8','2850fcc9-f450-430a-9317-c42def74e813','8fef3198-2aea-4bd8-aeab-bf5fccb46e42','9e3c3528-e20f-4da3-a0bd-ae9b8515b770')"

2022-10-06

  • I finished running the cleanup script on CGSpace and the before and after on the number of bitstreams is interesting:
$ find /home/cgspace.cgiar.org/assetstore -type f | wc -l
181094
$ find /home/cgspace.cgiar.org/assetstore -type f | wc -l
178329
  • So that cleaned up ~2,700 bitstreams!
  • Interesting, someone on the DSpace Slack mentioned this as being a known issue with discussion, reproducers, and a pull request: https://github.com/DSpace/DSpace/issues/7348
  • I am having an issue with the new FixLowQualityThumbnails script on some communities like 10568/117865 and 10568/97114
    • For some reason it doesn’t descend into the collections
    • Also, my old FixJpgJpgThumbnails doesn’t either… weird
    • I might have to resort to getting a list of collections and doing it that way:
$ psql -h localhost -U postgres -d dspacetest -c 'SELECT ds6_collection2collectionhandle(uuid) FROM collection WHERE uuid in (SELECT uuid FROM collection);' |
    sed 1,2d |
    tac |
    sed 1,3d > /tmp/collections
  • Strange, I don’t think doing it by collections is actually working because it says it’s replacing the bitstreams, but it doesn’t actually do it
    • I don’t have time to figure out what’s happening, because I see “update_item” in dspace.log when the script says it’s doing it, but it doesn’t do it
    • I might just extract a list of items that have .jpg.jpg thumbnails from the database and run the script through item mode
    • There might be a problem with the context commit logic…?
  • I exported a list of items that have .jpg.jpg thumbnails on CGSpace:
$ psql -h localhost -p 5432 -U postgres -d dspacetest -c "SELECT ds6_bitstream2itemhandle(dspace_object_id) FROM metadatavalue WHERE text_value ~ '.*\.(jpg|jpeg|JPG|JPEG)\.(jpg|jpeg|JPG|JPEG)' AND dspace_object_id IS NOT NULL;" |
  sed 1,2d |
  tac |
  sed 1,3d |
  grep -v '␀' |
  sort -u |
  sed 's/ //' > /tmp/jpgjpg-handles.txt
  • I restarted DSpace Test because it had high load since yesterday and I don’t know why
  • Run check-duplicates.py on the 1642 MARLO Innovations to try to include matches from the OICRs we uploaded last month
    • Then I processed those matches like I did with the OICRs themselves last month, and then cleaned them one last time with csv-metadata-quality, created a SAF bundle, and uploaded them to CGSpace
    • BTW this bumps CGSpace over 100,000 items…
    • Then I did the same for the 749 MARLO MELIAs and imported them to CGSpace
  • Meeting about CG Core types with Abenet, Marie-Angelique, Sara, Margarita, and Valentina
  • I made some minor logic changes to the FixJpgJpgThumbnails script in cgspace-java-helpers
    • Now it checks to make sure the bitstream description is not empty or null, and also excludes Maps (in addition to Infographics) since those are likely to be JPEG files in the ORIGINAL bundle on purpose

2022-10-07

  • I did the matching and cleaning on the 512 MARLO Policies and uploaded them to CGSpace
  • I sent a list of the IDs and Handles for all four groups of MARLO items to Jose so he can do the redirects on their server:
$ wc -l /tmp/*mappings.csv
  1643 /tmp/crp-innovation-mappings.csv
   750 /tmp/crp-melia-mappings.csv
   683 /tmp/crp-oicr-mappings.csv
   513 /tmp/crp-policy-mappings.csv
  3589 total
  • I fixed the mysterious issue with my cgspace-java-helpers scripts not working on communities and collections
    • It was because the code wasn’t committing the context!
    • I ran both FixJpgJpgThumbnails and FixLowQualityThumbnails on a dozen or so large collections on CGSpace and processed about 1,200 low-quality thumbnails
  • I did a complete re-sync of CGSpace to DSpace Test

2022-10-08

  • Start a harvest on AReS
  • Experiment with PDF thumbnails in ImageMagick again, I found an interesting reference on their legacy website saying we can use -unsharp after -thumbnail to make them less blurry
    • There are a few examples for unsharp values (starting from a DSpace default of a flattened JPEG from the PDF, then the thumbnail in a second operation:
$ convert '10568-103447.pdf[0]' -flatten 10568-103447-dspace-step1.pdf.jpg 
$ convert 10568-103447-dspace-step1.pdf.jpg -thumbnail 600x600 -unsharp 0x.5 10568-103447-dspace-step2-600-unsharp.pdf.jpg
$ convert 10568-103447-dspace-step1.pdf.jpg -thumbnail 600x600 -unsharp 2x0.5+0.7+0 10568-103447-dspace-step2-600-unsharp2.pdf.jpg
$ convert 10568-103447-dspace-step1.pdf.jpg -thumbnail 600x600 -unsharp 0x0.75+0.75+0.008 10568-103447-dspace-step2-600-unsharp3.pdf.jpg
$ convert 10568-103447-dspace-step1.pdf.jpg -thumbnail 600x600 -unsharp 1.5x1+0.7+0.02 10568-103447-dspace-step2-600-unsharp4.pdf.jpg
  • I merged all the changes from 6_x-dev to 6_x-prod after having run them on DSpace Test for the last ten days

2022-10-11

2022-10-12

  • I submitted a pull request to DSpace 7 for the -unsharp 0x0.5 change: https://github.com/DSpace/DSpace/pull/8515
  • I did some tests on CGSpace and verified that MEL will indeed need admin permissions on every collection that they want to map to
  • I had a call with Salem and he asked me about redirecting from some CRP duplicates that exist in both MELSpace and CGSpace
    • We decided that the only way is to use an HTTP 301 redirect in the nginx web server, but I said that I’d check with CNRI to see if there was a way to do this within the Handle system

2022-10-13

  • Disable the REST API cache on CGSpace temporarily to see if that fixes a strange problem we are seeing with listing publications on ilri.org
  • Meeting with MEL, MARLO, and CG Core people to continue discussing dcterms.type
  • I added the new MEL account to all the appropriate authorizations for Initiatives that ICARDA is involved in on CGSpace
    • I still have to add the few that WorldFish is involved in

2022-10-14

  • Abenet finalized adding the MEL user to all initiative collections on CGSpace
  • Re-sync CGSpace to DSpace Test to get the new MEL user and authorizations
  • I checked ilri.org and I see more publications for 2021 and earlier
    • The results are still strange though because I only see a few for each year

2022-10-15

  • I’m going to turn the REST API cache on CGSpace back on to see if the ilri.org publications thing gets broken again
  • Start a harvest on AReS

2022-10-16

  • The harvest on AReS finished but somehow there are 10,000 less items than the previous indexing… hmmm
    • I don’t see any hits from MELSpace there so I will start another harvest…
    • After starting the harvesting the load on the server went up to 20 and UptimeRobot said CGSpace was down for three hours, sigh
    • I stopped the harvesting and the load went down immediately
    • I am trying to find a pattern with the load on Sundays
  • I see this in the AReS backend logs:
[Nest] 1   - 10/16/2022, 6:42:04 PM   [HarvesterService] Starting Harvest =>0
[Nest] 1   - 10/16/2022, 6:42:07 PM   [HarvesterService] Starting Harvest =>101555
[Nest] 1   - 10/16/2022, 6:42:10 PM   [HarvesterService] Starting Harvest =>4936
  • Which means MELSpace is having some issue
  • I’m not sure what was going on on CGSpace yesterday, but the load was indeed very high according to Munin:

CGSpace CPU load day

  • The pattern is clear on Sundays if you look at the past month:

CGSpace CPU load month

  • I have yet to find an increased nginx request pattern correlating with the increased load, but looking back on the last year it seems something started happening around March, 2022, and also I start seeing CPU steal in July (red coming from the top of the graph):

CGSpace CPU load year

  • The amount of CPU steal is very low if I look at it now, around 1 or 2 percent, but what’s happening now reminds me of the mysterious load problems I had in 2019-03 that were due to CPU steal
  • Salem said there was an issue with the sitemaps on MELSpace so that’s why it wasn’t working in AReS
    • Load on CGSpace is low in the evening so I’ll start a new AReS harvest

2022-10-18

  • Start mapping the Initiative names on CGSpace to tne new short names from Enrico’s spreadsheet
  • Then I will update them for existing CGSpace items:
$ ./ilri/fix-metadata-values.py -i 2022-10-18-update-initiatives.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.initiative -m 258 -t correct -d -n
  • And later in the controlled vocabulary
  • Apply some corrections to a few hundred items on CGSpace for Peter
  • Meeting with Abenet, Sara, and Valentina about CG Core types
    • We finished going over our list and agreed to send a message to concerned parties in our organizations for feedback by November 4th
    • Next week we will continue doing the definitions
  • Re-sync CGSpace to DSpace Test to get the latest Initiatives changes
    • I also need to re-create the CIAT/Alliance TIP accounts so they can continue testing
    • I re-created the tip-submit@cgiar.org and tip-approve@cgiar.org account on DSpace Test
    • According to my notes:
      • A user must be in the collection admin group in order to deposit via the REST API (not in the collection’s “Submit” group, which is for normal submission)
      • A user must be in the collection’s “Accept/Reject/Edit Metadata” step in order to see and approve the item in the DSpace workflow
    • I created a new “TIP test” collection under Alliance’s community and added the users accordingly
    • I think I’ll be able to just add these two submit/approve users to the Alliance Admins and Alliance Editors groups once we’re ready

2022-10-19

gs thumbnail

  • In other news, I see pdftocairo from the poppler package produces a similar, though slightly prettier version of the thumbnail of that PDF:

pdftocairo thumbnail

  • I used the command:
$ pdftocairo -jpeg -singlefile -f 1 -l 1 -scale-to-x 640 -scale-to-y -1 10568-116598.pdf thumb
  • The Ghostscript developers responded in a few minutes (!) and explained that PDFs can contain many different “boxes”:

PDF files can have multiple different ‘Box’ values; ArtBox, BleedBox, CropBox, MediaBox and TrimBox. The MediaBox is required the other boxes are optional, a given PDF page description must contain the MediaBox and may contain any or all of the others.

By default Ghostscript uses the MediaBox to determine the size of the media. Other PDF consumers may exhibit other behaviours.

The pages in your PDF file contain all of the Boxes. In the majority of cases the Boxes all contain the same values (which makes their inclusion pointless of course). But for page 1 they differ:

/CropBox[594.375 0.0 1190.55 839.176] /MediaBox[0.0 0.0 1190.55 841.89]

You can tell Ghostscript to use a different Box value for the media by using one of -dUseArtBox, -dUseBleedBox, -dUseCropBox, -dUseTrim,Box. If I specify -dUseCropBox then the file is rendered as you expect.

  • I confirm that adding -define pdf:use-cropbox=true to the ImageMagick command produces a better thumbnail in this case
    • We can check the boxes in a PDF using pdfinfo from the poppler package:
$ pdfinfo -box data/10568-116598.pdf
Creator:         Adobe InDesign 17.0 (Macintosh)
Producer:        Adobe PDF Library 16.0.3
CreationDate:    Tue Dec  7 12:44:46 2021 EAT
ModDate:         Tue Dec  7 15:37:58 2021 EAT
Custom Metadata: no
Metadata Stream: yes
Tagged:          no
UserProperties:  no
Suspects:        no
Form:            none
JavaScript:      no
Pages:           17
Encrypted:       no
Page size:       596.175 x 839.176 pts
Page rot:        0
MediaBox:            0.00     0.00  1190.55   841.89
CropBox:           594.38     0.00  1190.55   839.18
BleedBox:          594.38     0.00  1190.55   839.18
TrimBox:           594.38     0.00  1190.55   839.18
ArtBox:            594.38     0.00  1190.55   839.18
File size:       572600 bytes
Optimized:       no
PDF version:     1.6
  • In this case the MediaBox is a strange size, and we should use the CropBox
    • I wonder if we can check that from DSpace…
  • Apply some corrections from Peter on CGSpace
  • Meeting with Leroy, Daniel, Francesca, and Maria from Alliance to review their TIP tool and talk about next steps
    • We asked them to do some real submissions (as opposed to “I like coffee” etc) to test the full breadth of the metadata and controlled vocabularies
  • Minor work on the CG Core Types spreadsheet to clear up some of the actions and incorporate some of Peter’s feedback
  • After looking at the request patterns in nginx on CGSpace for the past few weeks I see nothing that would explain the high loads we see several times per week (especially Sundays!)
    • So I suspect there is a noisy neighbor, and actually I do see some non-trivial amount of CPU steal in my Munin graphs and iostat
    • I asked Linode to move the instance elsewhere

2022-10-22

  • Start a harvest on AReS

2022-10-24

  • Peter sent me some corrections for affiliations:
$ cat 2022-10-24-affiliations.csv 
cg.contributor.affiliation,correct
Wageningen University and Research Centre,Wageningen University & Research
Wageningen University and Research,Wageningen University & Research
Wageningen University,Wageningen University & Research
$ ./ilri/fix-metadata-values.py -i 2022-10-24-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211 -t correct -d
  • Add ORCID identifier for Claudia Arndt on CGSpace and tag her existing items
  • Linode responded to my request last week and said they don’t think that the culprit here is CPU steal, but that they would move us to another host anyways
    • I still need to check the Munin graphs

2022-10-25

  • Upload some changes to items on CGSpace for Peter
  • Start a full Discovery index on CGSpace:
$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b

real    226m40.463s
user    132m6.511s
sys     3m15.077s

2022-10-26

  • We published the infographic and blog post to mark CGSpace’s 100,000th item
    • I generated a high-quality thumbnail using ImageMagick in order to Tweet it:
$ convert -density 144 10568-125167.pdf\[0\] -thumbnail x1200 /tmp/10568-125167.pdf.png
$ pngquant /tmp/10568-125167.pdf.png
  • Spent some time looking at the MediaBox / CropBox thing in DSpace’s ImageMagickThumbnailFilter.java
    • We need to make sure to put -define pdf:use-cropbox=true before we specify the input file or else it will not have any effect

2022-10-27

$ pdfcpu box rem -- "crop" in.pdf out.pdf
  • I filed an issue on DSpace for the ImageMagick CropBox problem
    • I decided that this is a bug that should be fixed separately from the “improving thumbnail quality” issue
    • I made a pull request to fix the CropBox issue
  • I did more work on my improved-dspace-thumbnails microsite to complement the DSpace thumbnail pull requests
    • I am updating it to recommend using the PDF cropbox and “supersampling” with a higher density than 72
    • I measured execution time of ImageMagick with time and found that the higher-density mode takes about five times longer on average
    • I measured the maximum heap memory of ImageMagick with Valgrind and Massif:
$ valgrind --tool=massif magick convert ...
  • Then I checked the results for each set of default DSpace thumbnail runs and “improved” thumbnail runs using ms_print (hacky way to get the max heap, I know):
$ for file in memory-dspace/massif.out.49*; do ms_print "$file" | grep -A1 "    MB" | tail -n1 | sed 's/\^.*//'; done
15.87
16.06
21.26
15.88
20.01
15.85
20.06
16.04
15.87
15.87
20.02
15.87
15.86
19.92
10.89
$ for file in memory-improved/massif.out.5*; do ms_print "$file" | grep -A1 "    MB" | tail -n1 | sed 's/\^.*//'; done
245.3
245.5
298.6
245.3
306.8
245.2
306.9
245.5
245.2
245.3
306.8
245.3
244.9
306.3
165.6
  • Ouch, this shows that it takes about fifteen times more memory to do the “4x” density of 288!
    • It seems more reasonable to use a “2x” density of 144:
$ for file in memory-improved-144/*; do ms_print "$file" | grep -A1 "    MB" | tail -n1 | sed 's/\^.*//'; done
61.80
62.00
76.76
61.82
77.43
61.77
77.48
61.98
61.76
61.81
77.44
61.81
61.69
77.16
41.84
  • There’s a really cool visualizer called massif-visualizer, but it isn’t easy to parse

2022-10-28

  • I finalized the code for the ImageMagick density change and made a pull request against DSpace 7.x

2022-10-29

  • Start a harvest on AReS

2022-10-31

  • Tag version 6.1 of cgspace-java-helpers: https://github.com/ilri/cgspace-java-helpers/releases/tag/v6.1
    • I also pushed a more recent 6.1-SNAPSHOT version to Maven Central via OSSRH
    • I should probably push a non SNAPSHOT but I don’t have time to figure that out in Maven
  • Add some new items on CGSpace and update others for Peter
  • Email Mishell from CIP about their old theses which are using Creative Commons licenses
    • They said it’s OK so I updated all sixteen items in that collection
  • Move the “MEL submissions” collection on CGSpace from ICARDA’s community to the Initiatives community
  • Meeting with Peter and Abenet about ongoing CGSpace action points
  • I created the authorizations for Alliance’s TIP tool to submit on CGSpace