September, 2017
2017-09-06
- Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours
2017-09-07
- Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group
Documenting day-to-day work on the CGSpace repository.
dspace.log.2017-08-01, they are all using the same Tomcat sessionrobots.txt only blocks the top-level /discover and /browse URLs… we will need to find a way to forbid them from accessing these!X-Robots-Tag "none" HTTP header, but this only forbids the search engine from indexing the page, not crawling it!dc.description.abstract column, which caused OpenRefine to choke when exporting the CSVg/^$/d-x) plus sed to format the output into quasi XML:cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themescg.subject.system to CGSpace metadata registry, for subject from the upcoming CGIAR Library migrationcg.identifier.wletheme to 1106 WLE items I can see the field on XMLUI but not in REST!value.split("page=", "")[1]value.replace("p. ", "").split("-")[1].toNumber() - value.replace("p. ", "").split("-")[0].toNumber()cells["dc.page.from"].value.toNumber() + cells["dc.format.pages"].value.toNumber()value.split(" ")[0].replace(",","").toLowercase() + "-" + sha1(value).get(1,9) + ".pdf__description:" + cells["dc.type"].valuegenerate-thumbnails.py script to read certain fields and then pass to GhostScript:
value.contains("page=")or(value.contains("p. "),value.contains(" p."))$ gs -dNOPAUSE -dBATCH -dFirstPage=14 -dLastPage=27 -sDEVICE=pdfwrite -sOutputFile=beans.pdf -f 12605-1.pdf$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/93843 --source /home/aorth/src/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books.map &> /tmp/ciat-books.log
cg.identifier.wletheme field is available on REST API for Macaroni Bros+----------------+----------------------------+---------------------+---------+
| Version | Description | Installed on | State |
+----------------+----------------------------+---------------------+---------+
| 1.1 | Initial DSpace 1.1 databas | | PreInit |
| 1.2 | Upgrade to DSpace 1.2 sche | | PreInit |
| 1.3 | Upgrade to DSpace 1.3 sche | | PreInit |
| 1.3.9 | Drop constraint for DSpace | | PreInit |
| 1.4 | Upgrade to DSpace 1.4 sche | | PreInit |
| 1.5 | Upgrade to DSpace 1.5 sche | | PreInit |
| 1.5.9 | Drop constraint for DSpace | | PreInit |
| 1.6 | Upgrade to DSpace 1.6 sche | | PreInit |
| 1.7 | Upgrade to DSpace 1.7 sche | | PreInit |
| 1.8 | Upgrade to DSpace 1.8 sche | | PreInit |
| 3.0 | Upgrade to DSpace 3.x sche | | PreInit |
| 4.0 | Initializing from DSpace 4 | 2015-11-20 12:42:52 | Success |
| 5.0.2014.08.08 | DS-1945 Helpdesk Request a | 2015-11-20 12:42:53 | Success |
| 5.0.2014.09.25 | DS 1582 Metadata For All O | 2015-11-20 12:42:55 | Success |
| 5.0.2014.09.26 | DS-1582 Metadata For All O | 2015-11-20 12:42:55 | Success |
| 5.0.2015.01.27 | MigrateAtmireExtraMetadata | 2015-11-20 12:43:29 | Success |
| 5.0.2017.04.28 | CUA eperson metadata migra | 2017-06-07 11:07:28 | OutOrde |
| 5.5.2015.12.03 | Atmire CUA 4 migration | 2016-11-27 06:39:05 | OutOrde |
| 5.5.2015.12.03 | Atmire MQM migration | 2016-11-27 06:39:06 | OutOrde |
| 5.6.2016.08.08 | CUA emailreport migration | 2017-01-29 11:18:56 | OutOrde |
+----------------+----------------------------+---------------------+---------+
5_x-prod, run system updates, and reboot the serverdc.subject[en_US] just so DSpace would detect changes properlyreplace(value,/<\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)\/?>/,'')value.unescape("html").unescape("xml")$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/35701 --source /home/aorth/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books.map &> /tmp/ciat-books.log
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/35701 --source /home/aorth/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books2.map &> /tmp/ciat-books2.log
Regenerating Degraded Landscapes to Restoring Degraded Landscapesinput-forms.xml: #329dspace=# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=237 and text_value like 'Regenerating Degraded Landscapes%';
text_value
------------
(0 rows)
cg.identifier.wlethemeJava stacktrace: java.util.NoSuchElementException: Timeout waiting for idle object
db.maxconnections 30→70 (the default PostgreSQL config allows 100 connections, so DSpace’s default of 30 is quite low)db.maxwait 5000→10000db.maxidle 8→20 (DSpace default is -1, unlimited, but we had set it to 8 earlier)pg_hba.conf settings) when we deploy tsega’s REST API

cg.* fields, but not very consistent and even copy some of CGSpace items:
index-discovery -bcg.identifier.status field$ [dspace]/bin/dspace curate -t requiredmetadata -i 10568/1 -r - > /tmp/curation.out
dc.typerequiredmetadata curation task stops when it finds a missing metadata field is by designcg.subject.ccafs), first changed in the submission forms, and then in the database:$ ./fix-metadata-values.py -i ccafs-flagships-may7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
java.lang.OutOfMemoryError: GC overhead limit exceeded, which can be solved by disabling the GC timeout with -XX:-UseGCOverheadLimitdspace cleanup -v, or else you’ll run out of disk space-s) to ingest the community object as a single AIP without its children, followed by each of the collections:$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx2048m -XX:-UseGCOverheadLimit"
$ [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10568/87775 /home/aorth/10947-1/10947-1.zip
$ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10947/1 $collection; done
$ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e some@user.com $item; done
mets.xml in the zip file, so you need to turn that off with -o ignoreHandle=false-u option supresses prompts, to allow the process to run without user inputwebui.itemlist.sort-option in dspace.cfg||| in the Discovery facetsdspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "handle_pkey"
Detail: Key (handle_id)=(80928) already exists.
update-sequences.sql script while Tomcat/DSpace are running$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit"
$ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-2517/10947-2517.zip
$ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-2515/10947-2515.zip
$ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-2516/10947-2516.zip
$ [dspace]/bin/dspace packager -s -t AIP -o ignoreHandle=false -e some@user.com -p 10568/80923 /home/aorth/10947-1/10947-1.zip
$ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10947/1 $collection; done
$ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e some@user.com $item; done
skipIfParentMissing)-XX:-UseGCOverheadLimit JVM option helps with some issues in large importsupdate-sequences.sql script (with Tomcat shut down), and cleaned up the 200+ blank metadata records:dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
dc.description.abstract field (at least) on the lines where CSV importing was failingg/^$/ddc.subject field to try to pull countries and regions out, but there are too many values in there$ ./fix-metadata-values.py -i /tmp/ccafs-flagships-may7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p 'fuuu'
These include:
Read more →dc.rights to the input form, including some inline instructions/hints:
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
Read more →
filter-media bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: DS-3516filter-media plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
Read more →
dspace=# select * from collection2item where item_id = '80278';
id | collection_id | item_id
-------+---------------+---------
92551 | 313 | 80278
92550 | 313 | 80278
90774 | 1051 | 80278
(3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1
cg.identifier.ccafsprojectpii as the field name2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")