CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

February, 2024

2024-02-05

dspace=# BEGIN;
BEGIN
dspace=*# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ '[[:upper:]]';
UPDATE 180
dspace=*# COMMIT;
COMMIT

2024-02-06

  • Discuss IWMI using the CGSpace REST API for their new website
  • Export the IWMI community to extract their ORCID identifiers:
$ dspace metadata-export -i 10568/16814 -f /tmp/iwmi.csv
$ csvcut -c 'cg.creator.identifier,cg.creator.identifier[en_US]' ~/Downloads/2024-02-06-iwmi.csv \
  | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' \
  | sort -u \
  | tee /tmp/iwmi-orcids.txt \
  | wc -l
353
$ ./ilri/resolve_orcids.py -i /tmp/iwmi-orcids.txt -o /tmp/iwmi-orcids-names.csv -d
  • I noticed some similar looking names in our list so I clustered them in OpenRefine and manually checked a dozen or so to update our list

2024-02-07

  • Maria asked me about the “missing” item from last week again
    • I can see it when I used the Admin search, but not in her workflow
    • It was submitted by TIP so I checked that user’s workspace and found it there
    • After depositing, it went into the workflow so Maria should be able to see it now

2024-02-09

  • Minor edits to CGSpace submission form
  • Upload 55 ISNAR book chapters to CGSpace from Peter

2024-02-19

2024-02-20

  • Minor work on OpenRXV to fix a bug in the ng-select drop downs
  • Minor work on the DSpace 7 nginx configuration to allow requesting robots.txt and sitemaps without hitting rate limits

2024-02-21

  • Minor updates on OpenRXV, including one bug fix for missing mapped collections
    • Salem had to re-work the harvester for DSpace 7 since the mapped collections and parent collection list are separate!

2024-02-22

  • Discuss tagging of datasets and re-work the submission form to encourage use of DOI field for any item that has a DOI, and the normal URL field if not
    • The “cg.identifier.dataurl” field will be used for “related” datasets
    • I still have to check and move some metadata for existing datasets

2024-02-23

  • This morning Tomcat died due to an OOM kill from the kernel:
kernel: Out of memory: Killed process 698 (java) total-vm:14151300kB, anon-rss:9665812kB, file-rss:320kB, shmem-rss:0kB, UID:997 pgtables:20436kB oom_score_adj:0
  • I don’t see any abnormal pattern in my Grafana graphs, for JVM or system load… very weird
  • I updated the submission form on CGSpace to include the new changes to URLs for datasets
    • I also updated about 80 datasets to move the URLs to the correct field

2024-02-25

  • This morning Tomcat died while I was doing a CSV export, with an OOM kill from the kernel:
kernel: Out of memory: Killed process 720768 (java) total-vm:14079976kB, anon-rss:9301684kB, file-rss:152kB, shmem-rss:0kB, UID:997 pgtables:19488kB oom_score_adj:0
  • I don’t know why this is happening so often recently…

2024-02-27

  • IFPRI sent me a list of authors to add to our list for now, until we can find a better way of doing it
    • I extracted the existing authors from our controlled vocabulary and combined them with IFPRI’s:
$ xmllint --xpath '//node/isComposedBy/node()' dspace/config/controlled-vocabularies/dc-contributor-author.xml \
  | grep -oE 'label=".*"' \
  | sed -e 's/label="//' -e 's/"$//' > /tmp/authors
$ cat /tmp/authors /tmp/ifpri-authors | sort -u > /tmp/new-authors

2024-02-28

  • I figured out a way to add a new Angular component to handle all our relation fields

2024-02-29

  • Clean up a bunch of metadata on CGSpace