DSP

METS Tools

2019-07-13 16:30发布

METS Tools

The experimental (incomplete) METS export tool writes DSpace items to a filesystem with the metadata held in a more standard format based on METS.

The Export Tool

The METS export tool is invoked via the command line like this: [dspace]/bin/dsrun org.dspace.app.mets.METSExport --help The tool can export an individual item, the items within a given collection, or everything in the DSpace instance. To export an individual item, use: [dspace]/bin/dsrun org.dspace.app.mets.METSExport --item [handle] To export the items in collection hdl:123.456/789, use: [dspace]/bin/dsrun org.dspace.app.mets.METSExport --collection hdl:123.456/789 To export all the items DSpace, use: [dspace]/bin/dsrun org.dspace.app.mets.METSExport --all With any of the above forms, you can specify the base directory into which the items will be exported, using --destination [directory]. If this parameter is omitted, the current directory is used.

The AIP Format

Each exported item is written to a separate directory, created under the base directory specified in the command-line arguments, or in the current directory if --destination is omitted. The name of each directory is the Handle, URL-encoded so that the directory name is 'legal'. Within each item directory is a mets.xml file which contains the METS-encoded metadata for the item. Bitstreams in the item are also stored in the directory. Their filenames are their MD5 checksums, firstly for easy integrity checking, and also to avoid any problems with 'special characters' in the filenames that were legal on the original filing system they came from but are illegal in the server filing system. The mets.xml file includes XLink pointers to these bitstream files. An example AIP might look like this:
  • hdl%3A123456789%2F8/
    • mets.xml -- METS metadata
    • 184BE84F293342 -- bitstream
    • 3F9AD0389CB821
    • 135FB82113C32D
The contents of the METS in the mets.xml file are as follows:
  • dmdSec (descriptive metadata section) containing the item's metadata in Metadata Object Description Schema (MODS) XML. The Dublin Core descriptive metadata is mapped to MODS since there is no official qualified Dublin Core XML schema in existence as of yet, and the Library Application Profile of DC that DSpace uses includes some qualifiers that are not part of the DCMI Metadata Terms.
  • An amdSec (administrative metadata section), which contains the a rights metadata element, which in turn contains the base64-encoded deposit license (the license the submitter granted as part of the submission process).
  • fileSec containing a list of the bitstreams in the item. Each bundle constitutes a fileGrp. Each bitstream is represented by a file element, which contains an FLocatelement with a simple XLink to the bitstream in the same directory as the mets.xml file. The file attributes consist of most of the basic technical metadata for the bitstream. Additionally, for those bitstreams that are thumbnails or text extracted from another bitstream in the item, those 'derived' bitstreams have the same GROUPID as the bitstream they were derived from, in order that clients understand that there is a relationship. The OWNERID of each file is the 'persistent' bitstream identifier assigned by the DSpace instance. The ID and GROUPID attributes consist of the item's Handle, together with the bitstream's sequence ID, which underscores used in place of dots and slashes. For example, a bitstream with sequence ID 24, in the item hdl:123.456/789 will have the ID123_456_789_24. This is because ID and GROUPID attributes must be of type xsd:id.

Limitations

  • No corresponding import tool yet
  • No structmap section
  • Some technical metadata not written, e.g. the primary bitstream in a bundle, original filenames or descriptions.
  • Only the MIME type is stored, not the (finer grained) bitstream format.
  • Dublin Core to MODS mapping is very simple, probably needs verification

热门文章