The experimental (incomplete) METS export tool writes DSpace items to a filesystem with the metadata held in a more standard format based on METS.
The Export Tool
The METS export tool is invoked via the command line like this:
[dspace]/bin/dsrun org.dspace.app.mets.METSExport --help
The tool can export an individual item, the items within a given collection, or everything in the DSpace instance. To export an individual item, use:
[dspace]/bin/dsrun org.dspace.app.mets.METSExport --item
[handle]
To export the items in collection
hdl:123.456/789
, use:
[dspace]/bin/dsrun org.dspace.app.mets.METSExport --collection hdl:123.456/789
To export all the items DSpace, use:
[dspace]/bin/dsrun org.dspace.app.mets.METSExport --all
With any of the above forms, you can specify the base directory into which the items will be exported, using
--destination [directory]
. If this parameter is omitted, the current directory is used.
The AIP Format
Each exported item is written to a separate directory, created under the base directory specified in the command-line arguments, or in the current directory if
--destination
is omitted. The name of each directory is the Handle, URL-encoded so that the directory name is 'legal'.
Within each item directory is a
mets.xml
file which contains the METS-encoded metadata for the item. Bitstreams in the item are also stored in the directory. Their filenames are their MD5 checksums, firstly for easy integrity checking, and also to avoid any problems with 'special characters' in the filenames that were legal on the original filing system they came from but are illegal in the server filing system. The
mets.xml
file includes XLink pointers to these bitstream files.
An example AIP might look like this:
hdl%3A123456789%2F8/
mets.xml
-- METS metadata
184BE84F293342
-- bitstream
3F9AD0389CB821
135FB82113C32D
The contents of the METS in the
mets.xml
file are as follows:
-
A
dmdSec
(descriptive metadata section) containing the item's metadata in Metadata Object Description Schema (MODS) XML. The Dublin Core descriptive metadata is mapped to MODS since there is no official qualified Dublin Core XML schema in existence as of yet, and the Library Application Profile of DC that DSpace uses includes some qualifiers that are not part of the DCMI Metadata Terms.
-
An
amdSec
(administrative metadata section), which contains the a rights metadata element, which in turn contains the base64-encoded deposit license (the license the submitter granted as part of the submission process).
-
A
fileSec
containing a list of the bitstreams in the item. Each bundle constitutes a fileGrp
. Each bitstream is represented by a file
element, which contains an FLocat
element with a simple XLink to the bitstream in the same directory as the mets.xml
file. The file
attributes consist of most of the basic technical metadata for the bitstream. Additionally, for those bitstreams that are thumbnails or text extracted from another bitstream in the item, those 'derived' bitstreams have the same GROUPID
as the bitstream they were derived from, in order that clients understand that there is a relationship.
The OWNERID
of each file
is the 'persistent' bitstream identifier assigned by the DSpace instance. The ID
and GROUPID
attributes consist of the item's Handle, together with the bitstream's sequence ID, which underscores used in place of dots and slashes. For example, a bitstream with sequence ID 24, in the item hdl:123.456/789
will have the ID
123_456_789_24
. This is because ID
and GROUPID
attributes must be of type xsd:id
.
Limitations
- No corresponding import tool yet
- No
structmap
section
- Some technical metadata not written, e.g. the primary bitstream in a bundle, original filenames or descriptions.
- Only the MIME type is stored, not the (finer grained) bitstream format.
- Dublin Core to MODS mapping is very simple, probably needs verification