1 Rotch Visual Collection Digitization Project Description DRAFT: The purpose of this project is to provide the teaching content required by our faculty in digital format (slide technology is rapidly disappearing). The RVC constitutes a dynamic teaching collection of approximately 350,000 slides. We are using current recommendations for digitizing teaching collections of mostly copywork slides. In addition, RVC has purchased approx. 5776 digital images from vendors so far. The intent is not to digitize the entire collection but will specifically target those images used for teaching. The end goal is to create a production environment in support of the ongoing digitization of the slide collection. The collection will be served from an instance of DSpace separate from that of "DSpace@MIT". We are coordinating our work to load images into DSpace with Academic Media Production Services (AMPS), who are working on the web classroom presentation tool, Stellar Images. More Stellar functional specifications and project information can be found at: web.mit.edu—image-scope.html
2 Deliverables:
1 Digitize slides in support of course curriculum
2 Create RVC production workflow for processing images, metadata creation
3 Support best-practices for metadata standards for slide image collections (VRA Core)
4 Export records in metadata format suitable for import into DSpace
1 Initially this will be Dublin Core
2 Include metadata to support relationships between complex, multi-part, objects
3 Will move towards creation of METS-based (Metadata Encoding and Transmission Standard) submission packages
5 Develop batch DSpace import scripts for RVC images
6 Academic Media Production Services will develop classroom presentation tools
7 Coordinate with Academic Media Production Services on development of above tools
8 Serve image objects from (RVC) DSpace production instance
3 Legal Considerations
1 Images from the slide collection are restricted to MIT community members
2 Licenses with vendors, such as Saskia, simply call for restricting access to the MIT community through some kind of authentication. Thumbnails can displayed openly (this from a court ruling, as well as being stated in our vendor licenses); larger files must be restricted to MIT Community use.
4 Authentication
1 Access restricted to MIT community members (set at the collection level through the DSpace admin interface)
2 Once set this will require all users to authenticate using certificates whether they are on or off campus
3 Stellar will handle authentication for some images [?]
4 Stellar will keep copies of the images used for class presentation.
5 Image Processing
1 Bulk of images in the RVC teaching collection are copystand work (photographed from books) not high-resolution photographs or scans from the original object (significant variations in quality)
2 Image resolutions being scanned to support thumbnail retrieval, screen projection display, and master image; these are:
300 dpi from Boston Photo , and 72 dpi from vendor images [but we are scanning at 3000 pixels on long side for the largest file (some vendor images are *at least* 3000 pixels on long side)?]
1 Image clean-up includes monitor color calibration, basic color correction of scans if necessary, cropping, etc.
2 Need to record precisely what technical specifications are being adhered to with regard to clean-up
3 Image files are renamed to match the IRIS Image number. The original Boston Photo filename is referenced both in the metadata file of the image itself created by photoshop, and in the IRIS record. The file names on the Boston Photo DVD remain unchanged, and are also cross-referenced on digitization log sheets. The RVC photoshop technician does the renaming of image files, and these are double-checked by a member of the staff in order to spot potential errors
4 Approx 3,800 records have been scanned by Boston Photo to date
6 Metadata - Select/Identify fields for DC export/display
1 Dublin Core field list: title, Title.alternative, creator, subject, Coverage.spatial, Coverage.temporal, Date.created, source, Format.medium, rights, Rights.accessRights, Relation.hasPart, Relation.isVersionOf, Relation.isPartOf, Relation.isReferencedBy, Relation.hasVersion
7 Exporting from Filemaker to DSpace
1 FilemakerPro export IRIS records as XML
2 Collapsed all associated fields into a "flat" records and export to .xml file using FMPXMLRESULT schema
3 Apply xml stylesheet transformation (xslt) to produce qualified Dublin Core records for batch import into DSpace
4 Load Frequency :
1 At least weekly (or more frequently as necessary)
8 Building Submission Packages (Dspace archive format) for batch load
1 Create submission directories by accession number conforming to DSpace archive format (each directory contains dublin_core.xml, contents, <accession_no.jpg> files)
2 Will automate metadata building and submisson package building process
3 Submission packages will be loaded into the RVC Community on the test dspace server, hedwig.mit.edu
9 Workflow Management /Tracking
1 RVC uses a work form to keep track of what's been scanned and loaded so far
2 Quality Control - need automated tool for checking the accuracy of renamed files; keep log files for metadata export, submission package contents, etc.
10 Production
1 Install and configure separate DSpace instance for RVC images
2 Define RVC staff procedures
3 Support
11 Next steps:
1 Export to DSpace using qualified Dublin Core
2 Decide how extensively we can use VRA in DSpace and what changes does this imply for AMPS?
3 Install second instance of DSpace on test server
4 Build VRA Core 4 XML records from IRIS records
5 Create VRA to MODS crosswalk
6 Create METS record for export to DSpace
1 MODS (descriptive)
2 Technical Metadata
3 Structmap
4 Filesec
12 Links
1 VRA - www.vraweb.org—vracore3.htm
2 Dublin Core - www.dublincore.org
3 METS - www.loc.gov—mets
4 Premis - www.loc.gov—index.html
5 DSpace - www.dspace.org