OSOS
WA Secretary of State Wikis
RSS

Home

About
Grant Admin
Milestones
NDNP News
Newspaper Titles
Progress Reports
Working Groups

Search NDNP Wiki:

»
Advanced Search »

Browse

All pages
Categories

Links

Chronicling America (LOC)
About NDNP (NEH)
WSL Online Newspapers
WSL catalog
UW catalog

Processors

QA Report

RSS
Modified on 2011/02/14 16:22 by lrobinson Categorized as Batch QA
Redirected from: QA-Report

About

QA_Report was written to compare WSL's newspaper input data (i.e. data created for NDNP during reel evaluation) to the NDNP output after data conversion (i.e. the batch, mets issue and mets reel files). You may be able to adapt the script for use but be aware that these instructions are written for use with data from an MS Access 2007 database and a Windows operating system.

Download

QA_Report.tar v1 or QA_Report.zip v1

Feel free to download but please help us improve it by if you find bugs.

Install

Extract QA_Report and save it on your server or external hard drive. This script works best (and is faster) if you save the folder on the same drive as the batch your are testing (e.g. C:\NDNP\OUT\batch_wa_2009_sample and C:\NDNP\OUT\QA\QA_Report)
save to same folder as the batch

save to same folder as the batch

FYI: the pictured batch is one reel with three titles

FYI: the pictured batch is one reel with three titles


How to Run

  1. To run QA_Report you must first save your input data as xml to the QA_Report folder. You can easily convert Access tables to xml (see sample_database.mdb)
    1. Our input data is delivered in MS Access in three tables (titleTbl, reelTbl, issueTbl)
    2. From Access save each table as xml into the QA_Report folder (e.g. Export > More > XML File > C:\NDNP\OUT\QA\QA_Report\reelTbl.xml)
      1. Click OK
      2. Export XML window will open, save 'Data (XML)' only (uncheck 'Schema of the data (XSD)' - this isn't needed)
      3. Click OK
      4. Save Export Steps if you like, click Close
      5. You may close the database
  2. QA_Report should now have three xml files (titleTbl.xml, reelTbl.xml, issueTbl.xml)
    NOTE: You may get errors if your data isn't structured in the same way as the sample data. If you know xsl, you're welcome to tweak the code to meet your needs. For example, if your field names are different simply change the variable names in the titleTbl.xsl, reelTbl.xsl, and issueTbl.xsl files in QA_Report
    save data to xml files

    save data to xml files

  3. Now run the QA_Report using the BATCH.xml file
    1. Open your computer's MS-DOS command
    2. Start > Run > type 'cmd' > click 'OK'
    3. Navigate to QA_Report (cd C:\NDNP\OUT\QA\QA_Report)
    4. At the prompt run the run_report.bat file with the proper path to the BATCH.xml file as a parameter - use single quotes as shown here (e.g. run_report '..\batch_wa_2009_sample\BATCH.xml')
      run the script from the command line

      run the script from the command line

    5. The script may take a few minutes depending on the size of your dataset. When the comparison is done a browser window will open to 'Test:titleTbl'
      when the script finishes the first test table will open in a browser

      when the script finishes the first test table will open in a browser



Understanding the Results

Links in all the tables will open up the corresponding mets (issue or reel) or the BATCH.xml file in the browser so you can investigate any found differences in the output data.

Test: titleTbl

This table is a re-creation of titleTbl in the database.
  1. Titles in red are missing from the BATCH.xml file
  2. The links at the bottom of the page will take you to the next test results

Test: reelTbl

This table is a re-creation of reelTbl in the database.
  1. Reels in red are those whose reelNumber is missing from the BATCH.xml file
  2. The data in this table is compared to the mets reel files throughout the batch (crawled via the BATCH.xml file using ndnp:reel)
  3. Cells in yellow are where there is a mismatch in the output

Test: issueTbl

This table is a re-creation of issueTbl in the database.
  1. Issues in red are those whose lccn is missing from the BATCH.xml file
  2. The data in this table is compared to the mets issues files throughout the batch (crawled via the BATCH.xml file using ndnp:issue)
  3. Cells in yellow are where there is a mismatch in the output

Known issues

  1. Need to fix test for ndnp:startDate and ndnp:endDate on reel mets test (error created when reel is split between two titles).
  2. Fix test for density readings
  3. Need to build a way to test when duplicate pages MODS:detail@type="page number"/MODS:number should not equal MODS:detail@type="pages"/MODS:start (since WSL delivers duplicates).

ScrewTurn Wiki version 3.0.1.400. Some of the icons created by FamFamFam.