Improve crawl item metadata
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Archivecollections |
Fix Committed
|
High
|
siznax |
Bug Description
From Brewster:
hank, can you also look at the books metadata and see what we can gen up for web that is related (I know the conflict between keeping the same tag, but not quite right vs making a new "right" tag that does not integrate)... I will opt more towards overloading existing tags.
example current web metadata:
http://
example book:
http://
<collection>
<collection>
* can we have in the first collection's item the description of the crawl parameters and what version of the crawler was used?
* I think we should have a scanner field that would be the hostname
e.g. ia3706.
* What would go in the creator field? is that the organization such as "Internet Archive" and "Alexa Internet"?
* scandate would be helpful as well. This will then feed into reports better.
* should we have date and year, like books do? <date>1853</date>
<year>1853</year> I dont know why we have 2, so manybe we should just have <date>...
* I think we should have sponsor, and that would be "Internet Archive" or "Alexa Internet" <sponsor>University of Pittsburgh Library System</sponsor>
* I think we should have scanning center. This would be "San Francisco"
<scanningcenter
* <operator><email address hidden></operator>
would be good. <email address hidden> or <email address hidden>
* <imagecount>
* <identifier-access>
http://
</identifier-
would be good as well.
then can we have a meeting with alexis, hank, me, and anyone else that wants to.
metadata ho!
-brewster
Changed in archivecollections: | |
status: | New → Confirmed |
importance: | Undecided → High |
assignee: | nobody → siznax (steve-archive) |
i believe this has been handled. please re-open if not.