In the BBC English Regions web CMS I've been involved with building, we have six controlled vocabularies for descriptive metadata (location, name, subject, audience, BBC brand, time period), and we manage about 85,000 terms in those CVs. We sourced some of the CVs from external suppliers and modified them, some we built from scratch. We also have a team that maintains the existing terms and allows our journalists to suggest terms and have them vetted and finished off by the metadata specialists.
A few of us have some ideas around publishing out this metadata, we just have to run it past search to check what format we should use. I’m hoping we can just do
<meta name=”location” value=”BBC/C/Devon,BBC/C/London”>
etc. That would be very useful, in the sense that people could pick it up and do amazing things with it... maybe we can do that stuff ourselves, but even if we don’t, the data is there ready for someone else to play with.
As well as the text-based names, we have the lat/long data for those locations in an XML file somewhere, I might be able to get that extracted and put into the HTML as well. We aren't allowed to publish the complete location CV due to licensing restrictions, but we can extract pieces of it for certain purposes.
We haven’t really decided on the format yet, it might be worth making it RDF or something from the start -- although I’m not sure how useful that would be unless our content is all well-formed XML, and while we’re pretty close, I don't think we've sorted out all the issues of & in URLs etc.