Because the information is not reliable and is spread in different sites. Google Scholar although provides a feature for filtering papers from a specific venue, it returns many records that do not belong to it and missed others. Using IEEE Explore or ACM Digital Library can provide more accuracy but still there are missing proceedings (e.g. VISSOFT 2003), they do not seem to provide an API to programatically extract data and retrieving the data directly from the site seems prone to error. So our approach will be to extract the data directly from a corpus of PDF files.
e.g.
Explora: A Visualisation Tool for Metric Analysis of Software Corpora Leonel Merino Software Composition Group University of Bern Bern, Switzerland merino@iam.unibe.ch Mircea Lungu Software Composition Group University of Bern
e.g.
<?xml version="1.0" encoding="UTF-8"?> <algorithm name="ParsHed" version="110505"> <variant no="0" confidence="0.810110"> <title confidence="0.9953645">Explora: A Visualisation Tool for Metric Analysis of Software Corpora</title> <author confidence="0.991777">Leonel Merino</author> <affiliation confidence="0.9998805">Software Composition Group University of Bern</affiliation> <address confidence="0.978505">Bern, Switzerland</address> <email confidence="0.976031">merino@iam.unibe.ch</email> <author confidence="0.985589">Mircea Lungu</author> <affiliation confidence="0.9998005">Software Composition Group University of Bern</affiliation>
Gofer new squeaksource: 'XMLSupport'; package: 'ConfigurationOfXMLSupport'; load. (Smalltalk at: #ConfigurationOfXMLSupport) perform: #loadDefault.
file := 'output.head'. xml := XMLDOMParser parseFileNamed: file. title := ((xml allElementsNamed: #title) collect:[:e| e nodes first ]) first asString. authors := ((xml allElementsNamed: #author) collect:[:e| e nodes first]) asOrderedCollection.
VCExtractor new visualise