civic_scraper.platforms.civic_plus
Submodules
civic_scraper.platforms.civic_plus.parser
- exception civic_scraper.platforms.civic_plus.parser.ParsingError
Bases:
Exception
civic_scraper.platforms.civic_plus.site
- class civic_scraper.platforms.civic_plus.site.Site(base_url, cache=None, parser_kls=<class 'civic_scraper.platforms.civic_plus.parser.Parser'>, place_name=None)
Bases:
Site- property place
- scrape(start_date=None, end_date=None, cache=False, download=False, file_size=None, asset_list=None, timeout=None)
Scrape a government website for metadata and/or docs.
- Parameters:
start_date (str) – YYYY-MM-DD (default: current day)
end_date (str) – YYYY-MM-DD (default: current day)
cache (bool) – Cache source HTML containing file metadata (default: False)
download (bool) – Download file assets such as PDFs (default: False)
file_size (float) – Max size in Megabytes of file assets to download
asset_list (list) – Optional list of SUPPORTED_ASSET_TYPES to to limit items to be scraped (e.g. agenda, minutes). (default: [])
- Returns:
A sequence of Asset instances
- Return type: