civic_scraper.platforms.civic_plus

Submodules

civic_scraper.platforms.civic_plus.parser

class civic_scraper.platforms.civic_plus.parser.Parser(html)

Bases: object

parse()
exception civic_scraper.platforms.civic_plus.parser.ParsingError

Bases: Exception

civic_scraper.platforms.civic_plus.site

class civic_scraper.platforms.civic_plus.site.Site(base_url, cache=None, parser_kls=<class 'civic_scraper.platforms.civic_plus.parser.Parser'>, place_name=None)

Bases: Site

property place
scrape(start_date=None, end_date=None, cache=False, download=False, file_size=None, asset_list=None, timeout=None)

Scrape a government website for metadata and/or docs.

Parameters:
  • start_date (str) – YYYY-MM-DD (default: current day)

  • end_date (str) – YYYY-MM-DD (default: current day)

  • cache (bool) – Cache source HTML containing file metadata (default: False)

  • download (bool) – Download file assets such as PDFs (default: False)

  • file_size (float) – Max size in Megabytes of file assets to download

  • asset_list (list) – Optional list of SUPPORTED_ASSET_TYPES to to limit items to be scraped (e.g. agenda, minutes). (default: [])

Returns:

A sequence of Asset instances

Return type:

AssetCollection