civic_scraper
Subpackages
Submodules
civic_scraper.runner
- class civic_scraper.runner.Runner(cache_path=None)
Bases:
objectFacade class to simplify invocation and usage of scrapers.
Arguments:
cache_path – Path to cache location for scraped file artifact
- scrape(start_date, end_date, site_urls=None, cache=False, download=False, timeout=None, platform=None)
Scrape file metadata and assets for a list of agency sites.
For a given scraper, scrapes file artificate metadata and downloads file artificats. Automatically generats a metadata CSV of file assets.
If requested, caches intermediate file artifacts such as HTML from scraped pages and downloads file assets such as agendas, minutes (caching and downloading are optional and are off by default).
- Parameters:
start_date (str) – Start date of scrape (YYYY-MM-DD)
end_date (str) – End date of scrape (YYYY-MM-DD)
site_urls (list) – List of site URLs as strings, or dicts with a
urlkey and an optionalplatformkey to override auto-detection per entry.cache (bool) – Optionally cache intermediate file artificats such as HTML (default: False)
download (bool) – Optionally download file assets such as agendas (default: False)
timeout (int) – Timeout in seconds for HTTP requests (default: None)
platform (str) – Force a specific platform for all URLs instead of auto-detecting from the URL. Overrides any per-entry
platformvalue insite_urls. Must be a key inPLATFORMS.
- Outputs:
Metadata CSV listing file assets for given sites and params.
- Returns:
AssetCollection instance
- exception civic_scraper.runner.ScraperError
Bases:
Exception
civic_scraper.utils
- civic_scraper.utils.default_user_home()
- civic_scraper.utils.dtz_to_dt(dtz)
- civic_scraper.utils.mb_to_bytes(size_mb)
- civic_scraper.utils.parse_date(date_str, format='%Y-%m-%d')
- civic_scraper.utils.today_local_str()