civic_scraper

Subpackages

Submodules

civic_scraper.runner

class civic_scraper.runner.Runner(cache_path=None)

Bases: object

Facade class to simplify invocation and usage of scrapers.

Arguments:

  • cache_path – Path to cache location for scraped file artifact

scrape(start_date, end_date, site_urls=[], cache=False, download=False)

Scrape file metadata and assets for a list of agency sites.

For a given scraper, scrapes file artificate metadata and downloads file artificats. Automatically generats a metadata CSV of file assets.

If requested, caches intermediate file artifacts such as HTML from scraped pages and downloads file assets such as agendas, minutes (caching and downloading are optional and are off by default).

Parameters:
  • start_date (str) – Start date of scrape (YYYY-MM-DD)

  • end_date (str) – End date of scrape (YYYY-MM-DD)

  • site_urls (list) – List of site URLs

  • cache (bool) – Optionally cache intermediate file artificats such as HTML (default: False)

  • download (bool) – Optionally download file assets such as agendas (default: False)

Outputs:

Metadata CSV listing file assets for given sites and params.

Returns:

AssetCollection instance

exception civic_scraper.runner.ScraperError

Bases: Exception

civic_scraper.utils

civic_scraper.utils.default_user_home()
civic_scraper.utils.dtz_to_dt(dtz)
civic_scraper.utils.mb_to_bytes(size_mb)
civic_scraper.utils.parse_date(date_str, format='%Y-%m-%d')
civic_scraper.utils.today_local_str()