civic_scraper.base

Submodules

civic_scraper.base.asset

class civic_scraper.base.asset.Asset(url: str, asset_name: Optional[str] = None, committee_name: Optional[str] = None, place: Optional[str] = None, place_name: Optional[str] = None, state_or_province: Optional[str] = None, asset_type: Optional[str] = None, meeting_date: Optional[datetime] = None, meeting_time: Optional[time] = None, meeting_id: Optional[str] = None, scraped_by: Optional[str] = None, content_type: Optional[str] = None, content_length: Optional[str] = None)

Bases: object

Parameters:
  • url (str) – URL to download an asset.

  • asset_name (str) – Title of an asset. Ex: City Council Regular Meeting

  • committee_name (str) – Name of committee that generated the asset. Ex: City Council

  • place (str) – Name of place associated with the asset. Lowercase with spaces and punctuation removed. Ex: menlopark

  • place_name (str) – Human-readable place name. Ex: Menlo Park

  • state_or_province (str) – Two-letter abbreviation for state or province associated with an asset. Ex: ca

  • asset_type (str) – One of SUPPORTED_ASSET_TYPES. Ex: agenda

  • meeting_date (datetime.datetime) – Date of meeting or None if no date given

  • meeting_time (datetime.time) – Time of meeting or None

  • meeting_id (str) – Unique meeting ID. For example, cominbation of scraper type, subdomain and numeric ID or date. Ex: civicplus-nc-nashcounty-05052020-382

  • scraped_by (str) – civic_scraper.__version__

  • content_type (str) – File type of the asset as given by HTTP headers. Ex: ‘application/pdf’

  • content_length (str) – Asset size in bytes

Public methods:

download: downloads an asset to a given target_path

download(target_dir, session=None)

Downloads an asset to a target directory.

Parameters:

target_dir (str) – target directory name

Returns:

Full path to downloaded file

class civic_scraper.base.asset.AssetCollection(iterable=(), /)

Bases: list

to_csv(target_dir)

Write metadata about the asset list to a csv.

Parameters:

targer_dir (str) – Path to directory where metadata file should be written.

Output: csv with metadata

Returns:

Path to file written.

civic_scraper.base.cache

class civic_scraper.base.cache.Cache(path=None)

Bases: object

property artifacts_path

Path for HTML and other intermediate artifacts from scraping

property assets_path

Path for agendas, minutes and other gov file assets

property metadata_files_path

Path for metadata files related to file artifacts

write(name, content)

civic_scraper.base.constants

civic_scraper.base.site

class civic_scraper.base.site.Site(base_url, cache=<civic_scraper.base.cache.Cache object>, parser_kls=None)

Bases: object

Base class for all Site scrapers.

Parameters:
  • base_url (int) – URL to a government agency site

  • cache (Cache instance) – Optional Cache instance (default: “.civic-scraper” in user home dir)

  • parser_kls (class) – Optional parser class to extract data from government agency websites.

scrape(*args, **kwargs) AssetCollection

Scrape the site and return an AssetCollection instance.