civic_scraper.base¶
Submodules¶
civic_scraper.base.asset¶
- class civic_scraper.base.asset.Asset(url: str, asset_name: Optional[str] = None, committee_name: Optional[str] = None, place: Optional[str] = None, place_name: Optional[str] = None, state_or_province: Optional[str] = None, asset_type: Optional[str] = None, meeting_date: Optional[datetime] = None, meeting_time: Optional[time] = None, meeting_id: Optional[str] = None, scraped_by: Optional[str] = None, content_type: Optional[str] = None, content_length: Optional[str] = None)¶
Bases:
object
- Parameters:
url (str) – URL to download an asset.
asset_name (str) – Title of an asset. Ex: City Council Regular Meeting
committee_name (str) – Name of committee that generated the asset. Ex: City Council
place (str) – Name of place associated with the asset. Lowercase with spaces and punctuation removed. Ex: menlopark
place_name (str) – Human-readable place name. Ex: Menlo Park
state_or_province (str) – Two-letter abbreviation for state or province associated with an asset. Ex: ca
asset_type (str) – One of SUPPORTED_ASSET_TYPES. Ex: agenda
meeting_date (datetime.datetime) – Date of meeting or None if no date given
meeting_time (datetime.time) – Time of meeting or None
meeting_id (str) – Unique meeting ID. For example, cominbation of scraper type, subdomain and numeric ID or date. Ex: civicplus-nc-nashcounty-05052020-382
scraped_by (str) – civic_scraper.__version__
content_type (str) – File type of the asset as given by HTTP headers. Ex: ‘application/pdf’
content_length (str) – Asset size in bytes
- Public methods:
download: downloads an asset to a given target_path
- download(target_dir, session=None)¶
Downloads an asset to a target directory.
- Parameters:
target_dir (str) – target directory name
- Returns:
Full path to downloaded file
civic_scraper.base.cache¶
- class civic_scraper.base.cache.Cache(path=None)¶
Bases:
object
- property artifacts_path¶
Path for HTML and other intermediate artifacts from scraping
- property assets_path¶
Path for agendas, minutes and other gov file assets
- property metadata_files_path¶
Path for metadata files related to file artifacts
- write(name, content)¶
civic_scraper.base.constants¶
civic_scraper.base.site¶
- class civic_scraper.base.site.Site(base_url, cache=<civic_scraper.base.cache.Cache object>, parser_kls=None)¶
Bases:
object
Base class for all Site scrapers.
- Parameters:
base_url (int) – URL to a government agency site
cache (Cache instance) – Optional Cache instance (default: “.civic-scraper” in user home dir)
parser_kls (class) – Optional parser class to extract data from government agency websites.
- scrape(*args, **kwargs) AssetCollection ¶
Scrape the site and return an AssetCollection instance.