A package of search providers for Media Cloud, wrapping up interfaces for different social media platform.
Install with pip (pip install .
) and the install.sh
script.
Requires environment variables set for various interfaces to work correctly.
Releases > 4.0.0 will all be shared exclusively on github, and this project will be delisted from pypi before April 2025
- Bump the version number in
pyproject.toml
- Add a note about changes to the version history below
- Commit the changes and tag it with a semantic version number
- A github action will build and push the repository on committing a tagged version
-
v4.0.1 - WaybackMachineProvider.words now raises an error instead of waiting for timeout from IA
-
v4.0.0 - Remove unused providers and old news-search-api machinary, update mediacloud provider to limit subindexes used in search, update some internal fields
-
v3.1.3 - Accept "Seconds" argument to cache decorator. Update deployment action to track Minor releases
-
v3.1.2 - Fix random sampling behavior in ES provider to be genuinely random, bugfix related to marginal sorting error, additional counters for fine-grained visibility
-
v3.1.1 - Fix ES Provider to send None as last page pagination token
-
v3.1.0 - Add new ProviderException classes to pass more meaningful errors to consumer processes
-
v3.0.1 - Fix ES Provider to accept sort_{order,field} paging arguments like NSA-based Provider
-
v3.0.0 - New "OnlineNewsMediaCloudProvider" using Elasticsearch DSL for direct access to the ES cluster. Retain old provider as "OnlineNewsMediaCloudOldProvider" for now.
-
v2.2.0 - Added an optional argument to providers to toggle caching behavior, added more specific error on 504
-
v2.1.1 - Bugfix
-
v2.1.0 - Mediacloud news client code incorperated into this package
-
v2.0.5 - Build-system in pyproject.toml
-
v2.0.4 - reintroduce stopwords
-
v2.0.3 - version bump for automatic releases
-
v2.0.2 - respect domain filters on Media Cloud searches
-
v2.0.1 - more work on caching strategies
-
v2.0.0 - change CachingManager interface to support online news providers better
-
v1.0.1 - fix default timeout option that applies across all providers
-
v1.0.0 - Remove legacy Media Cloud, add timeout option to
provider_for
-
v0.5.3 - Temporary fix to onlinenews-mediacloud search handling
-
v0.5.3 - Tweaks to onlinenews-mediacloud for compatibility with new database pattern
-
v0.5.2 - Fix to allow override of chunk'ing in MC client
-
v0.5.1 - Fix use of media cloud to respect domains clause on story list paging
-
v0.5.0 - Integrate new mediacloud-news-client into onlinenews-mediacloud
-
v0.4.0 - Specify custom base URLs via new string param to
provider_by_name
andprovider_for
-
v0.3.0 - Add support for paging through stories directly, and including text in paged results for speed
-
v0.2.6 - Fixed querying by domain on new mediacloud system
-
v0.2.5 - Alignment with new mediacloud system. Old onlinenews provider is now "onlinenews-mclegacy", "onlinenews-mediacloud" now queries the new index.
-
v0.2.4 - Added support for api keys via "provider_by_name"
-
v0.2.3 - removed support for API keys in environment variables- now expected as an argument in
providers.provider_for
-
v0.2.2 - transition to use the dedicated mediacloud-api-legacy package to avoid version conflictsgit
-
v0.2.1 - add in a date hack to resolve a lower-level bug in the Media Cloud legacy count-over-time results
-
v0.2.0 - add in support for Media Cloud legacy database
-
v0.1.7 - corrected support for a "filters" kwarg in online_news
-
v0.1.6 - Added support for a "filters" kwarg in online_news
-
v0.1.5 - Added politeness wait to all chunked queries in twitter provider
-
v0.1.4 - Added Query Chunking for large collections in the Twitter provider
-
v0.1.3 - Added Query Chunking for large queries in the onlinenews provider
-
v0.1.2 - Test Completeness
-
v0.1.1 - Parity with web-search module, and language model
-
v0.1.0 - Initial pypi upload