Add sort, use_pagination and closest (#158)

* add sort param support in CDX API class

see https://nla.github.io/outbackcdx/api.html#operation/query

sort takes string input which must be one of the follwoing:
- default
- closest
- reverse

This commit shall help in closing issue at https://github.com/akamhy/waybackpy/issues/155

* add BlockedSiteError for cases when archiving is blocked by site's robots.txt

* create check_for_blocked_site for handling the BlockedSiteError for sites that are blocking wayback machine by their robots.txt policy

* add attrs use_pagination and closest, which are can be used to use the pagination API and lookup archive close to a timestamp respectively. And now to get out of infinte blank pages loop just check for two succesive black and not total two blank pages while using the CDX server API.

* added cli support for sort, use-pagination and closest

* added tests

* fix codeql warnings, nothing to worry about here.

* fix save test for archive_url
This commit is contained in:
Akash Mahanty
2022-02-18 00:24:14 +05:30
committed by GitHub
parent 3a44a710d3
commit f990b93f8a
7 changed files with 164 additions and 44 deletions

View File

@@ -16,6 +16,13 @@ class WaybackError(Exception):
"""
class BlockedSiteError(WaybackError):
"""
Raised when the archives for website/URLs that was excluded from Wayback
Machine are requested via the CDX server API.
"""
class TooManyRequestsError(WaybackError):
"""
Raised when you make more than 15 requests per