Commit Graph

24 Commits

Author SHA1 Message Date
Akash Mahanty
f990b93f8a
Add sort, use_pagination and closest (#158)
* add sort param support in CDX API class

see https://nla.github.io/outbackcdx/api.html#operation/query

sort takes string input which must be one of the follwoing:
- default
- closest
- reverse

This commit shall help in closing issue at https://github.com/akamhy/waybackpy/issues/155

* add BlockedSiteError for cases when archiving is blocked by site's robots.txt

* create check_for_blocked_site for handling the BlockedSiteError for sites that are blocking wayback machine by their robots.txt policy

* add attrs use_pagination and closest, which are can be used to use the pagination API and lookup archive close to a timestamp respectively. And now to get out of infinte blank pages loop just check for two succesive black and not total two blank pages while using the CDX server API.

* added cli support for sort, use-pagination and closest

* added tests

* fix codeql warnings, nothing to worry about here.

* fix save test for archive_url
2022-02-18 00:24:14 +05:30
Akash Mahanty
118dc6c523 add test for wrapper module 2022-02-08 20:08:44 +05:30
Akash Mahanty
4e68cd5743 Create separate module for the 3 different APIs also CDX is now CLI supported. 2022-01-02 14:14:45 +05:30
Akash Mahanty
db8f902cff
Add doc strings (#90)
* Added some docstrings in utils.py

* renamed some func/meth to better names and added doc strings + lint

* added more docstrings

* more docstrings

* improve docstrings

* docstrings

* added more docstrings, lint

* fix import error
2021-01-26 11:56:03 +05:30
Akash Mahanty
36b936820b
known urls now yileds, more reliable. And save the file in chucks wrt to response. --file arg can be used to create output file, if --file not used no output will be saved in any file. (#88) 2021-01-24 16:11:39 +05:30
Akash Mahanty
a3bc6aad2b too much API usage by duplicate tests was causing too much tests failure 2021-01-23 21:08:21 +05:30
Akash Mahanty
712471176b better error messages(str), check latest version before asking for an upgrade and rm alive checking 2021-01-15 16:47:26 +05:30
Akash Mahanty
76205d9cf6 backoff_factor=2 for save, incr success by 25% 2021-01-13 10:13:16 +05:30
Akash Mahanty
259a024eb1
joke? they changed their robots.txt 2021-01-11 23:17:01 +05:30
Akash Mahanty
a03813315f full cdx api support 2021-01-10 02:23:53 +05:30
Akash Mahanty
a2550f17d7 retries support for get requests 2021-01-06 01:58:38 +05:30
Akash Mahanty
bd079978bf inc coverage 2021-01-04 00:44:55 +05:30
Akash Mahanty
5dec4927cd refactoring, try to code complexity 2021-01-04 00:14:38 +05:30
Akash Mahanty
db5737a857 JSON is now available for near and other other methods that call it 2021-01-02 18:52:46 +05:30
Akash Mahanty
d3e68d0e70
code formated with black (#47) 2020-12-14 01:18:04 +05:30
Akash Mahanty
0280fca189 remove unused import (urllib) 2020-12-13 15:13:51 +05:30
Akash Mahanty
60ee8b95a8 now using requests lib as it handles errors nicely 2020-12-13 15:05:57 +05:30
Akash Mahanty
5088305a58 removed python2 compatibility code 2020-11-21 17:00:11 +05:30
Akash Mahanty
7f927ec7be
added tests for json and archive_url, updated broken tests (#34)
* added tests for json and archive_url, updated broken tests

* drop 2.7 support
2020-10-16 19:25:45 +05:30
Akash Mahanty
6b3b2e2a7d tests for newly added known_urls feature 2020-10-03 09:33:50 +05:30
Akash Mahanty
c304f58ea2 update tests 2020-10-02 21:35:39 +05:30
Akash
c5de2232ba
Update test_wrapper.py 2020-08-09 10:53:00 +05:30
Akash
56116551ac
Coverge improvements (#22)
* Update cli.py

* improved tests

* chnages for proper testing

* Type check using isinstance

* Replace elifs with if when used after return

* twitter.com --> www.ibm.com

* fix typo

* test archive urll parser and dunders

* Update test_wrapper.py
2020-07-24 15:31:21 +05:30
Akash
eb037a0284
Rename test_1.py to test_wrapper.py 2020-07-22 20:19:59 +05:30