Commit Graph

82 Commits

Author SHA1 Message Date
Akash Mahanty
4b218d35cb
Cdx based oldest newest and near (#159)
* implement oldest newest and near methods in the cdx interface class, now cli uses the cdx methods instead of availablity api methods.

* handle the closest parameter derivative methods more efficiently and also handle exceptions gracefully.

* update test code
2022-02-18 13:17:40 +05:30
Akash Mahanty
f990b93f8a
Add sort, use_pagination and closest (#158)
* add sort param support in CDX API class

see https://nla.github.io/outbackcdx/api.html#operation/query

sort takes string input which must be one of the follwoing:
- default
- closest
- reverse

This commit shall help in closing issue at https://github.com/akamhy/waybackpy/issues/155

* add BlockedSiteError for cases when archiving is blocked by site's robots.txt

* create check_for_blocked_site for handling the BlockedSiteError for sites that are blocking wayback machine by their robots.txt policy

* add attrs use_pagination and closest, which are can be used to use the pagination API and lookup archive close to a timestamp respectively. And now to get out of infinte blank pages loop just check for two succesive black and not total two blank pages while using the CDX server API.

* added cli support for sort, use-pagination and closest

* added tests

* fix codeql warnings, nothing to worry about here.

* fix save test for archive_url
2022-02-18 00:24:14 +05:30
Akash Mahanty
3a44a710d3
add sort param support in CDX API class (#156)
see https://nla.github.io/outbackcdx/api.html#operation/query

sort takes string input which must be one of the follwoing:
- default
- closest
- reverse

This commit shall help in closing issue at https://github.com/akamhy/waybackpy/issues/155
2022-02-17 12:17:23 +05:30
Akash Mahanty
cd5c3c61a5 fix imports with isort 2022-02-09 16:18:25 +05:30
Akash Mahanty
87fb5ecd58 remove latest version funcs from utils, they were unused. 2022-02-09 16:12:30 +05:30
Akash Mahanty
6d233f24fc apply isort 2022-02-09 11:20:59 +05:30
Akash Mahanty
81162eebd0
issues with HN 2022-02-08 21:28:25 +05:30
Akash Mahanty
118dc6c523 add test for wrapper module 2022-02-08 20:08:44 +05:30
Akash Mahanty
f8bf9c16f9
Add tests (#149)
* enable codecov

* fix save_urls_on_file

* increase the limit of CDX to 25000 from 5000. 5X increase.

* added test for the CLI module

* make flake 8 happy

* make mypy happy
2022-02-08 17:46:59 +05:30
deepsource-autofix[bot]
e0dfbe0b7d
Fix comparison constant position (#145)
* Fix comparison constant position

* format with black

Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: Akash Mahanty <akamhy@yahoo.com>
2022-02-08 10:06:23 +05:30
eggplants
0b631592ea
Improve pylint score (#142)
* fix: errors to improve pylint scores

* fix: test

* fix

* add: flake ignore rule to pip8speaks conf

* fix

* add: test patterns to deepsource conf
2022-02-08 06:42:20 +09:00
Akash Mahanty
97f8b96411
added docstrings, added some static type hints and also lint. (#141)
* added docstrings, added some static type hints and also lint.

* added doc strings and changed some internal variable names for more clarity.

* make flake8 happy

* add descriptive docstrings and type hints in waybackpy/cdx_snapshot.py

* remove useless code and add docstrings and also lint using pylint.

* remove unwarented test

* added docstrings, lint using pylint and add a raise on 509 SC

* added docstrings and lint with pylint

* lint

* add doc strings and lint

* add docstrings and lint
2022-02-07 19:40:37 +05:30
eggplants
d8cabdfdb5
Typing (#128)
* fix: CI yml name

* add: mypy configuraion

* add: type annotation to waybackpy modules

* add: type annotation to test modules

* fix: mypy command

* add: types-requests to dev deps

* fix: disable max-line-length

* fix: move pytest.ini into setup.cfg

* add: urllib3 to deps

* fix: Retry (ref: https://github.com/python/typeshed/issues/6893)

* fix: f-string

* fix: shorten long lines

* add: staticmethod decorator to no-self-use methods

* fix: str(headers)->headers_str

* fix: error message

* fix: revert "str(headers)->headers_str" and ignore assignment CaseInsensitiveDict with str

* fix: mypy error
2022-02-05 03:23:36 +09:00
eggplants
e61447effd
Format and lint codes and fix packaging (#125)
* add: configure files (setup.py->setup.py+setup.cfg+pyproject.toml)

* add: __download_url__

* format with black and isort

* fix: flake8 section in setup.cfg

* add: E501 to flake ignore

* fix: metadata.name does not accept attr

* fix: merge __version__.py into __init__.py

* fix: flake8 errors in tests/

* fix: datetime.datetime -> datetime

* fix: banner

* fix: ignore W605 for banner

* fix: way to install deps in CI

* add: versem to setuptools

* fix: drop python<=3.6 (#126) from package and CI
2022-02-03 19:13:39 +05:30
Akash Mahanty
3be6ac01fc created tests/test_cdx_api.py: added tests for cdx_api.py 2022-01-30 20:03:40 +05:30
Akash Mahanty
b8b9bc098f tests/test_utils.py: test latest_version_pypi and latest_version_github of waybackpy.utils 2022-01-30 20:02:17 +05:30
Akash Mahanty
8b7603e241 the test is faulty as it fails when we increment the version on dunder version file but did not upstreamed the code to PyPi. 2022-01-26 01:51:24 +05:30
Akash Mahanty
d6783d5525 added tests for cdx_utils.py 2022-01-24 23:05:47 +05:30
Akash Mahanty
d1a1cf2546 added tests for utils.py at tests/test_utils.py also changed a keyword argument from headers to user_agent for latest_version of utils.py with the usual default vaule. 2022-01-24 17:50:36 +05:30
Akash Mahanty
cd8a32ed1f added tests for cdx_snapshot.py at tests/test_cdx_snapshot.py 2022-01-24 16:29:44 +05:30
Akash Mahanty
57512c65ff change test oldest method from google.com to example.com, the oldest on google is for some unknown reason is not very stable. 2022-01-24 16:27:35 +05:30
Akash Mahanty
2bea92b348 fix bug with the third matching case of the archive_url_parser, caught while writing more tests fo the save API interface. 2022-01-24 13:31:30 +05:30
Akash Mahanty
d506685f68 added some tests for save_api interface 2022-01-23 18:35:54 +05:30
Akash Mahanty
c0252edff2 updated tests for availability_api.py and also added max_tries(default value is 3) with delay (sleep) between successive API calls. The dealy actually improves the performace of the availability_api interface. 2022-01-23 15:05:10 +05:30
Akash Mahanty
e7488f3a3e added test badge, rename test to Tests from ubuntu and fix the Incomplete URL substring sanitization(or trying to) 2022-01-23 02:26:53 +05:30
Akash Mahanty
aed75ad1db Make modules imprtable as part of a Python package, waybackpy by creating __init__.py file in tests 2022-01-23 02:14:38 +05:30
Akash Mahanty
a8acc4c4d8 Fix Incomplete URL substring sanitization in the last commit. 2022-01-23 01:42:48 +05:30
Akash Mahanty
1bacd73002 created pytest.ini, added test for waybackpy/availability_api.py, new exceptions all of which inherit from the main WaybackError and created requirements-dev.txt 2022-01-23 01:29:07 +05:30
Akash Mahanty
4e68cd5743 Create separate module for the 3 different APIs also CDX is now CLI supported. 2022-01-02 14:14:45 +05:30
Akash Mahanty
dd1917c77e
added RedirectSaveError - for failed saves if the URL is a permanent … (#93)
* added RedirectSaveError - for failed saves if the URL is a permanent redirect.

* check if url is redirect before throwing exceptions, res.url is the redirect url if redirected at all

* update tests and cli errors
2021-04-02 10:38:17 +05:30
Akash Mahanty
db8f902cff
Add doc strings (#90)
* Added some docstrings in utils.py

* renamed some func/meth to better names and added doc strings + lint

* added more docstrings

* more docstrings

* improve docstrings

* docstrings

* added more docstrings, lint

* fix import error
2021-01-26 11:56:03 +05:30
Akash Mahanty
36b936820b
known urls now yileds, more reliable. And save the file in chucks wrt to response. --file arg can be used to create output file, if --file not used no output will be saved in any file. (#88) 2021-01-24 16:11:39 +05:30
Akash Mahanty
a3bc6aad2b too much API usage by duplicate tests was causing too much tests failure 2021-01-23 21:08:21 +05:30
Akash Mahanty
712471176b better error messages(str), check latest version before asking for an upgrade and rm alive checking 2021-01-15 16:47:26 +05:30
Akash Mahanty
76205d9cf6 backoff_factor=2 for save, incr success by 25% 2021-01-13 10:13:16 +05:30
Akash Mahanty
259a024eb1
joke? they changed their robots.txt 2021-01-11 23:17:01 +05:30
Akash Mahanty
4693dbf9c1 change str repr of cdxsnapshot to cdx line 2021-01-11 09:34:37 +05:30
Akash Mahanty
a6470b1036 not passing dict to cdxsnapshot 2021-01-10 10:40:32 +05:30
Akash Mahanty
04cda4558e fix test 2021-01-10 03:18:09 +05:30
Akash Mahanty
a03813315f full cdx api support 2021-01-10 02:23:53 +05:30
Akash Mahanty
a2550f17d7 retries support for get requests 2021-01-06 01:58:38 +05:30
Akash Mahanty
0c6107e675 increase coverage 2021-01-04 01:54:40 +05:30
Akash Mahanty
bd079978bf inc coverage 2021-01-04 00:44:55 +05:30
Akash Mahanty
5dec4927cd refactoring, try to code complexity 2021-01-04 00:14:38 +05:30
Akash Mahanty
db5737a857 JSON is now available for near and other other methods that call it 2021-01-02 18:52:46 +05:30
Akash Mahanty
d3e68d0e70
code formated with black (#47) 2020-12-14 01:18:04 +05:30
Akash Mahanty
0280fca189 remove unused import (urllib) 2020-12-13 15:13:51 +05:30
Akash Mahanty
60ee8b95a8 now using requests lib as it handles errors nicely 2020-12-13 15:05:57 +05:30
Akash Mahanty
ca51c14332
deleted .travis.yml, link with flake (#41)
close #38
2020-11-26 13:06:50 +05:30
Akash Mahanty
5088305a58 removed python2 compatibility code 2020-11-21 17:00:11 +05:30