Commit Graph

60 Commits

Author SHA1 Message Date
Akash Mahanty
407d95cc24 implement oldest newest and near methods in the cdx interface class, now cli uses the cdx methods instead of availablity api methods. 2022-02-18 11:38:58 +05:30
Akash Mahanty
f990b93f8a Add sort, use_pagination and closest (#158)
* add sort param support in CDX API class

see https://nla.github.io/outbackcdx/api.html#operation/query

sort takes string input which must be one of the follwoing:
- default
- closest
- reverse

This commit shall help in closing issue at https://github.com/akamhy/waybackpy/issues/155

* add BlockedSiteError for cases when archiving is blocked by site's robots.txt

* create check_for_blocked_site for handling the BlockedSiteError for sites that are blocking wayback machine by their robots.txt policy

* add attrs use_pagination and closest, which are can be used to use the pagination API and lookup archive close to a timestamp respectively. And now to get out of infinte blank pages loop just check for two succesive black and not total two blank pages while using the CDX server API.

* added cli support for sort, use-pagination and closest

* added tests

* fix codeql warnings, nothing to worry about here.

* fix save test for archive_url
2022-02-18 00:24:14 +05:30
Akash Mahanty
cd5c3c61a5 fix imports with isort 2022-02-09 16:18:25 +05:30
Akash Mahanty
edaa1d5d54 update value to the new limit. 2022-02-09 15:40:38 +05:30
Akash Mahanty
25eb709ade improve doc strings and comments and remove useless exceptions. 2022-02-09 14:32:15 +05:30
Akash Mahanty
ec341fa8b3 refactor code in cli module 2022-02-09 11:20:10 +05:30
Akash Mahanty
27f2727049 add cli alias for --start-timestamp(--from) and --end-timestamp(--to) to conform with the CDX API docs. 2022-02-08 20:12:19 +05:30
Akash Mahanty
1216ffbc70 lint and refactor cli module 2022-02-08 20:06:17 +05:30
Akash Mahanty
f8bf9c16f9 Add tests (#149)
* enable codecov

* fix save_urls_on_file

* increase the limit of CDX to 25000 from 5000. 5X increase.

* added test for the CLI module

* make flake 8 happy

* make mypy happy
2022-02-08 17:46:59 +05:30
eggplants
0b631592ea Improve pylint score (#142)
* fix: errors to improve pylint scores

* fix: test

* fix

* add: flake ignore rule to pip8speaks conf

* fix

* add: test patterns to deepsource conf
2022-02-08 06:42:20 +09:00
Akash Mahanty
97f8b96411 added docstrings, added some static type hints and also lint. (#141)
* added docstrings, added some static type hints and also lint.

* added doc strings and changed some internal variable names for more clarity.

* make flake8 happy

* add descriptive docstrings and type hints in waybackpy/cdx_snapshot.py

* remove useless code and add docstrings and also lint using pylint.

* remove unwarented test

* added docstrings, lint using pylint and add a raise on 509 SC

* added docstrings and lint with pylint

* lint

* add doc strings and lint

* add docstrings and lint
2022-02-07 19:40:37 +05:30
eggplants
d2a3946425 fix: escape banner 2022-02-05 10:12:27 +09:00
eggplants
fcab19a40a fix: cli
print error message to stderr and specify defaults of url
2022-02-05 05:55:04 +09:00
eggplants
5f3cd28046 Fix Pylint errors were pointed out by codacy (#133)
* fix: pylint errors were pointed out by codacy

* fix: line length

* fix: help text

* fix: revert

https://stackoverflow.com/a/64477857 makes cli unusable

* fix: cli error and refactor codes
2022-02-05 05:25:40 +09:00
Akash Mahanty
b69e4dff37 rename params of main in cli.py to avoid using built-ins (#132)
* rename params of main in cli.py to avoid using built-ins

* Fix Line 32:80: E501 line too long (102 > 79 characters)
2022-02-05 00:30:35 +05:30
eggplants
d8cabdfdb5 Typing (#128)
* fix: CI yml name

* add: mypy configuraion

* add: type annotation to waybackpy modules

* add: type annotation to test modules

* fix: mypy command

* add: types-requests to dev deps

* fix: disable max-line-length

* fix: move pytest.ini into setup.cfg

* add: urllib3 to deps

* fix: Retry (ref: https://github.com/python/typeshed/issues/6893)

* fix: f-string

* fix: shorten long lines

* add: staticmethod decorator to no-self-use methods

* fix: str(headers)->headers_str

* fix: error message

* fix: revert "str(headers)->headers_str" and ignore assignment CaseInsensitiveDict with str

* fix: mypy error
2022-02-05 03:23:36 +09:00
eggplants
e61447effd Format and lint codes and fix packaging (#125)
* add: configure files (setup.py->setup.py+setup.cfg+pyproject.toml)

* add: __download_url__

* format with black and isort

* fix: flake8 section in setup.cfg

* add: E501 to flake ignore

* fix: metadata.name does not accept attr

* fix: merge __version__.py into __init__.py

* fix: flake8 errors in tests/

* fix: datetime.datetime -> datetime

* fix: banner

* fix: ignore W605 for banner

* fix: way to install deps in CI

* add: versem to setuptools

* fix: drop python<=3.6 (#126) from package and CI
2022-02-03 19:13:39 +05:30
Akash Mahanty
5cbdfc040b waybackpy/cli.py : remove duplicate original_string from output_string in cdx 2022-01-30 21:02:25 +05:30
Akash Mahanty
946c28eddf waybackpy/cli.py: Added help text, fix bug in the cdx_print parameter and lots of other stuff
parameter --filters is now --filter

parameter --collapses is now --collapse

added a new --license flag for fetching the license from GitHub repo and printing it.
2022-01-30 20:00:50 +05:30
Akash Mahanty
f03b2cb6cb fix formatting of ASCII art 2022-01-26 18:24:24 +05:30
Akash Mahanty
5e0ea023e6 update CLI help text 2022-01-26 16:23:24 +05:30
Akash Mahanty
5ea1d3ba4f Replace NON-ASCII character figlet with ASCII character figlet. 2022-01-26 01:46:42 +05:30
Akash Mahanty
16b9bdd7f9 output the file name if known_url and file flag are passed. 2022-01-18 20:14:44 +05:30
Akash Mahanty
7adc01bff2 implement known_urls for cli from the newer interface. Although use of CDX is recommended but backward-compatibility matters. 2022-01-18 20:07:12 +05:30
Akash Mahanty
4e68cd5743 Create separate module for the 3 different APIs also CDX is now CLI supported. 2022-01-02 14:14:45 +05:30
Jens Finkhaeuser
5a7fc7d568 Fix typo (#95) 2021-04-13 16:58:34 +05:30
Akash Mahanty
dd1917c77e added RedirectSaveError - for failed saves if the URL is a permanent … (#93)
* added RedirectSaveError - for failed saves if the URL is a permanent redirect.

* check if url is redirect before throwing exceptions, res.url is the redirect url if redirected at all

* update tests and cli errors
2021-04-02 10:38:17 +05:30
Akash Mahanty
db8f902cff Add doc strings (#90)
* Added some docstrings in utils.py

* renamed some func/meth to better names and added doc strings + lint

* added more docstrings

* more docstrings

* improve docstrings

* docstrings

* added more docstrings, lint

* fix import error
2021-01-26 11:56:03 +05:30
Akash Mahanty
36b936820b known urls now yileds, more reliable. And save the file in chucks wrt to response. --file arg can be used to create output file, if --file not used no output will be saved in any file. (#88) 2021-01-24 16:11:39 +05:30
Akash Mahanty
edc2f63d93 Output valid JSON, dumps python dict. Make JSON valid. 2021-01-23 20:43:52 +05:30
Akash Mahanty
40233eb115 improve code quality, remove unused imports, use system randomness etc 2021-01-16 11:35:13 +05:30
Akash Mahanty
712471176b better error messages(str), check latest version before asking for an upgrade and rm alive checking 2021-01-15 16:47:26 +05:30
Akash Mahanty
dcd7b03302 getting rid of c style str formatting, now using .format 2021-01-14 19:30:07 +05:30
Akash Mahanty
a03813315f full cdx api support 2021-01-10 02:23:53 +05:30
Akash Mahanty
0c6107e675 increase coverage 2021-01-04 01:54:40 +05:30
Akash Mahanty
5dec4927cd refactoring, try to code complexity 2021-01-04 00:14:38 +05:30
Akash Mahanty
62e5217b9e reduce code complexity: refactoring, less flow breaking structures 2021-01-03 19:38:25 +05:30
Akash Mahanty
bb4dbc7d3c rm url = obj.url 2021-01-02 11:19:09 +05:30
Akash Mahanty
7c7fd75376 No need to fetch archive_url and timestamp from availability API on init (#55)
* No need to fetch archive_url and timestamp from availability API on init. 

Not useful if all I want is to archive a page

* Update test_wrapper.py

* Update wrapper.py

* Update test_wrapper.py

* Update wrapper.py

* Update cli.py

* Update wrapper.py

* Update __version__.py

* Update __version__.py

* Update __version__.py

* Update __version__.py

* Update setup.py

* Update README.md
2021-01-02 11:10:23 +05:30
Akash Mahanty
da390ee8a3 improve maintainability and reduce code cognitive complexity (#49) 2020-12-15 10:24:13 +05:30
Akash Mahanty
d3e68d0e70 code formated with black (#47) 2020-12-14 01:18:04 +05:30
Akash Mahanty
ca51c14332 deleted .travis.yml, link with flake (#41)
close #38
2020-11-26 13:06:50 +05:30
Akash Mahanty
58cd9c28e7 Threading enabled checking for URLs 2020-11-26 06:15:42 +05:30
Akash Mahanty
5088305a58 removed python2 compatibility code 2020-11-21 17:00:11 +05:30
Akash Mahanty
9de6393cd5 Add support for JSON and archive_url (#33)
CLI support for JSON and archive_url attributes
2020-10-16 15:16:18 +05:30
Akash Mahanty
1a81eb97fb lint 2020-10-03 16:58:11 +05:30
Akash Mahanty
23f7222cb5 tweak 2020-10-02 21:01:32 +05:30
Akash Mahanty
ce7294d990 Implemented new feature, known urls for domain. 2020-10-02 20:27:28 +05:30
Akash
18cbd2fd30 Update cli.py 2020-07-24 16:10:29 +05:30
Akash
a2812fb56f patch for cli 2020-07-24 16:09:47 +05:30