Commit Graph

99 Commits

Author SHA1 Message Date
Akash Mahanty
d13dd4db1a added notice on waybackpy/wrapper.py that the Url class will cease to exist after 2024-01-01 and also removed unused imports. 2022-01-21 23:14:20 +05:30
Akash Mahanty
2e487e88d3 define __len__ on Url objects, if any method not used prior to len op then default to len of oldest archive. 2022-01-16 21:29:43 +05:30
Akash Mahanty
c8d0ad493a defined __str__ for Url objects, print func should print the url. 2022-01-16 21:22:43 +05:30
Akash Mahanty
4e68cd5743 Create separate module for the 3 different APIs also CDX is now CLI supported. 2022-01-02 14:14:45 +05:30
Jonáš Jančařík
6dc6124dc4 Raise error on a 509 response (too many sessions) (#99)
* Raise error on a 509 response (too many sessions)

When the response code is 509, raise an error with an explanation (based on the actual error message contained in the response HTML).

* Raise error on a 509 response (too many sessions) - linting
2021-09-03 08:04:36 +05:30
Akash Mahanty
dd1917c77e added RedirectSaveError - for failed saves if the URL is a permanent … (#93)
* added RedirectSaveError - for failed saves if the URL is a permanent redirect.

* check if url is redirect before throwing exceptions, res.url is the redirect url if redirected at all

* update tests and cli errors
2021-04-02 10:38:17 +05:30
Akash Mahanty
db8f902cff Add doc strings (#90)
* Added some docstrings in utils.py

* renamed some func/meth to better names and added doc strings + lint

* added more docstrings

* more docstrings

* improve docstrings

* docstrings

* added more docstrings, lint

* fix import error
2021-01-26 11:56:03 +05:30
Akash Mahanty
09290f88d1 fix one more error 2021-01-24 16:58:53 +05:30
Akash Mahanty
e5835091c9 import re 2021-01-24 16:56:59 +05:30
Akash Mahanty
7312ed1f4f set cached_save to True if archive older than 3 mins. 2021-01-24 16:53:36 +05:30
Akash Mahanty
36b936820b known urls now yileds, more reliable. And save the file in chucks wrt to response. --file arg can be used to create output file, if --file not used no output will be saved in any file. (#88) 2021-01-24 16:11:39 +05:30
Akash Mahanty
ffe0810b12 flag to check if the archive saved is 30 mins older or not 2021-01-16 12:06:08 +05:30
Akash Mahanty
40233eb115 improve code quality, remove unused imports, use system randomness etc 2021-01-16 11:35:13 +05:30
Akash Mahanty
d549d31421 improve save method, now we know that 302 errors indicates that wayback machine is archiving the URL and hasn't yet archived. We construct an artifical archive with the current UTC time and check for HTTP status code 20* or 30*. If we verify the archival, we return the artifical archive. The artificial archive will automatically point to the new archive or in best case will be the new archive after some time. 2021-01-16 10:47:43 +05:30
Akash Mahanty
712471176b better error messages(str), check latest version before asking for an upgrade and rm alive checking 2021-01-15 16:47:26 +05:30
Akash Mahanty
dcd7b03302 getting rid of c style str formatting, now using .format 2021-01-14 19:30:07 +05:30
Akash Mahanty
76205d9cf6 backoff_factor=2 for save, incr success by 25% 2021-01-13 10:13:16 +05:30
Akash Mahanty
6142e0b353 get should retrive the last fetched archive by default 2021-01-12 10:07:14 +05:30
Akash Mahanty
eabf4dc046 don't fetch more pages if >=2 pages are empty 2021-01-11 22:43:14 +05:30
Akash Mahanty
5a7bd73565 support unix ts as an arg in near 2021-01-11 19:53:37 +05:30
Akash Mahanty
4693dbf9c1 change str repr of cdxsnapshot to cdx line 2021-01-11 09:34:37 +05:30
Akash Mahanty
d6b7df6837 no need to de-duplicate as we are collapsing the results by urlkey
Same urls aren't recieved
2021-01-10 11:36:46 +05:30
Akash Mahanty
dafba5d0cb collapses=["urlkey"] for known urls 2021-01-10 11:34:06 +05:30
Akash Mahanty
6c71dfbe41 use cdx matchtype for domain and host 2021-01-10 11:10:49 +05:30
Akash Mahanty
a03813315f full cdx api support 2021-01-10 02:23:53 +05:30
Akash Mahanty
a2550f17d7 retries support for get requests 2021-01-06 01:58:38 +05:30
Akash Mahanty
15ef5816db Always cast url to string, avoid passing waybackpy objects to _get_response 2021-01-05 19:46:17 +05:30
Akash Mahanty
93b52bd0fe FIX : don't use self.user_agent if user_agent passed in get() 2021-01-05 19:31:27 +05:30
Akash Mahanty
e0a4b007d5 improve docs 2021-01-05 01:46:12 +05:30
Akash Mahanty
1882862992 now using cdx Pagination API 2021-01-04 20:46:54 +05:30
Akash Mahanty
0c6107e675 increase coverage 2021-01-04 01:54:40 +05:30
Akash Mahanty
5dec4927cd refactoring, try to code complexity 2021-01-04 00:14:38 +05:30
Akash Mahanty
9823c809e9 Added doc strings in wrapper.py, documenting code and improving docs. 2021-01-03 17:11:32 +05:30
Akash Mahanty
db5737a857 JSON is now available for near and other other methods that call it 2021-01-02 18:52:46 +05:30
Akash Mahanty
7c7fd75376 No need to fetch archive_url and timestamp from availability API on init (#55)
* No need to fetch archive_url and timestamp from availability API on init. 

Not useful if all I want is to archive a page

* Update test_wrapper.py

* Update wrapper.py

* Update test_wrapper.py

* Update wrapper.py

* Update cli.py

* Update wrapper.py

* Update __version__.py

* Update __version__.py

* Update __version__.py

* Update __version__.py

* Update setup.py

* Update README.md
2021-01-02 11:10:23 +05:30
Akash Mahanty
1b499a7594 removed JSON from init, this was resulting in too much unnecessary taffic. Some users who are thousands of URLs were blocked by IA (#53)
closes #52
2021-01-01 16:38:57 +05:30
Akash Mahanty
d3e68d0e70 code formated with black (#47) 2020-12-14 01:18:04 +05:30
Akash Mahanty
fd163f3d36 Update wrapper.py 2020-12-13 17:12:32 +05:30
Akash Mahanty
a0a918cf0d . 2020-12-13 17:10:28 +05:30
Akash Mahanty
4943cf6873 remove print stmnt, update ci 2020-12-13 16:37:35 +05:30
Akash Mahanty
bc3efc7d63 now using requests lib as it handles errors nicely (#42)
* now using requests lib as it handles errors nicely

* remove unused import (urllib)

* FIX : replaced full_url with endpoint (not using urlib)

* LINT :  Found in waybackpy\wrapper.py:88  Unnecessary else after return
2020-12-13 15:44:37 +05:30
Akash Mahanty
f89368f16d LINT : Found in waybackpy\wrapper.py:88 Unnecessary else after return 2020-12-13 15:39:23 +05:30
Akash Mahanty
c919a6a605 FIX : replaced full_url with endpoint (not using urlib) 2020-12-13 15:22:56 +05:30
Akash Mahanty
60ee8b95a8 now using requests lib as it handles errors nicely 2020-12-13 15:05:57 +05:30
Akash Mahanty
ca51c14332 deleted .travis.yml, link with flake (#41)
close #38
2020-11-26 13:06:50 +05:30
Akash Mahanty
58cd9c28e7 Threading enabled checking for URLs 2020-11-26 06:15:42 +05:30
Akash Mahanty
5088305a58 removed python2 compatibility code 2020-11-21 17:00:11 +05:30
Akash Mahanty
7f927ec7be added tests for json and archive_url, updated broken tests (#34)
* added tests for json and archive_url, updated broken tests

* drop 2.7 support
2020-10-16 19:25:45 +05:30
danvalen1
91e7f65617 Fixing len() bug (#32)
* added class functionality

* Update wrapper.py

* style edits

* fixed bug with len() of url()

* fixing len() bug

* fixing len() bug

* squashing bug

* removed test notebook
2020-10-16 10:04:13 +05:30
danvalen1
d465454019 Adding attributes to Url class (#28)
* added class functionality

* Update wrapper.py

* style edits
2020-10-15 22:10:32 +05:30