Commit Graph

391 Commits

Author SHA1 Message Date
Akash Mahanty
e5835091c9 import re 2021-01-24 16:56:59 +05:30
Akash Mahanty
7312ed1f4f set cached_save to True if archive older than 3 mins. 2021-01-24 16:53:36 +05:30
Akash Mahanty
6ae8f843d3
add --file to --known_urls 2021-01-24 16:15:11 +05:30
Akash Mahanty
36b936820b
known urls now yileds, more reliable. And save the file in chucks wrt to response. --file arg can be used to create output file, if --file not used no output will be saved in any file. (#88) 2021-01-24 16:11:39 +05:30
Akash Mahanty
a3bc6aad2b too much API usage by duplicate tests was causing too much tests failure 2021-01-23 21:08:21 +05:30
Akash Mahanty
edc2f63d93 Output valid JSON, dumps python dict. Make JSON valid. 2021-01-23 20:43:52 +05:30
Akash Mahanty
ffe0810b12 flag to check if the archive saved is 30 mins older or not 2021-01-16 12:06:08 +05:30
Akash Mahanty
40233eb115 improve code quality, remove unused imports, use system randomness etc 2021-01-16 11:35:13 +05:30
Akash Mahanty
d549d31421 improve save method, now we know that 302 errors indicates that wayback machine is archiving the URL and hasn't yet archived. We construct an artifical archive with the current UTC time and check for HTTP status code 20* or 30*. If we verify the archival, we return the artifical archive. The artificial archive will automatically point to the new archive or in best case will be the new archive after some time. 2021-01-16 10:47:43 +05:30
Akash Mahanty
0725163af8 mimify the logo, remove ugly old logos 2021-01-15 18:14:48 +05:30
Akash Mahanty
712471176b better error messages(str), check latest version before asking for an upgrade and rm alive checking 2021-01-15 16:47:26 +05:30
Akash Mahanty
dcd7b03302 getting rid of c style str formatting, now using .format 2021-01-14 19:30:07 +05:30
Akash Mahanty
76205d9cf6 backoff_factor=2 for save, incr success by 25% 2021-01-13 10:13:16 +05:30
Akash Mahanty
ec0a0d04cc
+ dequeued0
dequeued0 (https://github.com/dequeued0) for reporting bugs and useful feature requests.
2021-01-12 10:52:41 +05:30
Akash Mahanty
7bb01df846 v2.4.1 2021-01-12 10:18:09 +05:30
Akash Mahanty
6142e0b353 get should retrive the last fetched archive by default 2021-01-12 10:07:14 +05:30
Akash Mahanty
a65990aee3 don't use pagination API if total pages <= 2 2021-01-12 09:46:07 +05:30
Akash Mahanty
259a024eb1
joke? they changed their robots.txt 2021-01-11 23:17:01 +05:30
Akash Mahanty
91402792e6
+ Supported Features
tell what the package can do, many users probably do not read the full usage.
2021-01-11 23:01:18 +05:30
Akash Mahanty
eabf4dc046 don't fetch more pages if >=2 pages are empty 2021-01-11 22:43:14 +05:30
Akash Mahanty
5a7bd73565 support unix ts as an arg in near 2021-01-11 19:53:37 +05:30
Akash Mahanty
4693dbf9c1 change str repr of cdxsnapshot to cdx line 2021-01-11 09:34:37 +05:30
Akash Mahanty
f4f2e51315
V2.4.0 (#62)
* v 2.4.0

* v 2.4.0
2021-01-10 11:53:45 +05:30
Akash Mahanty
d6b7df6837
no need to de-duplicate as we are collapsing the results by urlkey
Same urls aren't recieved
2021-01-10 11:36:46 +05:30
Akash Mahanty
dafba5d0cb
collapses=["urlkey"] for known urls 2021-01-10 11:34:06 +05:30
Akash Mahanty
6c71dfbe41 use cdx matchtype for domain and host 2021-01-10 11:10:49 +05:30
Akash Mahanty
a6470b1036 not passing dict to cdxsnapshot 2021-01-10 10:40:32 +05:30
Akash Mahanty
04cda4558e fix test 2021-01-10 03:18:09 +05:30
Akash Mahanty
625ed63482 remove asserts stmnts 2021-01-10 03:05:48 +05:30
Akash Mahanty
a03813315f full cdx api support 2021-01-10 02:23:53 +05:30
Akash Mahanty
a2550f17d7 retries support for get requests 2021-01-06 01:58:38 +05:30
Akash Mahanty
15ef5816db
Always cast url to string, avoid passing waybackpy objects to _get_response 2021-01-05 19:46:17 +05:30
Akash Mahanty
93b52bd0fe
FIX : don't use self.user_agent if user_agent passed in get() 2021-01-05 19:31:27 +05:30
Akash Mahanty
28ff877081
Update README.md 2021-01-05 19:08:35 +05:30
Akash Mahanty
3e3ecff9df
l2 heading and lint 2021-01-05 01:59:29 +05:30
Akash Mahanty
ce64135ba8
ce 2021-01-05 01:52:35 +05:30
Akash Mahanty
2af6580ffb
docs link 2021-01-05 01:51:53 +05:30
Akash Mahanty
8a3c515176
v2.3.3 2021-01-05 01:49:26 +05:30
Akash Mahanty
d98c4f32ad
v2.3.3 2021-01-05 01:48:54 +05:30
Akash Mahanty
e0a4b007d5 improve docs 2021-01-05 01:46:12 +05:30
Akash Mahanty
6fb6b2deee
Update readme + new file CONTRIBUTORS.md (#59)
* remove some badges

* remove made with python button, obvious

* - maintained badge, we already have latest commit badge

- [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/akamhy/waybackpy/graphs/commit-activity)

* re arranged order of badges

* a bit more re odering

* - release badge

* - license section

* center h1

* try once more'

* removed the TOC

* move the hr

* Update README.md

* + hr

* h1 --> h2

* remove tests and pacakging info from here to docs/wiki

* Update README.md

* example inspired by psf/requests

* CLI tool example gist

* Update README.md

* Update README.md

* + license

* Update README.md

* authors list

* Update CONTRIBUTORS.md

* fix code

* Update README.md

* Update README.md

* center the button
2021-01-05 00:30:07 +05:30
Akash Mahanty
1882862992 now using cdx Pagination API 2021-01-04 20:46:54 +05:30
Akash Mahanty
0c6107e675 increase coverage 2021-01-04 01:54:40 +05:30
Akash Mahanty
bd079978bf inc coverage 2021-01-04 00:44:55 +05:30
Akash Mahanty
5dec4927cd refactoring, try to code complexity 2021-01-04 00:14:38 +05:30
Akash Mahanty
62e5217b9e reduce code complexity: refactoring, less flow breaking structures 2021-01-03 19:38:25 +05:30
Akash Mahanty
9823c809e9 Added doc strings in wrapper.py, documenting code and improving docs. 2021-01-03 17:11:32 +05:30
Akash Mahanty
db5737a857 JSON is now available for near and other other methods that call it 2021-01-02 18:52:46 +05:30
Akash Mahanty
ca0821a466
Wiki docs (#58)
* move docs to wiki

* Update README.md

* Update setup.py
2021-01-02 12:20:43 +05:30
Akash Mahanty
bb4dbc7d3c
rm url = obj.url 2021-01-02 11:19:09 +05:30