Commit Graph

  • cd8a32ed1f added tests for cdx_snapshot.py at tests/test_cdx_snapshot.py Akash Mahanty 2022-01-24 16:29:44 +05:30
  • 57512c65ff change test oldest method from google.com to example.com, the oldest on google is for some unknown reason is not very stable. Akash Mahanty 2022-01-24 16:27:35 +05:30
  • d9ea26e11c added code style black badge Akash Mahanty 2022-01-24 13:46:31 +05:30
  • 2bea92b348 fix bug with the third matching case of the archive_url_parser, caught while writing more tests fo the save API interface. Akash Mahanty 2022-01-24 13:31:30 +05:30
  • d506685f68 added some tests for save_api interface Akash Mahanty 2022-01-23 18:35:54 +05:30
  • 7844d15d99 close the session in save api interface Akash Mahanty 2022-01-23 18:34:06 +05:30
  • c0252edff2 updated tests for availability_api.py and also added max_tries(default value is 3) with delay (sleep) between successive API calls. The dealy actually improves the performace of the availability_api interface. Akash Mahanty 2022-01-23 15:05:10 +05:30
  • e7488f3a3e added test badge, rename test to Tests from ubuntu and fix the Incomplete URL substring sanitization(or trying to) Akash Mahanty 2022-01-23 02:26:53 +05:30
  • aed75ad1db Make modules imprtable as part of a Python package, waybackpy by creating __init__.py file in tests Akash Mahanty 2022-01-23 02:14:38 +05:30
  • d740959c34 more dev reqs Akash Mahanty 2022-01-23 02:10:12 +05:30
  • 2d83043ef7 + flake8 in requirements-dev.txt Akash Mahanty 2022-01-23 02:05:08 +05:30
  • 31b1056217 fix typo in CI Akash Mahanty 2022-01-23 02:03:30 +05:30
  • 97712b2c1e add CI unit_test.yml Akash Mahanty 2022-01-23 02:00:15 +05:30
  • a8acc4c4d8 Fix Incomplete URL substring sanitization in the last commit. Akash Mahanty 2022-01-23 01:42:48 +05:30
  • 1bacd73002 created pytest.ini, added test for waybackpy/availability_api.py, new exceptions all of which inherit from the main WaybackError and created requirements-dev.txt Akash Mahanty 2022-01-23 01:29:07 +05:30
  • 79901ba968 updated README.md Akash Mahanty 2022-01-22 03:08:26 +05:30
  • df64e839d7 added trove classifiers for python 3.10 Akash Mahanty 2022-01-22 00:57:10 +05:30
  • 405e9a2a79 waybackpy/save_api.py : Added doc strings and also lint with black. Akash Mahanty 2022-01-22 00:41:10 +05:30
  • db551abbf6 lint waybackpy/cdx_api.py and added some doc strings Akash Mahanty 2022-01-22 00:11:35 +05:30
  • d13dd4db1a added notice on waybackpy/wrapper.py that the Url class will cease to exist after 2024-01-01 and also removed unused imports. Akash Mahanty 2022-01-21 23:14:20 +05:30
  • d3bb8337a1 make setup.py smarter, now no need to update the URL again and also added more keywords. And in __version__.py updated the __author__ Akash Mahanty 2022-01-21 23:01:09 +05:30
  • fd5e85420c waybackpy/availability_api.py : removed unused imports, added doc strings, removed redundant function. Akash Mahanty 2022-01-21 22:47:44 +05:30
  • 5c685ef5d7
    upload logo and make p path not text Akash Mahanty 2022-01-21 21:11:42 +05:30
  • 6a3d96b453
    Logo (#113) Akash Mahanty 2022-01-21 21:02:38 +05:30
  • afe1b15a5f
    Add files via upload Akash Mahanty 2022-01-21 20:58:53 +05:30
  • 4fd9d142e7
    Merge pull request #112 from akamhy/fix Akash Mahanty 2022-01-21 19:52:55 +05:30
  • 5e9fdb40ce
    escape '.' before 'archive.org' Akash Mahanty 2022-01-21 19:51:08 +05:30
  • fa72098270
    _get_response is not used anymore Akash Mahanty 2022-01-21 19:43:35 +05:30
  • d18f955044
    date year range 2020-2022 Akash Mahanty 2022-01-21 11:55:42 +05:30
  • 9c340d6967
    Create codeql-analysis.yml Akash Mahanty 2022-01-21 11:12:59 +05:30
  • 78d0e0c126
    Update README.md Akash Mahanty 2022-01-21 09:54:04 +05:30
  • 564101e6f5
    🐳 for docker image Akash Mahanty 2022-01-21 01:23:05 +05:30
  • de5a3e1561
    improve usage code 3.0.0 Akash Mahanty 2022-01-18 21:18:17 +05:30
  • 52e46fecc2
    more usage example Akash Mahanty 2022-01-18 20:58:39 +05:30
  • 3b6415abc7
    updating examples Akash Mahanty 2022-01-18 20:44:47 +05:30
  • 66e16d6d89 define __repr__ for the Availability API class Akash Mahanty 2022-01-18 20:34:21 +05:30
  • 16b9bdd7f9 output the file name if known_url and file flag are passed. Akash Mahanty 2022-01-18 20:14:44 +05:30
  • 7adc01bff2 implement known_urls for cli from the newer interface. Although use of CDX is recommended but backward-compatibility matters. Akash Mahanty 2022-01-18 20:07:12 +05:30
  • 9bbd056268
    Update README.md Akash Mahanty 2022-01-17 02:15:38 +05:30
  • 2ab44391cf
    close #107, added link to SecSI/Docker image Akash Mahanty 2022-01-16 23:01:31 +05:30
  • cc3628ae18 define __str__ for objects of WaybackMachineAvailabilityAPI class, the check for self.JSON ensures that the API was atleast called. Akash Mahanty 2022-01-16 22:28:12 +05:30
  • 1d751b942b invoke json, was a bad idea removing it the earlier commit as the end user should not have to call it Akash Mahanty 2022-01-16 22:15:25 +05:30
  • 261a867a21 near() method of WaybackMachineAvailabilityAPI return self to preserve past behaviour Akash Mahanty 2022-01-16 21:53:54 +05:30
  • 2e487e88d3 define __len__ on Url objects, if any method not used prior to len op then default to len of oldest archive. Akash Mahanty 2022-01-16 21:29:43 +05:30
  • c8d0ad493a defined __str__ for Url objects, print func should print the url. Akash Mahanty 2022-01-16 21:22:43 +05:30
  • ce869177fd
    Merge pull request #103 from akamhy/whitesource/configure Akash Mahanty 2022-01-02 16:04:15 +05:30
  • 58616fb986
    Add .whitesource configuration file whitesource-bolt-for-github[bot] 2022-01-02 08:45:07 +00:00
  • 4e68cd5743 Create separate module for the 3 different APIs also CDX is now CLI supported. Akash Mahanty 2022-01-02 14:14:45 +05:30
  • a7b805292d
    changes made for v2.4.4 (update download_url) (#100) 2.4.4 akamhy 2021-09-03 11:28:26 +05:30
  • 6dc6124dc4
    Raise error on a 509 response (too many sessions) (#99) Jonáš Jančařík 2021-09-03 04:34:36 +02:00
  • 5a7fc7d568
    Fix typo (#95) Jens Finkhaeuser 2021-04-13 13:28:34 +02:00
  • 5a9c861cad
    v2.4.3 (#94) 2.4.3 Akash Mahanty 2021-04-02 10:41:59 +05:30
  • dd1917c77e
    added RedirectSaveError - for failed saves if the URL is a permanent … (#93) Akash Mahanty 2021-04-02 10:38:17 +05:30
  • db8f902cff
    Add doc strings (#90) Akash Mahanty 2021-01-26 11:56:03 +05:30
  • 88cda94c0b
    v2.4.2 (#89) 2.4.2 Akash Mahanty 2021-01-24 17:03:35 +05:30
  • 09290f88d1 fix one more error Akash Mahanty 2021-01-24 16:58:53 +05:30
  • e5835091c9 import re Akash Mahanty 2021-01-24 16:56:59 +05:30
  • 7312ed1f4f set cached_save to True if archive older than 3 mins. Akash Mahanty 2021-01-24 16:53:36 +05:30
  • 6ae8f843d3
    add --file to --known_urls Akash Mahanty 2021-01-24 16:15:11 +05:30
  • 36b936820b
    known urls now yileds, more reliable. And save the file in chucks wrt to response. --file arg can be used to create output file, if --file not used no output will be saved in any file. (#88) Akash Mahanty 2021-01-24 16:11:39 +05:30
  • a3bc6aad2b too much API usage by duplicate tests was causing too much tests failure Akash Mahanty 2021-01-23 21:08:21 +05:30
  • edc2f63d93 Output valid JSON, dumps python dict. Make JSON valid. Akash Mahanty 2021-01-23 20:43:52 +05:30
  • ffe0810b12 flag to check if the archive saved is 30 mins older or not Akash Mahanty 2021-01-16 12:06:08 +05:30
  • 40233eb115 improve code quality, remove unused imports, use system randomness etc Akash Mahanty 2021-01-16 11:35:13 +05:30
  • d549d31421 improve save method, now we know that 302 errors indicates that wayback machine is archiving the URL and hasn't yet archived. We construct an artifical archive with the current UTC time and check for HTTP status code 20* or 30*. If we verify the archival, we return the artifical archive. The artificial archive will automatically point to the new archive or in best case will be the new archive after some time. Akash Mahanty 2021-01-16 10:47:43 +05:30
  • 0725163af8 mimify the logo, remove ugly old logos Akash Mahanty 2021-01-15 18:14:48 +05:30
  • 712471176b better error messages(str), check latest version before asking for an upgrade and rm alive checking Akash Mahanty 2021-01-15 16:47:26 +05:30
  • dcd7b03302 getting rid of c style str formatting, now using .format Akash Mahanty 2021-01-14 19:30:07 +05:30
  • 76205d9cf6 backoff_factor=2 for save, incr success by 25% Akash Mahanty 2021-01-13 10:13:16 +05:30
  • ec0a0d04cc
    + dequeued0 Akash Mahanty 2021-01-12 10:52:41 +05:30
  • 7bb01df846 v2.4.1 2.4.1 Akash Mahanty 2021-01-12 10:18:09 +05:30
  • 6142e0b353 get should retrive the last fetched archive by default Akash Mahanty 2021-01-12 10:07:14 +05:30
  • a65990aee3 don't use pagination API if total pages <= 2 Akash Mahanty 2021-01-12 09:46:07 +05:30
  • 259a024eb1
    joke? they changed their robots.txt Akash Mahanty 2021-01-11 23:17:01 +05:30
  • 91402792e6
    + Supported Features Akash Mahanty 2021-01-11 23:01:18 +05:30
  • eabf4dc046 don't fetch more pages if >=2 pages are empty Akash Mahanty 2021-01-11 22:43:14 +05:30
  • 5a7bd73565 support unix ts as an arg in near Akash Mahanty 2021-01-11 19:53:37 +05:30
  • 4693dbf9c1 change str repr of cdxsnapshot to cdx line Akash Mahanty 2021-01-11 09:34:37 +05:30
  • f4f2e51315
    V2.4.0 (#62) 2.4.0 Akash Mahanty 2021-01-10 11:53:45 +05:30
  • d6b7df6837
    no need to de-duplicate as we are collapsing the results by urlkey Akash Mahanty 2021-01-10 11:36:46 +05:30
  • dafba5d0cb
    collapses=["urlkey"] for known urls Akash Mahanty 2021-01-10 11:34:06 +05:30
  • 6c71dfbe41 use cdx matchtype for domain and host Akash Mahanty 2021-01-10 11:10:49 +05:30
  • a6470b1036 not passing dict to cdxsnapshot Akash Mahanty 2021-01-10 10:40:32 +05:30
  • 04cda4558e fix test Akash Mahanty 2021-01-10 03:18:09 +05:30
  • 625ed63482 remove asserts stmnts Akash Mahanty 2021-01-10 03:05:48 +05:30
  • a03813315f full cdx api support Akash Mahanty 2021-01-10 02:23:53 +05:30
  • a2550f17d7 retries support for get requests Akash Mahanty 2021-01-06 01:58:38 +05:30
  • 15ef5816db
    Always cast url to string, avoid passing waybackpy objects to _get_response Akash Mahanty 2021-01-05 19:46:17 +05:30
  • 93b52bd0fe
    FIX : don't use self.user_agent if user_agent passed in get() Akash Mahanty 2021-01-05 19:31:27 +05:30
  • 28ff877081
    Update README.md Akash Mahanty 2021-01-05 19:08:35 +05:30
  • 3e3ecff9df
    l2 heading and lint 2.3.3 Akash Mahanty 2021-01-05 01:59:29 +05:30
  • ce64135ba8
    ce Akash Mahanty 2021-01-05 01:52:35 +05:30
  • 2af6580ffb
    docs link Akash Mahanty 2021-01-05 01:51:53 +05:30
  • 8a3c515176
    v2.3.3 Akash Mahanty 2021-01-05 01:49:26 +05:30
  • d98c4f32ad
    v2.3.3 Akash Mahanty 2021-01-05 01:48:54 +05:30
  • e0a4b007d5 improve docs Akash Mahanty 2021-01-05 01:46:12 +05:30
  • 6fb6b2deee
    Update readme + new file CONTRIBUTORS.md (#59) Akash Mahanty 2021-01-05 00:30:07 +05:30
  • 1882862992 now using cdx Pagination API Akash Mahanty 2021-01-04 20:46:54 +05:30
  • 0c6107e675 increase coverage Akash Mahanty 2021-01-04 01:54:40 +05:30
  • bd079978bf inc coverage Akash Mahanty 2021-01-04 00:44:55 +05:30