Fork of https://github.com/akamhy/waybackpy Wayback Machine API interface & a command-line tool
Go to file
Akash Mahanty 9dbe3b3bf4 In waybackpy/wrapper.py set self.timestamp to None on init.
In older interface(2.x.x) we had timestamp set to none in the constructer, so maybe it should be best to set it to None in the backwards compatiblliy module.)
2022-01-29 22:12:02 +05:30
.github/workflows .github/workflows/build_test.yml : change python versions from '3.4', '3.8', '3.10' to '3.6', '3.10' as 3.4 not found by GitHub. 2022-01-25 13:56:49 +05:30
assets upload logo and make p path not text 2022-01-21 21:11:42 +05:30
tests the test is faulty as it fails when we increment the version on dunder version file but did not upstreamed the code to PyPi. 2022-01-26 01:51:24 +05:30
waybackpy In waybackpy/wrapper.py set self.timestamp to None on init. 2022-01-29 22:12:02 +05:30
_config.yml now using requests lib as it handles errors nicely 2020-12-13 15:05:57 +05:30
.gitignore removed JSON from init, this was resulting in too much unnecessary taffic. Some users who are thousands of URLs were blocked by IA (#53) 2021-01-01 16:38:57 +05:30
.whitesource Add .whitesource configuration file 2022-01-02 08:45:07 +00:00
CONTRIBUTORS.md _get_response is not used anymore 2022-01-21 19:43:35 +05:30
LICENSE date year range 2020-2022 2022-01-21 11:55:42 +05:30
pytest.ini created pytest.ini, added test for waybackpy/availability_api.py, new exceptions all of which inherit from the main WaybackError and created requirements-dev.txt 2022-01-23 01:29:07 +05:30
README.md update install command for conda and replace the link to conda-forge.org with https://anaconda.org/conda-forge/waybackpy 2022-01-27 00:17:36 +05:30
requirements-dev.txt more dev reqs 2022-01-23 02:10:12 +05:30
requirements.txt Create separate module for the 3 different APIs also CDX is now CLI supported. 2022-01-02 14:14:45 +05:30
setup.cfg Code style improvements (#20) 2020-07-22 10:09:14 +05:30
setup.py Fix syntax for opening the README.md and __version__.py 2022-01-25 13:05:01 +05:30
snapcraft.yaml add snapcraft.yaml 2022-01-25 20:54:09 +05:30


A Python package & CLI tool that interfaces with the Wayback Machine API

Unit Tests pypi Downloads GitHub lastest commit PyPI - Python Version Code style: black


Introduction

Waybackpy is a Python package and a CLI tool that interfaces with the Wayback Machine API.

Wayback Machine has 3 client side APIs.

These three APIs can be accessed via the waybackpy either by importing it in a script or from the CLI.

🏗 Installation

Using pip, from PyPI (recommended):

pip install waybackpy

Using conda, from conda-forge (recommended):

See also waybackpy feedstock, maintainers are @rafaelrdealmeida, @labriunesp and @akamhy.

conda install -c conda-forge waybackpy

Install directly from this git repository (NOT recommended):

pip install git+https://github.com/akamhy/waybackpy.git

🐳 Docker Image

Docker Hub : https://hub.docker.com/r/secsi/waybackpy

Docker image is automatically updated on every release by Regulary and Automatically Updated Docker Images (RAUDI).

RAUDI is a tool by SecSI (https://secsi.io), an Italian cybersecurity startup.

🚀 Usage

As a Python package

Save API aka SavePageNow
>>> from waybackpy import WaybackMachineSaveAPI
>>> url = "https://github.com"
>>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
>>>
>>> save_api = WaybackMachineSaveAPI(url, user_agent)
>>> save_api.save()
https://web.archive.org/web/20220118125249/https://github.com/
>>> save_api.cached_save
False
>>> save_api.timestamp()
datetime.datetime(2022, 1, 18, 12, 52, 49)
Availability API
>>> from waybackpy import WaybackMachineAvailabilityAPI
>>>
>>> url = "https://google.com"
>>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
>>>
>>> availability_api = WaybackMachineAvailabilityAPI(url, user_agent)
>>>
>>> availability_api.oldest()
https://web.archive.org/web/19981111184551/http://google.com:80/
>>>
>>> availability_api.newest()
https://web.archive.org/web/20220118150444/https://www.google.com/
>>>
>>> availability_api.near(year=2010, month=10, day=10, hour=10)
https://web.archive.org/web/20101010101708/http://www.google.com/
CDX API aka CDXServerAPI
>>> from waybackpy import WaybackMachineCDXServerAPI
>>> url = "https://pypi.org"
>>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
>>> cdx = WaybackMachineCDXServerAPI(url, user_agent, start_timestamp=2016, end_timestamp=2017)
>>> for item in cdx.snapshots():
...     print(item.archive_url)
...
https://web.archive.org/web/20160110011047/http://pypi.org/
https://web.archive.org/web/20160305104847/http://pypi.org/
.
. # URLS REDACTED FOR READABILITY
.
https://web.archive.org/web/20171127171549/https://pypi.org/
https://web.archive.org/web/20171206002737/http://pypi.org:80/

Documentation is at https://github.com/akamhy/waybackpy/wiki/Python-package-docs.

As a CLI tool

Demo video on asciinema.org, you can copy the text from video:

asciicast

CLI documentation is at https://github.com/akamhy/waybackpy/wiki/CLI-docs.

🛡 License

License: MIT

Copyright (c) 2020-2022 Akash Mahanty Et al.

Released under the MIT License. See license for details.