Before and After methods (#175 )

* Added before and after functions * add tests * formatting
Add Python 3.11 to setup.cfg classifiers list (#179 )
2022-11-17 07:58:46 +05:30 · 2022-11-17 07:56:19 +05:30 · 2022-03-29 11:30:43 +05:30 · 2022-03-29 10:42:57 +05:30 · 2022-03-29 10:24:55 +05:30 · 2022-03-29 03:39:50 +05:30
20 changed files with 779 additions and 232 deletions
--- a/CITATION.cff
+++ b/CITATION.cff
@ -0,0 +1,25 @@
+cff-version: 1.2.0
+message: "If you use this software, please cite it as below."
+title: waybackpy
+abstract: "Python package that interfaces with the Internet Archive's Wayback Machine APIs. Archive pages and retrieve archived pages easily."
+version: '3.0.6'
+doi: 10.5281/ZENODO.3977276
+date-released: 2022-03-15
+type: software
+authors:
+  - given-names: Akash
+    family-names: Mahanty
+    email: akamhy@yahoo.com
+    orcid: https://orcid.org/0000-0003-2482-8227
+keywords:
+    - Archive Website
+    - Wayback Machine
+    - Internet Archive
+    - Wayback Machine CLI
+    - Wayback Machine Python
+    - Internet Archiving
+    - Availability API
+    - CDX API
+    - savepagenow
+license: MIT
+repository-code: "https://github.com/akamhy/waybackpy"
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -0,0 +1,54 @@
+# Welcome to waybackpy contributing guide
+
+
+## Getting started
+
+Read our [Code of Conduct](./CODE_OF_CONDUCT.md).
+
+## Creating an issue
+
+It's a good idea to open an issue and discuss suspected bugs and new feature ideas with the maintainers. Somebody might be working on your bug/idea and it would be best to discuss it to avoid wasting your time. It is a recommendation. You may avoid creating an issue and directly open pull requests.
+
+## Fork this repository
+
+Fork this repository. See '[Fork a repo](https://docs.github.com/en/get-started/quickstart/fork-a-repo)' for help forking this repository on GitHub.
+
+## Make changes to the forked copy
+
+Make the required changes to your forked copy of waybackpy, please don't forget to add or update comments and docstrings.
+
+## Add tests for your changes
+
+You have made the required changes to the codebase, now go ahead and add tests for newly written methods/functions and update the tests of code that you changed.
+
+## Testing and Linting
+
+You must run the tests and linter on your changes before opening a pull request.
+
+### pytest
+
+Runs all test from tests directory. pytest is a mature full-featured Python testing tool.
+```bash
+pytest
+```
+
+### mypy
+
+Mypy is a static type checker for Python. Type checkers help ensure that you're using variables and functions in your code correctly.
+```bash
+mypy -p waybackpy -p tests
+```
+
+### black
+
+After testing with pytest and type checking with mypy run black on the code base. The codestyle used by the project is 'black'.
+
+```bash
+black .
+```
+
+## Create a pull request
+
+Read [Creating a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request).
+
+Try to make sure that all automated tests are passing, and if some of them do not pass then don't worry. Tests are meant to catch bugs and a failed test is better than introducing bugs to the master branch.
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@ -1,16 +0,0 @@
-# CONTRIBUTORS
-
-## AUTHORS
-
- akamhy (<https://github.com/akamhy>)
- eggplants (<https://github.com/eggplants>)
- danvalen1 (<https://github.com/danvalen1>)
- AntiCompositeNumber (<https://github.com/AntiCompositeNumber>)
- rafaelrdealmeida (<https://github.com/rafaelrdealmeida>)
- jonasjancarik (<https://github.com/jonasjancarik>)
- jfinkhaeuser (<https://github.com/jfinkhaeuser>)
-
-## ACKNOWLEDGEMENTS
-
- mhmdiaa (<https://github.com/mhmdiaa>) for <https://gist.github.com/mhmdiaa/adf6bff70142e5091792841d4b372050>. known_urls is based on this gist.
- dequeued0 (<https://github.com/dequeued0>) for reporting bugs and useful feature requests.
--- a/README.md
+++ b/README.md
@ -3,7 +3,7 @@

 <img src="https://raw.githubusercontent.com/akamhy/waybackpy/master/assets/waybackpy_logo.svg"><br>

-<h3>A Python package & CLI tool that interfaces with the Wayback Machine API</h3>
+<h3>Python package & CLI tool that interfaces the Wayback Machine APIs</h3>

 </div>

@ -22,22 +22,22 @@

 # <img src="https://github.githubassets.com/images/icons/emoji/unicode/2b50.png" width="30"></img> Introduction

-Waybackpy is a [Python package](https://www.udacity.com/blog/2021/01/what-is-a-python-package.html) and a [CLI](https://www.w3schools.com/whatis/whatis_cli.asp) tool that interfaces with the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine) API.
+Waybackpy is a Python package and a CLI tool that interfaces with the Wayback Machine APIs.

- Wayback Machine has 3 client side [API](https://www.redhat.com/en/topics/api/what-are-application-programming-interfaces)s.
+Internet Archive's Wayback Machine has 3 useful public APIs.

- [Save API](https://github.com/akamhy/waybackpy/wiki/Wayback-Machine-APIs#save-api)
- [Availability API](https://github.com/akamhy/waybackpy/wiki/Wayback-Machine-APIs#availability-api)
- [CDX API](https://github.com/akamhy/waybackpy/wiki/Wayback-Machine-APIs#cdx-api)
+- SavePageNow or Save API
+- CDX Server API
+- Availability API

-These three APIs can be accessed via the waybackpy either by importing it in a script or from the CLI.
+These three APIs can be accessed via the waybackpy either by importing it from a python file/module or from the command-line interface.

 ## <img src="https://github.githubassets.com/images/icons/emoji/unicode/1f3d7.png" width="20"></img> Installation

 **Using [pip](https://en.wikipedia.org/wiki/Pip_(package_manager)), from [PyPI](https://pypi.org/) (recommended)**:

 ```bash
-pip install waybackpy
+pip install waybackpy -U
 ```

 **Using [conda](https://en.wikipedia.org/wiki/Conda_(package_manager)), from [conda-forge](https://anaconda.org/conda-forge/waybackpy) (recommended)**:
@ -58,11 +58,11 @@ pip install git+https://github.com/akamhy/waybackpy.git

 ## <img src="https://github.githubassets.com/images/icons/emoji/unicode/1f433.png" width="20"></img> Docker Image

-Docker Hub : <https://hub.docker.com/r/secsi/waybackpy>
+Docker Hub: [hub.docker.com/r/secsi/waybackpy](https://hub.docker.com/r/secsi/waybackpy)

-[Docker image](https://searchitoperations.techtarget.com/definition/Docker-image) is automatically updated on every release by [Regulary and Automatically Updated Docker Images](https://github.com/cybersecsi/RAUDI) (RAUDI).
+Docker image is automatically updated on every release by [Regulary and Automatically Updated Docker Images](https://github.com/cybersecsi/RAUDI) (RAUDI).

-RAUDI is a tool by SecSI (<https://secsi.io>), an Italian cybersecurity startup.
+RAUDI is a tool by [SecSI](https://secsi.io), an Italian cybersecurity startup.

 ## <img src="https://github.githubassets.com/images/icons/emoji/unicode/1f680.png" width="20"></img> Usage

@ -84,28 +84,68 @@ False
 datetime.datetime(2022, 1, 18, 12, 52, 49)
 ```

-#### Availability API
-
-```python
->>> from waybackpy import WaybackMachineAvailabilityAPI
->>>
->>> url = "https://google.com"
->>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
->>>
->>> availability_api = WaybackMachineAvailabilityAPI(url, user_agent)
->>>
->>> availability_api.oldest()
-https://web.archive.org/web/19981111184551/http://google.com:80/
->>>
->>> availability_api.newest()
-https://web.archive.org/web/20220118150444/https://www.google.com/
->>>
->>> availability_api.near(year=2010, month=10, day=10, hour=10)
-https://web.archive.org/web/20101010101708/http://www.google.com/
-```
-
 #### CDX API aka CDXServerAPI

+```python
+>>> from waybackpy import WaybackMachineCDXServerAPI
+>>> url = "https://google.com"
+>>> user_agent = "my new app's user agent"
+>>> cdx_api = WaybackMachineCDXServerAPI(url, user_agent)
+```
+##### oldest
+```python
+>>> cdx_api.oldest()
+com,google)/ 19981111184551 http://google.com:80/ text/html 200 HOQ2TGPYAEQJPNUA6M4SMZ3NGQRBXDZ3 381
+>>> oldest = cdx_api.oldest()
+>>> oldest
+com,google)/ 19981111184551 http://google.com:80/ text/html 200 HOQ2TGPYAEQJPNUA6M4SMZ3NGQRBXDZ3 381
+>>> oldest.archive_url
+'https://web.archive.org/web/19981111184551/http://google.com:80/'
+>>> oldest.original
+'http://google.com:80/'
+>>> oldest.urlkey
+'com,google)/'
+>>> oldest.timestamp
+'19981111184551'
+>>> oldest.datetime_timestamp
+datetime.datetime(1998, 11, 11, 18, 45, 51)
+>>> oldest.statuscode
+'200'
+>>> oldest.mimetype
+'text/html'
+```
+##### newest
+```python
+>>> newest = cdx_api.newest()
+>>> newest
+com,google)/ 20220217234427 http://@google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5WLLWU5JKUVNSFZ 563
+>>> newest.archive_url
+'https://web.archive.org/web/20220217234427/http://@google.com/'
+>>> newest.timestamp
+'20220217234427'
+```
+##### near
+```python
+>>> near = cdx_api.near(year=2010, month=10, day=10, hour=10, minute=10)
+>>> near.archive_url
+'https://web.archive.org/web/20101010101435/http://google.com/'
+>>> near
+com,google)/ 20101010101435 http://google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5WLLWU5JKUVNSFZ 391
+>>> near.timestamp
+'20101010101435'
+>>> near.timestamp
+'20101010101435'
+>>> near = cdx_api.near(wayback_machine_timestamp=2008080808)
+>>> near.archive_url
+'https://web.archive.org/web/20080808051143/http://google.com/'
+>>> near = cdx_api.near(unix_timestamp=1286705410)
+>>> near
+com,google)/ 20101010101435 http://google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5WLLWU5JKUVNSFZ 391
+>>> near.archive_url
+'https://web.archive.org/web/20101010101435/http://google.com/'
+>>>
+```
+##### snapshots
 ```python
 >>> from waybackpy import WaybackMachineCDXServerAPI
 >>> url = "https://pypi.org"
@ -123,20 +163,59 @@ https://web.archive.org/web/20171127171549/https://pypi.org/
 https://web.archive.org/web/20171206002737/http://pypi.org:80/
 ```

+#### Availability API
+
+It is recommended to not use the availability API due to performance issues. All the methods of availability API interface class, `WaybackMachineAvailabilityAPI`, are also implemented in the CDX server API interface class, `WaybackMachineCDXServerAPI`. Also note
+that the `newest()` method of `WaybackMachineAvailabilityAPI` can be more recent than `WaybackMachineCDXServerAPI`'s same method.
+
+```python
+>>> from waybackpy import WaybackMachineAvailabilityAPI
+>>>
+>>> url = "https://google.com"
+>>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
+>>>
+>>> availability_api = WaybackMachineAvailabilityAPI(url, user_agent)
+```
+##### oldest
+```python
+>>> availability_api.oldest()
+https://web.archive.org/web/19981111184551/http://google.com:80/
+```
+##### newest
+```python
+>>> availability_api.newest()
+https://web.archive.org/web/20220118150444/https://www.google.com/
+```
+##### near
+```python
+>>> availability_api.near(year=2010, month=10, day=10, hour=10)
+https://web.archive.org/web/20101010101708/http://www.google.com/
+```
+
 > Documentation is at <https://github.com/akamhy/waybackpy/wiki/Python-package-docs>.

 ### As a CLI tool

-Demo video on [asciinema.org](https://asciinema.org), you can copy the text from video:
+Demo video on [asciinema.org](https://asciinema.org/a/469890), you can copy the text from video:

-[![asciicast](https://asciinema.org/a/464367.svg)](https://asciinema.org/a/464367)
+[![asciicast](https://asciinema.org/a/469890.svg)](https://asciinema.org/a/469890)

 > CLI documentation is at <https://github.com/akamhy/waybackpy/wiki/CLI-docs>.

-## <img src="https://github.githubassets.com/images/icons/emoji/unicode/1f6e1.png" width="20"></img> License

-[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/akamhy/waybackpy/blob/master/LICENSE)
+## CONTRIBUTORS

-Copyright (c) 2020-2022 Akash Mahanty Et al.
+### AUTHORS

-Released under the MIT License. See [license](https://github.com/akamhy/waybackpy/blob/master/LICENSE) for details.
+- akamhy (<https://github.com/akamhy>)
+- eggplants (<https://github.com/eggplants>)
+- danvalen1 (<https://github.com/danvalen1>)
+- AntiCompositeNumber (<https://github.com/AntiCompositeNumber>)
+- rafaelrdealmeida (<https://github.com/rafaelrdealmeida>)
+- jonasjancarik (<https://github.com/jonasjancarik>)
+- jfinkhaeuser (<https://github.com/jfinkhaeuser>)
+
+### ACKNOWLEDGEMENTS
+
+- mhmdiaa (<https://github.com/mhmdiaa>)  `--known-urls` is based on [this](https://gist.github.com/mhmdiaa/adf6bff70142e5091792841d4b372050) gist.
+- dequeued0 (<https://github.com/dequeued0>) for reporting bugs and useful feature requests.
--- a/setup.cfg
+++ b/setup.cfg
@ -1,14 +1,14 @@
 [metadata]
 name = waybackpy
 version = attr: waybackpy.__version__
-description = attr: waybackpy.__description__
+description = Python package that interfaces with the Internet Archive's Wayback Machine APIs. Archive pages and retrieve archived pages easily.
 long_description = file: README.md
 long_description_content_type = text/markdown
-license = attr: waybackpy.__license__
-author = attr: waybackpy.__author__
-author_email = attr: waybackpy.__author_email__
-url = attr: waybackpy.__url__
-download_url = attr: waybackpy.__download_url__
+license = MIT
+author = Akash Mahanty
+author_email = akamhy@yahoo.com
+url = https://akamhy.github.io/waybackpy/
+download_url = https://github.com/akamhy/waybackpy/releases
 project_urls =
    Documentation = https://github.com/akamhy/waybackpy/wiki
    Source = https://github.com/akamhy/waybackpy
@ -32,20 +32,26 @@ classifiers =
    License :: OSI Approved :: MIT License
    Programming Language :: Python
    Programming Language :: Python :: 3
+    Programming Language :: Python :: 3.6
    Programming Language :: Python :: 3.7
    Programming Language :: Python :: 3.8
    Programming Language :: Python :: 3.9
    Programming Language :: Python :: 3.10
+    Programming Language :: Python :: 3.11
    Programming Language :: Python :: Implementation :: CPython

 [options]
 packages = find:
-python_requires = >= 3.7
+include-package-data = True
+python_requires = >= 3.6
 install_requires =
    click
    requests
    urllib3

+[options.package_data]
+waybackpy = py.typed
+
 [options.extras_require]
 dev =
    black
--- a/tests/test_cdx_api.py
+++ b/tests/test_cdx_api.py
@ -1,4 +1,16 @@
+import random
+import string
+
+import pytest
+
 from waybackpy.cdx_api import WaybackMachineCDXServerAPI
+from waybackpy.exceptions import NoCDXRecordFound
+
+
+def rndstr(n: int) -> str:
+    return "".join(
+        random.choice(string.ascii_uppercase + string.digits) for _ in range(n)
+    )


 def test_a() -> None:
@ -32,7 +44,11 @@ def test_b() -> None:
    url = "https://www.google.com"

    wayback = WaybackMachineCDXServerAPI(
-        url=url, user_agent=user_agent, start_timestamp="202101", end_timestamp="202112"
+        url=url,
+        user_agent=user_agent,
+        start_timestamp="202101",
+        end_timestamp="202112",
+        collapses=["urlkey"],
    )
    #  timeframe bound prefix matching enabled along with active urlkey based collapsing

@ -40,3 +56,159 @@ def test_b() -> None:

    for snapshot in snapshots:
        assert snapshot.timestamp.startswith("2021")
+
+
+def test_c() -> None:
+    user_agent = (
+        "Mozilla/5.0 (MacBook Air; M1 Mac OS X 11_4) "
+        "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/604.1"
+    )
+    url = "https://www.google.com"
+
+    cdx = WaybackMachineCDXServerAPI(
+        url=url,
+        user_agent=user_agent,
+        closest="201010101010",
+        sort="closest",
+        limit="1",
+    )
+    snapshots = cdx.snapshots()
+    for snapshot in snapshots:
+        archive_url = snapshot.archive_url
+        timestamp = snapshot.timestamp
+        break
+
+    assert str(archive_url).find("google.com")
+    assert "20101010" in timestamp
+
+
+def test_d() -> None:
+    user_agent = (
+        "Mozilla/5.0 (MacBook Air; M1 Mac OS X 11_4) "
+        "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/604.1"
+    )
+
+    cdx = WaybackMachineCDXServerAPI(
+        url="akamhy.github.io",
+        user_agent=user_agent,
+        match_type="prefix",
+        use_pagination=True,
+        filters=["statuscode:200"],
+    )
+    snapshots = cdx.snapshots()
+
+    count = 0
+    for snapshot in snapshots:
+        count += 1
+        assert str(snapshot.archive_url).find("akamhy.github.io")
+    assert count > 50
+
+
+def test_oldest() -> None:
+    user_agent = (
+        "Mozilla/5.0 (MacBook Air; M1 Mac OS X 11_4) "
+        "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/604.1"
+    )
+
+    cdx = WaybackMachineCDXServerAPI(
+        url="google.com",
+        user_agent=user_agent,
+        filters=["statuscode:200"],
+    )
+    oldest = cdx.oldest()
+    assert "1998" in oldest.timestamp
+    assert "google" in oldest.urlkey
+    assert oldest.original.find("google.com") != -1
+    assert oldest.archive_url.find("google.com") != -1
+
+
+def test_newest() -> None:
+    user_agent = (
+        "Mozilla/5.0 (MacBook Air; M1 Mac OS X 11_4) "
+        "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/604.1"
+    )
+
+    cdx = WaybackMachineCDXServerAPI(
+        url="google.com",
+        user_agent=user_agent,
+        filters=["statuscode:200"],
+    )
+    newest = cdx.newest()
+    assert "google" in newest.urlkey
+    assert newest.original.find("google.com") != -1
+    assert newest.archive_url.find("google.com") != -1
+
+
+def test_near() -> None:
+    user_agent = (
+        "Mozilla/5.0 (MacBook Air; M1 Mac OS X 11_4) "
+        "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/604.1"
+    )
+
+    cdx = WaybackMachineCDXServerAPI(
+        url="google.com",
+        user_agent=user_agent,
+        filters=["statuscode:200"],
+    )
+    near = cdx.near(year=2010, month=10, day=10, hour=10, minute=10)
+    assert "2010101010" in near.timestamp
+    assert "google" in near.urlkey
+    assert near.original.find("google.com") != -1
+    assert near.archive_url.find("google.com") != -1
+
+    near = cdx.near(wayback_machine_timestamp="201010101010")
+    assert "2010101010" in near.timestamp
+    assert "google" in near.urlkey
+    assert near.original.find("google.com") != -1
+    assert near.archive_url.find("google.com") != -1
+
+    near = cdx.near(unix_timestamp=1286705410)
+    assert "2010101010" in near.timestamp
+    assert "google" in near.urlkey
+    assert near.original.find("google.com") != -1
+    assert near.archive_url.find("google.com") != -1
+
+    with pytest.raises(NoCDXRecordFound):
+        dne_url = f"https://{rndstr(30)}.in"
+        cdx = WaybackMachineCDXServerAPI(
+            url=dne_url,
+            user_agent=user_agent,
+            filters=["statuscode:200"],
+        )
+        cdx.near(unix_timestamp=1286705410)
+
+
+def test_before() -> None:
+    user_agent = (
+        "Mozilla/5.0 (MacBook Air; M1 Mac OS X 11_4) "
+        "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/604.1"
+    )
+
+    cdx = WaybackMachineCDXServerAPI(
+        url="http://www.google.com/",
+        user_agent=user_agent,
+        filters=["statuscode:200"],
+    )
+    before = cdx.before(wayback_machine_timestamp=20160731235949)
+    assert "20160731233347" in before.timestamp
+    assert "google" in before.urlkey
+    assert before.original.find("google.com") != -1
+    assert before.archive_url.find("google.com") != -1
+
+
+def test_after() -> None:
+    user_agent = (
+        "Mozilla/5.0 (MacBook Air; M1 Mac OS X 11_4) "
+        "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/604.1"
+    )
+
+    cdx = WaybackMachineCDXServerAPI(
+        url="http://www.google.com/",
+        user_agent=user_agent,
+        filters=["statuscode:200"],
+    )
+    after = cdx.after(wayback_machine_timestamp=20160731235949)
+    assert "20160801000917" in after.timestamp, after.timestamp
+    assert "google" in after.urlkey
+    assert after.original.find("google.com") != -1
+    assert after.archive_url.find("google.com") != -1
--- a/tests/test_cdx_snapshot.py
+++ b/tests/test_cdx_snapshot.py
@ -41,3 +41,4 @@ def test_CDXSnapshot() -> None:
    )
    assert archive_url == snapshot.archive_url
    assert sample_input == str(snapshot)
+    assert sample_input == repr(snapshot)
--- a/tests/test_cdx_utils.py
+++ b/tests/test_cdx_utils.py
@ -6,6 +6,7 @@ from waybackpy.cdx_utils import (
    check_collapses,
    check_filters,
    check_match_type,
+    check_sort,
    full_url,
    get_response,
    get_total_pages,
@ -101,3 +102,12 @@ def test_check_match_type() -> None:

    with pytest.raises(WaybackError):
        check_match_type("not a valid type", "url")
+
+
+def test_check_sort() -> None:
+    assert check_sort("default")
+    assert check_sort("closest")
+    assert check_sort("reverse")
+
+    with pytest.raises(WaybackError):
+        assert check_sort("random crap")
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@ -42,39 +42,6 @@ def test_near() -> None:
    )


-def test_json() -> None:
-    runner = CliRunner()
-    result = runner.invoke(
-        main,
-        [
-            "--url",
-            " https://apple.com ",
-            "--near",
-            "--year",
-            "2010",
-            "--month",
-            "2",
-            "--day",
-            "8",
-            "--hour",
-            "12",
-            "--json",
-        ],
-    )
-    assert result.exit_code == 0
-    assert (
-        result.output.find(
-            """Archive URL:\nhttps://web.archive.org/web/2010020812\
-5854/http://www.apple.com/\nJSON respons\
-e:\n{"url": "https://apple.com", "archived_snapshots": {"close\
-st": {"status": "200", "available": true, "url": "http://web.ar\
-chive.org/web/20100208125854/http://www.apple.com/", "timest\
-amp": "20100208125854"}}, "timestamp":"""
-        )
-        != -1
-    )
-
-
 def test_newest() -> None:
    runner = CliRunner()
    result = runner.invoke(main, ["--url", " https://microsoft.com ", "--newest"])
@ -145,7 +112,7 @@ def test_only_url() -> None:
    assert result.exit_code == 0
    assert (
        result.output
-        == "Only URL passed, but did not specify what to do with the URL. Use \
+        == "NoCommandFound: Only URL passed, but did not specify what to do with the URL. Use \
 --help flag for help using waybackpy.\n"
    )

--- a/tests/test_save_api.py
+++ b/tests/test_save_api.py
@ -219,4 +219,5 @@ def test_archive_url() -> None:
    save_api.saved_archive = (
        "https://web.archive.org/web/20220124063056/https://example.com/"
    )
+    save_api._archive_url = save_api.saved_archive
    assert save_api.archive_url == save_api.saved_archive
--- a/tests/test_wrapper.py
+++ b/tests/test_wrapper.py
@ -35,4 +35,11 @@ def test_total_archives() -> None:

 def test_known_urls() -> None:
    wayback = Url("akamhy.github.io")
-    assert len(list(wayback.known_urls())) > 40
+    assert len(list(wayback.known_urls(subdomain=True))) > 40
+
+
+def test_Save() -> None:
+    wayback = Url("https://en.wikipedia.org/wiki/Asymptotic_equipartition_property")
+    wayback.save()
+    archive_url = str(wayback.archive_url)
+    assert archive_url.find("Asymptotic_equipartition_property") != -1
--- a/waybackpy/init.py
+++ b/waybackpy/init.py
@ -1,17 +1,6 @@
 """Module initializer and provider of static information."""

-__title__ = "waybackpy"
-__description__ = (
-    "Python package that interfaces with the Internet Archive's Wayback Machine APIs. "
-    "Archive pages and retrieve archived pages easily."
-)
-__url__ = "https://akamhy.github.io/waybackpy/"
-__version__ = "3.0.3"
-__download_url__ = f"https://github.com/akamhy/waybackpy/archive/{__version__}.tar.gz"
-__author__ = "Akash Mahanty"
-__author_email__ = "akamhy@yahoo.com"
-__license__ = "MIT"
-__copyright__ = "Copyright 2020-2022 Akash Mahanty et al."
+__version__ = "3.0.6"

 from .availability_api import WaybackMachineAvailabilityAPI
 from .cdx_api import WaybackMachineCDXServerAPI
@ -19,14 +8,6 @@ from .save_api import WaybackMachineSaveAPI
 from .wrapper import Url

 __all__ = [
-    "__author__",
-    "__author_email__",
-    "__copyright__",
-    "__description__",
-    "__license__",
-    "__title__",
-    "__url__",
-    "__download_url__",
    "__version__",
    "WaybackMachineAvailabilityAPI",
    "WaybackMachineCDXServerAPI",
--- a/waybackpy/availability_api.py
+++ b/waybackpy/availability_api.py
@ -32,7 +32,11 @@ from .exceptions import (
    ArchiveNotInAvailabilityAPIResponse,
    InvalidJSONInAvailabilityAPIResponse,
 )
-from .utils import DEFAULT_USER_AGENT
+from .utils import (
+    DEFAULT_USER_AGENT,
+    unix_timestamp_to_wayback_timestamp,
+    wayback_timestamp,
+)

 ResponseJSON = Dict[str, Any]

@ -58,14 +62,6 @@ class WaybackMachineAvailabilityAPI:
        self.json: Optional[ResponseJSON] = None
        self.response: Optional[Response] = None

-    @staticmethod
-    def unix_timestamp_to_wayback_timestamp(unix_timestamp: int) -> str:
-        """
-        Converts Unix time to Wayback Machine timestamp, Wayback Machine
-        timestamp format is yyyyMMddhhmmss.
-        """
-        return datetime.utcfromtimestamp(int(unix_timestamp)).strftime("%Y%m%d%H%M%S")
-
    def __repr__(self) -> str:
        """
        Same as string representation, just return the archive URL as a string.
@ -194,17 +190,6 @@ class WaybackMachineAvailabilityAPI:
            )
        return archive_url

-    @staticmethod
-    def wayback_timestamp(**kwargs: int) -> str:
-        """
-        Prepends zero before the year, month, day, hour and minute so that they
-        are conformable with the YYYYMMDDhhmmss Wayback Machine timestamp format.
-        """
-        return "".join(
-            str(kwargs[key]).zfill(2)
-            for key in ["year", "month", "day", "hour", "minute"]
-        )
-
    def oldest(self) -> "WaybackMachineAvailabilityAPI":
        """
        Passes the date 1994-01-01 to near which should return the oldest archive
@ -245,10 +230,10 @@ class WaybackMachineAvailabilityAPI:
        finally returns the instance.
        """
        if unix_timestamp:
-            timestamp = self.unix_timestamp_to_wayback_timestamp(unix_timestamp)
+            timestamp = unix_timestamp_to_wayback_timestamp(unix_timestamp)
        else:
            now = datetime.utcnow().timetuple()
-            timestamp = self.wayback_timestamp(
+            timestamp = wayback_timestamp(
                year=now.tm_year if year is None else year,
                month=now.tm_mon if month is None else month,
                day=now.tm_mday if day is None else day,
--- a/waybackpy/cdx_api.py
+++ b/waybackpy/cdx_api.py
@ -9,19 +9,26 @@ the snapshots are yielded as instances of the CDXSnapshot class.
 """


-from typing import Dict, Generator, List, Optional, cast
+import time
+from datetime import datetime
+from typing import Dict, Generator, List, Optional, Union, cast

 from .cdx_snapshot import CDXSnapshot
 from .cdx_utils import (
    check_collapses,
    check_filters,
    check_match_type,
+    check_sort,
    full_url,
    get_response,
    get_total_pages,
 )
-from .exceptions import WaybackError
-from .utils import DEFAULT_USER_AGENT
+from .exceptions import NoCDXRecordFound, WaybackError
+from .utils import (
+    DEFAULT_USER_AGENT,
+    unix_timestamp_to_wayback_timestamp,
+    wayback_timestamp,
+)


 class WaybackMachineCDXServerAPI:
@ -44,10 +51,13 @@ class WaybackMachineCDXServerAPI:
        end_timestamp: Optional[str] = None,
        filters: Optional[List[str]] = None,
        match_type: Optional[str] = None,
+        sort: Optional[str] = None,
        gzip: Optional[str] = None,
        collapses: Optional[List[str]] = None,
        limit: Optional[str] = None,
        max_tries: int = 3,
+        use_pagination: bool = False,
+        closest: Optional[str] = None,
    ) -> None:
        self.url = str(url).strip().replace(" ", "%20")
        self.user_agent = user_agent
@ -57,65 +67,65 @@ class WaybackMachineCDXServerAPI:
        check_filters(self.filters)
        self.match_type = None if match_type is None else str(match_type).strip()
        check_match_type(self.match_type, self.url)
+        self.sort = None if sort is None else str(sort).strip()
+        check_sort(self.sort)
        self.gzip = gzip
        self.collapses = [] if collapses is None else collapses
        check_collapses(self.collapses)
        self.limit = 25000 if limit is None else limit
        self.max_tries = max_tries
+        self.use_pagination = use_pagination
+        self.closest = None if closest is None else str(closest)
        self.last_api_request_url: Optional[str] = None
-        self.use_page = False
        self.endpoint = "https://web.archive.org/cdx/search/cdx"

    def cdx_api_manager(
-        self, payload: Dict[str, str], headers: Dict[str, str], use_page: bool = False
+        self, payload: Dict[str, str], headers: Dict[str, str]
    ) -> Generator[str, None, None]:
        """
-        Manages the API calls for the instance, it automatically selects the best
-        parameters by looking as the query of the end-user. For bigger queries
-        automatically use the CDX pagination API and for smaller queries use the
-        normal API.
-
-        CDX Server API is a complex API and to make it easy for the end user to
-        consume it the CDX manager(this method) handles the selection of the
-        API output, whether to use the pagination API or not.
-
-        For doing large/bulk queries, the use of the Pagination API is
-        recommended by the Wayback Machine authors. And it determines if the
-        query would be large or not by using the showNumPages=true parameter,
-        this tells the number of pages of CDX DATA that the pagination API
-        will return.
-
-        If the number of page is less than 2 we use the normal non-pagination
-        API as the pagination API is known to lag and for big queries it should
-        not matter but for queries where the number of pages are less this
-        method chooses accuracy over the pagination API.
+        This method uses the pagination API of the CDX server if
+        use_pagination attribute is True else uses the standard
+        CDX server response data.
        """
-        # number of pages that will returned by the pagination API.
-        # get_total_pages adds the showNumPages=true param to pagination API
-        # requests.
-        # This is a special query that will return a single number indicating
-        # the number of pages.
-        total_pages = get_total_pages(self.url, self.user_agent)

-        if use_page is True and total_pages >= 2:
-            blank_pages = 0
+        # When using the pagination API of the CDX server.
+        if self.use_pagination is True:
+
+            total_pages = get_total_pages(self.url, self.user_agent)
+            successive_blank_pages = 0
+
            for i in range(total_pages):
                payload["page"] = str(i)

                url = full_url(self.endpoint, params=payload)
                res = get_response(url, headers=headers)
+
                if isinstance(res, Exception):
                    raise res

                self.last_api_request_url = url
                text = res.text
-                if len(text) == 0:
-                    blank_pages += 1

-                if blank_pages >= 2:
+                # Reset the counter if the last page was blank
+                # but the current page is not.
+                if successive_blank_pages == 1:
+                    if len(text) != 0:
+                        successive_blank_pages = 0
+
+                # Increase the succesive page counter on encountering
+                # blank page.
+                if len(text) == 0:
+                    successive_blank_pages += 1
+
+                # If two succesive pages are blank
+                # then we don't have any more pages left to
+                # iterate.
+                if successive_blank_pages >= 2:
                    break

                yield text
+
+        # When not using the pagination API of the CDX server
        else:
            payload["showResumeKey"] = "true"
            payload["limit"] = str(self.limit)
@ -162,9 +172,15 @@ class WaybackMachineCDXServerAPI:
        if self.gzip is None:
            payload["gzip"] = "false"

+        if self.closest:
+            payload["closest"] = self.closest
+
        if self.match_type:
            payload["matchType"] = self.match_type

+        if self.sort:
+            payload["sort"] = self.sort
+
        if self.filters and len(self.filters) > 0:
            for i, _filter in enumerate(self.filters):
                payload["filter" + str(i)] = _filter
@ -175,6 +191,151 @@ class WaybackMachineCDXServerAPI:

        payload["url"] = self.url

+    def before(
+        self,
+        year: Optional[int] = None,
+        month: Optional[int] = None,
+        day: Optional[int] = None,
+        hour: Optional[int] = None,
+        minute: Optional[int] = None,
+        unix_timestamp: Optional[int] = None,
+        wayback_machine_timestamp: Optional[Union[int, str]] = None,
+    ) -> CDXSnapshot:
+        """
+        Gets the nearest archive before the given datetime.
+        """
+        if unix_timestamp:
+            timestamp = unix_timestamp_to_wayback_timestamp(unix_timestamp)
+        elif wayback_machine_timestamp:
+            timestamp = str(wayback_machine_timestamp)
+        else:
+            now = datetime.utcnow().timetuple()
+            timestamp = wayback_timestamp(
+                year=now.tm_year if year is None else year,
+                month=now.tm_mon if month is None else month,
+                day=now.tm_mday if day is None else day,
+                hour=now.tm_hour if hour is None else hour,
+                minute=now.tm_min if minute is None else minute,
+            )
+        self.closest = timestamp
+        self.sort = "closest"
+        self.limit = 25000
+        for snapshot in self.snapshots():
+            if snapshot.timestamp < timestamp:
+                return snapshot
+
+        # If a snapshot isn't returned, then none were found.
+        raise NoCDXRecordFound(
+            "No records were found before the given date for the query."
+            + "Either there are no archives before the given date,"
+            + " the URL may not have any archived, or the URL may have been"
+            + " recently archived and is still not available on the CDX server."
+        )
+
+    def after(
+        self,
+        year: Optional[int] = None,
+        month: Optional[int] = None,
+        day: Optional[int] = None,
+        hour: Optional[int] = None,
+        minute: Optional[int] = None,
+        unix_timestamp: Optional[int] = None,
+        wayback_machine_timestamp: Optional[Union[int, str]] = None,
+    ) -> CDXSnapshot:
+        """
+        Gets the nearest archive after the given datetime.
+        """
+        if unix_timestamp:
+            timestamp = unix_timestamp_to_wayback_timestamp(unix_timestamp)
+        elif wayback_machine_timestamp:
+            timestamp = str(wayback_machine_timestamp)
+        else:
+            now = datetime.utcnow().timetuple()
+            timestamp = wayback_timestamp(
+                year=now.tm_year if year is None else year,
+                month=now.tm_mon if month is None else month,
+                day=now.tm_mday if day is None else day,
+                hour=now.tm_hour if hour is None else hour,
+                minute=now.tm_min if minute is None else minute,
+            )
+        self.closest = timestamp
+        self.sort = "closest"
+        self.limit = 25000
+        for snapshot in self.snapshots():
+            if snapshot.timestamp > timestamp:
+                return snapshot
+
+        # If a snapshot isn't returned, then none were found.
+        raise NoCDXRecordFound(
+            "No records were found after the given date for the query."
+            + "Either there are no archives after the given date,"
+            + " the URL may not have any archives, or the URL may have been"
+            + " recently archived and is still not available on the CDX server."
+        )
+
+    def near(
+        self,
+        year: Optional[int] = None,
+        month: Optional[int] = None,
+        day: Optional[int] = None,
+        hour: Optional[int] = None,
+        minute: Optional[int] = None,
+        unix_timestamp: Optional[int] = None,
+        wayback_machine_timestamp: Optional[Union[int, str]] = None,
+    ) -> CDXSnapshot:
+        """
+        Fetch archive close to a datetime, it can only return
+        a single URL. If you want more do not use this method
+        instead use the class.
+        """
+        if unix_timestamp:
+            timestamp = unix_timestamp_to_wayback_timestamp(unix_timestamp)
+        elif wayback_machine_timestamp:
+            timestamp = str(wayback_machine_timestamp)
+        else:
+            now = datetime.utcnow().timetuple()
+            timestamp = wayback_timestamp(
+                year=now.tm_year if year is None else year,
+                month=now.tm_mon if month is None else month,
+                day=now.tm_mday if day is None else day,
+                hour=now.tm_hour if hour is None else hour,
+                minute=now.tm_min if minute is None else minute,
+            )
+        self.closest = timestamp
+        self.sort = "closest"
+        self.limit = 1
+        first_snapshot = None
+        for snapshot in self.snapshots():
+            first_snapshot = snapshot
+            break
+
+        if not first_snapshot:
+            raise NoCDXRecordFound(
+                "Wayback Machine's CDX server did not return any records "
+                + "for the query. The URL may not have any archives "
+                + " on the Wayback Machine or the URL may have been recently "
+                + "archived and is still not available on the CDX server."
+            )
+
+        return first_snapshot
+
+    def newest(self) -> CDXSnapshot:
+        """
+        Passes the current UNIX time to near() for retrieving the newest archive
+        from the availability API.
+
+        Remember UNIX time is UTC and Wayback Machine is also UTC based.
+        """
+        return self.near(unix_timestamp=int(time.time()))
+
+    def oldest(self) -> CDXSnapshot:
+        """
+        Passes the date 1994-01-01 to near which should return the oldest archive
+        because Wayback Machine was started in May, 1996 and it is assumed that
+        there would be no archive older than January 1, 1994.
+        """
+        return self.near(year=1994, month=1, day=1)
+
    def snapshots(self) -> Generator[CDXSnapshot, None, None]:
        """
        This function yields the CDX data lines as snapshots.
@ -199,13 +360,7 @@ class WaybackMachineCDXServerAPI:

        self.add_payload(payload)

-        if not self.start_timestamp or self.end_timestamp:
-            self.use_page = True
-
-        if self.collapses != []:
-            self.use_page = False
-
-        entries = self.cdx_api_manager(payload, headers, use_page=self.use_page)
+        entries = self.cdx_api_manager(payload, headers)

        for entry in entries:

--- a/waybackpy/cdx_snapshot.py
+++ b/waybackpy/cdx_snapshot.py
@ -73,6 +73,12 @@ class CDXSnapshot:
            f"https://web.archive.org/web/{self.timestamp}/{self.original}"
        )

+    def __repr__(self) -> str:
+        """
+        Same as __str__()
+        """
+        return str(self)
+
    def __str__(self) -> str:
        """
        The string representation is same as the line returned by the
--- a/waybackpy/cdx_utils.py
+++ b/waybackpy/cdx_utils.py
@ -13,7 +13,7 @@ import requests
 from requests.adapters import HTTPAdapter
 from urllib3.util.retry import Retry

-from .exceptions import WaybackError
+from .exceptions import BlockedSiteError, WaybackError
 from .utils import DEFAULT_USER_AGENT


@ -28,12 +28,38 @@ def get_total_pages(url: str, user_agent: str = DEFAULT_USER_AGENT) -> int:
    headers = {"User-Agent": user_agent}
    request_url = full_url(endpoint, params=payload)
    response = get_response(request_url, headers=headers)
-
+    check_for_blocked_site(response, url)
    if isinstance(response, requests.Response):
        return int(response.text.strip())
    raise response


+def check_for_blocked_site(
+    response: Union[requests.Response, Exception], url: Optional[str] = None
+) -> None:
+    """
+    Checks that the URL can be archived by wayback machine or not.
+    robots.txt policy of the site may prevent the wayback machine.
+    """
+    # see https://github.com/akamhy/waybackpy/issues/157
+
+    # the following if block is to make mypy happy.
+    if isinstance(response, Exception):
+        raise response
+
+    if not url:
+        url = "The requested content"
+    if (
+        "org.archive.util.io.RuntimeIOException: "
+        + "org.archive.wayback.exception.AdministrativeAccessControlException: "
+        + "Blocked Site Error"
+        in response.text.strip()
+    ):
+        raise BlockedSiteError(
+            f"{url} is excluded from Wayback Machine by the site's robots.txt policy."
+        )
+
+
 def full_url(endpoint: str, params: Dict[str, Any]) -> str:
    """
    As the function's name already implies that it returns
@ -76,6 +102,7 @@ def get_response(
    session.mount("https://", HTTPAdapter(max_retries=retries_))
    response = session.get(url, headers=headers)
    session.close()
+    check_for_blocked_site(response)
    return response


@ -151,3 +178,24 @@ def check_match_type(match_type: Optional[str], url: str) -> bool:
        raise WaybackError(exc_message)

    return True
+
+
+def check_sort(sort: Optional[str]) -> bool:
+    """
+    Check that the sort argument passed by the end-user is valid.
+    If not valid then raise WaybackError.
+    """
+
+    legal_sort = ["default", "closest", "reverse"]
+
+    if not sort:
+        return True
+
+    if sort not in legal_sort:
+        exc_message = (
+            f"{sort} is not an allowed argument for sort.\n"
+            "Use one from 'default', 'closest' or 'reverse'"
+        )
+        raise WaybackError(exc_message)
+
+    return True
--- a/waybackpy/cli.py
+++ b/waybackpy/cli.py
@ -6,47 +6,48 @@ import os
 import random
 import re
 import string
-from json import dumps
-from typing import Any, Generator, List, Optional
+from typing import Any, Dict, Generator, List, Optional

 import click
 import requests

 from . import __version__
-from .availability_api import WaybackMachineAvailabilityAPI
 from .cdx_api import WaybackMachineCDXServerAPI
-from .exceptions import ArchiveNotInAvailabilityAPIResponse
+from .exceptions import BlockedSiteError, NoCDXRecordFound
 from .save_api import WaybackMachineSaveAPI
 from .utils import DEFAULT_USER_AGENT
 from .wrapper import Url


-def echo_availability_api(
-    availability_api_instance: WaybackMachineAvailabilityAPI, json: bool
+def handle_cdx_closest_derivative_methods(
+    cdx_api: "WaybackMachineCDXServerAPI",
+    oldest: bool,
+    near: bool,
+    newest: bool,
+    near_args: Optional[Dict[str, int]] = None,
 ) -> None:
    """
-    Output for method that use the availability API.
-    Near, oldest and newest output via this function.
+    Handles the closest parameter derivative methods.
+
+    near, newest and oldest use the closest parameter with active
+    closest based sorting.
    """
    try:
-        if availability_api_instance.archive_url:
-            archive_url = availability_api_instance.archive_url
-    except ArchiveNotInAvailabilityAPIResponse as error:
-        message = (
-            "NO ARCHIVE FOUND - The requested URL is probably "
-            + "not yet archived or if the URL was recently archived then it is "
-            + "not yet available via the Wayback Machine's availability API "
-            + "because of database lag and should be available after some time."
-        )
-
-        click.echo(message + "\nJSON response:\n" + str(error), err=True)
-        return
-
-    click.echo("Archive URL:")
-    click.echo(archive_url)
-    if json:
-        click.echo("JSON response:")
-        click.echo(dumps(availability_api_instance.json))
+        if near:
+            if near_args:
+                archive_url = cdx_api.near(**near_args).archive_url
+            else:
+                archive_url = cdx_api.near().archive_url
+        elif newest:
+            archive_url = cdx_api.newest().archive_url
+        elif oldest:
+            archive_url = cdx_api.oldest().archive_url
+        click.echo("Archive URL:")
+        click.echo(archive_url)
+    except NoCDXRecordFound as exc:
+        click.echo(click.style("NoCDXRecordFound: ", fg="red") + str(exc), err=True)
+    except BlockedSiteError as exc:
+        click.echo(click.style("BlockedSiteError: ", fg="red") + str(exc), err=True)


 def handle_cdx(data: List[Any]) -> None:
@ -63,6 +64,9 @@ def handle_cdx(data: List[Any]) -> None:
    limit = data[7]
    gzip = data[8]
    match_type = data[9]
+    sort = data[10]
+    use_pagination = data[11]
+    closest = data[12]

    filters = list(cdx_filter)
    collapses = list(collapse)
@ -73,8 +77,11 @@ def handle_cdx(data: List[Any]) -> None:
        user_agent=user_agent,
        start_timestamp=start_timestamp,
        end_timestamp=end_timestamp,
+        closest=closest,
        filters=filters,
        match_type=match_type,
+        sort=sort,
+        use_pagination=use_pagination,
        gzip=gzip,
        collapses=collapses,
        limit=limit,
@ -139,7 +146,8 @@ def save_urls_on_file(url_gen: Generator[str, None, None]) -> None:
            file_name = f"{domain}-urls-{uid}.txt"
            file_path = os.path.join(os.getcwd(), file_name)
            if not os.path.isfile(file_path):
-                open(file_path, "w+", encoding="utf-8").close()
+                with open(file_path, "w+", encoding="utf-8") as file:
+                    file.close()

        with open(file_path, "a", encoding="utf-8") as file:
            file.write(f"{url}\n")
@ -193,13 +201,6 @@ def save_urls_on_file(url_gen: Generator[str, None, None]) -> None:
    is_flag=True,
    help="Retrieve the oldest archive of URL.",
 )
-@click.option(
-    "-j",
-    "--json",
-    default=False,
-    is_flag=True,
-    help="JSON data returned by the availability API.",
-)
@click.option(
    "-N",
    "--near",
@ -249,7 +250,6 @@ def save_urls_on_file(url_gen: Generator[str, None, None]) -> None:
    help="Use with '--known_urls' to save the URLs in file at current directory.",
 )
@click.option(
-    "-c",
    "--cdx",
    default=False,
    is_flag=True,
@ -269,6 +269,12 @@ def save_urls_on_file(url_gen: Generator[str, None, None]) -> None:
    "--to",
    help="End timestamp for CDX API in yyyyMMddhhmmss format.",
 )
+@click.option(
+    "-C",
+    "--closest",
+    help="Archive that are closest the timestamp passed as arguments to this "
+    + "parameter.",
+)
@click.option(
    "-f",
    "--cdx-filter",
@ -285,6 +291,20 @@ def save_urls_on_file(url_gen: Generator[str, None, None]) -> None:
    + "However, the CDX server can also return results matching a certain prefix, "
    + "a certain host, or all sub-hosts by using the match_type",
 )
+@click.option(
+    "-st",
+    "--sort",
+    help="Choose one from default, closest or reverse. It returns sorted CDX entries "
+    + "in the response.",
+)
+@click.option(
+    "-up",
+    "--use-pagination",
+    "--use_pagination",
+    default=False,
+    is_flag=True,
+    help="Use the pagination API of the CDX server instead of the default one.",
+)
@click.option(
    "-gz",
    "--gzip",
@ -318,7 +338,6 @@ def main(  # pylint: disable=no-value-for-parameter
    show_license: bool,
    newest: bool,
    oldest: bool,
-    json: bool,
    near: bool,
    save: bool,
    headers: bool,
@ -326,6 +345,7 @@ def main(  # pylint: disable=no-value-for-parameter
    subdomain: bool,
    file: bool,
    cdx: bool,
+    use_pagination: bool,
    cdx_filter: List[str],
    collapse: List[str],
    cdx_print: List[str],
@ -337,7 +357,9 @@ def main(  # pylint: disable=no-value-for-parameter
    minute: Optional[int] = None,
    start_timestamp: Optional[str] = None,
    end_timestamp: Optional[str] = None,
+    closest: Optional[str] = None,
    match_type: Optional[str] = None,
+    sort: Optional[str] = None,
    gzip: Optional[str] = None,
    limit: Optional[str] = None,
 ) -> None:
@ -357,7 +379,7 @@ def main(  # pylint: disable=no-value-for-parameter

    Documentation: https://github.com/akamhy/waybackpy/wiki/CLI-docs

-    waybackpy - CLI usage(Demo video): https://asciinema.org/a/464367
+    waybackpy - CLI usage(Demo video): https://asciinema.org/a/469890

    Released under the MIT License. Use the flag --license for license.

@ -372,28 +394,32 @@ def main(  # pylint: disable=no-value-for-parameter
            ).text
        )
    elif url is None:
-        click.echo("No URL detected. Please provide an URL.", err=True)
+        click.echo(
+            click.style("NoURLDetected: ", fg="red")
+            + "No URL detected. "
+            + "Please provide an URL.",
+            err=True,
+        )

    elif oldest:
-        availability_api = WaybackMachineAvailabilityAPI(url, user_agent=user_agent)
-        availability_api.oldest()
-        echo_availability_api(availability_api, json)
+        cdx_api = WaybackMachineCDXServerAPI(url, user_agent=user_agent)
+        handle_cdx_closest_derivative_methods(cdx_api, oldest, near, newest)

    elif newest:
-        availability_api = WaybackMachineAvailabilityAPI(url, user_agent=user_agent)
-        availability_api.newest()
-        echo_availability_api(availability_api, json)
+        cdx_api = WaybackMachineCDXServerAPI(url, user_agent=user_agent)
+        handle_cdx_closest_derivative_methods(cdx_api, oldest, near, newest)

    elif near:
-        availability_api = WaybackMachineAvailabilityAPI(url, user_agent=user_agent)
+        cdx_api = WaybackMachineCDXServerAPI(url, user_agent=user_agent)
        near_args = {}
        keys = ["year", "month", "day", "hour", "minute"]
        args_arr = [year, month, day, hour, minute]
        for key, arg in zip(keys, args_arr):
            if arg:
                near_args[key] = arg
-        availability_api.near(**near_args)
-        echo_availability_api(availability_api, json)
+        handle_cdx_closest_derivative_methods(
+            cdx_api, oldest, near, newest, near_args=near_args
+        )

    elif save:
        save_api = WaybackMachineSaveAPI(url, user_agent=user_agent)
@ -428,16 +454,21 @@ def main(  # pylint: disable=no-value-for-parameter
            limit,
            gzip,
            match_type,
+            sort,
+            use_pagination,
+            closest,
        ]
        handle_cdx(data)

    else:
+
        click.echo(
-            "Only URL passed, but did not specify what to do with the URL. "
-            "Use --help flag for help using waybackpy.",
+            click.style("NoCommandFound: ", fg="red")
+            + "Only URL passed, but did not specify what to do with the URL. "
+            + "Use --help flag for help using waybackpy.",
            err=True,
        )


 if __name__ == "__main__":
-    main()  # pylint: disable=no-value-for-parameter
+    main()  # type: ignore # pylint: disable=no-value-for-parameter
--- a/waybackpy/exceptions.py
+++ b/waybackpy/exceptions.py
@ -16,6 +16,21 @@ class WaybackError(Exception):
    """


+class NoCDXRecordFound(WaybackError):
+    """
+    No records returned by the CDX server for a query.
+    Raised when the user invokes near(), newest() or oldest() methods
+    and there are no archives.
+    """
+
+
+class BlockedSiteError(WaybackError):
+    """
+    Raised when the archives for website/URLs that was excluded from Wayback
+    Machine are requested via the CDX server API.
+    """
+
+
 class TooManyRequestsError(WaybackError):
    """
    Raised when you make more than 15 requests per
--- a/waybackpy/py.typed
+++ b/waybackpy/py.typed
--- a/waybackpy/utils.py
+++ b/waybackpy/utils.py
@ -2,8 +2,28 @@
 Utility functions and shared variables like DEFAULT_USER_AGENT are here.
 """

+from datetime import datetime
+
 from . import __version__

 DEFAULT_USER_AGENT: str = (
    f"waybackpy {__version__} - https://github.com/akamhy/waybackpy"
 )
+
+
+def unix_timestamp_to_wayback_timestamp(unix_timestamp: int) -> str:
+    """
+    Converts Unix time to Wayback Machine timestamp, Wayback Machine
+    timestamp format is yyyyMMddhhmmss.
+    """
+    return datetime.utcfromtimestamp(int(unix_timestamp)).strftime("%Y%m%d%H%M%S")
+
+
+def wayback_timestamp(**kwargs: int) -> str:
+    """
+    Prepends zero before the year, month, day, hour and minute so that they
+    are conformable with the YYYYMMDDhhmmss Wayback Machine timestamp format.
+    """
+    return "".join(
+        str(kwargs[key]).zfill(2) for key in ["year", "month", "day", "hour", "minute"]
+    )
Author	SHA1	Message	Date
ArztKlein	3b3e78d901	Before and After methods (#175 ) * Added before and after functions * add tests * formatting	2022-11-17 07:58:46 +05:30
Rishav Kundu	0202efd39d	Add Python 3.11 to setup.cfg classifiers list (#179 )	2022-11-17 07:56:19 +05:30
Akash Mahanty	25c0adacb0	create CONTRIBUTING.md	2022-03-29 11:30:43 +05:30
Akash Mahanty	5bd16a42e7	lint	2022-03-29 10:42:57 +05:30
Akash Mahanty	57f4be53d5	ignore line 474 beacause 'error: <nothing> not callable'.	2022-03-29 10:24:55 +05:30
Akash Mahanty	64a4ce88af	Minor copyediting and also deleted CONTRIBUTORS.md moved content to README.md	2022-03-29 03:39:50 +05:30
Akash Mahanty	5407681c34	v3.0.6 (#170 ) * remove the license section from readme This does not mean that I'm waving the copyrights rather just formatting the README * remove useless external links form the README lead and also added a line about the recentness of the newest method between the availability and CDX server API. * incr version to 3.0.6 and change date to todays da -te that is 15th of March, 2022. * update secsi and DI section * v3.0.5 --> v3.0.6	2022-03-15 20:33:51 +05:30
Akash Mahanty	cfd977135d	Update CITATION.cff (#169 )	2022-03-04 11:48:49 +05:30
eggplants	7a5e0bfdaf	fix: cff format (#168 )	2022-03-04 03:10:27 +05:30
eggplants	48dcda8020	add: typed marker (PEP561) (#167 )	2022-03-03 19:05:43 +05:30
eggplants	3ed2170a32	add: CITATION.cff (#166 )	2022-03-03 19:05:23 +05:30
Akash Mahanty	d6ef55020c	undo drop python3.6, see #162 (#163 )	2022-02-18 21:54:33 +05:30
Akash Mahanty	2650943f9d	v3.0.4 (#160 ) * Update README.md * Update README.md * update asciinema link * v3.0.4 * update video link	2022-02-18 16:05:58 +05:30
Akash Mahanty	4b218d35cb	Cdx based oldest newest and near (#159 ) * implement oldest newest and near methods in the cdx interface class, now cli uses the cdx methods instead of availablity api methods. * handle the closest parameter derivative methods more efficiently and also handle exceptions gracefully. * update test code	2022-02-18 13:17:40 +05:30
Akash Mahanty	f990b93f8a	Add sort, use_pagination and closest (#158 ) * add sort param support in CDX API class see https://nla.github.io/outbackcdx/api.html#operation/query sort takes string input which must be one of the follwoing: - default - closest - reverse This commit shall help in closing issue at https://github.com/akamhy/waybackpy/issues/155 * add BlockedSiteError for cases when archiving is blocked by site's robots.txt * create check_for_blocked_site for handling the BlockedSiteError for sites that are blocking wayback machine by their robots.txt policy * add attrs use_pagination and closest, which are can be used to use the pagination API and lookup archive close to a timestamp respectively. And now to get out of infinte blank pages loop just check for two succesive black and not total two blank pages while using the CDX server API. * added cli support for sort, use-pagination and closest * added tests * fix codeql warnings, nothing to worry about here. * fix save test for archive_url	2022-02-18 00:24:14 +05:30
Akash Mahanty	3a44a710d3	add sort param support in CDX API class (#156 ) see https://nla.github.io/outbackcdx/api.html#operation/query sort takes string input which must be one of the follwoing: - default - closest - reverse This commit shall help in closing issue at https://github.com/akamhy/waybackpy/issues/155	2022-02-17 12:17:23 +05:30
Akash Mahanty	f63c6adf79	Trigger Build	2022-02-09 17:29:19 +05:30
eggplants	b4d3393ef1	fix: move metadata from __init__.py into setup.cfg (#153 )	2022-02-09 17:20:23 +05:30