changes made for v2.4.4 (update download_url) (#100 )

* v2.4.4 (update download_url) * v2.4.4 (update __version__) * +1 add jonasjancarik
Raise error on a 509 response (too many sessions) (#99 )
2021-09-03 11:28:26 +05:30 · 2021-09-03 08:04:36 +05:30 · 2021-04-13 16:58:34 +05:30 · 2021-04-02 10:41:59 +05:30 · 2021-04-02 10:38:17 +05:30 · 2021-01-26 11:56:03 +05:30
28 changed files with 2553 additions and 1145 deletions
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@ -0,0 +1,42 @@
+# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
+# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
+
+name: CI
+
+on:
+  push:
+    branches: [ master ]
+  pull_request:
+    branches: [ master ]
+
+jobs:
+  build:
+
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ['3.8']
+
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v2
+      with:
+        python-version: ${{ matrix.python-version }}
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        python -m pip install flake8 pytest codecov pytest-cov
+        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
+    - name: Lint with flake8
+      run: |
+        # stop the build if there are Python syntax errors or undefined names
+        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
+        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
+        flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
+    - name: Test with pytest
+      run: |
+        pytest --cov=waybackpy tests/
+    - name: Upload coverage to Codecov
+      run: |
+        bash <(curl -s https://codecov.io/bash) -t ${{ secrets.CODECOV_TOKEN }}
--- a/.gitignore
+++ b/.gitignore
@ -1,3 +1,6 @@
+# Files generated while testing
+*-urls-*.txt
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
--- a/.pep8speaks.yml
+++ b/.pep8speaks.yml
@ -0,0 +1,4 @@
+# File : .pep8speaks.yml
+
+scanner:
+    diff_only: True  # If True, errors caused by only the patch are shown
--- a/.pyup.yml
+++ b/.pyup.yml
@ -0,0 +1,5 @@
+# autogenerated pyup.io config file 
+# see https://pyup.io/docs/configuration/ for all available options
+
+schedule: ''
+update: false
--- a/.travis.yml
+++ b/.travis.yml
@ -1,19 +0,0 @@
-language: python
-os: linux
-dist: xenial
-cache: pip
-python:
-  - 2.7
-  - 3.6
-  - 3.8
-before_install:
-  - python --version
-  - pip install -U pip
-  - pip install -U pytest
-  - pip install codecov
-  - pip install pytest pytest-cov
-script:
-  - cd tests
-  - pytest --cov=../waybackpy
-after_success:
-  - if [[ $TRAVIS_PYTHON_VERSION == 3.8 ]]; then python -m codecov; fi
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -0,0 +1,58 @@
+# Contributing to waybackpy
+
+We love your input! We want to make contributing to this project as easy and transparent as possible, whether it's:
+
+- Reporting a bug
+- Discussing the current state of the code
+- Submitting a fix
+- Proposing new features
+- Becoming a maintainer
+
+## We Develop with Github
+
+We use github to host code, to track issues and feature requests, as well as accept pull requests.
+
+## We Use [Github Flow](https://guides.github.com/introduction/flow/index.html), So All Code Changes Happen Through Pull Requests
+
+Pull requests are the best way to propose changes to the codebase (we use [Github Flow](https://guides.github.com/introduction/flow/index.html)). We actively welcome your pull requests:
+
+1. Fork the repo and create your branch from `master`.
+2. If you've added code that should be tested, add tests.
+3. If you've changed APIs, update the documentation.
+4. Ensure the test suite passes.
+5. Make sure your code lints.
+6. Issue that pull request!
+
+## Any contributions you make will be under the MIT Software License
+
+In short, when you submit code changes, your submissions are understood to be under the same [MIT License](https://github.com/akamhy/waybackpy/blob/master/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern.
+
+## Report bugs using Github's [issues](https://github.com/akamhy/waybackpy/issues)
+
+We use GitHub issues to track public bugs. Report a bug by [opening a new issue](https://github.com/akamhy/waybackpy/issues/new); it's that easy!
+
+## Write bug reports with detail, background, and sample code
+
+**Great Bug Reports** tend to have:
+
+- A quick summary and/or background
+- Steps to reproduce
+  - Be specific!
+  - Give sample code if you can.
+- What you expected would happen
+- What actually happens
+- Notes (possibly including why you think this might be happening, or stuff you tried that didn't work)
+
+People *love* thorough bug reports. I'm not even kidding.
+
+## Use a Consistent Coding Style
+
+* You can try running `flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics` for style unification.
+
+## License
+
+By contributing, you agree that your contributions will be licensed under its [MIT License](https://github.com/akamhy/waybackpy/blob/master/LICENSE).
+
+## References
+
+This document is forked from [this gist](https://gist.github.com/briandk/3d2e8b3ec8daf5a27a62) by [briandk](https://github.com/briandk) which was itself adapted from the open-source contribution guidelines for [Facebook's Draft](https://github.com/facebook/draft-js/blob/a9316a723f9e918afde44dea68b5f9f39b7d9b00/CONTRIBUTING.md)
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@ -0,0 +1,10 @@
+## AUTHORS
+  - akamhy (<https://github.com/akamhy>)
+  - danvalen1 (<https://github.com/danvalen1>)
+  - AntiCompositeNumber (<https://github.com/AntiCompositeNumber>)
+  - jonasjancarik (<https://github.com/jonasjancarik>)
+
+## ACKNOWLEDGEMENTS
+  - mhmdiaa (<https://github.com/mhmdiaa>) for <https://gist.github.com/mhmdiaa/adf6bff70142e5091792841d4b372050>. known_urls is based on this gist.
+  - datashaman (<https://stackoverflow.com/users/401467/datashaman>) for <https://stackoverflow.com/a/35504626>. _get_response is based on this amazing answer.
+  - dequeued0 (<https://github.com/dequeued0>) for reporting bugs and useful feature requests.
--- a/2
+++ b/2
@ -1,6 +1,6 @@
 MIT License

-Copyright (c) 2020 akamhy
+Copyright (c) 2020 waybackpy contributors ( https://github.com/akamhy/waybackpy/graphs/contributors )

 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
--- a/README.md
+++ b/README.md
@ -1,288 +1,111 @@
-# waybackpy
+<div align="center">

-[![Build Status](https://img.shields.io/travis/akamhy/waybackpy.svg?label=Travis%20CI&logo=travis&style=flat-square)](https://travis-ci.org/akamhy/waybackpy)
-[![Downloads](https://img.shields.io/pypi/dm/waybackpy.svg)](https://pypistats.org/packages/waybackpy)
-[![Release](https://img.shields.io/github/v/release/akamhy/waybackpy.svg)](https://github.com/akamhy/waybackpy/releases)
-[![Codacy Badge](https://api.codacy.com/project/badge/Grade/255459cede9341e39436ec8866d3fb65)](https://www.codacy.com/manual/akamhy/waybackpy?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=akamhy/waybackpy&amp;utm_campaign=Badge_Grade)
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/akamhy/waybackpy/blob/master/LICENSE)
-[![Maintainability](https://api.codeclimate.com/v1/badges/942f13d8177a56c1c906/maintainability)](https://codeclimate.com/github/akamhy/waybackpy/maintainability)
-[![CodeFactor](https://www.codefactor.io/repository/github/akamhy/waybackpy/badge)](https://www.codefactor.io/repository/github/akamhy/waybackpy)
-[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/)
-![pypi](https://img.shields.io/pypi/v/waybackpy.svg)
-![PyPI - Python Version](https://img.shields.io/pypi/pyversions/waybackpy?style=flat-square)
-[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/akamhy/waybackpy/graphs/commit-activity)
-[![codecov](https://codecov.io/gh/akamhy/waybackpy/branch/master/graph/badge.svg)](https://codecov.io/gh/akamhy/waybackpy)
-![](https://img.shields.io/github/repo-size/akamhy/waybackpy.svg?label=Repo%20size&style=flat-square)
-![contributions welcome](https://img.shields.io/static/v1.svg?label=Contributions&message=Welcome&color=0059b3&style=flat-square)
+<img src="https://raw.githubusercontent.com/akamhy/waybackpy/master/assets/waybackpy_logo.svg"><br>

+<h2>Python package & CLI tool that interfaces with the Wayback Machine API</h2>

-![Internet Archive](https://upload.wikimedia.org/wikipedia/commons/thumb/8/84/Internet_Archive_logo_and_wordmark.svg/84px-Internet_Archive_logo_and_wordmark.svg.png)
-![Wayback Machine](https://upload.wikimedia.org/wikipedia/commons/thumb/0/01/Wayback_Machine_logo_2010.svg/284px-Wayback_Machine_logo_2010.svg.png)
+</div>

-Waybackpy is a Python library that interfaces with the [Internet Archive](https://en.wikipedia.org/wiki/Internet_Archive)'s [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine) API. Archive pages and retrieve archived pages easily.
+<p align="center">
+<a href="https://pypi.org/project/waybackpy/"><img alt="pypi" src="https://img.shields.io/pypi/v/waybackpy.svg"></a>
+<a href="https://github.com/akamhy/waybackpy/actions?query=workflow%3ACI"><img alt="Build Status" src="https://github.com/akamhy/waybackpy/workflows/CI/badge.svg"></a>
+<a href="https://www.codacy.com/manual/akamhy/waybackpy?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=akamhy/waybackpy&amp;utm_campaign=Badge_Grade"><img alt="Codacy Badge" src="https://api.codacy.com/project/badge/Grade/255459cede9341e39436ec8866d3fb65"></a>
+<a href="https://codecov.io/gh/akamhy/waybackpy"><img alt="codecov" src="https://codecov.io/gh/akamhy/waybackpy/branch/master/graph/badge.svg"></a>
+<a href="https://github.com/akamhy/waybackpy/blob/master/CONTRIBUTING.md"><img alt="Contributions Welcome" src="https://img.shields.io/static/v1.svg?label=Contributions&message=Welcome&color=0059b3&style=flat-square"></a>
+<a href="https://pepy.tech/project/waybackpy?versions=2*&versions=1*&versions=3*"><img alt="Downloads" src="https://pepy.tech/badge/waybackpy/month"></a>
+<a href="https://github.com/akamhy/waybackpy/commits/master"><img alt="GitHub lastest commit" src="https://img.shields.io/github/last-commit/akamhy/waybackpy?color=blue&style=flat-square"></a>
+<a href="#"><img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/waybackpy?style=flat-square"></a>
+</p>

-Table of contents
-=================
-<!--ts-->
+-----------------------------------------------------------------------------------------------------------------------------------------------

-* [Installation](#installation)
+### Installation

-* [Usage](#usage)
-  * [As a python package](#as-a-python-package)
-    * [Saving an url using save()](#capturing-aka-saving-an-url-using-save)
-    * [Receiving the oldest archive for an URL Using oldest()](#receiving-the-oldest-archive-for-an-url-using-oldest)
-    * [Receiving the recent most/newest archive for an URL using newest()](#receiving-the-newest-archive-for-an-url-using-newest)
-    * [Receiving archive close to a specified year, month, day, hour, and minute using near()](#receiving-archive-close-to-a-specified-year-month-day-hour-and-minute-using-near)
-    * [Get the content of webpage using get()](#get-the-content-of-webpage-using-get)
-    * [Count total archives for an URL using total_archives()](#count-total-archives-for-an-url-using-total_archives)
-  * [With CLI](#with-the-cli)
-    * [Save](#save)
-    * [Oldest archive](#oldest-archive)
-    * [Newest archive](#newest-archive)
-    * [Total archives](#total-number-of-archives)
-    * [Archive near a time](#archive-near-time)
-    * [Get the source code](#get-the-source-code)
-
-* [Tests](#tests)
-
-* [Dependency](#dependency)
-
-* [License](#license)
-
-<!--te-->
-
-## Installation
 Using [pip](https://en.wikipedia.org/wiki/Pip_(package_manager)):
+
 ```bash
 pip install waybackpy
 ```
-or direct from this repository using git.
+
+Install directly from GitHub:
+
 ```bash
 pip install git+https://github.com/akamhy/waybackpy.git
 ```

-## Usage
+### Supported Features

-### As a python package
+  - Archive webpage
+  - Retrieve all archives of a webpage/domain
+  - Retrieve archive close to a date or timestamp
+  - Retrieve all archives which have a particular prefix
+  - Get source code of the archive easily
+  - CDX API support

-#### Capturing aka Saving an url using save()
+
+### Usage
+
+#### As a Python package
 ```python
-import waybackpy
+>>> import waybackpy

-new_archive_url = waybackpy.Url(
+>>> url = "https://en.wikipedia.org/wiki/Multivariable_calculus"
+>>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"

-    url = "https://en.wikipedia.org/wiki/Multivariable_calculus",
-    user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
-    
-).save()
+>>> wayback = waybackpy.Url(url, user_agent)

-print(new_archive_url)
+>>> archive = wayback.save()
+>>> archive.archive_url
+'https://web.archive.org/web/20210104173410/https://en.wikipedia.org/wiki/Multivariable_calculus'
+
+>>> archive.timestamp
+datetime.datetime(2021, 1, 4, 17, 35, 12, 691741)
+
+>>> oldest_archive = wayback.oldest()
+>>> oldest_archive.archive_url
+'https://web.archive.org/web/20050422130129/http://en.wikipedia.org:80/wiki/Multivariable_calculus'
+
+>>> archive_close_to_2010_feb = wayback.near(year=2010, month=2)
+>>> archive_close_to_2010_feb.archive_url
+'https://web.archive.org/web/20100215001541/http://en.wikipedia.org:80/wiki/Multivariable_calculus'
+
+>>> wayback.newest().archive_url
+'https://web.archive.org/web/20210104173410/https://en.wikipedia.org/wiki/Multivariable_calculus'
 ```
+> Full Python package documentation can be found at <https://github.com/akamhy/waybackpy/wiki/Python-package-docs>.
+
+
+
+#### As a CLI tool
 ```bash
-https://web.archive.org/web/20200504141153/https://github.com/akamhy/waybackpy
-```
-<sub>Try this out in your browser @ <https://repl.it/@akamhy/WaybackPySaveExample></sub>
-
-
-
-#### Receiving the oldest archive for an URL using oldest()
-```python
-import waybackpy
-
-oldest_archive_url = waybackpy.Url(
-
-    "https://www.google.com/",
-    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:40.0) Gecko/20100101 Firefox/40.0"
-    
-).oldest()
-
-print(oldest_archive_url)
-```
-```bash
-http://web.archive.org/web/19981111184551/http://google.com:80/
-```
-<sub>Try this out in your browser @ <https://repl.it/@akamhy/WaybackPyOldestExample></sub>
-
-
-
-#### Receiving the newest archive for an URL using newest()
-```python
-import waybackpy
-
-newest_archive_url = waybackpy.Url(
-
-    "https://www.facebook.com/",
-    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:39.0) Gecko/20100101 Firefox/39.0"
-    
-).newest()
-
-print(newest_archive_url)
-```
-```bash
-https://web.archive.org/web/20200714013225/https://www.facebook.com/
-```
-<sub>Try this out in your browser @ <https://repl.it/@akamhy/WaybackPyNewestExample></sub>
-
-
-
-#### Receiving archive close to a specified year, month, day, hour, and minute using near()
-```python
-from waybackpy import Url
-
-user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Firefox/38.0"
-github_url = "https://github.com/"
-
-
-github_wayback_obj = Url(github_url, user_agent)
-
-# Do not pad (don't use zeros in the month, year, day, minute, and hour arguments). e.g. For January, set month = 1 and not month = 01.
-```
-```python
-github_archive_near_2010 = github_wayback_obj.near(year=2010)
-print(github_archive_near_2010)
-```
-```bash
-https://web.archive.org/web/20100719134402/http://github.com/
-```
-```python
-github_archive_near_2011_may = github_wayback_obj.near(year=2011, month=5)
-print(github_archive_near_2011_may)
-```
-```bash
-https://web.archive.org/web/20110519185447/https://github.com/
-```
-```python
-github_archive_near_2015_january_26 = github_wayback_obj.near(
-    year=2015, month=1, day=26
-)
-print(github_archive_near_2015_january_26)
-```
-```bash
-https://web.archive.org/web/20150127031159/https://github.com
-```
-```python
-github_archive_near_2018_4_july_9_2_am = github_wayback_obj.near(
-    year=2018, month=7, day=4, hour = 9, minute = 2
-)
-print(github_archive_near_2018_4_july_9_2_am)
-```
-```bash
-https://web.archive.org/web/20180704090245/https://github.com/
-
-```
-
-<sub>The library doesn't supports seconds yet. You are encourged to create a PR ;)</sub>
-
-<sub>Try this out in your browser @ <https://repl.it/@akamhy/WaybackPyNearExample></sub>
-
-
-
-#### Get the content of webpage using get()
-```python
-import waybackpy
-
-google_url = "https://www.google.com/"
-
-User_Agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36"
-
-waybackpy_url_object = waybackpy.Url(google_url, User_Agent)
-
-
-# If no argument is passed in get(), it gets the source of the Url used to create the object.
-current_google_url_source = waybackpy_url_object.get()
-print(current_google_url_source)
-
-
-# The following chunk of code will force a new archive of google.com and get the source of the archived page.
-# waybackpy_url_object.save() type is string.
-google_newest_archive_source = waybackpy_url_object.get(
-    waybackpy_url_object.save()
-)
-print(google_newest_archive_source)
-
-
-# waybackpy_url_object.oldest() type is str, it's oldest archive of google.com
-google_oldest_archive_source = waybackpy_url_object.get(
-    waybackpy_url_object.oldest()
-)
-print(google_oldest_archive_source)
-```
-<sub>Try this out in your browser @ <https://repl.it/@akamhy/WaybackPyGetExample#main.py></sub>
-
-
-#### Count total archives for an URL using total_archives()
-```python
-import waybackpy
-
-URL = "https://en.wikipedia.org/wiki/Python (programming language)"
-
-UA = "Mozilla/5.0 (iPad; CPU OS 8_1_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B435 Safari/600.1.4"
-
-archive_count = waybackpy.Url(
-    url=URL,
-    user_agent=UA
-).total_archives()
-
-print(archive_count) # total_archives() returns an int
-```
-```bash
-2440
-```
-<sub>Try this out in your browser @ <https://repl.it/@akamhy/WaybackPyTotalArchivesExample></sub>
-
-### With the CLI
-
-#### Save
-```bash
-$ waybackpy --url "https://en.wikipedia.org/wiki/Social_media" --user_agent "my-unique-user-agent" --save
+$ waybackpy --save --url "https://en.wikipedia.org/wiki/Social_media" --user_agent "my-unique-user-agent"
 https://web.archive.org/web/20200719062108/https://en.wikipedia.org/wiki/Social_media
+
+$ waybackpy --oldest --url "https://en.wikipedia.org/wiki/Humanoid" --user_agent "my-unique-user-agent"
+https://web.archive.org/web/20040415020811/http://en.wikipedia.org:80/wiki/Humanoid
+
+$ waybackpy --newest --url "https://en.wikipedia.org/wiki/Remote_sensing" --user_agent "my-unique-user-agent"
+https://web.archive.org/web/20201221130522/https://en.wikipedia.org/wiki/Remote_sensing
+
+$ waybackpy --total --url "https://en.wikipedia.org/wiki/Linux_kernel" --user_agent "my-unique-user-agent"
+1904
+
+$ waybackpy --known_urls --url akamhy.github.io --user_agent "my-unique-user-agent" --file
+https://akamhy.github.io
+https://akamhy.github.io/assets/js/scale.fix.js
+https://akamhy.github.io/favicon.ico
+https://akamhy.github.io/robots.txt
+https://akamhy.github.io/waybackpy/
+
+'akamhy.github.io-urls-iftor2.txt' saved in current working directory
 ```
-<sub>Try this out in your browser @ <https://repl.it/@akamhy/WaybackPyBashSave></sub>
-
-#### Oldest archive
-```bash
-$ waybackpy --url "https://en.wikipedia.org/wiki/SpaceX" --user_agent "my-unique-user-agent" --oldest
-https://web.archive.org/web/20040803000845/http://en.wikipedia.org:80/wiki/SpaceX
-```
-<sub>Try this out in your browser @ <https://repl.it/@akamhy/WaybackPyBashOldest></sub>
-
-#### Newest archive
-```bash
-$ waybackpy --url "https://en.wikipedia.org/wiki/YouTube" --user_agent "my-unique-user-agent" --newest
-https://web.archive.org/web/20200606044708/https://en.wikipedia.org/wiki/YouTube
-```
-<sub>Try this out in your browser @ <https://repl.it/@akamhy/WaybackPyBashNewest></sub>
-
-#### Total number of archives
-```bash
-$ waybackpy --url "https://en.wikipedia.org/wiki/Linux_kernel" --user_agent "my-unique-user-agent" --total
-853
-```
-<sub>Try this out in your browser @ <https://repl.it/@akamhy/WaybackPyBashTotal></sub>
-
-#### Archive near time
-```bash
-$ waybackpy --url facebook.com --user_agent "my-unique-user-agent" --near --year 2012 --month 5 --day 12
-https://web.archive.org/web/20120512142515/https://www.facebook.com/
-```
-<sub>Try this out in your browser @ <https://repl.it/@akamhy/WaybackPyBashNear></sub>
-
-#### Get the source code
-```bash
-$ waybackpy --url google.com --user_agent "my-unique-user-agent" --get url # Prints the source code of the url
-$ waybackpy --url google.com --user_agent "my-unique-user-agent" --get oldest # Prints the source code of the oldest archive
-$ waybackpy --url google.com --user_agent "my-unique-user-agent" --get newest # Prints the source code of the newest archive
-$ waybackpy --url google.com --user_agent "my-unique-user-agent" --get save # Save a new archive on wayback machine then print the source code of this archive.
-```
-<sub>Try this out in your browser @ <https://repl.it/@akamhy/WaybackPyBashGet></sub>
-
-## Tests
-* [Here](https://github.com/akamhy/waybackpy/tree/master/tests)
-
-
-## Dependency
-* None, just python standard libraries (re, json, urllib, argparse and datetime). Both python 2 and 3 are supported :)
-
+> Full CLI documentation can be found at <https://github.com/akamhy/waybackpy/wiki/CLI-docs>.

 ## License
-[MIT License](https://github.com/akamhy/waybackpy/blob/master/LICENSE)
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/akamhy/waybackpy/blob/master/LICENSE)
+
+Released under the MIT License. See
+[license](https://github.com/akamhy/waybackpy/blob/master/LICENSE) for details.
+
+
+-----------------------------------------------------------------------------------------------------------------------------------------------
--- a/_config.yml
+++ b/_config.yml
@ -1 +1 @@
-theme: jekyll-theme-cayman
+theme: jekyll-theme-cayman
--- a/assets/waybackpy_logo.svg
+++ b/assets/waybackpy_logo.svg
@ -0,0 +1 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 176.612 41.908" height="158.392" width="667.51"  xmlns:v="https://github.com/akamhy/waybackpy"><text transform="matrix(.862888 0 0 1.158899 -.748 -98.312)" y="110.937" x="0.931" xml:space="preserve" font-weight="bold" font-size="28.149" font-family="sans-serif" letter-spacing="0" word-spacing="0" writing-mode="lr-tb" fill="#003dff"><tspan y="110.937" x="0.931"><tspan y="110.937" x="0.931" letter-spacing="3.568" writing-mode="lr-tb">waybackpy</tspan></tspan></text><path d="M.749 0h153.787v4.864H.749zm22.076 37.418h153.787v4.49H22.825z" fill="navy"/><path d="M0 37.418h22.825v4.49H0zM154.536 0h21.702v4.864h-21.702z" fill="#f0f"/></svg>
--- a/index.rst
+++ b/index.rst
@ -1,385 +0,0 @@
-waybackpy
-=========
-
-|Build Status| |Downloads| |Release| |Codacy Badge| |License: MIT|
-|Maintainability| |CodeFactor| |made-with-python| |pypi| |PyPI - Python
-Version| |Maintenance| |codecov| |image12| |contributions welcome|
-
-|Internet Archive| |Wayback Machine|
-
-Waybackpy is a Python library that interfaces with the `Internet
-Archive <https://en.wikipedia.org/wiki/Internet_Archive>`__'s `Wayback
-Machine <https://en.wikipedia.org/wiki/Wayback_Machine>`__ API. Archive
-pages and retrieve archived pages easily.
-
-Table of contents
-=================
-
-.. raw:: html
-
-   <!--ts-->
-
-  `Installation <#installation>`__
-
-  `Usage <#usage>`__
-  `As a python package <#as-a-python-package>`__
-
-   -  `Saving an url using
-      save() <#capturing-aka-saving-an-url-using-save>`__
-   -  `Receiving the oldest archive for an URL Using
-      oldest() <#receiving-the-oldest-archive-for-an-url-using-oldest>`__
-   -  `Receiving the recent most/newest archive for an URL using
-      newest() <#receiving-the-newest-archive-for-an-url-using-newest>`__
-   -  `Receiving archive close to a specified year, month, day, hour,
-      and minute using
-      near() <#receiving-archive-close-to-a-specified-year-month-day-hour-and-minute-using-near>`__
-   -  `Get the content of webpage using
-      get() <#get-the-content-of-webpage-using-get>`__
-   -  `Count total archives for an URL using
-      total\_archives() <#count-total-archives-for-an-url-using-total_archives>`__
-
-  `With CLI <#with-the-cli>`__
-
-   -  `Save <#save>`__
-   -  `Oldest archive <#oldest-archive>`__
-   -  `Newest archive <#newest-archive>`__
-   -  `Total archives <#total-number-of-archives>`__
-   -  `Archive near a time <#archive-near-time>`__
-   -  `Get the source code <#get-the-source-code>`__
-
-  `Tests <#tests>`__
-
-  `Dependency <#dependency>`__
-
-  `License <#license>`__
-
-.. raw:: html
-
-   <!--te-->
-
-Installation
------------
-
-Using `pip <https://en.wikipedia.org/wiki/Pip_(package_manager)>`__:
-
-.. code:: bash
-
-    pip install waybackpy
-
-or direct from this repository using git.
-
-.. code:: bash
-
-    pip install git+https://github.com/akamhy/waybackpy.git
-
-Usage
-----
-
-As a python package
-~~~~~~~~~~~~~~~~~~~
-
-Capturing aka Saving an url using save()
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. code:: python
-
-    import waybackpy
-
-    new_archive_url = waybackpy.Url(
-
-        url = "https://en.wikipedia.org/wiki/Multivariable_calculus",
-        user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
-        
-    ).save()
-
-    print(new_archive_url)
-
-.. code:: bash
-
-    https://web.archive.org/web/20200504141153/https://github.com/akamhy/waybackpy
-
-Try this out in your browser @
-https://repl.it/@akamhy/WaybackPySaveExample\ 
-
-Receiving the oldest archive for an URL using oldest()
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. code:: python
-
-    import waybackpy
-
-    oldest_archive_url = waybackpy.Url(
-
-        "https://www.google.com/",
-        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:40.0) Gecko/20100101 Firefox/40.0"
-        
-    ).oldest()
-
-    print(oldest_archive_url)
-
-.. code:: bash
-
-    http://web.archive.org/web/19981111184551/http://google.com:80/
-
-Try this out in your browser @
-https://repl.it/@akamhy/WaybackPyOldestExample\ 
-
-Receiving the newest archive for an URL using newest()
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. code:: python
-
-    import waybackpy
-
-    newest_archive_url = waybackpy.Url(
-
-        "https://www.facebook.com/",
-        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:39.0) Gecko/20100101 Firefox/39.0"
-        
-    ).newest()
-
-    print(newest_archive_url)
-
-.. code:: bash
-
-    https://web.archive.org/web/20200714013225/https://www.facebook.com/
-
-Try this out in your browser @
-https://repl.it/@akamhy/WaybackPyNewestExample\ 
-
-Receiving archive close to a specified year, month, day, hour, and minute using near()
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. code:: python
-
-    from waybackpy import Url
-
-    user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Firefox/38.0"
-    github_url = "https://github.com/"
-
-
-    github_wayback_obj = Url(github_url, user_agent)
-
-    # Do not pad (don't use zeros in the month, year, day, minute, and hour arguments). e.g. For January, set month = 1 and not month = 01.
-
-.. code:: python
-
-    github_archive_near_2010 = github_wayback_obj.near(year=2010)
-    print(github_archive_near_2010)
-
-.. code:: bash
-
-    https://web.archive.org/web/20100719134402/http://github.com/
-
-.. code:: python
-
-    github_archive_near_2011_may = github_wayback_obj.near(year=2011, month=5)
-    print(github_archive_near_2011_may)
-
-.. code:: bash
-
-    https://web.archive.org/web/20110519185447/https://github.com/
-
-.. code:: python
-
-    github_archive_near_2015_january_26 = github_wayback_obj.near(
-        year=2015, month=1, day=26
-    )
-    print(github_archive_near_2015_january_26)
-
-.. code:: bash
-
-    https://web.archive.org/web/20150127031159/https://github.com
-
-.. code:: python
-
-    github_archive_near_2018_4_july_9_2_am = github_wayback_obj.near(
-        year=2018, month=7, day=4, hour = 9, minute = 2
-    )
-    print(github_archive_near_2018_4_july_9_2_am)
-
-.. code:: bash
-
-    https://web.archive.org/web/20180704090245/https://github.com/
-
-The library doesn't supports seconds yet. You are encourged to create a
-PR ;)
-
-Try this out in your browser @
-https://repl.it/@akamhy/WaybackPyNearExample\ 
-
-Get the content of webpage using get()
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. code:: python
-
-    import waybackpy
-
-    google_url = "https://www.google.com/"
-
-    User_Agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36"
-
-    waybackpy_url_object = waybackpy.Url(google_url, User_Agent)
-
-
-    # If no argument is passed in get(), it gets the source of the Url used to create the object.
-    current_google_url_source = waybackpy_url_object.get()
-    print(current_google_url_source)
-
-
-    # The following chunk of code will force a new archive of google.com and get the source of the archived page.
-    # waybackpy_url_object.save() type is string.
-    google_newest_archive_source = waybackpy_url_object.get(
-        waybackpy_url_object.save()
-    )
-    print(google_newest_archive_source)
-
-
-    # waybackpy_url_object.oldest() type is str, it's oldest archive of google.com
-    google_oldest_archive_source = waybackpy_url_object.get(
-        waybackpy_url_object.oldest()
-    )
-    print(google_oldest_archive_source)
-
-Try this out in your browser @
-https://repl.it/@akamhy/WaybackPyGetExample#main.py\ 
-
-Count total archives for an URL using total\_archives()
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. code:: python
-
-    import waybackpy
-
-    URL = "https://en.wikipedia.org/wiki/Python (programming language)"
-
-    UA = "Mozilla/5.0 (iPad; CPU OS 8_1_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B435 Safari/600.1.4"
-
-    archive_count = waybackpy.Url(
-        url=URL,
-        user_agent=UA
-    ).total_archives()
-
-    print(archive_count) # total_archives() returns an int
-
-.. code:: bash
-
-    2440
-
-Try this out in your browser @
-https://repl.it/@akamhy/WaybackPyTotalArchivesExample\ 
-
-With the CLI
-~~~~~~~~~~~~
-
-Save
-^^^^
-
-.. code:: bash
-
-    $ waybackpy --url "https://en.wikipedia.org/wiki/Social_media" --user_agent "my-unique-user-agent" --save
-    https://web.archive.org/web/20200719062108/https://en.wikipedia.org/wiki/Social_media
-
-Try this out in your browser @
-https://repl.it/@akamhy/WaybackPyBashSave\ 
-
-Oldest archive
-^^^^^^^^^^^^^^
-
-.. code:: bash
-
-    $ waybackpy --url "https://en.wikipedia.org/wiki/SpaceX" --user_agent "my-unique-user-agent" --oldest
-    https://web.archive.org/web/20040803000845/http://en.wikipedia.org:80/wiki/SpaceX
-
-Try this out in your browser @
-https://repl.it/@akamhy/WaybackPyBashOldest\ 
-
-Newest archive
-^^^^^^^^^^^^^^
-
-.. code:: bash
-
-    $ waybackpy --url "https://en.wikipedia.org/wiki/YouTube" --user_agent "my-unique-user-agent" --newest
-    https://web.archive.org/web/20200606044708/https://en.wikipedia.org/wiki/YouTube
-
-Try this out in your browser @
-https://repl.it/@akamhy/WaybackPyBashNewest\ 
-
-Total number of archives
-^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. code:: bash
-
-    $ waybackpy --url "https://en.wikipedia.org/wiki/Linux_kernel" --user_agent "my-unique-user-agent" --total
-    853
-
-Try this out in your browser @
-https://repl.it/@akamhy/WaybackPyBashTotal\ 
-
-Archive near time
-^^^^^^^^^^^^^^^^^
-
-.. code:: bash
-
-    $ waybackpy --url facebook.com --user_agent "my-unique-user-agent" --near --year 2012 --month 5 --day 12
-    https://web.archive.org/web/20120512142515/https://www.facebook.com/
-
-Try this out in your browser @
-https://repl.it/@akamhy/WaybackPyBashNear\ 
-
-Get the source code
-^^^^^^^^^^^^^^^^^^^
-
-.. code:: bash
-
-    $ waybackpy --url google.com --user_agent "my-unique-user-agent" --get url # Prints the source code of the url
-    $ waybackpy --url google.com --user_agent "my-unique-user-agent" --get oldest # Prints the source code of the oldest archive
-    $ waybackpy --url google.com --user_agent "my-unique-user-agent" --get newest # Prints the source code of the newest archive
-    $ waybackpy --url google.com --user_agent "my-unique-user-agent" --get save # Save a new archive on wayback machine then print the source code of this archive.
-
-Try this out in your browser @
-https://repl.it/@akamhy/WaybackPyBashGet\ 
-
-Tests
-----
-
-  `Here <https://github.com/akamhy/waybackpy/tree/master/tests>`__
-
-Dependency
----------
-
-  None, just python standard libraries (re, json, urllib, argparse and datetime).
-   Both python 2 and 3 are supported :)
-
-License
-------
-
-`MIT
-License <https://github.com/akamhy/waybackpy/blob/master/LICENSE>`__
-
-.. |Build Status| image:: https://img.shields.io/travis/akamhy/waybackpy.svg?label=Travis%20CI&logo=travis&style=flat-square
-   :target: https://travis-ci.org/akamhy/waybackpy
-.. |Downloads| image:: https://img.shields.io/pypi/dm/waybackpy.svg
-   :target: https://pypistats.org/packages/waybackpy
-.. |Release| image:: https://img.shields.io/github/v/release/akamhy/waybackpy.svg
-   :target: https://github.com/akamhy/waybackpy/releases
-.. |Codacy Badge| image:: https://api.codacy.com/project/badge/Grade/255459cede9341e39436ec8866d3fb65
-   :target: https://www.codacy.com/manual/akamhy/waybackpy?utm_source=github.com&utm_medium=referral&utm_content=akamhy/waybackpy&utm_campaign=Badge_Grade
-.. |License: MIT| image:: https://img.shields.io/badge/License-MIT-yellow.svg
-   :target: https://github.com/akamhy/waybackpy/blob/master/LICENSE
-.. |Maintainability| image:: https://api.codeclimate.com/v1/badges/942f13d8177a56c1c906/maintainability
-   :target: https://codeclimate.com/github/akamhy/waybackpy/maintainability
-.. |CodeFactor| image:: https://www.codefactor.io/repository/github/akamhy/waybackpy/badge
-   :target: https://www.codefactor.io/repository/github/akamhy/waybackpy
-.. |made-with-python| image:: https://img.shields.io/badge/Made%20with-Python-1f425f.svg
-   :target: https://www.python.org/
-.. |pypi| image:: https://img.shields.io/pypi/v/waybackpy.svg
-.. |PyPI - Python Version| image:: https://img.shields.io/pypi/pyversions/waybackpy?style=flat-square
-.. |Maintenance| image:: https://img.shields.io/badge/Maintained%3F-yes-green.svg
-   :target: https://github.com/akamhy/waybackpy/graphs/commit-activity
-.. |codecov| image:: https://codecov.io/gh/akamhy/waybackpy/branch/master/graph/badge.svg
-   :target: https://codecov.io/gh/akamhy/waybackpy
-.. |image12| image:: https://img.shields.io/github/repo-size/akamhy/waybackpy.svg?label=Repo%20size&style=flat-square
-.. |contributions welcome| image:: https://img.shields.io/static/v1.svg?label=Contributions&message=Welcome&color=0059b3&style=flat-square
-.. |Internet Archive| image:: https://upload.wikimedia.org/wikipedia/commons/thumb/8/84/Internet_Archive_logo_and_wordmark.svg/84px-Internet_Archive_logo_and_wordmark.svg.png
-.. |Wayback Machine| image:: https://upload.wikimedia.org/wikipedia/commons/thumb/0/01/Wayback_Machine_logo_2010.svg/284px-Wayback_Machine_logo_2010.svg.png
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1 @@
+requests>=2.24.0
--- a/setup.py
+++ b/setup.py
@ -1,54 +1,54 @@
 import os.path
 from setuptools import setup

-with open(os.path.join(os.path.dirname(__file__), 'README.md')) as f:
+with open(os.path.join(os.path.dirname(__file__), "README.md")) as f:
    long_description = f.read()

 about = {}
-with open(os.path.join(os.path.dirname(__file__), 'waybackpy', '__version__.py')) as f:
+with open(os.path.join(os.path.dirname(__file__), "waybackpy", "__version__.py")) as f:
    exec(f.read(), about)
-    
+
 setup(
-    name = about['__title__'],
-    packages = ['waybackpy'],
-    version = about['__version__'],
-    description = about['__description__'],
+    name=about["__title__"],
+    packages=["waybackpy"],
+    version=about["__version__"],
+    description=about["__description__"],
    long_description=long_description,
-    long_description_content_type='text/markdown',
-    license= about['__license__'],
-    author = about['__author__'],
-    author_email = about['__author_email__'],
-    url = about['__url__'],
-    download_url = 'https://github.com/akamhy/waybackpy/archive/2.1.6.tar.gz',
-    keywords = ['wayback', 'archive', 'archive website', 'wayback machine', 'Internet Archive'],
-    install_requires=[],
-    python_requires= ">=2.7",
+    long_description_content_type="text/markdown",
+    license=about["__license__"],
+    author=about["__author__"],
+    author_email=about["__author_email__"],
+    url=about["__url__"],
+    download_url="https://github.com/akamhy/waybackpy/archive/2.4.4.tar.gz",
+    keywords=[
+        "Archive It",
+        "Archive Website",
+        "Wayback Machine",
+        "waybackurls",
+        "Internet Archive",
+    ],
+    install_requires=["requests"],
+    python_requires=">=3.4",
    classifiers=[
-        'Development Status :: 5 - Production/Stable',
-        'Intended Audience :: Developers',
-        'Natural Language :: English',
-        'Topic :: Software Development :: Build Tools',
-        'License :: OSI Approved :: MIT License',
-        'Programming Language :: Python',
-        'Programming Language :: Python :: 2',
-        'Programming Language :: Python :: 2.7',
-        'Programming Language :: Python :: 3',
-        'Programming Language :: Python :: 3.2',
-        'Programming Language :: Python :: 3.3',
-        'Programming Language :: Python :: 3.4',
-        'Programming Language :: Python :: 3.5',
-        'Programming Language :: Python :: 3.6',
-        'Programming Language :: Python :: 3.7',
-        'Programming Language :: Python :: 3.8',
-        'Programming Language :: Python :: Implementation :: CPython',
-        ],
-    entry_points={
-        'console_scripts': [
-            'waybackpy = waybackpy.cli:main'
-        ]
-    },
+        "Development Status :: 5 - Production/Stable",
+        "Intended Audience :: Developers",
+        "Natural Language :: English",
+        "Topic :: Software Development :: Build Tools",
+        "License :: OSI Approved :: MIT License",
+        "Programming Language :: Python",
+        "Programming Language :: Python :: 3",
+        "Programming Language :: Python :: 3.4",
+        "Programming Language :: Python :: 3.5",
+        "Programming Language :: Python :: 3.6",
+        "Programming Language :: Python :: 3.7",
+        "Programming Language :: Python :: 3.8",
+        "Programming Language :: Python :: 3.9",
+        "Programming Language :: Python :: Implementation :: CPython",
+    ],
+    entry_points={"console_scripts": ["waybackpy = waybackpy.cli:main"]},
    project_urls={
-        'Documentation': 'https://waybackpy.readthedocs.io',
-        'Source': 'https://github.com/akamhy/waybackpy',
+        "Documentation": "https://github.com/akamhy/waybackpy/wiki",
+        "Source": "https://github.com/akamhy/waybackpy",
+        "Tracker": "https://github.com/akamhy/waybackpy/issues",
    },
 )
--- a/tests/init.py
+++ b/tests/init.py
--- a/tests/test_cdx.py
+++ b/tests/test_cdx.py
@ -0,0 +1,93 @@
+import pytest
+from waybackpy.cdx import Cdx
+from waybackpy.exceptions import WaybackError
+
+
+def test_all_cdx():
+    url = "akamhy.github.io"
+    user_agent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, \
+    like Gecko) Chrome/45.0.2454.85 Safari/537.36"
+    cdx = Cdx(
+        url=url,
+        user_agent=user_agent,
+        start_timestamp=2017,
+        end_timestamp=2020,
+        filters=[
+            "statuscode:200",
+            "mimetype:text/html",
+            "timestamp:20201002182319",
+            "original:https://akamhy.github.io/",
+        ],
+        gzip=False,
+        collapses=["timestamp:10", "digest"],
+        limit=50,
+        match_type="prefix",
+    )
+    snapshots = cdx.snapshots()
+    for snapshot in snapshots:
+        ans = snapshot.archive_url
+    assert "https://web.archive.org/web/20201002182319/https://akamhy.github.io/" == ans
+
+    url = "akahfjgjkmhy.gihthub.ip"
+    cdx = Cdx(
+        url=url,
+        user_agent=user_agent,
+        start_timestamp=None,
+        end_timestamp=None,
+        filters=[],
+        match_type=None,
+        gzip=True,
+        collapses=[],
+        limit=10,
+    )
+
+    snapshots = cdx.snapshots()
+    print(snapshots)
+    i = 0
+    for _ in snapshots:
+        i += 1
+    assert i == 0
+
+    url = "https://github.com/akamhy/waybackpy/*"
+    cdx = Cdx(url=url, user_agent=user_agent, limit=50)
+    snapshots = cdx.snapshots()
+
+    for snapshot in snapshots:
+        print(snapshot.archive_url)
+
+    url = "https://github.com/akamhy/waybackpy"
+    with pytest.raises(WaybackError):
+        cdx = Cdx(url=url, user_agent=user_agent, limit=50, filters=["ghddhfhj"])
+        snapshots = cdx.snapshots()
+
+    with pytest.raises(WaybackError):
+        cdx = Cdx(url=url, user_agent=user_agent, collapses=["timestamp", "ghdd:hfhj"])
+        snapshots = cdx.snapshots()
+
+    url = "https://github.com"
+    cdx = Cdx(url=url, user_agent=user_agent, limit=50)
+    snapshots = cdx.snapshots()
+    c = 0
+    for snapshot in snapshots:
+        c += 1
+        if c > 100:
+            break
+
+    url = "https://github.com/*"
+    cdx = Cdx(url=url, user_agent=user_agent, collapses=["timestamp"])
+    snapshots = cdx.snapshots()
+    c = 0
+    for snapshot in snapshots:
+        c += 1
+        if c > 30529:  # deafult limit is 10k
+            break
+
+    url = "https://github.com/*"
+    cdx = Cdx(url=url, user_agent=user_agent)
+    c = 0
+    snapshots = cdx.snapshots()
+
+    for snapshot in snapshots:
+        c += 1
+        if c > 100529:
+            break
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@ -1,97 +1,359 @@
-# -*- coding: utf-8 -*-
 import sys
 import os
 import pytest
+import random
+import string
 import argparse

-sys.path.append("..")
-import waybackpy.cli as cli  # noqa: E402
-from waybackpy.wrapper import  Url  # noqa: E402
+import waybackpy.cli as cli
+from waybackpy.wrapper import Url  # noqa: E402
 from waybackpy.__version__ import __version__

-codecov_python = False
-if sys.version_info > (3, 7):
-    codecov_python = True

-# Namespace(day=None, get=None, hour=None, minute=None, month=None, near=False,
-# newest=False, oldest=False, save=False, total=False, url=None, user_agent=None, version=False, year=None)
+def test_save():
+
+    args = argparse.Namespace(
+        user_agent=None,
+        url="https://hfjfjfjfyu6r6rfjvj.fjhgjhfjgvjm",
+        total=False,
+        version=False,
+        file=False,
+        oldest=False,
+        save=True,
+        json=False,
+        archive_url=False,
+        newest=False,
+        near=False,
+        subdomain=False,
+        known_urls=False,
+        get=None,
+    )
+    reply = cli.args_handler(args)
+    assert "could happen because either your waybackpy" or "cannot be archived by wayback machine as it is a redirect" in str(reply)
+
+
+def test_json():
+    args = argparse.Namespace(
+        user_agent=None,
+        url="https://pypi.org/user/akamhy/",
+        total=False,
+        version=False,
+        file=False,
+        oldest=False,
+        save=False,
+        json=True,
+        archive_url=False,
+        newest=False,
+        near=False,
+        subdomain=False,
+        known_urls=False,
+        get=None,
+    )
+    reply = cli.args_handler(args)
+    assert "archived_snapshots" in str(reply)
+
+
+def test_archive_url():
+    args = argparse.Namespace(
+        user_agent=None,
+        url="https://pypi.org/user/akamhy/",
+        total=False,
+        version=False,
+        file=False,
+        oldest=False,
+        save=False,
+        json=False,
+        archive_url=True,
+        newest=False,
+        near=False,
+        subdomain=False,
+        known_urls=False,
+        get=None,
+    )
+    reply = cli.args_handler(args)
+    assert "https://web.archive.org/web/" in str(reply)

-if codecov_python:
-    def test_save():
-        args = argparse.Namespace(user_agent=None, url="https://pypi.org/user/akamhy/", total=False, version=False,
-        oldest=False, save=True, newest=False, near=False, get=None)
-        reply = cli.args_handler(args)
-        assert "pypi.org/user/akamhy" in reply

 def test_oldest():
-    args = argparse.Namespace(user_agent=None, url="https://pypi.org/user/akamhy/", total=False, version=False,
-    oldest=True, save=False, newest=False, near=False, get=None)
+    args = argparse.Namespace(
+        user_agent=None,
+        url="https://pypi.org/user/akamhy/",
+        total=False,
+        version=False,
+        file=False,
+        oldest=True,
+        save=False,
+        json=False,
+        archive_url=False,
+        newest=False,
+        near=False,
+        subdomain=False,
+        known_urls=False,
+        get=None,
+    )
    reply = cli.args_handler(args)
-    assert "pypi.org/user/akamhy" in reply
+    assert "pypi.org/user/akamhy" in str(reply)
+
+    uid = "".join(
+        random.choice(string.ascii_lowercase + string.digits) for _ in range(6)
+    )
+    url = "https://pypi.org/yfvjvycyc667r67ed67r" + uid
+    args = argparse.Namespace(
+        user_agent=None,
+        url=url,
+        total=False,
+        version=False,
+        file=False,
+        oldest=True,
+        save=False,
+        json=False,
+        archive_url=False,
+        newest=False,
+        near=False,
+        subdomain=False,
+        known_urls=False,
+        get=None,
+    )
+    reply = cli.args_handler(args)
+    assert "Can not find archive for" in str(reply)
+

 def test_newest():
-    args = argparse.Namespace(user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
-    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9", url="https://pypi.org/user/akamhy/", total=False, version=False,
-    oldest=False, save=False, newest=True, near=False, get=None)
+    args = argparse.Namespace(
+        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
+    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9",
+        url="https://pypi.org/user/akamhy/",
+        total=False,
+        version=False,
+        file=False,
+        oldest=False,
+        save=False,
+        json=False,
+        archive_url=False,
+        newest=True,
+        near=False,
+        subdomain=False,
+        known_urls=False,
+        get=None,
+    )
    reply = cli.args_handler(args)
-    assert "pypi.org/user/akamhy" in reply
+    assert "pypi.org/user/akamhy" in str(reply)
+
+    uid = "".join(
+        random.choice(string.ascii_lowercase + string.digits) for _ in range(6)
+    )
+    url = "https://pypi.org/yfvjvycyc667r67ed67r" + uid
+    args = argparse.Namespace(
+        user_agent=None,
+        url=url,
+        total=False,
+        version=False,
+        file=False,
+        oldest=False,
+        save=False,
+        json=False,
+        archive_url=False,
+        newest=True,
+        near=False,
+        subdomain=False,
+        known_urls=False,
+        get=None,
+    )
+    reply = cli.args_handler(args)
+    assert "Can not find archive for" in str(reply)
+

 def test_total_archives():
-    args = argparse.Namespace(user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
-    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9", url="https://pypi.org/user/akamhy/", total=True, version=False,
-    oldest=False, save=False, newest=False, near=False, get=None)
+    args = argparse.Namespace(
+        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
+    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9",
+        url="https://pypi.org/user/akamhy/",
+        total=True,
+        version=False,
+        file=False,
+        oldest=False,
+        save=False,
+        json=False,
+        archive_url=False,
+        newest=False,
+        near=False,
+        subdomain=False,
+        known_urls=False,
+        get=None,
+    )
    reply = cli.args_handler(args)
    assert isinstance(reply, int)

-def test_near():
-    args = argparse.Namespace(user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
-    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9", url="https://pypi.org/user/akamhy/", total=False, version=False,
-    oldest=False, save=False, newest=False, near=True, get=None, year=2020, month=7, day=15, hour=1, minute=1)
+
+def test_known_urls():
+    args = argparse.Namespace(
+        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
+    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9",
+        url="https://www.keybr.com",
+        total=False,
+        version=False,
+        file=True,
+        oldest=False,
+        save=False,
+        json=False,
+        archive_url=False,
+        newest=False,
+        near=False,
+        subdomain=False,
+        known_urls=True,
+        get=None,
+    )
    reply = cli.args_handler(args)
-    assert "202007" in reply
+    assert "keybr" in str(reply)
+
+
+def test_near():
+    args = argparse.Namespace(
+        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
+    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9",
+        url="https://pypi.org/user/akamhy/",
+        total=False,
+        version=False,
+        file=False,
+        oldest=False,
+        save=False,
+        json=False,
+        archive_url=False,
+        newest=False,
+        near=True,
+        subdomain=False,
+        known_urls=False,
+        get=None,
+        year=2020,
+        month=7,
+        day=15,
+        hour=1,
+        minute=1,
+    )
+    reply = cli.args_handler(args)
+    assert "202007" in str(reply)
+
+    uid = "".join(
+        random.choice(string.ascii_lowercase + string.digits) for _ in range(6)
+    )
+    url = "https://pypi.org/yfvjvycyc667r67ed67r" + uid
+    args = argparse.Namespace(
+        user_agent=None,
+        url=url,
+        total=False,
+        version=False,
+        file=False,
+        oldest=False,
+        save=False,
+        json=False,
+        archive_url=False,
+        newest=False,
+        near=True,
+        subdomain=False,
+        known_urls=False,
+        get=None,
+        year=2020,
+        month=7,
+        day=15,
+        hour=1,
+        minute=1,
+    )
+    reply = cli.args_handler(args)
+    assert "Can not find archive for" in str(reply)


 def test_get():
-    args = argparse.Namespace(user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
-    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9", url="https://pypi.org/user/akamhy/", total=False, version=False,
-    oldest=False, save=False, newest=False, near=False, get="url")
+    args = argparse.Namespace(
+        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
+    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9",
+        url="https://github.com/akamhy",
+        total=False,
+        version=False,
+        file=False,
+        oldest=False,
+        save=False,
+        json=False,
+        archive_url=False,
+        newest=False,
+        near=False,
+        subdomain=False,
+        known_urls=False,
+        get="url",
+    )
    reply = cli.args_handler(args)
-    assert "waybackpy" in reply
+    assert "waybackpy" in str(reply)

-    args = argparse.Namespace(user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
-    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9", url="https://pypi.org/user/akamhy/", total=False, version=False,
-    oldest=False, save=False, newest=False, near=False, get="oldest")
+    args = argparse.Namespace(
+        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
+    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9",
+        url="https://github.com/akamhy/waybackpy",
+        total=False,
+        version=False,
+        file=False,
+        oldest=False,
+        save=False,
+        json=False,
+        archive_url=False,
+        newest=False,
+        near=False,
+        subdomain=False,
+        known_urls=False,
+        get="oldest",
+    )
    reply = cli.args_handler(args)
-    assert "waybackpy" in reply
+    assert "waybackpy" in str(reply)

-    args = argparse.Namespace(user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
-    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9", url="https://pypi.org/user/akamhy/", total=False, version=False,
-    oldest=False, save=False, newest=False, near=False, get="newest")
+    args = argparse.Namespace(
+        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
+    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9",
+        url="https://akamhy.github.io/waybackpy/",
+        total=False,
+        version=False,
+        file=False,
+        oldest=False,
+        save=False,
+        json=False,
+        archive_url=False,
+        newest=False,
+        near=False,
+        subdomain=False,
+        known_urls=False,
+        get="newest",
+    )
    reply = cli.args_handler(args)
-    assert "waybackpy" in reply
+    assert "waybackpy" in str(reply)

-    if codecov_python:
-        args = argparse.Namespace(user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
-        (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9", url="https://pypi.org/user/akamhy/", total=False, version=False,
-        oldest=False, save=False, newest=False, near=False, get="save")
-        reply = cli.args_handler(args)
-        assert "waybackpy" in reply
-
-    args = argparse.Namespace(user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
-    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9", url="https://pypi.org/user/akamhy/", total=False, version=False,
-    oldest=False, save=False, newest=False, near=False, get="BullShit")
+    args = argparse.Namespace(
+        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
+    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9",
+        url="https://pypi.org/user/akamhy/",
+        total=False,
+        version=False,
+        file=False,
+        oldest=False,
+        save=False,
+        json=False,
+        archive_url=False,
+        newest=False,
+        near=False,
+        subdomain=False,
+        known_urls=False,
+        get="foobar",
+    )
    reply = cli.args_handler(args)
-    assert "get the source code of the" in reply
+    assert "get the source code of the" in str(reply)
+

 def test_args_handler():
    args = argparse.Namespace(version=True)
    reply = cli.args_handler(args)
-    assert __version__ == reply
+    assert ("waybackpy version %s" % (__version__)) == reply

    args = argparse.Namespace(url=None, version=False)
    reply = cli.args_handler(args)
-    assert "Specify an URL" in reply
+    assert ("waybackpy %s" % (__version__)) in str(reply)
+

 def test_main():
    # This also tests the parse_args method in cli.py
-    cli.main(['temp.py', '--version'])
+    cli.main(["temp.py", "--version"])
--- a/tests/test_snapshot.py
+++ b/tests/test_snapshot.py
@ -0,0 +1,40 @@
+import pytest
+
+from waybackpy.snapshot import CdxSnapshot, datetime
+
+
+def test_CdxSnapshot():
+    sample_input = "org,archive)/ 20080126045828 http://github.com text/html 200 Q4YULN754FHV2U6Q5JUT6Q2P57WEWNNY 1415"
+    prop_values = sample_input.split(" ")
+    properties = {}
+    (
+        properties["urlkey"],
+        properties["timestamp"],
+        properties["original"],
+        properties["mimetype"],
+        properties["statuscode"],
+        properties["digest"],
+        properties["length"],
+    ) = prop_values
+
+    snapshot = CdxSnapshot(properties)
+
+    assert properties["urlkey"] == snapshot.urlkey
+    assert properties["timestamp"] == snapshot.timestamp
+    assert properties["original"] == snapshot.original
+    assert properties["mimetype"] == snapshot.mimetype
+    assert properties["statuscode"] == snapshot.statuscode
+    assert properties["digest"] == snapshot.digest
+    assert properties["length"] == snapshot.length
+    assert (
+        datetime.strptime(properties["timestamp"], "%Y%m%d%H%M%S")
+        == snapshot.datetime_timestamp
+    )
+    archive_url = (
+        "https://web.archive.org/web/"
+        + properties["timestamp"]
+        + "/"
+        + properties["original"]
+    )
+    assert archive_url == snapshot.archive_url
+    assert sample_input == str(snapshot)
--- a/tests/test_utils.py
+++ b/tests/test_utils.py
@ -0,0 +1,186 @@
+import pytest
+import json
+
+from waybackpy.utils import (
+    _cleaned_url,
+    _url_check,
+    _full_url,
+    URLError,
+    WaybackError,
+    _get_total_pages,
+    _archive_url_parser,
+    _wayback_timestamp,
+    _get_response,
+    _check_match_type,
+    _check_collapses,
+    _check_filters,
+    _timestamp_manager,
+)
+
+
+def test_timestamp_manager():
+    timestamp = True
+    data = {}
+    assert _timestamp_manager(timestamp, data)
+
+    data = """
+    {"archived_snapshots": {"closest": {"timestamp": "20210109155628", "available": true, "status": "200", "url": "http://web.archive.org/web/20210109155628/https://www.google.com/"}}, "url": "https://www.google.com/"}
+    """
+    data = json.loads(data)
+    assert data["archived_snapshots"]["closest"]["timestamp"] == "20210109155628"
+
+
+def test_check_filters():
+    filters = []
+    _check_filters(filters)
+
+    filters = ["statuscode:200", "timestamp:20215678901234", "original:https://url.com"]
+    _check_filters(filters)
+
+    with pytest.raises(WaybackError):
+        _check_filters("not-list")
+
+
+def test_check_collapses():
+    collapses = []
+    _check_collapses(collapses)
+
+    collapses = ["timestamp:10"]
+    _check_collapses(collapses)
+
+    collapses = ["urlkey"]
+    _check_collapses(collapses)
+
+    collapses = "urlkey"  # NOT LIST
+    with pytest.raises(WaybackError):
+        _check_collapses(collapses)
+
+    collapses = ["also illegal collapse"]
+    with pytest.raises(WaybackError):
+        _check_collapses(collapses)
+
+
+def test_check_match_type():
+    assert _check_match_type(None, "url") is None
+    match_type = "exact"
+    url = "test_url"
+    assert _check_match_type(match_type, url) is None
+
+    url = "has * in it"
+    with pytest.raises(WaybackError):
+        _check_match_type("domain", url)
+
+    with pytest.raises(WaybackError):
+        _check_match_type("not a valid type", "url")
+
+
+def test_cleaned_url():
+    test_url = " https://en.wikipedia.org/wiki/Network security "
+    answer = "https://en.wikipedia.org/wiki/Network%20security"
+    assert answer == _cleaned_url(test_url)
+
+
+def test_url_check():
+    good_url = "https://akamhy.github.io"
+    assert _url_check(good_url) is None
+
+    bad_url = "https://github-com"
+    with pytest.raises(URLError):
+        _url_check(bad_url)
+
+
+def test_full_url():
+    params = {}
+    endpoint = "https://web.archive.org/cdx/search/cdx"
+    assert endpoint == _full_url(endpoint, params)
+
+    params = {"a": "1"}
+    assert "https://web.archive.org/cdx/search/cdx?a=1" == _full_url(endpoint, params)
+    assert "https://web.archive.org/cdx/search/cdx?a=1" == _full_url(
+        endpoint + "?", params
+    )
+
+    params["b"] = 2
+    assert "https://web.archive.org/cdx/search/cdx?a=1&b=2" == _full_url(
+        endpoint + "?", params
+    )
+
+    params["c"] = "foo bar"
+    assert "https://web.archive.org/cdx/search/cdx?a=1&b=2&c=foo%20bar" == _full_url(
+        endpoint + "?", params
+    )
+
+
+def test_get_total_pages():
+    user_agent = "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
+    url = "github.com*"
+    assert 212890 <= _get_total_pages(url, user_agent)
+
+    url = "https://zenodo.org/record/4416138"
+    assert 2 >= _get_total_pages(url, user_agent)
+
+
+def test_archive_url_parser():
+    perfect_header = """
+    {'Server': 'nginx/1.15.8', 'Date': 'Sat, 02 Jan 2021 09:40:25 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'X-Archive-Orig-Server': 'nginx', 'X-Archive-Orig-Date': 'Sat, 02 Jan 2021 09:40:09 GMT', 'X-Archive-Orig-Transfer-Encoding': 'chunked', 'X-Archive-Orig-Connection': 'keep-alive', 'X-Archive-Orig-Vary': 'Accept-Encoding', 'X-Archive-Orig-Last-Modified': 'Fri, 01 Jan 2021 12:19:00 GMT', 'X-Archive-Orig-Strict-Transport-Security': 'max-age=31536000, max-age=0;', 'X-Archive-Guessed-Content-Type': 'text/html', 'X-Archive-Guessed-Charset': 'utf-8', 'Memento-Datetime': 'Sat, 02 Jan 2021 09:40:09 GMT', 'Link': '<https://www.scribbr.com/citing-sources/et-al/>; rel="original", <https://web.archive.org/web/timemap/link/https://www.scribbr.com/citing-sources/et-al/>; rel="timemap"; type="application/link-format", <https://web.archive.org/web/https://www.scribbr.com/citing-sources/et-al/>; rel="timegate", <https://web.archive.org/web/20200601082911/https://www.scribbr.com/citing-sources/et-al/>; rel="first memento"; datetime="Mon, 01 Jun 2020 08:29:11 GMT", <https://web.archive.org/web/20201126185327/https://www.scribbr.com/citing-sources/et-al/>; rel="prev memento"; datetime="Thu, 26 Nov 2020 18:53:27 GMT", <https://web.archive.org/web/20210102094009/https://www.scribbr.com/citing-sources/et-al/>; rel="memento"; datetime="Sat, 02 Jan 2021 09:40:09 GMT", <https://web.archive.org/web/20210102094009/https://www.scribbr.com/citing-sources/et-al/>; rel="last memento"; datetime="Sat, 02 Jan 2021 09:40:09 GMT"', 'Content-Security-Policy': "default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org analytics.archive.org pragma.archivelab.org", 'X-Archive-Src': 'spn2-20210102092956-wwwb-spn20.us.archive.org-8001.warc.gz', 'Server-Timing': 'captures_list;dur=112.646325, exclusion.robots;dur=0.172010, exclusion.robots.policy;dur=0.158205, RedisCDXSource;dur=2.205932, esindex;dur=0.014647, LoadShardBlock;dur=82.205012, PetaboxLoader3.datanode;dur=70.750239, CDXLines.iter;dur=24.306278, load_resource;dur=26.520179', 'X-App-Server': 'wwwb-app200', 'X-ts': '200', 'X-location': 'All', 'X-Cache-Key': 'httpsweb.archive.org/web/20210102094009/https://www.scribbr.com/citing-sources/et-al/IN', 'X-RL': '0', 'X-Page-Cache': 'MISS', 'X-Archive-Screenname': '0', 'Content-Encoding': 'gzip'}
+    """
+
+    archive = _archive_url_parser(
+        perfect_header, "https://www.scribbr.com/citing-sources/et-al/"
+    )
+    assert "web.archive.org/web/20210102094009" in archive
+
+    header = """
+    vhgvkjv
+    Content-Location: /web/20201126185327/https://www.scribbr.com/citing-sources/et-al
+    ghvjkbjmmcmhj
+    """
+    archive = _archive_url_parser(
+        header, "https://www.scribbr.com/citing-sources/et-al/"
+    )
+    assert "20201126185327" in archive
+
+    header = """
+    hfjkfjfcjhmghmvjm
+    X-Cache-Key: https://web.archive.org/web/20171128185327/https://www.scribbr.com/citing-sources/et-al/US
+    yfu,u,gikgkikik
+    """
+    archive = _archive_url_parser(
+        header, "https://www.scribbr.com/citing-sources/et-al/"
+    )
+    assert "20171128185327" in archive
+
+    # The below header should result in Exception
+    no_archive_header = """
+    {'Server': 'nginx/1.15.8', 'Date': 'Sat, 02 Jan 2021 09:42:45 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Cache-Control': 'no-cache', 'X-App-Server': 'wwwb-app52', 'X-ts': '523', 'X-RL': '0', 'X-Page-Cache': 'MISS', 'X-Archive-Screenname': '0'}
+    """
+
+    with pytest.raises(WaybackError):
+        _archive_url_parser(
+            no_archive_header, "https://www.scribbr.com/citing-sources/et-al/"
+        )
+
+
+def test_wayback_timestamp():
+    ts = _wayback_timestamp(year=2020, month=1, day=2, hour=3, minute=4)
+    assert "202001020304" in str(ts)
+
+
+def test_get_response():
+    endpoint = "https://www.google.com"
+    user_agent = (
+        "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0"
+    )
+    headers = {"User-Agent": "%s" % user_agent}
+    response = _get_response(endpoint, params=None, headers=headers)
+    assert response.status_code == 200
+
+    endpoint = "http/wwhfhfvhvjhmom"
+    with pytest.raises(WaybackError):
+        _get_response(endpoint, params=None, headers=headers)
+
+    endpoint = "https://akamhy.github.io"
+    url, response = _get_response(
+        endpoint, params=None, headers=headers, return_full_url=True
+    )
+    assert endpoint == url
--- a/tests/test_wrapper.py
+++ b/tests/test_wrapper.py
@ -1,194 +1,28 @@
-# -*- coding: utf-8 -*-
-import sys
 import pytest
-import random
-import time

-sys.path.append("..")
-import waybackpy.wrapper as waybackpy  # noqa: E402
+from waybackpy.wrapper import Url

-if sys.version_info >= (3, 0):  # If the python ver >= 3
-    from urllib.request import Request, urlopen
-    from urllib.error import URLError
-else:  # For python2.x
-    from urllib2 import Request, urlopen, URLError

 user_agent = "Mozilla/5.0 (Windows NT 6.2; rv:20.0) Gecko/20121202 Firefox/20.0"


-def test_clean_url():
-    test_url = " https://en.wikipedia.org/wiki/Network security "
-    answer = "https://en.wikipedia.org/wiki/Network_security"
-    target = waybackpy.Url(test_url, user_agent)
-    test_result = target._clean_url()
-    assert answer == test_result
-
-def test_dunders():
-    url = "https://en.wikipedia.org/wiki/Network_security"
-    user_agent = "UA"
-    target = waybackpy.Url(url, user_agent)
-    assert "waybackpy.Url(url=%s, user_agent=%s)" % (url, user_agent) == repr(target)
-    assert len(target) == len(url)
-    assert str(target) == url
-
-def test_archive_url_parser():
-    request_url = "https://amazon.com"
-    hdr = {"User-Agent": user_agent}  # nosec
-    req = Request(request_url, headers=hdr)  # nosec
-    header = waybackpy._get_response(req).headers
-    with pytest.raises(Exception):
-        waybackpy._archive_url_parser(header)
-
 def test_url_check():
+    """No API Use"""
    broken_url = "http://wwwgooglecom/"
    with pytest.raises(Exception):
-        waybackpy.Url(broken_url, user_agent)
-
-
-def test_save():
-    # Test for urls that exist and can be archived.
-    time.sleep(10)
-
-    url_list = [
-        "en.wikipedia.org",
-        "www.wikidata.org",
-        "commons.wikimedia.org",
-        "www.wiktionary.org",
-        "www.w3schools.com",
-        "www.ibm.com",
-    ]
-    x = random.randint(0, len(url_list) - 1)
-    url1 = url_list[x]
-    target = waybackpy.Url(
-        url1,
-        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 "
-        "(KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36",
-    )
-    archived_url1 = target.save()
-    assert url1 in archived_url1
-
-    if sys.version_info > (3, 6):
-
-        # Test for urls that are incorrect.
-        with pytest.raises(Exception):
-            url2 = "ha ha ha ha"
-            waybackpy.Url(url2, user_agent)
-        time.sleep(5)
-        # Test for urls not allowed to archive by robot.txt.
-        with pytest.raises(Exception):
-            url3 = "http://www.archive.is/faq.html"
-            target = waybackpy.Url(
-                url3,
-                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:25.0) "
-                "Gecko/20100101 Firefox/25.0",
-            )
-            target.save()
-
-        time.sleep(5)
-        # Non existent urls, test
-        with pytest.raises(Exception):
-            url4 = (
-                "https://githfgdhshajagjstgeths537agajaajgsagudadhuss87623"
-                "46887adsiugujsdgahub.us"
-            )
-            target = waybackpy.Url(
-                url3,
-                "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) "
-                "AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 "
-                "Safari/533.20.27",
-            )
-            target.save()
-
-    else:
-        pass
+        Url(broken_url, user_agent)


 def test_near():
-    time.sleep(10)
-    url = "google.com"
-    target = waybackpy.Url(
-        url,
-        "Mozilla/5.0 (Windows; U; Windows NT 6.0; de-DE) AppleWebKit/533.20.25 "
-        "(KHTML, like Gecko) Version/5.0.3 Safari/533.19.4",
-    )
-    archive_near_year = target.near(year=2010)
-    assert "2010" in archive_near_year
-
-    if sys.version_info > (3, 6):
-        time.sleep(5)
-        archive_near_month_year = target.near(year=2015, month=2)
-        assert (
-            ("201502" in archive_near_month_year)
-            or ("201501" in archive_near_month_year)
-            or ("201503" in archive_near_month_year)
+    with pytest.raises(Exception):
+        NeverArchivedUrl = (
+            "https://ee_3n.wrihkeipef4edia.org/rwti5r_ki/Nertr6w_rork_rse7c_urity"
        )
-
-        target = waybackpy.Url(
-            "www.python.org",
-            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
-            "(KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246",
-        )
-        archive_near_hour_day_month_year = target.near(
-            year=2008, month=5, day=9, hour=15
-        )
-        assert (
-            ("2008050915" in archive_near_hour_day_month_year)
-            or ("2008050914" in archive_near_hour_day_month_year)
-            or ("2008050913" in archive_near_hour_day_month_year)
-        )
-
-        with pytest.raises(Exception):
-            NeverArchivedUrl = (
-                "https://ee_3n.wrihkeipef4edia.org/rwti5r_ki/Nertr6w_rork_rse7c_urity"
-            )
-            target = waybackpy.Url(NeverArchivedUrl, user_agent)
-            target.near(year=2010)
-    else:
-        pass
+        target = Url(NeverArchivedUrl, user_agent)
+        target.near(year=2010)


-def test_oldest():
+def test_json():
    url = "github.com/akamhy/waybackpy"
-    target = waybackpy.Url(url, user_agent)
-    assert "20200504141153" in target.oldest()
-
-
-def test_newest():
-    url = "github.com/akamhy/waybackpy"
-    target = waybackpy.Url(url, user_agent)
-    assert url in target.newest()
-
-
-def test_get():
-    target = waybackpy.Url("google.com", user_agent)
-    assert "Welcome to Google" in target.get(target.oldest())
-
-
-
-def test_wayback_timestamp():
-    ts = waybackpy._wayback_timestamp(
-        year=2020, month=1, day=2, hour=3, minute=4
-    )
-    assert "202001020304" in str(ts)
-
-
-def test_get_response():
-    hdr = {
-        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) "
-        "Gecko/20100101 Firefox/78.0"
-    }
-    req = Request("https://www.google.com", headers=hdr)  # nosec
-    response = waybackpy._get_response(req)
-    assert response.code == 200
-
-
-def test_total_archives():
-    if sys.version_info > (3, 6):
-        target = waybackpy.Url(" https://google.com ", user_agent)
-        assert target.total_archives() > 500000
-    else:
-        pass
-    target = waybackpy.Url(
-        " https://gaha.e4i3n.m5iai3kip6ied.cima/gahh2718gs/ahkst63t7gad8 ", user_agent
-    )
-    assert target.total_archives() == 0
+    target = Url(url, user_agent)
+    assert "archived_snapshots" in str(target.JSON)
--- a/waybackpy/init.py
+++ b/waybackpy/init.py
@ -1,5 +1,3 @@
-# -*- coding: utf-8 -*-
-
 # ┏┓┏┓┏┓━━━━━━━━━━┏━━┓━━━━━━━━━━┏┓━━┏━━━┓━━━━━
 # ┃┃┃┃┃┃━━━━━━━━━━┃┏┓┃━━━━━━━━━━┃┃━━┃┏━┓┃━━━━━
 # ┃┃┃┃┃┃┏━━┓━┏┓━┏┓┃┗┛┗┓┏━━┓━┏━━┓┃┃┏┓┃┗━┛┃┏┓━┏┓
@ -10,24 +8,43 @@
 # ━━━━━━━━━━━┗━━┛━━━━━━━━━━━━━━━━━━━━━━━━┗━━┛━

 """
-Waybackpy is a Python library that interfaces with the Internet Archive's Wayback Machine API.
+Waybackpy is a Python package & command-line program that interfaces with the Internet Archive's Wayback Machine API.
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-Archive pages and retrieve archived pages easily.
+Archive webpage and retrieve archived URLs easily.

 Usage:
-   >>> import waybackpy
-   >>> target_url = waybackpy.Url('https://www.python.org', 'Your-apps-cool-user-agent')
-   >>> new_archive = target_url.save()
-   >>> print(new_archive)
-   https://web.archive.org/web/20200502170312/https://www.python.org/
+    >>> import waybackpy

-Full documentation @ <https://akamhy.github.io/waybackpy/>.
-:copyright: (c) 2020 by akamhy.
+    >>> url = "https://en.wikipedia.org/wiki/Multivariable_calculus"
+    >>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
+
+    >>> wayback = waybackpy.Url(url, user_agent)
+
+    >>> archive = wayback.save()
+    >>> str(archive)
+    'https://web.archive.org/web/20210104173410/https://en.wikipedia.org/wiki/Multivariable_calculus'
+
+    >>> archive.timestamp
+    datetime.datetime(2021, 1, 4, 17, 35, 12, 691741)
+
+    >>> oldest_archive = wayback.oldest()
+    >>> str(oldest_archive)
+    'https://web.archive.org/web/20050422130129/http://en.wikipedia.org:80/wiki/Multivariable_calculus'
+
+    >>> archive_close_to_2010_feb = wayback.near(year=2010, month=2)
+    >>> str(archive_close_to_2010_feb)
+    'https://web.archive.org/web/20100215001541/http://en.wikipedia.org:80/wiki/Multivariable_calculus'
+
+    >>> str(wayback.newest())
+    'https://web.archive.org/web/20210104173410/https://en.wikipedia.org/wiki/Multivariable_calculus'
+
+Full documentation @ <https://github.com/akamhy/waybackpy/wiki>.
+:copyright: (c) 2020-2021 AKash Mahanty Et al.
 :license: MIT
 """

-from .wrapper import Url
+from .wrapper import Url, Cdx
 from .__version__ import (
    __title__,
    __description__,
--- a/waybackpy/version.py
+++ b/waybackpy/version.py
@ -1,10 +1,11 @@
-# -*- coding: utf-8 -*-
-
 __title__ = "waybackpy"
-__description__ = "A Python library that interfaces with the Internet Archive's Wayback Machine API. Archive pages and retrieve archived pages easily."
+__description__ = (
+    "A Python package that interfaces with the Internet Archive's Wayback Machine API. "
+    "Archive pages and retrieve archived pages easily."
+)
 __url__ = "https://akamhy.github.io/waybackpy/"
-__version__ = "2.1.6"
+__version__ = "2.4.4"
 __author__ = "akamhy"
-__author_email__ = "akash3pro@gmail.com"
+__author_email__ = "akamhy@yahoo.com"
 __license__ = "MIT"
-__copyright__ = "Copyright 2020 akamhy"
+__copyright__ = "Copyright 2020-2021 Akash Mahanty et al."
--- a/waybackpy/cdx.py
+++ b/waybackpy/cdx.py
@ -0,0 +1,229 @@
+from .snapshot import CdxSnapshot
+from .exceptions import WaybackError
+from .utils import (
+    _get_total_pages,
+    _get_response,
+    default_user_agent,
+    _check_filters,
+    _check_collapses,
+    _check_match_type,
+    _add_payload,
+)
+
+# TODO : Threading support for pagination API. It's designed for Threading.
+# TODO : Add get method here if type is Vaild HTML, SVG other but not - or warc. Test it.
+
+
+class Cdx:
+    def __init__(
+        self,
+        url,
+        user_agent=None,
+        start_timestamp=None,
+        end_timestamp=None,
+        filters=[],
+        match_type=None,
+        gzip=None,
+        collapses=[],
+        limit=None,
+    ):
+        self.url = str(url).strip()
+        self.user_agent = str(user_agent) if user_agent else default_user_agent
+        self.start_timestamp = str(start_timestamp) if start_timestamp else None
+        self.end_timestamp = str(end_timestamp) if end_timestamp else None
+        self.filters = filters
+        _check_filters(self.filters)
+        self.match_type = str(match_type).strip() if match_type else None
+        _check_match_type(self.match_type, self.url)
+        self.gzip = gzip if gzip else True
+        self.collapses = collapses
+        _check_collapses(self.collapses)
+        self.limit = limit if limit else 5000
+        self.last_api_request_url = None
+        self.use_page = False
+
+    def cdx_api_manager(self, payload, headers, use_page=False):
+        """Act as button, we can choose between the normal API and pagination API.
+
+        Parameters
+        ----------
+        self : waybackpy.cdx.Cdx
+            The instance itself
+
+        payload : dict
+            Get request parameters name value pairs
+
+        headers : dict
+            The headers for making the GET request.
+
+        use_page : bool
+            If True use pagination API else use normal resume key based API.
+
+        We have two options to get the snapshots, we use this
+        method to make a selection between pagination API and
+        the normal one with Resumption Key, sequential querying
+        of CDX data. For very large querying (for example domain query),
+        it may be useful to perform queries in parallel and also estimate
+        the total size of the query.
+
+        read more about the pagination API at:
+        https://web.archive.org/web/20201228063237/https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md#pagination-api
+
+        if use_page is false if will use the normal sequential query API,
+        else use the pagination API.
+
+        two mutually exclusive cases possible:
+
+        1) pagination API is selected
+
+            a) get the total number of pages to read, using _get_total_pages()
+
+            b) then we use a for loop to get all the pages and yield the response text
+
+        2) normal sequential query API is selected.
+
+            a) get use showResumeKey=true to ask the API to add a query resumption key
+               at the bottom of response
+
+            b) check if the page has more than 3 lines, if not return the text
+
+            c) if it has atleast three lines, we check the second last line for zero length.
+
+            d) if the second last line has length zero than we assume that the last line contains
+               the resumption key, we set the resumeKey and remove the resumeKey from text
+
+            e) if the second line has non zero length we return the text as there will no resumption key
+
+            f) if we find the resumption key we set the "more" variable status to True which is always set
+               to False on each iteration. If more is not True the iteration stops and function returns.
+        """
+
+        endpoint = "https://web.archive.org/cdx/search/cdx"
+        total_pages = _get_total_pages(self.url, self.user_agent)
+        # If we only have two or less pages of archives then we care for accuracy
+        # pagination API can be lagged sometimes
+        if use_page == True and total_pages >= 2:
+            blank_pages = 0
+            for i in range(total_pages):
+                payload["page"] = str(i)
+                url, res = _get_response(
+                    endpoint, params=payload, headers=headers, return_full_url=True
+                )
+
+                self.last_api_request_url = url
+                text = res.text
+                if len(text) == 0:
+                    blank_pages += 1
+
+                if blank_pages >= 2:
+                    break
+
+                yield text
+        else:
+
+            payload["showResumeKey"] = "true"
+            payload["limit"] = str(self.limit)
+            resumeKey = None
+
+            more = True
+            while more:
+
+                if resumeKey:
+                    payload["resumeKey"] = resumeKey
+
+                url, res = _get_response(
+                    endpoint, params=payload, headers=headers, return_full_url=True
+                )
+
+                self.last_api_request_url = url
+
+                text = res.text.strip()
+                lines = text.splitlines()
+
+                more = False
+
+                if len(lines) >= 3:
+
+                    second_last_line = lines[-2]
+
+                    if len(second_last_line) == 0:
+
+                        resumeKey = lines[-1].strip()
+                        text = text.replace(resumeKey, "", 1).strip()
+                        more = True
+
+                yield text
+
+    def snapshots(self):
+        """
+        This function yeilds snapshots encapsulated
+        in CdxSnapshot for increased usability.
+
+        All the get request values are set if the conditions match
+
+        And we use logic that if someone's only inputs don't have any
+        of [start_timestamp, end_timestamp] and don't use any collapses
+        then we use the pagination API as it returns archives starting
+        from the first archive and the recent most archive will be on
+        the last page.
+        """
+        payload = {}
+        headers = {"User-Agent": self.user_agent}
+
+        _add_payload(self, payload)
+
+        if not self.start_timestamp or self.end_timestamp:
+            self.use_page = True
+
+        if self.collapses != []:
+            self.use_page = False
+
+        texts = self.cdx_api_manager(payload, headers, use_page=self.use_page)
+
+        for text in texts:
+
+            if text.isspace() or len(text) <= 1 or not text:
+                continue
+
+            snapshot_list = text.split("\n")
+
+            for snapshot in snapshot_list:
+
+                if len(snapshot) < 46:  # 14 + 32 (timestamp+digest)
+                    continue
+
+                properties = {
+                    "urlkey": None,
+                    "timestamp": None,
+                    "original": None,
+                    "mimetype": None,
+                    "statuscode": None,
+                    "digest": None,
+                    "length": None,
+                }
+
+                prop_values = snapshot.split(" ")
+
+                prop_values_len = len(prop_values)
+                properties_len = len(properties)
+
+                if prop_values_len != properties_len:
+                    raise WaybackError(
+                        "Snapshot returned by Cdx API has {prop_values_len} properties instead of expected {properties_len} properties.\nInvolved Snapshot : {snapshot}".format(
+                            prop_values_len=prop_values_len,
+                            properties_len=properties_len,
+                            snapshot=snapshot,
+                        )
+                    )
+
+                (
+                    properties["urlkey"],
+                    properties["timestamp"],
+                    properties["original"],
+                    properties["mimetype"],
+                    properties["statuscode"],
+                    properties["digest"],
+                    properties["length"],
+                ) = prop_values
+
+                yield CdxSnapshot(properties)
--- a/waybackpy/cli.py
+++ b/waybackpy/cli.py
@ -1,105 +1,334 @@
-# -*- coding: utf-8 -*-
-from __future__ import print_function
+import os
+import re
 import sys
+import json
+import random
+import string
 import argparse
-from waybackpy.wrapper import Url
-from waybackpy.__version__ import __version__
+
+from .wrapper import Url
+from .exceptions import WaybackError
+from .__version__ import __version__
+

 def _save(obj):
-    return (obj.save())
+    try:
+        return obj.save()
+    except Exception as err:
+        e = str(err)
+        m = re.search(r"Header:\n(.*)", e)
+        if m:
+            header = m.group(1)
+        if "No archive URL found in the API response" in e:
+            return (
+                "\n[waybackpy] Can not save/archive your link.\n[waybackpy] This "
+                "could happen because either your waybackpy ({version}) is likely out of "
+                "date or Wayback Machine is malfunctioning.\n[waybackpy] Visit "
+                "https://github.com/akamhy/waybackpy for the latest version of "
+                "waybackpy.\n[waybackpy] API response Header :\n{header}".format(
+                    version=__version__, header=header
+                )
+            )
+        if "URL cannot be archived by wayback machine as it is a redirect" in e:
+            return ("URL cannot be archived by wayback machine as it is a redirect")
+        raise WaybackError(err)
+
+
+def _archive_url(obj):
+    return obj.archive_url
+
+
+def _json(obj):
+    return json.dumps(obj.JSON)
+
+
+def no_archive_handler(e, obj):
+    m = re.search(r"archive\sfor\s\'(.*?)\'\stry", str(e))
+    if m:
+        url = m.group(1)
+        ua = obj.user_agent
+        if "github.com/akamhy/waybackpy" in ua:
+            ua = "YOUR_USER_AGENT_HERE"
+        return (
+            "\n[Waybackpy] Can not find archive for '{url}'.\n[Waybackpy] You can"
+            " save the URL using the following command:\n[Waybackpy] waybackpy --"
+            'user_agent "{user_agent}" --url "{url}" --save'.format(
+                url=url, user_agent=ua
+            )
+        )
+    raise WaybackError(e)
+

 def _oldest(obj):
-    return (obj.oldest())
+    try:
+        return obj.oldest()
+    except Exception as e:
+        return no_archive_handler(e, obj)
+

 def _newest(obj):
-    return (obj.newest())
+    try:
+        return obj.newest()
+    except Exception as e:
+        return no_archive_handler(e, obj)
+

 def _total_archives(obj):
-    return (obj.total_archives())
+    return obj.total_archives()
+

 def _near(obj, args):
    _near_args = {}
-    if args.year:
-        _near_args["year"] = args.year
-    if args.month:
-        _near_args["month"] = args.month
-    if args.day:
-        _near_args["day"] = args.day
-    if args.hour:
-        _near_args["hour"] = args.hour
-    if args.minute:
-        _near_args["minute"] = args.minute
-    return (obj.near(**_near_args))
+    args_arr = [args.year, args.month, args.day, args.hour, args.minute]
+    keys = ["year", "month", "day", "hour", "minute"]
+
+    for key, arg in zip(keys, args_arr):
+        if arg:
+            _near_args[key] = arg
+
+    try:
+        return obj.near(**_near_args)
+    except Exception as e:
+        return no_archive_handler(e, obj)
+
+
+def _save_urls_on_file(url_gen):
+    domain = None
+    sys_random = random.SystemRandom()
+    uid = "".join(
+        sys_random.choice(string.ascii_lowercase + string.digits) for _ in range(6)
+    )
+    url_count = 0
+
+    for url in url_gen:
+        url_count += 1
+        if not domain:
+            m = re.search("https?://([A-Za-z_0-9.-]+).*", url)
+
+            domain = "domain-unknown"
+
+            if m:
+                domain = m.group(1)
+
+            file_name = "{domain}-urls-{uid}.txt".format(domain=domain, uid=uid)
+            file_path = os.path.join(os.getcwd(), file_name)
+            if not os.path.isfile(file_path):
+                open(file_path, "w+").close()
+
+        with open(file_path, "a") as f:
+            f.write("{url}\n".format(url=url))
+
+        print(url)
+
+    if url_count > 0:
+        return "\n\n'{file_name}' saved in current working directory".format(
+            file_name=file_name
+        )
+    else:
+        return "No known URLs found. Please try a diffrent input!"
+
+
+def _known_urls(obj, args):
+    """
+    Known urls for a domain.
+    """
+
+    subdomain = True if args.subdomain else False
+
+    url_gen = obj.known_urls(subdomain=subdomain)
+
+    if args.file:
+        return _save_urls_on_file(url_gen)
+    else:
+        for url in url_gen:
+            print(url)
+        return "\n"
+

 def _get(obj, args):
    if args.get.lower() == "url":
-        return (obj.get())
-
+        return obj.get()
+    if args.get.lower() == "archive_url":
+        return obj.get(obj.archive_url)
    if args.get.lower() == "oldest":
-        return (obj.get(obj.oldest()))
-
+        return obj.get(obj.oldest())
    if args.get.lower() == "latest" or args.get.lower() == "newest":
-        return (obj.get(obj.newest()))
-
+        return obj.get(obj.newest())
    if args.get.lower() == "save":
-        return (obj.get(obj.save()))
-
-    return ("Use get as \"--get 'source'\", 'source' can be one of the followings: \
+        return obj.get(obj.save())
+    return "Use get as \"--get 'source'\", 'source' can be one of the followings: \
        \n1) url - get the source code of the url specified using --url/-u.\
-        \n2) oldest - get the source code of the oldest archive for the supplied url.\
-        \n3) newest - get the source code of the newest archive for the supplied url.\
-        \n4) save - Create a new archive and get the source code of this new archive for the supplied url.")
+        \n2) archive_url - get the source code of the newest archive for the supplied url, alias of newest.\
+        \n3) oldest - get the source code of the oldest archive for the supplied url.\
+        \n4) newest - get the source code of the newest archive for the supplied url.\
+        \n5) save - Create a new archive and get the source code of this new archive for the supplied url."
+

 def args_handler(args):
    if args.version:
-        return (__version__)
+        return "waybackpy version {version}".format(version=__version__)

    if not args.url:
-        return ("Specify an URL. See --help for help using waybackpy.")
+        return "waybackpy {version} \nSee 'waybackpy --help' for help using this tool.".format(
+            version=__version__
+        )

+    obj = Url(args.url)
    if args.user_agent:
        obj = Url(args.url, args.user_agent)
-    else:
-        obj = Url(args.url)

    if args.save:
-        return _save(obj)
-    if args.oldest:
-        return _oldest(obj)
-    if args.newest:
-        return _newest(obj)
-    if args.total:
-        return _total_archives(obj)
-    if args.near:
+        output = _save(obj)
+    elif args.archive_url:
+        output = _archive_url(obj)
+    elif args.json:
+        output = _json(obj)
+    elif args.oldest:
+        output = _oldest(obj)
+    elif args.newest:
+        output = _newest(obj)
+    elif args.known_urls:
+        output = _known_urls(obj, args)
+    elif args.total:
+        output = _total_archives(obj)
+    elif args.near:
        return _near(obj, args)
-    if args.get:
-        return _get(obj, args)
-    return ("Usage: waybackpy --url [URL] --user_agent [USER AGENT] [OPTIONS]. See --help for help using waybackpy.")
+    elif args.get:
+        output = _get(obj, args)
+    else:
+        output = (
+            "You only specified the URL. But you also need to specify the operation."
+            "\nSee 'waybackpy --help' for help using this tool."
+        )
+    return output
+
+
+def add_requiredArgs(requiredArgs):
+    requiredArgs.add_argument(
+        "--url", "-u", help="URL on which Wayback machine operations would occur"
+    )
+
+
+def add_userAgentArg(userAgentArg):
+    help_text = 'User agent, default user_agent is "waybackpy python package - https://github.com/akamhy/waybackpy"'
+    userAgentArg.add_argument("--user_agent", "-ua", help=help_text)
+
+
+def add_saveArg(saveArg):
+    saveArg.add_argument(
+        "--save", "-s", action="store_true", help="Save the URL on the Wayback machine"
+    )
+
+
+def add_auArg(auArg):
+    auArg.add_argument(
+        "--archive_url",
+        "-au",
+        action="store_true",
+        help="Get the latest archive URL, alias for --newest",
+    )
+
+
+def add_jsonArg(jsonArg):
+    jsonArg.add_argument(
+        "--json",
+        "-j",
+        action="store_true",
+        help="JSON data of the availability API request",
+    )
+
+
+def add_oldestArg(oldestArg):
+    oldestArg.add_argument(
+        "--oldest",
+        "-o",
+        action="store_true",
+        help="Oldest archive for the specified URL",
+    )
+
+
+def add_newestArg(newestArg):
+    newestArg.add_argument(
+        "--newest",
+        "-n",
+        action="store_true",
+        help="Newest archive for the specified URL",
+    )
+
+
+def add_totalArg(totalArg):
+    totalArg.add_argument(
+        "--total",
+        "-t",
+        action="store_true",
+        help="Total number of archives for the specified URL",
+    )
+
+
+def add_getArg(getArg):
+    getArg.add_argument(
+        "--get",
+        "-g",
+        help="Prints the source code of the supplied url. Use '--get help' for extended usage",
+    )
+
+
+def add_knownUrlArg(knownUrlArg):
+    knownUrlArg.add_argument(
+        "--known_urls", "-ku", action="store_true", help="URLs known for the domain."
+    )
+    help_text = "Use with '--known_urls' to include known URLs for subdomains."
+    knownUrlArg.add_argument("--subdomain", "-sub", action="store_true", help=help_text)
+    knownUrlArg.add_argument(
+        "--file",
+        "-f",
+        action="store_true",
+        help="Save the URLs in file at current directory.",
+    )
+
+
+def add_nearArg(nearArg):
+    nearArg.add_argument(
+        "--near", "-N", action="store_true", help="Archive near specified time"
+    )
+
+
+def add_nearArgs(nearArgs):
+    nearArgs.add_argument("--year", "-Y", type=int, help="Year in integer")
+    nearArgs.add_argument("--month", "-M", type=int, help="Month in integer")
+    nearArgs.add_argument("--day", "-D", type=int, help="Day in integer.")
+    nearArgs.add_argument("--hour", "-H", type=int, help="Hour in intege")
+    nearArgs.add_argument("--minute", "-MIN", type=int, help="Minute in integer")
+

 def parse_args(argv):
    parser = argparse.ArgumentParser()
-    parser.add_argument("-u", "--url", help="URL on which Wayback machine operations would occur.")
-    parser.add_argument("-ua", "--user_agent", help="User agent, default user_agent is \"waybackpy python package - https://github.com/akamhy/waybackpy\".")
-    parser.add_argument("-s", "--save", action='store_true', help="Save the URL on the Wayback machine.")
-    parser.add_argument("-o", "--oldest", action='store_true', help="Oldest archive for the specified URL.")
-    parser.add_argument("-n", "--newest", action='store_true', help="Newest archive for the specified URL.")
-    parser.add_argument("-t", "--total", action='store_true', help="Total number of archives for the specified URL.")
-    parser.add_argument("-g", "--get", help="Prints the source code of the supplied url. Use '--get help' for extended usage.")
-    parser.add_argument("-v", "--version", action='store_true', help="Prints the waybackpy version.")
-    parser.add_argument("-N", "--near", action='store_true', help="Latest/Newest archive for the specified URL.")
-    parser.add_argument("-Y", "--year", type=int, help="Year in integer. For use with --near.")
-    parser.add_argument("-M", "--month", type=int, help="Month in integer. For use with --near.")
-    parser.add_argument("-D", "--day", type=int, help="Day in integer. For use with --near.")
-    parser.add_argument("-H", "--hour", type=int, help="Hour in integer. For use with --near.")
-    parser.add_argument("-MIN", "--minute", type=int, help="Minute in integer. For use with --near.")
+    add_requiredArgs(parser.add_argument_group("URL argument (required)"))
+    add_userAgentArg(parser.add_argument_group("User Agent"))
+    add_saveArg(parser.add_argument_group("Create new archive/save URL"))
+    add_auArg(parser.add_argument_group("Get the latest Archive"))
+    add_jsonArg(parser.add_argument_group("Get the JSON data"))
+    add_oldestArg(parser.add_argument_group("Oldest archive"))
+    add_newestArg(parser.add_argument_group("Newest archive"))
+    add_totalArg(parser.add_argument_group("Total number of archives"))
+    add_getArg(parser.add_argument_group("Get source code"))
+    add_knownUrlArg(
+        parser.add_argument_group(
+            "URLs known and archived to Wayback Machine for the site."
+        )
+    )
+    add_nearArg(parser.add_argument_group("Archive close to time specified"))
+    add_nearArgs(parser.add_argument_group("Arguments that are used only with --near"))
+    parser.add_argument(
+        "--version", "-v", action="store_true", help="Waybackpy version"
+    )
    return parser.parse_args(argv[1:])

+
 def main(argv=None):
-    if argv is None:
-        argv = sys.argv
-    args = parse_args(argv)
-    output = args_handler(args)
-    print(output)
+    argv = sys.argv if argv is None else argv
+    print(args_handler(parse_args(argv)))
+

 if __name__ == "__main__":
    sys.exit(main(sys.argv))
--- a/waybackpy/exceptions.py
+++ b/waybackpy/exceptions.py
@ -1,6 +1,26 @@
-# -*- coding: utf-8 -*-
+"""
+waybackpy.exceptions
+~~~~~~~~~~~~~~~~~~~
+This module contains the set of Waybackpy's exceptions.
+"""
+

 class WaybackError(Exception):
    """
-    Raised when API Service error.
+    Raised when Waybackpy can not return what you asked for.
+     1) Wayback Machine API Service is unreachable/down.
+     2) You passed illegal arguments.
+    """
+
+
+class RedirectSaveError(WaybackError):
+    """
+    Raised when the original URL is redirected and the
+    redirect URL is archived but not the original URL.
+    """
+
+
+class URLError(Exception):
+    """
+    Raised when malformed URLs are passed as arguments.
    """
--- a/waybackpy/snapshot.py
+++ b/waybackpy/snapshot.py
@ -0,0 +1,51 @@
+from datetime import datetime
+
+
+class CdxSnapshot:
+    """
+    This class encapsulates the snapshots for greater usability.
+
+    Raw Snapshot data looks like:
+    org,archive)/ 20080126045828 http://github.com text/html 200 Q4YULN754FHV2U6Q5JUT6Q2P57WEWNNY 1415
+
+    """
+
+    def __init__(self, properties):
+        """
+        Parameters
+        ----------
+        self : waybackpy.snapshot.CdxSnapshot
+            The instance itself
+
+        properties : dict
+            Properties is a dict containg all of the 7 cdx snapshot properties.
+
+        """
+        self.urlkey = properties["urlkey"]
+        self.timestamp = properties["timestamp"]
+        self.datetime_timestamp = datetime.strptime(self.timestamp, "%Y%m%d%H%M%S")
+        self.original = properties["original"]
+        self.mimetype = properties["mimetype"]
+        self.statuscode = properties["statuscode"]
+        self.digest = properties["digest"]
+        self.length = properties["length"]
+        self.archive_url = (
+            "https://web.archive.org/web/" + self.timestamp + "/" + self.original
+        )
+
+    def __str__(self):
+        """Returns the Cdx snapshot line.
+
+        Output format:
+        org,archive)/ 20080126045828 http://github.com text/html 200 Q4YULN754FHV2U6Q5JUT6Q2P57WEWNNY 1415
+
+        """
+        return "{urlkey} {timestamp} {original} {mimetype} {statuscode} {digest} {length}".format(
+            urlkey=self.urlkey,
+            timestamp=self.timestamp,
+            original=self.original,
+            mimetype=self.mimetype,
+            statuscode=self.statuscode,
+            digest=self.digest,
+            length=self.length,
+        )
--- a/waybackpy/utils.py
+++ b/waybackpy/utils.py
@ -0,0 +1,564 @@
+import re
+import time
+import requests
+from datetime import datetime
+
+from .exceptions import WaybackError, URLError, RedirectSaveError
+from .__version__ import __version__
+
+from urllib3.util.retry import Retry
+from requests.adapters import HTTPAdapter
+
+quote = requests.utils.quote
+default_user_agent = "waybackpy python package - https://github.com/akamhy/waybackpy"
+
+
+def _latest_version(package_name, headers):
+    """Returns the latest version of package_name.
+
+    Parameters
+    ----------
+    package_name : str
+        The name of the python package
+
+    headers : dict
+        Headers that will be used while making get requests
+
+    Return type is str
+
+    Use API <https://pypi.org/pypi/> to get the latest version of
+    waybackpy, but can be used to get latest version of any package
+    on PyPi.
+    """
+
+    request_url = "https://pypi.org/pypi/" + package_name + "/json"
+    response = _get_response(request_url, headers=headers)
+    data = response.json()
+    return data["info"]["version"]
+
+
+def _unix_timestamp_to_wayback_timestamp(unix_timestamp):
+    """Returns unix timestamp converted to datetime.datetime
+
+    Parameters
+    ----------
+    unix_timestamp : str, int or float
+        Unix-timestamp that needs to be converted to datetime.datetime
+
+    Converts and returns input unix_timestamp to datetime.datetime object.
+    Does not matter if unix_timestamp is str, float or int.
+    """
+
+    return datetime.utcfromtimestamp(int(unix_timestamp)).strftime("%Y%m%d%H%M%S")
+
+
+def _add_payload(instance, payload):
+    """Adds payload from instance that can be used to make get requests.
+
+    Parameters
+    ----------
+    instance : waybackpy.cdx.Cdx
+        instance of the Cdx class
+
+    payload : dict
+        A dict onto which we need to add keys and values based on instance.
+
+    instance is object of Cdx class and it contains the data required to fill
+    the payload dictionary.
+    """
+
+    if instance.start_timestamp:
+        payload["from"] = instance.start_timestamp
+
+    if instance.end_timestamp:
+        payload["to"] = instance.end_timestamp
+
+    if instance.gzip != True:
+        payload["gzip"] = "false"
+
+    if instance.match_type:
+        payload["matchType"] = instance.match_type
+
+    if instance.filters and len(instance.filters) > 0:
+        for i, f in enumerate(instance.filters):
+            payload["filter" + str(i)] = f
+
+    if instance.collapses and len(instance.collapses) > 0:
+        for i, f in enumerate(instance.collapses):
+            payload["collapse" + str(i)] = f
+
+    # Don't need to return anything as it's dictionary.
+    payload["url"] = instance.url
+
+
+def _timestamp_manager(timestamp, data):
+    """Returns the timestamp.
+
+    Parameters
+    ----------
+    timestamp : datetime.datetime
+        datetime object
+
+    data : dict
+        A python dictionary, which is loaded JSON os the availability API.
+
+    Return type:
+        datetime.datetime
+
+     If timestamp is not None then sets the value to timestamp itself.
+     If timestamp is None the returns the value from the last fetched API data.
+     If not timestamp and can not read the archived_snapshots form data return datetime.max
+    """
+
+    if timestamp:
+        return timestamp
+
+    if not data["archived_snapshots"]:
+        return datetime.max
+
+    return datetime.strptime(
+        data["archived_snapshots"]["closest"]["timestamp"], "%Y%m%d%H%M%S"
+    )
+
+
+def _check_match_type(match_type, url):
+    """Checks the validity of match_type parameter of the CDX GET requests.
+
+    Parameters
+    ----------
+    match_type : list
+        list  that may contain any or all from  ["exact", "prefix", "host", "domain"]
+        See https://github.com/akamhy/waybackpy/wiki/Python-package-docs#url-match-scope
+
+    url : str
+        The URL used to create the waybackpy Url object.
+
+    If not vaild match_type raise Exception.
+
+    """
+
+    if not match_type:
+        return
+
+    if "*" in url:
+        raise WaybackError("Can not use wildcard with match_type argument")
+
+    legal_match_type = ["exact", "prefix", "host", "domain"]
+
+    if match_type not in legal_match_type:
+        exc_message = "{match_type} is not an allowed match type.\nUse one from 'exact', 'prefix', 'host' or 'domain'".format(
+            match_type=match_type
+        )
+        raise WaybackError(exc_message)
+
+
+def _check_collapses(collapses):
+    """Checks the validity of collapse parameter of the CDX GET request.
+
+    One or more field or field:N to 'collapses=[]' where
+    field is one of (urlkey, timestamp, original, mimetype, statuscode,
+    digest and length) and N is the first N characters of field to test.
+
+    Parameters
+    ----------
+    collapses : list
+
+    If not vaild collapses raise Exception.
+
+    """
+
+    if not isinstance(collapses, list):
+        raise WaybackError("collapses must be a list.")
+
+    if len(collapses) == 0:
+        return
+
+    for collapse in collapses:
+        try:
+            match = re.search(
+                r"(urlkey|timestamp|original|mimetype|statuscode|digest|length)(:?[0-9]{1,99})?",
+                collapse,
+            )
+            field = match.group(1)
+
+            N = None
+            if 2 == len(match.groups()):
+                N = match.group(2)
+
+            if N:
+                if not (field + N == collapse):
+                    raise Exception
+            else:
+                if not (field == collapse):
+                    raise Exception
+
+        except Exception:
+            exc_message = "collapse argument '{collapse}' is not following the cdx collapse syntax.".format(
+                collapse=collapse
+            )
+            raise WaybackError(exc_message)
+
+
+def _check_filters(filters):
+    """Checks the validity of filter parameter of the CDX GET request.
+
+    Any number of filter params of the following form may be specified:
+        filters=["[!]field:regex"] may be specified..
+
+    Parameters
+    ----------
+    filters : list
+
+    If not vaild filters raise Exception.
+
+    """
+
+    if not isinstance(filters, list):
+        raise WaybackError("filters must be a list.")
+
+    # [!]field:regex
+    for _filter in filters:
+        try:
+
+            match = re.search(
+                r"(\!?(?:urlkey|timestamp|original|mimetype|statuscode|digest|length)):(.*)",
+                _filter,
+            )
+
+            key = match.group(1)
+            val = match.group(2)
+
+        except Exception:
+
+            exc_message = (
+                "Filter '{_filter}' is not following the cdx filter syntax.".format(
+                    _filter=_filter
+                )
+            )
+            raise WaybackError(exc_message)
+
+
+def _cleaned_url(url):
+    """Sanatize the url
+    Remove and replace illegal whitespace characters from the URL.
+    """
+    return str(url).strip().replace(" ", "%20")
+
+
+def _url_check(url):
+    """
+    Check for common URL problems.
+    What we are checking:
+    1) '.' in self.url, no url that ain't '.' in it.
+
+    If you known any others, please create a PR on the github repo.
+    """
+
+    if "." not in url:
+        exc_message = "'{url}' is not a vaild URL.".format(url=url)
+        raise URLError(exc_message)
+
+
+def _full_url(endpoint, params):
+    """API endpoint + GET parameters = full_url
+
+    Parameters
+    ----------
+    endpoint : str
+        The API endpoint
+
+    params : dict
+        Dictionary that has name-value pairs.
+
+    Return type is str
+
+    """
+
+    if not params:
+        return endpoint
+
+    full_url = endpoint if endpoint.endswith("?") else (endpoint + "?")
+    for key, val in params.items():
+        key = "filter" if key.startswith("filter") else key
+        key = "collapse" if key.startswith("collapse") else key
+        amp = "" if full_url.endswith("?") else "&"
+        full_url = full_url + amp + "{key}={val}".format(key=key, val=quote(str(val)))
+    return full_url
+
+
+def _get_total_pages(url, user_agent):
+    """
+    If showNumPages is passed in cdx API, it returns
+    'number of archive pages'and each page has many archives.
+
+    This func returns number of pages of archives (type int).
+    """
+    total_pages_url = (
+        "https://web.archive.org/cdx/search/cdx?url={url}&showNumPages=true".format(
+            url=url
+        )
+    )
+    headers = {"User-Agent": user_agent}
+    return int((_get_response(total_pages_url, headers=headers).text).strip())
+
+
+def _archive_url_parser(
+    header, url, latest_version=__version__, instance=None, response=None
+):
+    """Returns the archive after parsing it from the response header.
+
+    Parameters
+    ----------
+    header : str
+        The response header of WayBack Machine's Save API
+
+    url : str
+        The input url, the one used to created the Url object.
+
+    latest_version : str
+        The latest version of waybackpy (default is __version__)
+
+    instance : waybackpy.wrapper.Url
+        Instance of Url class
+
+
+    The wayback machine's save API doesn't
+    return JSON response, we are required
+    to read the header of the API response
+    and find the archive URL.
+
+    This method has some regular expressions
+    that are used to search for the archive url
+    in the response header of Save API.
+
+    Two cases are possible:
+    1) Either we find the archive url in
+       the header.
+
+    2) Or we didn't find the archive url in
+       API header.
+
+    If we found the archive URL we return it.
+
+    Return format:
+    web.archive.org/web/<TIMESTAMP>/<URL>
+
+    And if we couldn't find it, we raise
+    WaybackError with an error message.
+    """
+
+    if "save redirected" in header and instance:
+        time.sleep(60)  # makeup for archive time
+
+        now = datetime.utcnow().timetuple()
+        timestamp = _wayback_timestamp(
+            year=now.tm_year,
+            month=now.tm_mon,
+            day=now.tm_mday,
+            hour=now.tm_hour,
+            minute=now.tm_min,
+        )
+
+        return_str = "web.archive.org/web/{timestamp}/{url}".format(
+            timestamp=timestamp, url=url
+        )
+        url = "https://" + return_str
+
+        headers = {"User-Agent": instance.user_agent}
+
+        res = _get_response(url, headers=headers)
+
+        if res.status_code < 400:
+            return "web.archive.org/web/{timestamp}/{url}".format(
+                timestamp=timestamp, url=url
+            )
+
+    # Regex1
+    m = re.search(r"Content-Location: (/web/[0-9]{14}/.*)", str(header))
+    if m:
+        return "web.archive.org" + m.group(1)
+
+    # Regex2
+    m = re.search(
+        r"rel=\"memento.*?(web\.archive\.org/web/[0-9]{14}/.*?)>", str(header)
+    )
+    if m:
+        return m.group(1)
+
+    # Regex3
+    m = re.search(r"X-Cache-Key:\shttps(.*)[A-Z]{2}", str(header))
+    if m:
+        return m.group(1)
+
+    if response:
+        if response.url:
+            if "web.archive.org/web" in response.url:
+                m = re.search(
+                    r"web\.archive\.org/web/(?:[0-9]*?)/(?:.*)$",
+                    str(response.url).strip(),
+                )
+                if m:
+                    return m.group(0)
+
+    if instance:
+        newest_archive = None
+        try:
+            newest_archive = instance.newest()
+        except WaybackError:
+            pass  # We don't care as this is a save request
+
+        if newest_archive:
+            minutes_old = (
+                datetime.utcnow() - newest_archive.timestamp
+            ).total_seconds() / 60.0
+
+            if minutes_old <= 30:
+                archive_url = newest_archive.archive_url
+                m = re.search(r"web\.archive\.org/web/[0-9]{14}/.*", archive_url)
+                if m:
+                    instance.cached_save = True
+                    return m.group(0)
+
+    if __version__ == latest_version:
+        exc_message = (
+            "No archive URL found in the API response. "
+            "If '{url}' can be accessed via your web browser then either "
+            "Wayback Machine is malfunctioning or it refused to archive your URL."
+            "\nHeader:\n{header}".format(url=url, header=header)
+        )
+
+        if "save redirected" == header.strip():
+            raise RedirectSaveError(
+                "URL cannot be archived by wayback machine as it is a redirect.\nHeader:\n{header}".format(
+                    header=header
+                )
+            )
+    else:
+        exc_message = (
+            "No archive URL found in the API response. "
+            "If '{url}' can be accessed via your web browser then either "
+            "this version of waybackpy ({version}) is out of date or WayBack "
+            "Machine is malfunctioning. Visit 'https://github.com/akamhy/waybackpy' "
+            "for the latest version of waybackpy.\nHeader:\n{header}".format(
+                url=url, version=__version__, header=header
+            )
+        )
+
+    raise WaybackError(exc_message)
+
+
+def _wayback_timestamp(**kwargs):
+    """Returns a valid waybackpy timestamp.
+
+    The standard archive URL format is
+    https://web.archive.org/web/20191214041711/https://www.youtube.com
+
+    If we break it down in three parts:
+    1 ) The start (https://web.archive.org/web/)
+    2 ) timestamp (20191214041711)
+    3 ) https://www.youtube.com, the original URL
+
+
+    The near method of Url class in wrapper.py takes year, month, day, hour
+    and minute as arguments, their type is int.
+
+    This method takes those integers and converts it to
+    wayback machine timestamp and returns it.
+
+
+    zfill(2) adds 1 zero in front of single digit days, months hour etc.
+
+    Return type is string.
+    """
+
+    return "".join(
+        str(kwargs[key]).zfill(2) for key in ["year", "month", "day", "hour", "minute"]
+    )
+
+
+def _get_response(
+    endpoint,
+    params=None,
+    headers=None,
+    return_full_url=False,
+    retries=5,
+    backoff_factor=0.5,
+    no_raise_on_redirects=False,
+):
+    """Makes get requests.
+
+    Parameters
+    ----------
+    endpoint : str
+        The API endpoint.
+
+    params : dict
+        The get request parameters. (default is None)
+
+    headers : dict
+        Headers for the get request. (default is None)
+
+    return_full_url : bool
+        Determines whether the call went full url returned along with the
+        response. (default is False)
+
+    retries : int
+        Maximum number of retries for the get request. (default is 5)
+
+    backoff_factor : float
+        The factor by which we determine the next retry time after wait.
+        https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html
+        (default is 0.5)
+
+    no_raise_on_redirects : bool
+        If maximum 30(default for requests) times redirected than instead of
+        exceptions return. (default is False)
+
+
+    To handle WaybackError:
+    from waybackpy.exceptions import WaybackError
+
+    try:
+        ...
+    except WaybackError as e:
+        # handle it
+    """
+
+    # From https://stackoverflow.com/a/35504626
+    # By https://stackoverflow.com/users/401467/datashaman
+
+    s = requests.Session()
+
+    retries = Retry(
+        total=retries,
+        backoff_factor=backoff_factor,
+        status_forcelist=[500, 502, 503, 504],
+    )
+
+    s.mount("https://", HTTPAdapter(max_retries=retries))
+
+    # The URL with parameters required for the get request
+    url = _full_url(endpoint, params)
+
+    try:
+
+        if not return_full_url:
+            return s.get(url, headers=headers)
+
+        return (url, s.get(url, headers=headers))
+
+    except Exception as e:
+
+        reason = str(e)
+
+        if no_raise_on_redirects:
+            if "Exceeded 30 redirects" in reason:
+                return
+
+        exc_message = "Error while retrieving {url}.\n{reason}".format(
+            url=url, reason=reason
+        )
+
+        exc = WaybackError(exc_message)
+        exc.__cause__ = e
+        raise exc
--- a/waybackpy/wrapper.py
+++ b/waybackpy/wrapper.py
@ -1,169 +1,508 @@
-# -*- coding: utf-8 -*-
-
 import re
-import sys
-import json
-from datetime import datetime
-from waybackpy.exceptions import WaybackError
-from waybackpy.__version__ import __version__
+from datetime import datetime, timedelta

-if sys.version_info >= (3, 0):  # If the python ver >= 3
-    from urllib.request import Request, urlopen
-    from urllib.error import URLError
-else:  # For python2.x
-    from urllib2 import Request, urlopen, URLError
+from .exceptions import WaybackError
+from .cdx import Cdx
+from .utils import (
+    _archive_url_parser,
+    _wayback_timestamp,
+    _get_response,
+    default_user_agent,
+    _url_check,
+    _cleaned_url,
+    _timestamp_manager,
+    _unix_timestamp_to_wayback_timestamp,
+    _latest_version,
+)

-default_UA = "waybackpy python package - https://github.com/akamhy/waybackpy"
-
-
-def _archive_url_parser(header):
-    """Parse out the archive from header."""
-    # Regex1
-    arch = re.search(
-        r"rel=\"memento.*?(web\.archive\.org/web/[0-9]{14}/.*?)>", str(header)
-    )
-    if arch:
-        return arch.group(1)
-    # Regex2
-    arch = re.search(r"X-Cache-Key:\shttps(.*)[A-Z]{2}", str(header))
-    if arch:
-        return arch.group(1)
-    raise WaybackError(
-        "No archive URL found in the API response. "
-        "This version of waybackpy (%s) is likely out of date. Visit "
-        "https://github.com/akamhy/waybackpy for the latest version "
-        "of waybackpy.\nHeader:\n%s" % (__version__, str(header))
-    )
-
-
-def _wayback_timestamp(**kwargs):
-    """Return a formatted timestamp."""
-    return "".join(
-        str(kwargs[key]).zfill(2) for key in ["year", "month", "day", "hour", "minute"]
-    )
-
-
-def _get_response(req):
-    """Get response for the supplied request."""
-    try:
-        response = urlopen(req)  # nosec
-    except Exception:
-        try:
-            response = urlopen(req)  # nosec
-        except Exception as e:
-            exc = WaybackError("Error while retrieving %s" % req.full_url)
-            exc.__cause__ = e
-            raise exc
-    return response

 class Url:
-    """waybackpy Url object"""
+    """

-    def __init__(self, url, user_agent=default_UA):
+    Attributes
+    ----------
+    url : str
+        The input URL, wayback machine API operations are performed
+        on this URL after sanatizing it.
+
+    user_agent : str
+        The user_agent used while making the GET requests to the
+        Wayback machine APIs
+
+    _archive_url : str
+        Caches the last fetched archive.
+
+    timestamp : datetime.datetime
+        timestamp of the archive URL as datetime object for
+        greater usability
+
+    _JSON : dict
+        Caches the last fetched availability API data
+
+    latest_version : str
+        The latest version of waybackpy on PyPi
+
+    cached_save : bool
+        Flag to check if WayBack machine returned a cached
+        archive instead of creating a new archive. WayBack
+        machine allows only one 1 archive for an URL in
+        30 minutes. If the archive returned by WayBack machine
+        is older than 3 minutes than this flag is set to True
+
+    Methods turned properties
+    ----------
+    JSON : dict
+        JSON response of availability API as dictionary / loaded JSON
+
+    archive_url : str
+        Return the archive url, returns str
+
+    _timestamp : datetime.datetime
+        Sets the value of self.timestamp if still not set
+
+    Methods
+    -------
+    save()
+        Archives the URL on WayBack machine
+
+    get(url="", user_agent="", encoding="")
+        Gets the source of archive url, can also be used to get source
+        of any URL if passed into it.
+
+    near(year=None, month=None, day=None, hour=None, minute=None, unix_timestamp=None)
+        Wayback Machine can have many archives for a URL/webpage, sometimes we want
+        archive close to a specific time.
+        This method takes year, month, day, hour, minute and unix_timestamp as input.
+
+    oldest(year=1994)
+        The oldest archive of an URL.
+
+    newest()
+        The newest archive of an URL
+
+    total_archives(start_timestamp=None, end_timestamp=None)
+        total number of archives of an URL, the timeframe can be confined by
+        start_timestamp and end_timestamp
+
+    known_urls(subdomain=False, host=False, start_timestamp=None, end_timestamp=None, match_type="prefix")
+        Known URLs for an URL, subdomain, URL as prefix etc.
+
+    """
+
+    def __init__(self, url, user_agent=default_user_agent):
        self.url = url
-        self.user_agent = user_agent
-        self._url_check()  # checks url validity on init.
+        self.user_agent = str(user_agent)
+        _url_check(self.url)
+        self._archive_url = None
+        self.timestamp = None
+        self._JSON = None
+        self.latest_version = None
+        self.cached_save = False

    def __repr__(self):
-        return "waybackpy.Url(url=%s, user_agent=%s)" % (self.url, self.user_agent)
+        return "waybackpy.Url(url={url}, user_agent={user_agent})".format(
+            url=self.url, user_agent=self.user_agent
+        )

    def __str__(self):
-        return "%s" % self._clean_url()
+        if not self._archive_url:
+            self._archive_url = self.archive_url
+
+        return "{archive_url}".format(archive_url=self._archive_url)

    def __len__(self):
-        return len(self._clean_url())
+        """Number of days between today and the date of archive based on the timestamp

-    def _url_check(self):
-        """Check for common URL problems."""
-        if "." not in self.url:
-            raise URLError("'%s' is not a vaild URL." % self.url)
+        len() of waybackpy.wrapper.Url should return
+        the number of days between today and the
+        archive timestamp.

-    def _clean_url(self):
-        """Fix the URL, if possible."""
-        return str(self.url).strip().replace(" ", "_")
+        Can be applied on return values of near and its
+        childs (e.g. oldest) and if applied on waybackpy.Url()
+        whithout using any functions, it just grabs
+        self._timestamp and def _timestamp gets it
+        from def JSON.
+        """
+        td_max = timedelta(
+            days=999999999, hours=23, minutes=59, seconds=59, microseconds=999999
+        )
+
+        if not self.timestamp:
+            self.timestamp = self._timestamp
+
+        if self.timestamp == datetime.max:
+            return td_max.days
+
+        return (datetime.utcnow() - self.timestamp).days
+
+    @property
+    def JSON(self):
+        """Returns JSON response of availability API as dictionary / loaded JSON
+
+        return type : dict
+        """
+
+        # If user used the near method or any method that depends on near, we
+        # are certain that we have a loaded dictionary cached in self._JSON.
+        # Return the loaded JSON data.
+        if self._JSON:
+            return self._JSON
+
+        # If no cached data found, get data and return + cache it.
+        endpoint = "https://archive.org/wayback/available"
+        headers = {"User-Agent": self.user_agent}
+        payload = {"url": "{url}".format(url=_cleaned_url(self.url))}
+        response = _get_response(endpoint, params=payload, headers=headers)
+        self._JSON = response.json()
+        return self._JSON
+
+    @property
+    def archive_url(self):
+        """Return the archive url.
+
+        return type : str
+        """
+
+        if self._archive_url:
+            return self._archive_url
+
+        data = self.JSON
+
+        if not data["archived_snapshots"]:
+            archive_url = None
+        else:
+            archive_url = data["archived_snapshots"]["closest"]["url"]
+            archive_url = archive_url.replace(
+                "http://web.archive.org/web/", "https://web.archive.org/web/", 1
+            )
+        self._archive_url = archive_url
+        return archive_url
+
+    @property
+    def _timestamp(self):
+        """Sets the value of self.timestamp if still not set.
+
+        Return type : datetime.datetime
+
+        """
+        return _timestamp_manager(self.timestamp, self.JSON)

    def save(self):
-        """Create a new Wayback Machine archive for this URL."""
-        request_url = "https://web.archive.org/save/" + self._clean_url()
-        hdr = {"User-Agent": "%s" % self.user_agent}  # nosec
-        req = Request(request_url, headers=hdr)  # nosec
-        header = _get_response(req).headers
-        return "https://" + _archive_url_parser(header)
+        """Saves/Archive the URL.
+
+        To save a webpage on WayBack machine we
+        need to send get request to https://web.archive.org/save/
+
+        And to get the archive URL we are required to read the
+        header of the API response.
+
+        _get_response() takes care of the get requests.
+
+        _archive_url_parser() parses the archive from the header.
+
+        return type : waybackpy.wrapper.Url
+
+        """
+        request_url = "https://web.archive.org/save/" + _cleaned_url(self.url)
+        headers = {"User-Agent": self.user_agent}
+
+        response = _get_response(
+            request_url,
+            params=None,
+            headers=headers,
+            backoff_factor=2,
+            no_raise_on_redirects=True,
+        )
+
+        if not self.latest_version:
+            self.latest_version = _latest_version("waybackpy", headers=headers)
+        if response:
+            res_headers = response.headers
+        else:
+            res_headers = "save redirected"
+        self._archive_url = "https://" + _archive_url_parser(
+            res_headers,
+            self.url,
+            latest_version=self.latest_version,
+            instance=self,
+            response=response,
+        )
+
+        if response.status_code == 509:
+            raise WaybackError(
+                "Can not save '{url}'. You have probably reached the limit of active "
+                "sessions. Try later.".format(
+                    url=_cleaned_url(self.url), text=response.text
+                )
+            )
+
+        m = re.search(
+            r"https?://web.archive.org/web/([0-9]{14})/http", self._archive_url
+        )
+        str_ts = m.group(1)
+        ts = datetime.strptime(str_ts, "%Y%m%d%H%M%S")
+        now = datetime.utcnow()
+        total_seconds = int((now - ts).total_seconds())
+
+        if total_seconds > 60 * 3:
+            self.cached_save = True
+
+        self.timestamp = ts
+
+        return self

    def get(self, url="", user_agent="", encoding=""):
-        """Return the source code of the supplied URL.
-        If encoding is not supplied, it is auto-detected from the response.
+        """GET the source of archive or any other URL.
+
+        url : str, waybackpy.wrapper.Url
+            The method will return the source code of
+            this URL instead of last fetched archive.
+
+        user_agent : str
+            The user_agent for GET request to API
+
+        encoding : str
+            If user is using any other encoding that
+            can't be detected by response.encoding
+
+        Return the source code of the last fetched
+        archive URL if no URL is passed to this method
+        else it returns the source code of url passed.
+
+        If encoding is not supplied, it is auto-detected
+         from the response itself by requests package.
        """
-        if not url:
-            url = self._clean_url()
+
+        if not url and self._archive_url:
+            url = self._archive_url
+
+        elif not url and not self._archive_url:
+            url = _cleaned_url(self.url)
+
        if not user_agent:
            user_agent = self.user_agent

-        hdr = {"User-Agent": "%s" % user_agent}
-        req = Request(url, headers=hdr)  # nosec
-        response = _get_response(req)
+        headers = {"User-Agent": str(user_agent)}
+        response = _get_response(str(url), params=None, headers=headers)
+
        if not encoding:
            try:
-                encoding = response.headers["content-type"].split("charset=")[-1]
+                encoding = response.encoding
            except AttributeError:
                encoding = "UTF-8"
-        return response.read().decode(encoding.replace("text/html", "UTF-8", 1))

-    def near(self, year=None, month=None, day=None, hour=None, minute=None):
-        """ Return the closest Wayback Machine archive to the time supplied.
-            Supported params are year, month, day, hour and minute.
-            Any non-supplied parameters default to the current time.
+        return response.content.decode(encoding.replace("text/html", "UTF-8", 1))

+    def near(
+        self,
+        year=None,
+        month=None,
+        day=None,
+        hour=None,
+        minute=None,
+        unix_timestamp=None,
+    ):
        """
-        now = datetime.utcnow().timetuple()
-        timestamp = _wayback_timestamp(
-            year=year if year else now.tm_year,
-            month=month if month else now.tm_mon,
-            day=day if day else now.tm_mday,
-            hour=hour if hour else now.tm_hour,
-            minute=minute if minute else now.tm_min,
-        )
+        Parameters
+        ----------
+
+        year : int
+            Archive close to year
+
+        month : int
+            Archive close to month
+
+        day : int
+            Archive close to day
+
+        hour : int
+            Archive close to hour
+
+        minute : int
+            Archive close to minute
+
+        unix_timestamp : str, float or int
+            Archive close to this unix_timestamp
+
+        Wayback Machine can have many archives of a webpage,
+        sometimes we want archive close to a specific time.
+
+        This method takes year, month, day, hour and minute as input.
+        The input type must be integer. Any non-supplied parameters
+        default to the current time.
+
+        We convert the input to a wayback machine timestamp using
+        _wayback_timestamp(), it returns a string.
+
+        We use the wayback machine's availability API
+        (https://archive.org/wayback/available)
+        to get the closest archive from the timestamp.
+
+        We set self._archive_url to the archive found, if any.
+        If archive found, we set self.timestamp to its timestamp.
+        We self._JSON to the response of the availability API.
+
+        And finally return self.
+        """
+
+        if unix_timestamp:
+            timestamp = _unix_timestamp_to_wayback_timestamp(unix_timestamp)
+        else:
+            now = datetime.utcnow().timetuple()
+            timestamp = _wayback_timestamp(
+                year=year if year else now.tm_year,
+                month=month if month else now.tm_mon,
+                day=day if day else now.tm_mday,
+                hour=hour if hour else now.tm_hour,
+                minute=minute if minute else now.tm_min,
+            )
+
+        endpoint = "https://archive.org/wayback/available"
+        headers = {"User-Agent": self.user_agent}
+        payload = {
+            "url": "{url}".format(url=_cleaned_url(self.url)),
+            "timestamp": timestamp,
+        }
+        response = _get_response(endpoint, params=payload, headers=headers)
+        data = response.json()

-        request_url = "https://archive.org/wayback/available?url=%s&timestamp=%s" % (
-            self._clean_url(),
-            timestamp,
-        )
-        hdr = {"User-Agent": "%s" % self.user_agent}
-        req = Request(request_url, headers=hdr)  # nosec
-        response = _get_response(req)
-        data = json.loads(response.read().decode("UTF-8"))
        if not data["archived_snapshots"]:
            raise WaybackError(
-                "'%s' is not yet archived. Use wayback.Url(url, user_agent).save() "
-                "to create a new archive." % self._clean_url()
+                "Can not find archive for '{url}' try later or use wayback.Url(url, user_agent).save() "
+                "to create a new archive.\nAPI response:\n{text}".format(
+                    url=_cleaned_url(self.url), text=response.text
+                )
            )
        archive_url = data["archived_snapshots"]["closest"]["url"]
-        # wayback machine returns http sometimes, idk why? But they support https
        archive_url = archive_url.replace(
            "http://web.archive.org/web/", "https://web.archive.org/web/", 1
        )
-        return archive_url
+
+        self._archive_url = archive_url
+        self.timestamp = datetime.strptime(
+            data["archived_snapshots"]["closest"]["timestamp"], "%Y%m%d%H%M%S"
+        )
+        self._JSON = data
+
+        return self

    def oldest(self, year=1994):
-        """Return the oldest Wayback Machine archive for this URL."""
+        """
+        Returns the earliest/oldest Wayback Machine archive for the webpage.
+
+        Wayback machine has started archiving the internet around 1997 and
+        therefore we can't have any archive older than 1997, we use 1994 as the
+        deafult year to look for the oldest archive.
+
+        We simply pass the year in near() and return it.
+        """
+
        return self.near(year=year)

    def newest(self):
-        """Return the newest Wayback Machine archive available for this URL.
+        """Return the newest Wayback Machine archive available.
+
+        We return the return value of self.near() as it deafults to current UTC time.

        Due to Wayback Machine database lag, this may not always be the
        most recent archive.
+
+        return type : waybackpy.wrapper.Url
        """
+
        return self.near()

-    def total_archives(self):
-        """Returns the total number of Wayback Machine archives for this URL."""
-        hdr = {"User-Agent": "%s" % self.user_agent}
-        request_url = (
-            "https://web.archive.org/cdx/search/cdx?url=%s&output=json&fl=statuscode"
-            % self._clean_url()
+    def total_archives(self, start_timestamp=None, end_timestamp=None):
+        """Returns the total number of archives for an URL
+
+        Parameters
+        ----------
+        start_timestamp : str
+            1 to 14 digit string of numbers, you are not required to
+            pass a full 14 digit timestamp.
+
+        end_timestamp : str
+            1 to 14 digit string of numbers, you are not required to
+            pass a full 14 digit timestamp.
+
+
+        return type : int
+
+
+        A webpage can have multiple archives on the wayback machine
+        If someone wants to count the total number of archives of a
+        webpage on wayback machine they can use this method.
+
+        Returns the total number of Wayback Machine archives for the URL.
+
+        """
+
+        cdx = Cdx(
+            _cleaned_url(self.url),
+            user_agent=self.user_agent,
+            start_timestamp=start_timestamp,
+            end_timestamp=end_timestamp,
        )
-        req = Request(request_url, headers=hdr)  # nosec
-        response = _get_response(req)
-        # Most efficient method to count number of archives (yet)
-        return str(response.read()).count(",")
+
+        # cdx.snapshots() is generator not list.
+        i = 0
+        for _ in cdx.snapshots():
+            i = i + 1
+        return i
+
+    def known_urls(
+        self,
+        subdomain=False,
+        host=False,
+        start_timestamp=None,
+        end_timestamp=None,
+        match_type="prefix",
+    ):
+        """Yields known_urls URLs from the CDX API.
+
+        Parameters
+        ----------
+
+        subdomain : bool
+            If True fetch subdomain URLs along with the host URLs.
+
+        host : bool
+            Only fetch host URLs.
+
+        start_timestamp : str
+            1 to 14 digit string of numbers, you are not required to
+            pass a full 14 digit timestamp.
+
+        end_timestamp : str
+            1 to 14 digit string of numbers, you are not required to
+            pass a full 14 digit timestamp.
+
+        match_type : str
+            One of  (exact, prefix, host and domain)
+
+        return type : waybackpy.snapshot.CdxSnapshot
+
+        Yields list of URLs known to exist for given input.
+        Defaults to input URL as prefix.
+
+        Based on:
+        https://gist.github.com/mhmdiaa/adf6bff70142e5091792841d4b372050
+        By Mohammed Diaa (https://github.com/mhmdiaa)
+        """
+
+        if subdomain:
+            match_type = "domain"
+        if host:
+            match_type = "host"
+
+        cdx = Cdx(
+            _cleaned_url(self.url),
+            user_agent=self.user_agent,
+            start_timestamp=start_timestamp,
+            end_timestamp=end_timestamp,
+            match_type=match_type,
+            collapses=["urlkey"],
+        )
+
+        for snapshot in cdx.snapshots():
+            yield (snapshot.original)
Author	SHA1	Message	Date
akamhy	a7b805292d	changes made for v2.4.4 (update download_url) (#100 ) * v2.4.4 (update download_url) * v2.4.4 (update __version__) * +1 add jonasjancarik	2021-09-03 11:28:26 +05:30
Jonáš Jančařík	6dc6124dc4	Raise error on a 509 response (too many sessions) (#99 ) * Raise error on a 509 response (too many sessions) When the response code is 509, raise an error with an explanation (based on the actual error message contained in the response HTML). * Raise error on a 509 response (too many sessions) - linting	2021-09-03 08:04:36 +05:30
Jens Finkhaeuser	5a7fc7d568	Fix typo (#95 )	2021-04-13 16:58:34 +05:30
Akash Mahanty	5a9c861cad	v2.4.3 (#94 ) * 2.4.3 * 2.4.3	2021-04-02 10:41:59 +05:30
Akash Mahanty	dd1917c77e	added RedirectSaveError - for failed saves if the URL is a permanent … (#93 ) * added RedirectSaveError - for failed saves if the URL is a permanent redirect. * check if url is redirect before throwing exceptions, res.url is the redirect url if redirected at all * update tests and cli errors	2021-04-02 10:38:17 +05:30
Akash Mahanty	db8f902cff	Add doc strings (#90 ) * Added some docstrings in utils.py * renamed some func/meth to better names and added doc strings + lint * added more docstrings * more docstrings * improve docstrings * docstrings * added more docstrings, lint * fix import error	2021-01-26 11:56:03 +05:30
Akash Mahanty	88cda94c0b	v2.4.2 (#89 ) * v2.4.2 * v2.4.2	2021-01-24 17:03:35 +05:30
Akash Mahanty	09290f88d1	fix one more error	2021-01-24 16:58:53 +05:30
Akash Mahanty	e5835091c9	import re	2021-01-24 16:56:59 +05:30
Akash Mahanty	7312ed1f4f	set cached_save to True if archive older than 3 mins.	2021-01-24 16:53:36 +05:30
Akash Mahanty	6ae8f843d3	add --file to --known_urls	2021-01-24 16:15:11 +05:30
Akash Mahanty	36b936820b	known urls now yileds, more reliable. And save the file in chucks wrt to response. --file arg can be used to create output file, if --file not used no output will be saved in any file. (#88 )	2021-01-24 16:11:39 +05:30
Akash Mahanty	a3bc6aad2b	too much API usage by duplicate tests was causing too much tests failure	2021-01-23 21:08:21 +05:30
Akash Mahanty	edc2f63d93	Output valid JSON, dumps python dict. Make JSON valid.	2021-01-23 20:43:52 +05:30
Akash Mahanty	ffe0810b12	flag to check if the archive saved is 30 mins older or not	2021-01-16 12:06:08 +05:30
Akash Mahanty	40233eb115	improve code quality, remove unused imports, use system randomness etc	2021-01-16 11:35:13 +05:30
Akash Mahanty	d549d31421	improve save method, now we know that 302 errors indicates that wayback machine is archiving the URL and hasn't yet archived. We construct an artifical archive with the current UTC time and check for HTTP status code 20* or 30*. If we verify the archival, we return the artifical archive. The artificial archive will automatically point to the new archive or in best case will be the new archive after some time.	2021-01-16 10:47:43 +05:30
Akash Mahanty	0725163af8	mimify the logo, remove ugly old logos	2021-01-15 18:14:48 +05:30
Akash Mahanty	712471176b	better error messages(str), check latest version before asking for an upgrade and rm alive checking	2021-01-15 16:47:26 +05:30
Akash Mahanty	dcd7b03302	getting rid of c style str formatting, now using .format	2021-01-14 19:30:07 +05:30
Akash Mahanty	76205d9cf6	backoff_factor=2 for save, incr success by 25%	2021-01-13 10:13:16 +05:30
Akash Mahanty	ec0a0d04cc	+ dequeued0 dequeued0 (https://github.com/dequeued0) for reporting bugs and useful feature requests.	2021-01-12 10:52:41 +05:30
Akash Mahanty	7bb01df846	v2.4.1	2021-01-12 10:18:09 +05:30
Akash Mahanty	6142e0b353	get should retrive the last fetched archive by default	2021-01-12 10:07:14 +05:30
Akash Mahanty	a65990aee3	don't use pagination API if total pages <= 2	2021-01-12 09:46:07 +05:30
Akash Mahanty	259a024eb1	joke? they changed their robots.txt	2021-01-11 23:17:01 +05:30
Akash Mahanty	91402792e6	+ Supported Features tell what the package can do, many users probably do not read the full usage.	2021-01-11 23:01:18 +05:30
Akash Mahanty	eabf4dc046	don't fetch more pages if >=2 pages are empty	2021-01-11 22:43:14 +05:30
Akash Mahanty	5a7bd73565	support unix ts as an arg in near	2021-01-11 19:53:37 +05:30
Akash Mahanty	4693dbf9c1	change str repr of cdxsnapshot to cdx line	2021-01-11 09:34:37 +05:30
Akash Mahanty	f4f2e51315	V2.4.0 (#62 ) * v 2.4.0 * v 2.4.0	2021-01-10 11:53:45 +05:30
Akash Mahanty	d6b7df6837	no need to de-duplicate as we are collapsing the results by urlkey Same urls aren't recieved	2021-01-10 11:36:46 +05:30
Akash Mahanty	dafba5d0cb	collapses=["urlkey"] for known urls	2021-01-10 11:34:06 +05:30
Akash Mahanty	6c71dfbe41	use cdx matchtype for domain and host	2021-01-10 11:10:49 +05:30
Akash Mahanty	a6470b1036	not passing dict to cdxsnapshot	2021-01-10 10:40:32 +05:30
Akash Mahanty	04cda4558e	fix test	2021-01-10 03:18:09 +05:30
Akash Mahanty	625ed63482	remove asserts stmnts	2021-01-10 03:05:48 +05:30
Akash Mahanty	a03813315f	full cdx api support	2021-01-10 02:23:53 +05:30
Akash Mahanty	a2550f17d7	retries support for get requests	2021-01-06 01:58:38 +05:30
Akash Mahanty	15ef5816db	Always cast url to string, avoid passing waybackpy objects to _get_response	2021-01-05 19:46:17 +05:30
Akash Mahanty	93b52bd0fe	FIX : don't use self.user_agent if user_agent passed in get()	2021-01-05 19:31:27 +05:30
Akash Mahanty	28ff877081	Update README.md	2021-01-05 19:08:35 +05:30
Akash Mahanty	3e3ecff9df	l2 heading and lint	2021-01-05 01:59:29 +05:30
Akash Mahanty	ce64135ba8	ce	2021-01-05 01:52:35 +05:30
Akash Mahanty	2af6580ffb	docs link	2021-01-05 01:51:53 +05:30
Akash Mahanty	8a3c515176	v2.3.3	2021-01-05 01:49:26 +05:30
Akash Mahanty	d98c4f32ad	v2.3.3	2021-01-05 01:48:54 +05:30
Akash Mahanty	e0a4b007d5	improve docs	2021-01-05 01:46:12 +05:30
Akash Mahanty	6fb6b2deee	Update readme + new file CONTRIBUTORS.md (#59 ) * remove some badges * remove made with python button, obvious * - maintained badge, we already have latest commit badge - [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/akamhy/waybackpy/graphs/commit-activity) * re arranged order of badges * a bit more re odering * - release badge * - license section * center h1 * try once more' * removed the TOC * move the hr * Update README.md * + hr * h1 --> h2 * remove tests and pacakging info from here to docs/wiki * Update README.md * example inspired by psf/requests * CLI tool example gist * Update README.md * Update README.md * + license * Update README.md * authors list * Update CONTRIBUTORS.md * fix code * Update README.md * Update README.md * center the button	2021-01-05 00:30:07 +05:30
Akash Mahanty	1882862992	now using cdx Pagination API	2021-01-04 20:46:54 +05:30
Akash Mahanty	0c6107e675	increase coverage	2021-01-04 01:54:40 +05:30
Akash Mahanty	bd079978bf	inc coverage	2021-01-04 00:44:55 +05:30
Akash Mahanty	5dec4927cd	refactoring, try to code complexity	2021-01-04 00:14:38 +05:30
Akash Mahanty	62e5217b9e	reduce code complexity: refactoring, less flow breaking structures	2021-01-03 19:38:25 +05:30
Akash Mahanty	9823c809e9	Added doc strings in wrapper.py, documenting code and improving docs.	2021-01-03 17:11:32 +05:30
Akash Mahanty	db5737a857	JSON is now available for near and other other methods that call it	2021-01-02 18:52:46 +05:30
Akash Mahanty	ca0821a466	Wiki docs (#58 ) * move docs to wiki * Update README.md * Update setup.py	2021-01-02 12:20:43 +05:30
Akash Mahanty	bb4dbc7d3c	rm url = obj.url	2021-01-02 11:19:09 +05:30
Akash Mahanty	7c7fd75376	No need to fetch archive_url and timestamp from availability API on init (#55 ) * No need to fetch archive_url and timestamp from availability API on init. Not useful if all I want is to archive a page * Update test_wrapper.py * Update wrapper.py * Update test_wrapper.py * Update wrapper.py * Update cli.py * Update wrapper.py * Update __version__.py * Update __version__.py * Update __version__.py * Update __version__.py * Update setup.py * Update README.md	2021-01-02 11:10:23 +05:30
Akash Mahanty	0b71433667	v2.3.1 (#54 ) * 2.3.1 * 2.3.1	2021-01-01 19:15:23 +05:30
Akash Mahanty	1b499a7594	removed JSON from init, this was resulting in too much unnecessary taffic. Some users who are thousands of URLs were blocked by IA (#53 ) closes #52	2021-01-01 16:38:57 +05:30
Akash Mahanty	da390ee8a3	improve maintainability and reduce code cognitive complexity (#49 )	2020-12-15 10:24:13 +05:30
Akash Mahanty	d3e68d0e70	code formated with black (#47 )	2020-12-14 01:18:04 +05:30
Akash Mahanty	fde28d57aa	Update CONTRIBUTING.md	2020-12-14 00:16:29 +05:30
Akash Mahanty	6092e504c8	Update CONTRIBUTING.md	2020-12-14 00:15:51 +05:30
Akash Mahanty	93ef60ecd2	v2.3.0 (#46 ) * v2.3.0 * v2.3.0 * decrease line length	2020-12-14 00:14:54 +05:30
Akash Mahanty	461b3f74c9	UPDATE header image url	2020-12-13 23:09:59 +05:30
Akash Mahanty	3c53b411b0	Improve the appearance of readme (#45 ) * replaced text header wth image * svg * Update README.md * Update README.md * Update README.md * level 2 * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Create CONTRIBUTING.md * Update README.md * Add files via upload * Update README.md * Delete waybackpy-colored 284.png * Delete waybackpy colored.png * Update README.md * Update index.rst * Update index.rst * Update index.rst * Update setup.py * Delete index.rst * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md	2020-12-13 23:08:16 +05:30
pyup.io bot	8125526061	create pyup.io config file (#44 )	2020-12-13 22:31:49 +05:30
Akash Mahanty	2dc81569a8	Create .pep8speaks.yml	2020-12-13 17:58:09 +05:30
Akash Mahanty	fd163f3d36	Update wrapper.py	2020-12-13 17:12:32 +05:30
Akash Mahanty	a0a918cf0d	.	2020-12-13 17:10:28 +05:30
Akash Mahanty	4943cf6873	remove print stmnt, update ci	2020-12-13 16:37:35 +05:30
Akash Mahanty	bc3efc7d63	now using requests lib as it handles errors nicely (#42 ) * now using requests lib as it handles errors nicely * remove unused import (urllib) * FIX : replaced full_url with endpoint (not using urlib) * LINT : Found in waybackpy\wrapper.py:88 Unnecessary else after return	2020-12-13 15:44:37 +05:30
Akash Mahanty	f89368f16d	LINT : Found in waybackpy\wrapper.py:88 Unnecessary else after return	2020-12-13 15:39:23 +05:30
Akash Mahanty	c919a6a605	FIX : replaced full_url with endpoint (not using urlib)	2020-12-13 15:22:56 +05:30
Akash Mahanty	0280fca189	remove unused import (urllib)	2020-12-13 15:13:51 +05:30
Akash Mahanty	60ee8b95a8	now using requests lib as it handles errors nicely	2020-12-13 15:05:57 +05:30
Akash Mahanty	ca51c14332	deleted .travis.yml, link with flake (#41 ) close #38	2020-11-26 13:06:50 +05:30
Akash Mahanty	525cf17c6f	Update ci.yml	2020-11-26 12:14:15 +05:30
Akash Mahanty	406e03c52f	Update ci.yml	2020-11-26 12:04:45 +05:30
Akash Mahanty	672b33e83a	Update ci.yml	2020-11-26 10:10:10 +05:30
Akash Mahanty	b19b840628	Update ci.yml	2020-11-26 10:01:55 +05:30
Akash Mahanty	a6df4f899c	Update ci.yml	2020-11-26 09:26:11 +05:30
Akash Mahanty	7686e9c20d	Update README.md (#40 )	2020-11-26 09:18:26 +05:30
Akash Mahanty	3c5932bc39	now using gh actions (#39 )	2020-11-26 09:09:53 +05:30
Akash Mahanty	f9a986f489	Create ci.yml	2020-11-26 08:55:23 +05:30
Akash Mahanty	0d7458ee90	per https://docs.travis-ci.com/user/languages/python/, Python builds are not available on the macOS	2020-11-26 08:08:59 +05:30
Akash Mahanty	ac8b9d6a50	use osx, huge backlog on .org travis for linux builds	2020-11-26 08:03:27 +05:30
Akash Mahanty	58cd9c28e7	Threading enabled checking for URLs	2020-11-26 06:15:42 +05:30
Akash Mahanty	5088305a58	removed python2 compatibility code	2020-11-21 17:00:11 +05:30
Akash Mahanty	9f847a5e55	change pepy.tech download count link, they removed the month page	2020-11-11 10:44:14 +05:30
Akash Mahanty	6c04c2f3d3	+ https://github.com/akamhy/waybackpy/graphs/contributors	2020-11-04 08:09:30 +05:30
Akash Mahanty	925be7b17e	V2.2.0	2020-10-17 17:10:46 +05:30
Akash Mahanty	2b132456ac	updated index.rst and minor docs updated.	2020-10-17 16:56:51 +05:30
Akash Mahanty	50e3154a4e	lint README.md	2020-10-17 12:01:49 +05:30
Akash Mahanty	7aef50428f	add link to the repo	2020-10-17 11:51:56 +05:30
Akash Mahanty	d8ec0f5025	More pythonic code snippets in README (#36 )	2020-10-17 11:49:27 +05:30
Akash Mahanty	0a2f97c034	Update README, drop python 2 support * Drop python 2 support * updated docs * added new docs	2020-10-16 22:37:32 +05:30
Akash Mahanty	3e9cf23578	3.9 archive doesn't not exist yet.	2020-10-16 19:43:06 +05:30
Akash Mahanty	7f927ec7be	added tests for json and archive_url, updated broken tests (#34 ) * added tests for json and archive_url, updated broken tests * drop 2.7 support	2020-10-16 19:25:45 +05:30
Akash Mahanty	9de6393cd5	Add support for JSON and archive_url (#33 ) CLI support for JSON and archive_url attributes	2020-10-16 15:16:18 +05:30
danvalen1	91e7f65617	Fixing len() bug (#32 ) * added class functionality * Update wrapper.py * style edits * fixed bug with len() of url() * fixing len() bug * fixing len() bug * squashing bug * removed test notebook	2020-10-16 10:04:13 +05:30
danvalen1	d465454019	Adding attributes to Url class (#28 ) * added class functionality * Update wrapper.py * style edits	2020-10-15 22:10:32 +05:30
Akash Mahanty	1a81eb97fb	lint	2020-10-03 16:58:11 +05:30
Akash Mahanty	6b3b2e2a7d	tests for newly added known_urls feature	2020-10-03 09:33:50 +05:30
Akash Mahanty	82c65454e6	2.1.9	2020-10-03 01:34:15 +05:30
Akash Mahanty	19710461b6	Update setup.py	2020-10-03 01:33:46 +05:30
Akash Mahanty	a3661d6b85	Update index.rst	2020-10-03 01:33:15 +05:30
Akash Mahanty	58375e4ef4	fix broken links	2020-10-03 01:31:28 +05:30
Akash Mahanty	ea023e98da	update	2020-10-03 01:22:51 +05:30
Akash Mahanty	f1065ed1c8	v2.1.8	2020-10-03 01:18:30 +05:30
Akash Mahanty	315519b21f	2.1.8	2020-10-03 01:18:08 +05:30
Akash Mahanty	07c98661de	add usage for known urls (#26 ) * Update README.md * Update README.md * Update README.md * bash example for known urls * python examples / usage for known urls :) * Update README.md * Update README.md * Update README.md * Update README.md	2020-10-03 01:16:19 +05:30
Akash Mahanty	2cd991a54e	lint markdown	2020-10-02 23:34:06 +05:30
Akash Mahanty	ede251afb3	update tests	2020-10-02 23:10:48 +05:30
Akash Mahanty	a8ce970ca0	fixed yet another issue with tests :(	2020-10-02 23:01:59 +05:30
Akash Mahanty	243af26bf6	update version format in tests	2020-10-02 22:23:58 +05:30
Akash Mahanty	0f1db94884	license & packaging info	2020-10-02 22:10:30 +05:30
Akash Mahanty	c304f58ea2	update tests	2020-10-02 21:35:39 +05:30
Akash Mahanty	23f7222cb5	tweak	2020-10-02 21:01:32 +05:30
Akash Mahanty	ce7294d990	Implemented new feature, known urls for domain.	2020-10-02 20:27:28 +05:30
Akash Mahanty	c9fa114d2e	grammar	2020-10-01 23:50:03 +05:30
Akash	8b6bacb28e	Add files via upload	2020-09-08 09:23:59 +05:30
Akash	32d8ad7780	Update README.md (#24 ) - IA and Wayback machine logo, added new waybackpy logo. + changed pages to webpages in lead	2020-09-08 09:12:48 +05:30
Akash	cbf2f90faa	Add files via upload	2020-09-08 09:06:33 +05:30
Akash	4dde3e3134	Delete a.txt	2020-09-08 09:02:36 +05:30
Akash	1551e8f1c6	Add files via upload	2020-09-08 09:02:19 +05:30
Akash	c84f09e2d2	Create a.txt	2020-09-08 08:59:28 +05:30
Akash	57a32669b5	v2.1.7	2020-08-09 11:06:29 +05:30
Akash	fe017cbcc8	v2.1.7	2020-08-09 11:06:04 +05:30
Akash	5edb03d24b	update docs	2020-08-09 11:05:04 +05:30
Akash	c5de2232ba	Update test_wrapper.py	2020-08-09 10:53:00 +05:30
Akash	ca9186c301	update message, sometimes raised for poor performance by wayback machine even if the url is archived.	2020-08-09 10:43:16 +05:30
Akash	8a4b631c13	new regex to parse archive, IA changed the header again :(	2020-08-09 10:36:25 +05:30
Akash	ec9ce92f48	Update README.md (#23 ) * Update README.md * fix grammar	2020-07-26 10:30:54 +05:30
Akash	e95d35c37f	re arrange the badges, moved contributions welcome to top	2020-07-26 10:24:31 +05:30
				`@ -0,0 +1 @@`
				<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 176.612 41.908" height="158.392" width="667.51" xmlns:v="https://github.com/akamhy/waybackpy"><text transform="matrix(.862888 0 0 1.158899 -.748 -98.312)" y="110.937" x="0.931" xml:space="preserve" font-weight="bold" font-size="28.149" font-family="sans-serif" letter-spacing="0" word-spacing="0" writing-mode="lr-tb" fill="#003dff"><tspan y="110.937" x="0.931"><tspan y="110.937" x="0.931" letter-spacing="3.568" writing-mode="lr-tb">waybackpy</tspan></tspan></text><path d="M.749 0h153.787v4.864H.749zm22.076 37.418h153.787v4.49H22.825z" fill="navy"/><path d="M0 37.418h22.825v4.49H0zM154.536 0h21.702v4.864h-21.702z" fill="#f0f"/></svg>