v2.4.2 (#89 )

* v2.4.2 * v2.4.2
fix one more error
2021-01-24 17:03:35 +05:30 · 2021-01-24 16:58:53 +05:30 · 2021-01-24 16:56:59 +05:30 · 2021-01-24 16:53:36 +05:30 · 2021-01-24 16:15:11 +05:30 · 2021-01-24 16:11:39 +05:30
18 changed files with 1202 additions and 1033 deletions
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@ -5,3 +5,5 @@

 ## ACKNOWLEDGEMENTS
  - mhmdiaa (<https://github.com/mhmdiaa>) for <https://gist.github.com/mhmdiaa/adf6bff70142e5091792841d4b372050>. known_urls is based on this gist.
+  - datashaman (<https://stackoverflow.com/users/401467/datashaman>) for <https://stackoverflow.com/a/35504626>. _get_response is based on this amazing answer.
+  - dequeued0 (<https://github.com/dequeued0>) for reporting bugs and useful feature requests.
--- a/README.md
+++ b/README.md
@ -11,7 +11,6 @@
 <a href="https://github.com/akamhy/waybackpy/actions?query=workflow%3ACI"><img alt="Build Status" src="https://github.com/akamhy/waybackpy/workflows/CI/badge.svg"></a>
 <a href="https://www.codacy.com/manual/akamhy/waybackpy?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=akamhy/waybackpy&amp;utm_campaign=Badge_Grade"><img alt="Codacy Badge" src="https://api.codacy.com/project/badge/Grade/255459cede9341e39436ec8866d3fb65"></a>
 <a href="https://codecov.io/gh/akamhy/waybackpy"><img alt="codecov" src="https://codecov.io/gh/akamhy/waybackpy/branch/master/graph/badge.svg"></a>
-<a href="https://codeclimate.com/github/akamhy/waybackpy/maintainability"><img alt="Maintainability" src="https://api.codeclimate.com/v1/badges/942f13d8177a56c1c906/maintainability"></a>
 <a href="https://github.com/akamhy/waybackpy/blob/master/CONTRIBUTING.md"><img alt="Contributions Welcome" src="https://img.shields.io/static/v1.svg?label=Contributions&message=Welcome&color=0059b3&style=flat-square"></a>
 <a href="https://pepy.tech/project/waybackpy?versions=2*&versions=1*&versions=3*"><img alt="Downloads" src="https://pepy.tech/badge/waybackpy/month"></a>
 <a href="https://github.com/akamhy/waybackpy/commits/master"><img alt="GitHub lastest commit" src="https://img.shields.io/github/last-commit/akamhy/waybackpy?color=blue&style=flat-square"></a>
@ -34,9 +33,19 @@ Install directly from GitHub:
 pip install git+https://github.com/akamhy/waybackpy.git
 ```

+### Supported Features
+
+  - Archive webpage
+  - Retrieve all archives of a webpage/domain
+  - Retrieve archive close to a date or timestamp
+  - Retrieve all archives which have a particular prefix
+  - Get source code of the archive easily
+  - CDX API support
+
+
 ### Usage

-#### As a python package
+#### As a Python package
 ```python
 >>> import waybackpy

@ -46,21 +55,21 @@ pip install git+https://github.com/akamhy/waybackpy.git
 >>> wayback = waybackpy.Url(url, user_agent)

 >>> archive = wayback.save()
->>> str(archive)
+>>> archive.archive_url
 'https://web.archive.org/web/20210104173410/https://en.wikipedia.org/wiki/Multivariable_calculus'

 >>> archive.timestamp
 datetime.datetime(2021, 1, 4, 17, 35, 12, 691741)

 >>> oldest_archive = wayback.oldest()
->>> str(oldest_archive)
+>>> oldest_archive.archive_url
 'https://web.archive.org/web/20050422130129/http://en.wikipedia.org:80/wiki/Multivariable_calculus'

 >>> archive_close_to_2010_feb = wayback.near(year=2010, month=2)
->>> str(archive_close_to_2010_feb)
+>>> archive_close_to_2010_feb.archive_url
 'https://web.archive.org/web/20100215001541/http://en.wikipedia.org:80/wiki/Multivariable_calculus'

->>> str(wayback.newest())
+>>> wayback.newest().archive_url
 'https://web.archive.org/web/20210104173410/https://en.wikipedia.org/wiki/Multivariable_calculus'
 ```
 > Full Python package documentation can be found at <https://github.com/akamhy/waybackpy/wiki/Python-package-docs>.
@ -81,14 +90,14 @@ https://web.archive.org/web/20201221130522/https://en.wikipedia.org/wiki/Remote_
 $ waybackpy --total --url "https://en.wikipedia.org/wiki/Linux_kernel" --user_agent "my-unique-user-agent"
 1904

-$ waybackpy --known_urls --url akamhy.github.io --user_agent "my-unique-user-agent"
+$ waybackpy --known_urls --url akamhy.github.io --user_agent "my-unique-user-agent" --file
 https://akamhy.github.io
 https://akamhy.github.io/assets/js/scale.fix.js
 https://akamhy.github.io/favicon.ico
 https://akamhy.github.io/robots.txt
 https://akamhy.github.io/waybackpy/

-'akamhy.github.io-10-urls-m2a24y.txt' saved in current working directory
+'akamhy.github.io-urls-iftor2.txt' saved in current working directory
 ```
 > Full CLI documentation can be found at <https://github.com/akamhy/waybackpy/wiki/CLI-docs>.

@ -100,4 +109,3 @@ Released under the MIT License. See


 -----------------------------------------------------------------------------------------------------------------------------------------------
-
--- a/assets/waybackpy
+++ b/assets/waybackpy
@ -1,268 +0,0 @@
-<?xml version="1.0" standalone="no"?>
-<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN"
- "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
-<svg version="1.0" xmlns="http://www.w3.org/2000/svg"
- width="629.000000pt" height="103.000000pt" viewBox="0 0 629.000000 103.000000"
- preserveAspectRatio="xMidYMid meet">
-
-<g transform="translate(0.000000,103.000000) scale(0.100000,-0.100000)"
-fill="#000000" stroke="none">
-<path d="M0 515 l0 -515 3145 0 3145 0 0 515 0 515 -3145 0 -3145 0 0 -515z
-m5413 439 c31 -6 36 -10 31 -26 -3 -10 0 -26 7 -34 6 -8 10 -17 7 -20 -3 -2
-17 11 -32 31 -15 19 -41 39 -59 44 -38 11 -10 14 46 5z m150 -11 c-7 -2 -21
-2 -30 0 -10 3 -4 5 12 5 17 0 24 -2 18 -5z m-4869 -23 c-6 -6 -21 -6 -39 -1
-30 9 -30 9 10 10 25 1 36 -2 29 -9z m452 -37 c-3 -26 -15 -65 -25 -88 -10
-22 -21 -64 -25 -94 -3 -29 -14 -72 -26 -95 -11 -23 -20 -51 -20 -61 0 -30
-39 -152 -53 -163 -6 -5 -45 -12 -85 -14 -72 -5 -102 4 -102 33 0 6 -9 31 -21
-56 -11 25 -26 72 -33 103 -6 31 -17 64 -24 73 -8 9 -22 37 -32 64 l-18 48 -16
-39 c-9 -21 -16 -44 -16 -50 0 -6 -7 -24 -15 -40 -8 -16 -24 -63 -34 -106 -11
-43 -26 -93 -34 -112 -14 -34 -15 -35 -108 -46 -70 -9 -96 -9 -106 0 -21 17
-43 64 -43 92 0 14 -4 27 -9 31 -12 7 -50 120 -66 200 -8 35 -25 81 -40 103
-14 22 -27 52 -28 68 -2 28 0 29 48 31 28 1 82 5 120 9 54 4 73 3 82 -7 11
-15 53 -148 53 -170 0 -7 9 -32 21 -56 20 -41 39 -49 39 -17 0 8 -5 12 -10 9
-6 -3 -13 2 -16 12 -3 10 -10 26 -15 36 -14 26 7 21 29 -8 l20 -26 7 33 c7 35
-41 149 56 185 7 19 16 23 56 23 27 0 80 2 120 6 80 6 88 1 97 -71 3 -20 9 -42
-14 -48 5 -7 20 -43 32 -82 13 -38 24 -72 26 -74 2 -2 13 4 24 14 13 12 20 31
-20 55 0 20 7 56 15 81 7 24 19 63 25 87 12 47 31 60 89 61 l34 1 -7 -47z
-m3131 41 c17 -3 34 -12 37 -20 3 -7 1 -48 -4 -91 -4 -43 -7 -80 -4 -82 2 -2
-11 2 20 10 9 7 24 18 34 24 9 5 55 40 101 77 79 64 87 68 136 68 28 0 54 -4
-58 -10 3 -5 12 -7 20 -3 9 3 15 -1 15 -9 0 -13 -180 -158 -197 -158 -4 0 -14
-9 -20 -20 -11 -17 -7 -27 27 -76 22 -32 40 -63 40 -70 0 -7 6 -19 14 -26 7
-8 37 -48 65 -89 l52 -74 -28 -3 c-51 -5 -74 -12 -68 -22 9 -14 -59 -12 -73 2
-20 20 -13 30 10 14 34 -24 44 -19 17 8 -25 25 -109 140 -109 149 0 7 -60 97
-64 97 -2 0 -11 -10 -22 -22 -18 -21 -18 -21 0 -15 10 4 25 2 32 -4 18 -15 19
-35 2 -22 -7 6 -25 13 -39 17 -34 8 -39 -5 -39 -94 0 -38 -3 -75 -6 -84 -6
-16 -54 -22 -67 -9 -4 3 -40 7 -81 8 -101 2 -110 10 -104 97 3 37 10 73 16 80
-6 8 10 77 10 174 0 89 2 166 6 172 6 11 162 15 213 6z m301 -1 c-25 -2 -52
-11 -58 -19 -7 -7 -17 -14 -23 -14 -5 0 -2 9 8 20 14 16 29 20 69 18 l51 -2
-47 -3z m809 -9 c33 -21 65 -89 62 -132 -1 -21 1 -47 5 -59 9 -28 -26 -111
-51 -120 -10 -3 -25 -12 -33 -19 -10 -8 -70 -15 -170 -21 l-155 -8 4 -73 c4
-93 -10 -112 -80 -112 -26 0 -60 5 -74 12 -19 8 -31 8 -51 -1 -45 -20 -55 -1
-55 98 0 47 -1 111 -3 141 -2 30 -5 107 -7 170 l-4 115 65 2 c36 2 103 7 150
-11 150 15 372 13 397 -4z m338 -19 c11 -14 46 -54 78 -88 l58 -62 62 65 c34
-36 75 73 89 83 28 18 113 24 122 9 3 -5 -32 -51 -77 -102 -147 -167 -134 -143
-139 -253 -3 -54 -10 -103 -16 -109 -8 -8 -8 -17 -1 -30 14 -26 11 -28 -47
-29 -119 -2 -165 3 -174 22 -6 10 -9 69 -8 131 l2 113 -57 75 c-32 41 -80 102
-107 134 -27 33 -47 62 -45 66 3 4 58 6 122 4 113 -3 119 -5 138 -29z m-4233
-13 c16 -13 98 -150 98 -164 0 -4 29 -65 65 -135 36 -71 65 -135 65 -143 0 -10
-14 -17 -37 -21 -21 -4 -48 -10 -61 -16 -40 -16 -51 -10 -77 41 -29 57 -35 59
-157 38 -65 -11 -71 -14 -84 -43 -10 -25 -21 -34 -46 -38 -41 -6 -61 8 -48 33
-15 28 12 38 -12 42 -18 2 -23 10 -24 36 -1 27 3 35 23 43 13 5 34 9 46 9 23 0
-57 47 57 78 0 9 10 33 22 52 14 24 21 52 22 92 1 49 4 58 24 67 13 6 31 11 40
-11 9 0 26 7 36 15 24 18 28 18 48 3z m1701 0 c16 -12 97 -143 97 -157 0 -3 32
-69 70 -146 39 -76 67 -142 62 -147 -4 -4 -28 -12 -52 -17 -25 -6 -57 -13 -72
-17 -25 -6 -29 -2 -50 42 -14 30 -31 50 -43 53 -11 2 -57 -2 -103 -9 -79 -12
-83 -13 -96 -45 -10 -24 -22 -34 -46 -38 -43 -9 -53 -1 -45 39 5 30 3 34 -15
-34 -17 0 -20 6 -20 39 0 40 13 50 65 51 19 0 55 48 55 72 0 6 8 29 19 52 32
-72 41 107 31 127 -8 14 -5 21 12 33 12 9 32 16 43 16 11 0 29 7 39 15 24 18
-28 18 49 3z m-3021 -11 c-29 -9 -32 -13 -27 -39 8 -36 -11 -37 -20 -1 -8 32
-15 54 54 52 24 -1 23 -2 -7 -12z m3499 4 c-12 -8 -51 -4 -51 5 0 2 15 4 33 4
-22 0 28 -3 18 -9z m1081 -67 c2 -42 0 -78 -4 -81 -5 -2 -8 18 -8 45 0 27 -3
-64 -6 81 -4 19 -2 31 4 31 6 0 12 -32 14 -76z m-1951 46 c12 -7 19 -21 19 -38
-l-1 -27 -15 28 c-8 15 -22 27 -32 27 -9 0 -24 5 -32 10 -21 14 35 13 61 0z
-m1004 -3 c73 -19 135 -61 135 -92 0 -15 -8 -29 -21 -36 -18 -9 -30 -6 -69 15
-37 20 -62 26 -109 26 -54 0 -62 -3 -78 -26 -21 -32 -33 -130 -25 -191 9 -58
-41 -84 111 -91 38 -3 61 1 97 17 36 17 49 19 60 10 25 -21 15 -48 -28 -76 -38
-24 -54 -28 -148 -31 -114 -4 -170 10 -190 48 -6 11 -16 20 -23 20 -24 0 -59
-95 -59 159 0 59 20 122 42 136 6 3 10 13 10 22 0 31 80 82 130 83 19 0 42 5
-50 10 21 13 57 12 115 -3z m-1682 -23 c-14 -14 -28 -23 -31 -20 -8 8 29 46 44
-46 7 0 2 -11 -13 -26z m159 -2 c-20 -15 -22 -23 -16 -60 4 -28 3 -42 -5 -42
-7 0 -11 19 -11 50 0 36 5 52 18 59 28 17 39 12 14 -7z m1224 -28 c-39 -40
-46 -38 -19 7 15 24 40 41 52 33 2 -2 -13 -20 -33 -40z m-1538 -33 l62 -66 63
-68 c56 59 68 67 100 67 19 0 38 -3 40 -7 3 -5 -32 -53 -76 -108 -88 -108 -84
-97 -90 -255 l-2 -55 -87 -3 c-49 -1 -88 -1 -89 0 0 2 -3 50 -5 107 -3 75 -8
-109 -19 121 -8 9 -15 20 -15 25 0 4 -18 29 -41 54 -83 94 -89 102 -84 111 3 6
-45 9 93 9 l87 -1 63 -67z m786 59 c33 -12 48 -42 52 -107 3 -43 0 -57 -16 -73
-l-20 -20 20 -28 c26 -35 35 -89 21 -125 -18 -46 -66 -60 -226 -64 -77 -3 -166
-7 -198 -10 -84 -7 -99 9 -97 102 1 38 -1 125 -4 191 l-5 122 47 5 c26 3 103
-4 171 2 69 -2 134 1 145 5 29 12 80 12 110 0z m-1050 -16 c3 -8 2 -12 -4 -9
-6 3 -10 10 -10 16 0 14 7 11 14 -7z m-374 -22 c0 -9 -5 -24 -10 -32 -7 -11
-10 -5 -10 23 0 23 4 36 10 32 6 -3 10 -14 10 -23z m1701 16 c2 -21 -2 -43
-10 -51 -4 -4 -7 9 -8 28 -1 32 15 52 18 23z m2859 -28 c-11 -20 -50 -28 -50
-10 0 6 9 10 19 10 11 0 23 5 26 10 12 19 16 10 5 -10z m-4759 -47 c-8 -15
-10 -15 -11 -2 0 17 10 32 18 25 2 -3 -1 -13 -7 -23z m2599 9 c0 -9 -40 -35
-46 -29 -6 6 25 37 37 37 5 0 9 -3 9 -8z m316 -127 c-4 -19 -12 -37 -18 -41
-8 -5 -9 -1 -5 10 4 10 7 36 7 59 1 35 2 39 11 24 6 -10 8 -34 5 -52z m1942
-38 c-15 -16 -30 -45 -33 -65 -4 -21 -12 -38 -17 -38 -19 0 3 74 30 103 14 15
-30 27 36 27 5 0 -2 -12 -16 -27z m-3855 -16 c-6 -12 -15 -33 -20 -47 -9 -23
-10 -23 -15 -3 -3 12 3 34 14 52 23 35 37 34 21 -2z m3282 -82 c-23 -18 -81
-35 -115 -34 -17 1 -11 5 21 13 25 7 54 18 65 24 30 18 53 15 29 -3z m-2585
-130 c-7 -8 -19 -15 -27 -15 -10 0 -7 8 9 31 18 24 24 27 26 14 2 -9 -2 -22
-8 -30z m-1775 -5 c-4 -12 -9 -19 -12 -17 -3 3 -2 15 2 27 4 12 9 19 12 17 3
-3 2 -15 -2 -27z m820 -29 c-9 -8 -25 21 -25 44 0 16 3 14 15 -9 9 -16 13 -32
-10 -35z m2085 47 c0 -17 -31 -48 -47 -48 -11 0 -8 8 9 29 24 32 38 38 38 19z
-m-1655 -47 c-11 -10 -35 11 -35 30 0 21 0 21 19 -2 11 -13 18 -26 16 -28z
-m1221 24 c13 -14 21 -25 18 -25 -11 0 -54 33 -54 41 0 15 12 10 36 -16z
-m-1428 -7 c-3 -7 -18 -14 -34 -15 -20 -1 -22 0 -6 4 12 2 22 9 22 14 0 5 5 9
-11 9 6 0 9 -6 7 -12z m3574 -45 c8 -10 6 -13 -11 -13 -18 0 -21 6 -20 38 0 34
-1 35 10 13 5 -14 15 -31 21 -38z m-4097 14 c19 -4 19 -4 2 -12 -18 -7 -46 16
-47 39 0 6 6 3 13 -6 6 -9 21 -18 32 -21z m1700 1 c19 -5 19 -5 2 -13 -18 -7
-46 17 -46 40 0 6 5 3 12 -6 7 -9 21 -19 32 -21z m-1970 12 c-3 -5 -21 -9 -38
-9 l-32 2 35 7 c19 4 36 8 38 9 2 0 0 -3 -3 -9z m350 0 c-27 -12 -35 -12 -35
-0 0 6 12 10 28 9 24 0 25 -1 7 -9z m1350 0 c-3 -5 -18 -9 -33 -9 l-27 1 30 8
-c17 4 31 8 33 9 2 0 0 -3 -3 -9z m355 0 c-19 -13 -30 -13 -30 0 0 6 10 10 23
-10 18 0 19 -2 7 -10z m-2324 -35 c-6 -22 -11 -25 -44 -24 -31 2 -32 3 -9 6 18
-3 32 14 39 29 14 30 23 24 14 -11z m2839 16 c-14 -14 -73 -26 -60 -13 6 5 19
-12 30 15 34 8 40 8 30 -2z m212 -21 l48 -8 -47 -1 c-56 -1 -78 6 -78 26 0 12
-3 13 14 3 8 -6 36 -15 63 -20z m116 -1 c-6 -6 -18 -6 -28 -3 -18 7 -18 8 1 14
-23 9 39 1 27 -11z m633 -14 c31 5 35 4 21 -5 -9 -6 -34 -10 -55 -8 -31 3 -37
-7 -40 28 l-3 25 19 -23 c16 -20 24 -23 58 -17z m939 15 c16 -7 11 -9 -20 -9
-29 -1 -36 2 -25 9 17 11 19 11 45 0z m-5445 -24 c6 -8 21 -16 33 -18 19 -3
-20 -4 5 -10 -12 -5 -27 1 -45 17 -16 13 -23 25 -17 25 6 0 17 -6 24 -14z m150
-76 c0 -11 -4 -20 -10 -20 -14 0 -13 -103 1 -117 21 -21 2 -43 -36 -43 -19 0
-35 5 -35 11 0 8 -5 7 -15 -1 -21 -17 -44 2 -28 22 22 26 20 128 -2 128 -8 0
-15 9 -15 19 0 18 8 20 70 20 63 0 70 -2 70 -19z m1189 -63 c17 -32 31 -62 31
-66 0 -14 -43 -21 -57 -9 -7 6 -29 12 -48 14 -26 2 -35 -1 -40 -16 -4 -12 -12
-17 -21 -13 -8 3 -13 12 -10 19 3 8 1 14 -4 14 -18 0 -10 22 9 27 22 6 43 46
-35 67 -3 9 5 20 23 30 34 18 38 14 82 -67z m2146 -8 l34 -67 -25 -6 c-14 -4
-31 -3 -37 2 -7 5 -29 12 -49 16 -31 6 -38 4 -38 -9 0 -8 -7 -15 -15 -15 -8 0
-15 7 -15 15 0 8 -4 15 -10 15 -19 0 -10 21 14 30 16 6 27 20 31 40 4 18 16
-41 27 52 26 26 40 14 83 -73z m-3205 51 c8 -10 20 -26 27 -36 10 -17 12 -14
-12 19 1 36 2 37 37 37 l37 0 -8 -72 c-3 -40 -11 -76 -17 -79 -20 -13 -43 3
-62 42 -27 56 -34 56 -41 4 -7 -42 -9 -44 -34 -39 -35 9 -34 6 -35 71 -1 41 4
-62 14 70 18 15 50 7 70 -17z m280 11 c-5 -11 -15 -21 -21 -23 -13 -4 -14 -101
-3 -120 5 -8 1 -9 -10 -5 -10 4 -29 7 -42 7 -22 0 -24 3 -24 55 0 52 -1 55
-26 55 -19 0 -25 5 -22 18 2 13 17 18 68 23 36 3 71 6 78 7 9 2 10 -3 2 -17z
-m178 -3 c3 -15 -4 -18 -32 -18 -25 0 -36 -4 -36 -15 0 -10 11 -15 35 -15 24 0
-35 -5 35 -15 0 -11 -11 -15 -41 -15 -55 0 -47 -24 9 -28 29 -2 42 -8 42 -18 0
-16 -25 -17 -108 -7 l-53 6 2 56 c3 92 1 90 77 88 55 -2 67 -5 70 -19z m230
-10 c18 -18 14 -56 -7 -77 -17 -17 -18 -21 -5 -40 14 -19 13 -21 -4 -21 -10 0
-28 11 -40 25 -24 27 -52 24 -52 -5 0 -24 -9 -29 -43 -23 -26 5 -27 7 -27 73
-0 45 4 70 13 73 26 11 153 7 165 -5z m557 -2 c47 -20 47 -40 0 -32 -53 10 -77
-7 -73 -52 l3 -37 48 1 c26 0 47 -3 47 -6 0 -35 -108 -42 -140 -10 -29 29 -27
-94 5 125 28 28 60 31 110 11z m213 -8 c3 -15 -4 -18 -38 -18 -50 0 -51 -22 -1
-30 44 -7 44 -24 -1 -28 -54 -5 -52 -32 2 -32 29 0 40 -4 40 -15 0 -17 -28
-19 -104 -9 l-46 7 0 72 0 72 72 -1 c61 -1 73 -4 76 -18z m312 6 c0 -9 -9 -18
-21 -21 -19 -5 -20 -12 -17 -69 3 -63 3 -63 -22 -58 -49 11 -50 12 -50 64 0
-43 -3 50 -20 50 -13 0 -20 7 -20 20 0 17 8 20 68 23 37 2 70 4 75 5 4 1 7 -5
-7 -14z m155 6 c65 -15 94 -73 62 -125 -14 -24 -25 -28 -92 -33 -44 -3 -54 0
-78 24 -34 34 -36 82 -4 111 37 34 53 37 112 23z m505 -3 c0 -8 -9 -40 -20
-72 -11 -31 -18 -60 -16 -64 3 -4 -9 -8 -25 -9 -25 -2 -31 3 -51 45 l-22 47
-21 -46 c-17 -38 -25 -47 -51 -50 -24 -3 -30 0 -32 17 -1 12 -8 40 -17 64 -21
-59 -20 61 20 61 27 0 35 -4 35 -17 0 -10 4 -24 9 -32 7 -11 13 -6 25 23 14 35
-18 37 53 34 32 -2 39 -7 41 -28 6 -43 19 -43 36 -1 15 40 36 55 36 28z m136
-4 c27 -45 64 -115 64 -122 0 -13 -42 -22 -54 -12 -6 5 -28 11 -49 15 -32 6
-38 4 -45 -13 -8 -24 -26 -16 -36 16 -5 16 -2 25 13 32 11 6 25 28 32 48 17
-55 53 71 75 36z m840 -4 c22 -18 16 -32 -11 -25 -59 15 -94 -18 -74 -71 8 -21
-15 -24 47 -22 40 3 66 -7 57 -21 -3 -5 -12 -7 -20 -3 -8 3 -15 1 -15 -4 0 -17
-111 4 -126 24 -26 34 -13 100 25 131 18 14 96 9 117 -9z m816 -54 l37 -70
-25 -8 c-16 -6 -30 -5 -40 3 -22 19 -81 22 -88 4 -7 -19 -26 -18 -26 1 0 8 -4
-15 -10 15 -20 0 -9 21 15 30 24 9 30 24 27 63 -1 10 2 16 7 13 5 -3 12 1 15
-10 4 9 15 14 28 12 17 -2 33 -22 60 -73z m183 61 c47 -20 47 -40 0 -32 -46 9
-75 -7 -75 -42 0 -45 13 -56 59 -49 30 4 41 2 41 -8 0 -32 -95 -35 -134 -4
-30 24 -34 64 -11 109 22 43 60 51 120 26z m398 4 c19 0 24 -26 6 -32 -13 -4
-16 -42 -5 -84 l7 -32 -55 -1 c-57 0 -68 7 -41 29 17 14 21 90 5 90 -5 0 -10
-10 -10 21 0 19 4 21 38 15 20 -3 45 -6 55 -6z m117 0 c5 0 17 -13 27 -30 9
-16 21 -30 25 -30 4 0 8 14 8 30 0 28 3 30 36 30 l36 0 -5 -71 c-2 -42 -9 -74
-17 -79 -15 -9 -50 -1 -50 12 0 5 -11 25 -24 45 l-24 35 -9 -42 c-4 -23 -11
-41 -15 -41 -5 1 -19 1 -32 1 -23 0 -23 2 -20 67 3 66 15 88 42 78 8 -3 18 -5
-22 -5z m317 -3 c21 -15 4 -27 -38 -27 -50 0 -49 -23 1 -30 50 -8 51 -30 1 -30
-30 0 -41 -4 -41 -15 0 -11 12 -15 45 -15 33 0 45 -4 45 -15 0 -17 -24 -19
-108 -8 l-54 6 6 66 c3 36 5 69 6 72 0 11 124 7 137 -4z m-4374 -7 c9 0 17 -4
-17 -10 0 -5 -16 -10 -35 -10 -28 0 -35 -4 -35 -19 0 -15 8 -21 35 -23 20 -2
-35 -7 35 -13 0 -5 -15 -11 -35 -13 -30 -3 -35 -7 -35 -28 0 -18 -5 -24 -23
-24 -13 0 -28 -5 -33 -10 -7 -7 -11 9 -13 51 -1 35 -6 70 -11 79 -7 13 -2 16
-28 18 20 2 39 5 41 8 3 3 15 3 26 0 11 -3 28 -6 38 -6z m1856 -14 c23 -21 38
-20 51 4 6 11 17 20 25 20 16 0 20 -16 6 -24 -17 -11 -50 -94 -44 -114 4 -18
-0 -20 -34 -19 l-38 2 3 40 c3 33 -1 45 -22 64 -36 34 -34 53 5 47 17 -2 39
-12 48 -20z m299 -18 c-3 -24 -1 -55 3 -70 6 -24 4 -29 -14 -32 -41 -9 -155
-14 -163 -7 -5 3 -10 36 -12 73 l-2 67 67 4 c38 2 81 4 97 5 27 2 28 1 24 -40z
-m512 22 c0 -11 4 -20 9 -20 4 0 20 9 34 20 25 20 57 27 57 12 0 -5 -14 -18
-30 -31 l-30 -22 26 -44 c24 -41 24 -45 7 -45 -10 0 -27 14 -37 31 -21 35 -40
-34 -44 -4 -3 -22 -8 -27 -32 -27 -39 0 -43 11 -35 86 l7 64 34 0 c27 0 34 -4
-34 -20z m511 12 c0 -4 1 -36 2 -72 l2 -65 -32 -3 c-28 -3 -32 0 -39 30 l-7 33
-14 -33 c-16 -40 -34 -41 -51 -2 -16 35 -35 31 -26 -6 6 -22 3 -24 -30 -24
-l-36 0 -1 55 c-1 30 -2 61 -3 68 -1 7 14 13 34 15 33 3 38 -1 59 -39 l24 -42
-18 24 c10 13 19 29 19 35 0 5 4 14 10 20 11 11 70 16 71 6z m509 -28 c0 -31 3
-35 23 -32 17 2 23 11 25 36 3 29 6 32 36 32 l34 0 1 -75 1 -75 -29 0 c-23 0
-30 5 -35 26 -5 19 -12 25 -29 22 -17 -2 -22 -10 -22 -30 1 -24 -2 -27 -25
-22 -45 10 -50 13 -50 33 0 11 -6 21 -12 24 -10 4 -10 7 0 18 6 7 12 25 12 39
-0 34 7 40 42 40 25 0 28 -3 28 -36z"/>
-<path d="M800 860 c30 -24 44 -25 36 -4 -3 9 -6 18 -6 20 0 2 -12 4 -27 4
-l-28 0 25 -20z"/>
-<path d="M310 850 c0 -5 5 -10 10 -10 6 0 10 5 10 10 0 6 -4 10 -10 10 -5 0
-10 -4 -10 -10z"/>
-<path d="M366 851 c-8 -12 21 -34 33 -27 6 4 8 13 4 21 -6 17 -29 20 -37 6z"/>
-<path d="M920 586 c0 -9 7 -16 16 -16 9 0 14 5 12 12 -6 18 -28 21 -28 4z"/>
-<path d="M965 419 c-4 -6 -5 -13 -2 -16 7 -7 27 6 27 18 0 12 -17 12 -25 -2z"/>
-<path d="M362 388 c3 -7 15 -14 29 -16 24 -4 24 -3 4 12 -24 19 -38 20 -33 4z"/>
-<path d="M4106 883 c-14 -14 -5 -31 14 -26 11 3 20 9 20 13 0 10 -26 20 -34
-13z"/>
-<path d="M4590 870 c-14 -10 -22 -22 -18 -25 7 -8 57 25 58 38 0 12 -14 8 -40
-13z"/>
-<path d="M4380 655 c7 -8 17 -15 22 -15 6 0 5 7 -2 15 -7 8 -17 15 -22 15 -6
-0 -5 -7 2 -15z"/>
-<path d="M4082 560 c-6 -11 -12 -28 -12 -37 0 -13 6 -10 20 12 11 17 20 33 20
-38 0 14 -15 7 -28 -13z"/>
-<path d="M4496 466 c3 -9 11 -16 16 -16 13 0 5 23 -10 28 -7 2 -10 -2 -6 -12z"/>
-<path d="M4236 445 c-9 -24 5 -41 16 -20 7 11 7 20 0 27 -6 6 -12 3 -16 -7z"/>
-<path d="M4540 400 c0 -5 5 -10 11 -10 5 0 7 5 4 10 -3 6 -8 10 -11 10 -2 0
-4 -4 -4 -10z"/>
-<path d="M5330 891 c0 -11 26 -22 34 -14 3 3 3 10 0 14 -7 12 -34 11 -34 0z"/>
-<path d="M4805 880 c-8 -13 4 -32 16 -25 12 8 12 35 0 35 -6 0 -13 -4 -16 -10z"/>
-<path d="M5070 821 l-35 -6 0 -75 0 -75 40 -3 c22 -2 58 3 80 10 38 12 40 16
-47 63 12 88 -16 107 -132 86z m109 -36 c3 -19 2 -19 -15 -4 -11 9 -26 19 -34
-22 -8 4 -2 5 15 4 21 -1 31 -8 34 -22z"/>
-<path d="M5411 694 c0 -11 3 -14 6 -6 3 7 2 16 -1 19 -3 4 -6 -2 -5 -13z"/>
-<path d="M5223 674 c-10 -22 -10 -25 3 -20 9 3 18 6 20 6 2 0 4 9 4 20 0 28
-13 25 -27 -6z"/>
-<path d="M5001 422 c-14 -27 -12 -35 8 -23 7 5 11 17 9 27 -4 17 -5 17 -17 -4z"/>
-<path d="M5673 883 c9 -9 19 -14 23 -11 10 10 -6 28 -24 28 -15 0 -15 -1 1
-17z"/>
-<path d="M5866 717 c-14 -10 -16 -16 -7 -22 15 -9 35 8 30 24 -3 8 -10 7 -23
-2z"/>
-<path d="M5700 520 c0 -5 5 -10 10 -10 6 0 10 5 10 10 0 6 -4 10 -10 10 -5 0
-10 -4 -10 -10z"/>
-<path d="M5700 451 c0 -23 25 -46 34 -32 4 6 -2 19 -14 31 -19 19 -20 19 -20
-1z"/>
-<path d="M1375 850 c-3 -5 -1 -10 4 -10 6 0 11 5 11 10 0 6 -2 10 -4 10 -3 0
-8 -4 -11 -10z"/>
-<path d="M1391 687 c-5 -12 -7 -35 -6 -50 2 -15 -1 -27 -7 -27 -5 0 -6 9 -3
-21 5 15 4 19 -4 15 -6 -4 -11 -18 -11 -30 0 -19 7 -25 33 -29 17 -2 42 1 55 7
-l22 12 -27 52 c-29 57 -39 63 -52 29z"/>
-<path d="M1240 520 c0 -5 5 -10 10 -10 6 0 10 5 10 10 0 6 -4 10 -10 10 -5 0
-10 -4 -10 -10z"/>
-<path d="M1575 490 c4 -14 9 -27 11 -29 7 -7 34 9 34 20 0 7 -3 9 -7 6 -3 -4
-15 1 -26 10 -19 17 -19 17 -12 -7z"/>
-<path d="M3094 688 c-4 -13 -7 -35 -6 -50 1 -16 -2 -28 -8 -28 -5 0 -6 7 -3
-17 4 11 3 14 -5 9 -16 -10 -15 -49 1 -43 6 2 20 0 29 -4 10 -6 27 -5 41 2 28
-13 26 30 -8 86 -24 39 -31 41 -41 11z"/>
-<path d="M3270 502 c0 -19 29 -47 39 -37 6 7 1 16 -15 28 -13 10 -24 14 -24 9z"/>
-<path d="M3570 812 c-13 -10 -21 -24 -19 -31 3 -7 15 0 34 19 31 33 21 41 -15
-12z"/>
-<path d="M3855 480 c-3 -5 -1 -10 4 -10 6 0 11 5 11 10 0 6 -2 10 -4 10 -3 0
-8 -4 -11 -10z"/>
-<path d="M3585 450 c3 -5 13 -10 21 -10 8 0 12 5 9 10 -3 6 -13 10 -21 10 -8
-0 -12 -4 -9 -10z"/>
-<path d="M1880 820 c0 -5 7 -10 16 -10 8 0 12 5 9 10 -3 6 -10 10 -16 10 -5 0
-9 -4 -9 -10z"/>
-<path d="M2042 668 c-7 -7 -12 -23 -12 -37 1 -24 2 -24 16 8 16 37 14 47 -4
-29z"/>
-<path d="M2015 560 c4 -6 11 -8 16 -5 14 9 11 15 -7 15 -8 0 -12 -5 -9 -10z"/>
-<path d="M1915 470 c4 -6 11 -8 16 -5 14 9 11 15 -7 15 -8 0 -12 -5 -9 -10z"/>
-<path d="M2320 795 c0 -14 5 -25 10 -25 6 0 10 11 10 25 0 14 -4 25 -10 25 -5
-0 -10 -11 -10 -25z"/>
-<path d="M2660 771 c0 -6 5 -13 10 -16 6 -3 10 1 10 9 0 9 -4 16 -10 16 -5 0
-10 -4 -10 -9z"/>
-<path d="M2487 763 c-4 -3 -7 -23 -7 -43 0 -36 1 -38 40 -43 68 -9 116 20 102
-61 -3 10 -7 10 -18 1 -11 -9 -14 -7 -14 10 0 18 -6 21 -48 21 -27 0 -52 -3
-55 -7z"/>
-<path d="M2320 719 c0 -5 5 -7 10 -4 6 3 10 8 10 11 0 2 -4 4 -10 4 -5 0 -10
-5 -10 -11z"/>
-<path d="M2480 550 l0 -40 66 1 c58 1 67 4 76 25 18 39 -4 54 -78 54 l-64 0 0
-40z m40 15 c-7 -8 -16 -15 -21 -15 -5 0 -6 7 -3 15 4 8 13 15 21 15 13 0 13
-3 3 -15z"/>
-<path d="M2665 527 c-4 -10 -5 -21 -1 -24 10 -10 18 4 13 24 -4 17 -4 17 -12
-0z"/>
-<path d="M1586 205 c-9 -23 -8 -25 9 -25 17 0 19 9 6 28 -7 11 -10 10 -15 -3z"/>
-<path d="M3727 200 c-3 -13 0 -20 9 -20 15 0 19 26 5 34 -5 3 -11 -3 -14 -14z"/>
-<path d="M1194 229 c-3 -6 -2 -15 3 -20 13 -13 43 -1 43 17 0 16 -36 19 -46 3z"/>
-<path d="M2470 224 c-18 -46 -12 -73 15 -80 37 -9 52 1 59 40 5 26 3 41 -8 51
-23 24 -55 18 -66 -11z"/>
-<path d="M3120 196 c0 -9 7 -16 16 -16 17 0 14 22 -4 28 -7 2 -12 -3 -12 -12z"/>
-<path d="M4750 201 c0 -12 5 -21 10 -21 6 0 10 6 10 14 0 8 -4 18 -10 21 -5 3
-10 -3 -10 -14z"/>
-<path d="M3515 229 c-8 -12 14 -31 30 -26 6 2 10 10 10 18 0 17 -31 24 -40 8z"/>
-<path d="M3521 161 c-7 -5 -9 -11 -4 -14 14 -9 54 4 47 14 -7 11 -25 11 -43 0z"/>
-</g>
-</svg>
--- a/assets/waybackpy-black-white-ls-400.png
+++ b/assets/waybackpy-black-white-ls-400.png
--- a/assets/waybackpy_logo.svg
+++ b/assets/waybackpy_logo.svg
@ -1,85 +1 @@
-<?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<svg
-   xmlns:dc="http://purl.org/dc/elements/1.1/"
-   xmlns:cc="http://creativecommons.org/ns#"
-   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
-   xmlns:svg="http://www.w3.org/2000/svg"
-   xmlns="http://www.w3.org/2000/svg"
-   id="svg8"
-   version="1.1"
-   viewBox="0 0 176.61171 41.907883"
-   height="41.907883mm"
-   width="176.61171mm">
-  <defs
-     id="defs2" />
-  <metadata
-     id="metadata5">
-    <rdf:RDF>
-      <cc:Work
-         rdf:about="">
-        <dc:format>image/svg+xml</dc:format>
-        <dc:type
-           rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
-      </cc:Work>
-    </rdf:RDF>
-  </metadata>
-  <g
-     transform="translate(-0.74835286,-98.31182)"
-     id="layer1">
-    <flowRoot
-       transform="scale(0.26458333)"
-       style="font-style:normal;font-weight:normal;font-size:40px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
-       id="flowRoot4598"
-       xml:space="preserve"><flowRegion
-         id="flowRegion4600"><rect
-           y="415.4129"
-           x="-38.183765"
-           height="48.08326"
-           width="257.38687"
-           id="rect4602" /></flowRegion><flowPara
-         id="flowPara4604"></flowPara></flowRoot>    <text
-       transform="scale(0.86288797,1.158899)"
-       id="text4777"
-       y="110.93711"
-       x="0.93061"
-       style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:28.14887619px;line-height:4.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Bold';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#003dff;fill-opacity:1;stroke:none;stroke-width:7.51955223;stroke-miterlimit:4;stroke-dasharray:none"
-       xml:space="preserve"><tspan
-         style="stroke-width:7.51955223"
-         id="tspan4775"
-         y="110.93711"
-         x="0.93061"><tspan
-           id="tspan4773"
-           style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:28.14887619px;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Bold';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;letter-spacing:3.56786728px;writing-mode:lr-tb;text-anchor:start;fill:#003dff;fill-opacity:1;stroke-width:7.51955223;stroke-miterlimit:4;stroke-dasharray:none"
-           y="110.93711"
-           x="0.93061">waybackpy</tspan></tspan></text>
-    <rect
-       y="98.311821"
-       x="1.4967092"
-       height="4.8643045"
-       width="153.78688"
-       id="rect4644"
-       style="opacity:1;fill:#000080;fill-opacity:1;stroke:#00ff00;stroke-width:0;stroke-miterlimit:4;stroke-dasharray:none" />
-    <rect
-       style="opacity:1;fill:#000080;fill-opacity:1;stroke:#00ff00;stroke-width:0;stroke-miterlimit:4;stroke-dasharray:none"
-       id="rect4648"
-       width="153.78688"
-       height="4.490128"
-       x="23.573174"
-       y="135.72957" />
-    <rect
-       y="135.72957"
-       x="0.74835336"
-       height="4.4901319"
-       width="22.82482"
-       id="rect4650"
-       style="opacity:1;fill:#ff00ff;fill-opacity:1;stroke:#00ff00;stroke-width:0;stroke-miterlimit:4;stroke-dasharray:none" />
-    <rect
-       style="opacity:1;fill:#ff00ff;fill-opacity:1;stroke:#00ff00;stroke-width:0;stroke-miterlimit:4;stroke-dasharray:none"
-       id="rect4652"
-       width="21.702286"
-       height="4.8643003"
-       x="155.2836"
-       y="98.311821" />
-  </g>
-</svg>
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 176.612 41.908" height="158.392" width="667.51"  xmlns:v="https://github.com/akamhy/waybackpy"><text transform="matrix(.862888 0 0 1.158899 -.748 -98.312)" y="110.937" x="0.931" xml:space="preserve" font-weight="bold" font-size="28.149" font-family="sans-serif" letter-spacing="0" word-spacing="0" writing-mode="lr-tb" fill="#003dff"><tspan y="110.937" x="0.931"><tspan y="110.937" x="0.931" letter-spacing="3.568" writing-mode="lr-tb">waybackpy</tspan></tspan></text><path d="M.749 0h153.787v4.864H.749zm22.076 37.418h153.787v4.49H22.825z" fill="navy"/><path d="M0 37.418h22.825v4.49H0zM154.536 0h21.702v4.864h-21.702z" fill="#f0f"/></svg>
--- a/setup.py
+++ b/setup.py
@ -19,7 +19,7 @@ setup(
    author=about["__author__"],
    author_email=about["__author_email__"],
    url=about["__url__"],
-    download_url="https://github.com/akamhy/waybackpy/archive/2.3.3.tar.gz",
+    download_url="https://github.com/akamhy/waybackpy/archive/2.4.2.tar.gz",
    keywords=[
        "Archive It",
        "Archive Website",
--- a/tests/test_cdx.py
+++ b/tests/test_cdx.py
@ -0,0 +1,93 @@
+import pytest
+from waybackpy.cdx import Cdx
+from waybackpy.exceptions import WaybackError
+
+
+def test_all_cdx():
+    url = "akamhy.github.io"
+    user_agent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, \
+    like Gecko) Chrome/45.0.2454.85 Safari/537.36"
+    cdx = Cdx(
+        url=url,
+        user_agent=user_agent,
+        start_timestamp=2017,
+        end_timestamp=2020,
+        filters=[
+            "statuscode:200",
+            "mimetype:text/html",
+            "timestamp:20201002182319",
+            "original:https://akamhy.github.io/",
+        ],
+        gzip=False,
+        collapses=["timestamp:10", "digest"],
+        limit=50,
+        match_type="prefix",
+    )
+    snapshots = cdx.snapshots()
+    for snapshot in snapshots:
+        ans = snapshot.archive_url
+    assert "https://web.archive.org/web/20201002182319/https://akamhy.github.io/" == ans
+
+    url = "akahfjgjkmhy.gihthub.ip"
+    cdx = Cdx(
+        url=url,
+        user_agent=user_agent,
+        start_timestamp=None,
+        end_timestamp=None,
+        filters=[],
+        match_type=None,
+        gzip=True,
+        collapses=[],
+        limit=10,
+    )
+
+    snapshots = cdx.snapshots()
+    print(snapshots)
+    i = 0
+    for _ in snapshots:
+        i += 1
+    assert i == 0
+
+    url = "https://github.com/akamhy/waybackpy/*"
+    cdx = Cdx(url=url, user_agent=user_agent, limit=50)
+    snapshots = cdx.snapshots()
+
+    for snapshot in snapshots:
+        print(snapshot.archive_url)
+
+    url = "https://github.com/akamhy/waybackpy"
+    with pytest.raises(WaybackError):
+        cdx = Cdx(url=url, user_agent=user_agent, limit=50, filters=["ghddhfhj"])
+        snapshots = cdx.snapshots()
+
+    with pytest.raises(WaybackError):
+        cdx = Cdx(url=url, user_agent=user_agent, collapses=["timestamp", "ghdd:hfhj"])
+        snapshots = cdx.snapshots()
+
+    url = "https://github.com"
+    cdx = Cdx(url=url, user_agent=user_agent, limit=50)
+    snapshots = cdx.snapshots()
+    c = 0
+    for snapshot in snapshots:
+        c += 1
+        if c > 100:
+            break
+
+    url = "https://github.com/*"
+    cdx = Cdx(url=url, user_agent=user_agent, collapses=["timestamp"])
+    snapshots = cdx.snapshots()
+    c = 0
+    for snapshot in snapshots:
+        c += 1
+        if c > 30_529:  # deafult limit is 10k
+            break
+
+    url = "https://github.com/*"
+    cdx = Cdx(url=url, user_agent=user_agent)
+    c = 0
+    snapshots = cdx.snapshots()
+
+    for snapshot in snapshots:
+        c += 1
+        if c > 100_529:
+            break
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@ -12,37 +12,19 @@ from waybackpy.__version__ import __version__


 def test_save():
-    args = argparse.Namespace(
-        user_agent=None,
-        url="https://pypi.org/user/akamhy/",
-        total=False,
-        version=False,
-        oldest=False,
-        save=True,
-        json=False,
-        archive_url=False,
-        newest=False,
-        near=False,
-        alive=False,
-        subdomain=False,
-        known_urls=False,
-        get=None,
-    )
-    reply = cli.args_handler(args)
-    assert "pypi.org/user/akamhy" in str(reply)

    args = argparse.Namespace(
        user_agent=None,
        url="https://hfjfjfjfyu6r6rfjvj.fjhgjhfjgvjm",
        total=False,
        version=False,
+        file=False,
        oldest=False,
        save=True,
        json=False,
        archive_url=False,
        newest=False,
        near=False,
-        alive=False,
        subdomain=False,
        known_urls=False,
        get=None,
@ -57,13 +39,13 @@ def test_json():
        url="https://pypi.org/user/akamhy/",
        total=False,
        version=False,
+        file=False,
        oldest=False,
        save=False,
        json=True,
        archive_url=False,
        newest=False,
        near=False,
-        alive=False,
        subdomain=False,
        known_urls=False,
        get=None,
@ -78,13 +60,13 @@ def test_archive_url():
        url="https://pypi.org/user/akamhy/",
        total=False,
        version=False,
+        file=False,
        oldest=False,
        save=False,
        json=False,
        archive_url=True,
        newest=False,
        near=False,
-        alive=False,
        subdomain=False,
        known_urls=False,
        get=None,
@ -99,13 +81,13 @@ def test_oldest():
        url="https://pypi.org/user/akamhy/",
        total=False,
        version=False,
+        file=False,
        oldest=True,
        save=False,
        json=False,
        archive_url=False,
        newest=False,
        near=False,
-        alive=False,
        subdomain=False,
        known_urls=False,
        get=None,
@ -122,13 +104,13 @@ def test_oldest():
        url=url,
        total=False,
        version=False,
+        file=False,
        oldest=True,
        save=False,
        json=False,
        archive_url=False,
        newest=False,
        near=False,
-        alive=False,
        subdomain=False,
        known_urls=False,
        get=None,
@ -144,13 +126,13 @@ def test_newest():
        url="https://pypi.org/user/akamhy/",
        total=False,
        version=False,
+        file=False,
        oldest=False,
        save=False,
        json=False,
        archive_url=False,
        newest=True,
        near=False,
-        alive=False,
        subdomain=False,
        known_urls=False,
        get=None,
@ -167,13 +149,13 @@ def test_newest():
        url=url,
        total=False,
        version=False,
+        file=False,
        oldest=False,
        save=False,
        json=False,
        archive_url=False,
        newest=True,
        near=False,
-        alive=False,
        subdomain=False,
        known_urls=False,
        get=None,
@ -189,13 +171,13 @@ def test_total_archives():
        url="https://pypi.org/user/akamhy/",
        total=True,
        version=False,
+        file=False,
        oldest=False,
        save=False,
        json=False,
        archive_url=False,
        newest=False,
        near=False,
-        alive=False,
        subdomain=False,
        known_urls=False,
        get=None,
@ -208,42 +190,22 @@ def test_known_urls():
    args = argparse.Namespace(
        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9",
-        url="https://akamhy.github.io",
+        url="https://www.keybr.com",
        total=False,
        version=False,
+        file=True,
        oldest=False,
        save=False,
        json=False,
        archive_url=False,
        newest=False,
        near=False,
-        alive=True,
-        subdomain=True,
+        subdomain=False,
        known_urls=True,
        get=None,
    )
    reply = cli.args_handler(args)
-    assert "github" in str(reply)
-
-    args = argparse.Namespace(
-        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
-    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9",
-        url="https://akfyfufyjcujfufu6576r76r6amhy.gitd6r67r6u6hub.yfjyfjio",
-        total=False,
-        version=False,
-        oldest=False,
-        save=False,
-        json=False,
-        archive_url=False,
-        newest=False,
-        near=False,
-        alive=True,
-        subdomain=True,
-        known_urls=True,
-        get=None,
-    )
-    reply = cli.args_handler(args)
-    assert "No known URLs found" in str(reply)
+    assert "keybr" in str(reply)


 def test_near():
@ -253,13 +215,13 @@ def test_near():
        url="https://pypi.org/user/akamhy/",
        total=False,
        version=False,
+        file=False,
        oldest=False,
        save=False,
        json=False,
        archive_url=False,
        newest=False,
        near=True,
-        alive=False,
        subdomain=False,
        known_urls=False,
        get=None,
@ -281,13 +243,13 @@ def test_near():
        url=url,
        total=False,
        version=False,
+        file=False,
        oldest=False,
        save=False,
        json=False,
        archive_url=False,
        newest=False,
        near=True,
-        alive=False,
        subdomain=False,
        known_urls=False,
        get=None,
@ -305,16 +267,16 @@ def test_get():
    args = argparse.Namespace(
        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9",
-        url="https://pypi.org/user/akamhy/",
+        url="https://github.com/akamhy",
        total=False,
        version=False,
+        file=False,
        oldest=False,
        save=False,
        json=False,
        archive_url=False,
        newest=False,
        near=False,
-        alive=False,
        subdomain=False,
        known_urls=False,
        get="url",
@ -325,16 +287,16 @@ def test_get():
    args = argparse.Namespace(
        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9",
-        url="https://pypi.org/user/akamhy/",
+        url="https://github.com/akamhy/waybackpy",
        total=False,
        version=False,
+        file=False,
        oldest=False,
        save=False,
        json=False,
        archive_url=False,
        newest=False,
        near=False,
-        alive=False,
        subdomain=False,
        known_urls=False,
        get="oldest",
@ -345,16 +307,16 @@ def test_get():
    args = argparse.Namespace(
        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9",
-        url="https://pypi.org/user/akamhy/",
+        url="https://akamhy.github.io/waybackpy/",
        total=False,
        version=False,
+        file=False,
        oldest=False,
        save=False,
        json=False,
        archive_url=False,
        newest=False,
        near=False,
-        alive=False,
        subdomain=False,
        known_urls=False,
        get="newest",
@ -368,33 +330,13 @@ def test_get():
        url="https://pypi.org/user/akamhy/",
        total=False,
        version=False,
+        file=False,
        oldest=False,
        save=False,
        json=False,
        archive_url=False,
        newest=False,
        near=False,
-        alive=False,
-        subdomain=False,
-        known_urls=False,
-        get="save",
-    )
-    reply = cli.args_handler(args)
-    assert "waybackpy" in str(reply)
-
-    args = argparse.Namespace(
-        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 \
-    (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9",
-        url="https://pypi.org/user/akamhy/",
-        total=False,
-        version=False,
-        oldest=False,
-        save=False,
-        json=False,
-        archive_url=False,
-        newest=False,
-        near=False,
-        alive=False,
        subdomain=False,
        known_urls=False,
        get="foobar",
--- a/tests/test_snapshot.py
+++ b/tests/test_snapshot.py
@ -0,0 +1,40 @@
+import pytest
+
+from waybackpy.snapshot import CdxSnapshot, datetime
+
+
+def test_CdxSnapshot():
+    sample_input = "org,archive)/ 20080126045828 http://github.com text/html 200 Q4YULN754FHV2U6Q5JUT6Q2P57WEWNNY 1415"
+    prop_values = sample_input.split(" ")
+    properties = {}
+    (
+        properties["urlkey"],
+        properties["timestamp"],
+        properties["original"],
+        properties["mimetype"],
+        properties["statuscode"],
+        properties["digest"],
+        properties["length"],
+    ) = prop_values
+
+    snapshot = CdxSnapshot(properties)
+
+    assert properties["urlkey"] == snapshot.urlkey
+    assert properties["timestamp"] == snapshot.timestamp
+    assert properties["original"] == snapshot.original
+    assert properties["mimetype"] == snapshot.mimetype
+    assert properties["statuscode"] == snapshot.statuscode
+    assert properties["digest"] == snapshot.digest
+    assert properties["length"] == snapshot.length
+    assert (
+        datetime.strptime(properties["timestamp"], "%Y%m%d%H%M%S")
+        == snapshot.datetime_timestamp
+    )
+    archive_url = (
+        "https://web.archive.org/web/"
+        + properties["timestamp"]
+        + "/"
+        + properties["original"]
+    )
+    assert archive_url == snapshot.archive_url
+    assert sample_input == str(snapshot)
--- a/tests/test_utils.py
+++ b/tests/test_utils.py
@ -0,0 +1,186 @@
+import pytest
+import json
+
+from waybackpy.utils import (
+    _cleaned_url,
+    _url_check,
+    _full_url,
+    URLError,
+    WaybackError,
+    _get_total_pages,
+    _archive_url_parser,
+    _wayback_timestamp,
+    _get_response,
+    _check_match_type,
+    _check_collapses,
+    _check_filters,
+    _ts,
+)
+
+
+def test_ts():
+    timestamp = True
+    data = {}
+    assert _ts(timestamp, data)
+
+    data = """
+    {"archived_snapshots": {"closest": {"timestamp": "20210109155628", "available": true, "status": "200", "url": "http://web.archive.org/web/20210109155628/https://www.google.com/"}}, "url": "https://www.google.com/"}
+    """
+    data = json.loads(data)
+    assert data["archived_snapshots"]["closest"]["timestamp"] == "20210109155628"
+
+
+def test_check_filters():
+    filters = []
+    _check_filters(filters)
+
+    filters = ["statuscode:200", "timestamp:20215678901234", "original:https://url.com"]
+    _check_filters(filters)
+
+    with pytest.raises(WaybackError):
+        _check_filters("not-list")
+
+
+def test_check_collapses():
+    collapses = []
+    _check_collapses(collapses)
+
+    collapses = ["timestamp:10"]
+    _check_collapses(collapses)
+
+    collapses = ["urlkey"]
+    _check_collapses(collapses)
+
+    collapses = "urlkey"  # NOT LIST
+    with pytest.raises(WaybackError):
+        _check_collapses(collapses)
+
+    collapses = ["also illegal collapse"]
+    with pytest.raises(WaybackError):
+        _check_collapses(collapses)
+
+
+def test_check_match_type():
+    assert None == _check_match_type(None, "url")
+    match_type = "exact"
+    url = "test_url"
+    assert None == _check_match_type(match_type, url)
+
+    url = "has * in it"
+    with pytest.raises(WaybackError):
+        _check_match_type("domain", url)
+
+    with pytest.raises(WaybackError):
+        _check_match_type("not a valid type", "url")
+
+
+def test_cleaned_url():
+    test_url = " https://en.wikipedia.org/wiki/Network security "
+    answer = "https://en.wikipedia.org/wiki/Network%20security"
+    assert answer == _cleaned_url(test_url)
+
+
+def test_url_check():
+    good_url = "https://akamhy.github.io"
+    assert None == _url_check(good_url)
+
+    bad_url = "https://github-com"
+    with pytest.raises(URLError):
+        _url_check(bad_url)
+
+
+def test_full_url():
+    params = {}
+    endpoint = "https://web.archive.org/cdx/search/cdx"
+    assert endpoint == _full_url(endpoint, params)
+
+    params = {"a": "1"}
+    assert "https://web.archive.org/cdx/search/cdx?a=1" == _full_url(endpoint, params)
+    assert "https://web.archive.org/cdx/search/cdx?a=1" == _full_url(
+        endpoint + "?", params
+    )
+
+    params["b"] = 2
+    assert "https://web.archive.org/cdx/search/cdx?a=1&b=2" == _full_url(
+        endpoint + "?", params
+    )
+
+    params["c"] = "foo bar"
+    assert "https://web.archive.org/cdx/search/cdx?a=1&b=2&c=foo%20bar" == _full_url(
+        endpoint + "?", params
+    )
+
+
+def test_get_total_pages():
+    user_agent = "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
+    url = "github.com*"
+    assert 212890 <= _get_total_pages(url, user_agent)
+
+    url = "https://zenodo.org/record/4416138"
+    assert 2 >= _get_total_pages(url, user_agent)
+
+
+def test_archive_url_parser():
+    perfect_header = """
+    {'Server': 'nginx/1.15.8', 'Date': 'Sat, 02 Jan 2021 09:40:25 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'X-Archive-Orig-Server': 'nginx', 'X-Archive-Orig-Date': 'Sat, 02 Jan 2021 09:40:09 GMT', 'X-Archive-Orig-Transfer-Encoding': 'chunked', 'X-Archive-Orig-Connection': 'keep-alive', 'X-Archive-Orig-Vary': 'Accept-Encoding', 'X-Archive-Orig-Last-Modified': 'Fri, 01 Jan 2021 12:19:00 GMT', 'X-Archive-Orig-Strict-Transport-Security': 'max-age=31536000, max-age=0;', 'X-Archive-Guessed-Content-Type': 'text/html', 'X-Archive-Guessed-Charset': 'utf-8', 'Memento-Datetime': 'Sat, 02 Jan 2021 09:40:09 GMT', 'Link': '<https://www.scribbr.com/citing-sources/et-al/>; rel="original", <https://web.archive.org/web/timemap/link/https://www.scribbr.com/citing-sources/et-al/>; rel="timemap"; type="application/link-format", <https://web.archive.org/web/https://www.scribbr.com/citing-sources/et-al/>; rel="timegate", <https://web.archive.org/web/20200601082911/https://www.scribbr.com/citing-sources/et-al/>; rel="first memento"; datetime="Mon, 01 Jun 2020 08:29:11 GMT", <https://web.archive.org/web/20201126185327/https://www.scribbr.com/citing-sources/et-al/>; rel="prev memento"; datetime="Thu, 26 Nov 2020 18:53:27 GMT", <https://web.archive.org/web/20210102094009/https://www.scribbr.com/citing-sources/et-al/>; rel="memento"; datetime="Sat, 02 Jan 2021 09:40:09 GMT", <https://web.archive.org/web/20210102094009/https://www.scribbr.com/citing-sources/et-al/>; rel="last memento"; datetime="Sat, 02 Jan 2021 09:40:09 GMT"', 'Content-Security-Policy': "default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org analytics.archive.org pragma.archivelab.org", 'X-Archive-Src': 'spn2-20210102092956-wwwb-spn20.us.archive.org-8001.warc.gz', 'Server-Timing': 'captures_list;dur=112.646325, exclusion.robots;dur=0.172010, exclusion.robots.policy;dur=0.158205, RedisCDXSource;dur=2.205932, esindex;dur=0.014647, LoadShardBlock;dur=82.205012, PetaboxLoader3.datanode;dur=70.750239, CDXLines.iter;dur=24.306278, load_resource;dur=26.520179', 'X-App-Server': 'wwwb-app200', 'X-ts': '200', 'X-location': 'All', 'X-Cache-Key': 'httpsweb.archive.org/web/20210102094009/https://www.scribbr.com/citing-sources/et-al/IN', 'X-RL': '0', 'X-Page-Cache': 'MISS', 'X-Archive-Screenname': '0', 'Content-Encoding': 'gzip'}
+    """
+
+    archive = _archive_url_parser(
+        perfect_header, "https://www.scribbr.com/citing-sources/et-al/"
+    )
+    assert "web.archive.org/web/20210102094009" in archive
+
+    header = """
+    vhgvkjv
+    Content-Location: /web/20201126185327/https://www.scribbr.com/citing-sources/et-al
+    ghvjkbjmmcmhj
+    """
+    archive = _archive_url_parser(
+        header, "https://www.scribbr.com/citing-sources/et-al/"
+    )
+    assert "20201126185327" in archive
+
+    header = """
+    hfjkfjfcjhmghmvjm
+    X-Cache-Key: https://web.archive.org/web/20171128185327/https://www.scribbr.com/citing-sources/et-al/US
+    yfu,u,gikgkikik
+    """
+    archive = _archive_url_parser(
+        header, "https://www.scribbr.com/citing-sources/et-al/"
+    )
+    assert "20171128185327" in archive
+
+    # The below header should result in Exception
+    no_archive_header = """
+    {'Server': 'nginx/1.15.8', 'Date': 'Sat, 02 Jan 2021 09:42:45 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Cache-Control': 'no-cache', 'X-App-Server': 'wwwb-app52', 'X-ts': '523', 'X-RL': '0', 'X-Page-Cache': 'MISS', 'X-Archive-Screenname': '0'}
+    """
+
+    with pytest.raises(WaybackError):
+        _archive_url_parser(
+            no_archive_header, "https://www.scribbr.com/citing-sources/et-al/"
+        )
+
+
+def test_wayback_timestamp():
+    ts = _wayback_timestamp(year=2020, month=1, day=2, hour=3, minute=4)
+    assert "202001020304" in str(ts)
+
+
+def test_get_response():
+    endpoint = "https://www.google.com"
+    user_agent = (
+        "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0"
+    )
+    headers = {"User-Agent": "%s" % user_agent}
+    response = _get_response(endpoint, params=None, headers=headers)
+    assert response.status_code == 200
+
+    endpoint = "http/wwhfhfvhvjhmom"
+    with pytest.raises(WaybackError):
+        _get_response(endpoint, params=None, headers=headers)
+
+    endpoint = "https://akamhy.github.io"
+    url, response = _get_response(
+        endpoint, params=None, headers=headers, return_full_url=True
+    )
+    assert endpoint == url
--- a/tests/test_wrapper.py
+++ b/tests/test_wrapper.py
@ -4,226 +4,29 @@ import random
 import requests
 from datetime import datetime

-sys.path.append("..")
-
-import waybackpy.wrapper as waybackpy  # noqa: E402
+from waybackpy.wrapper import Url


 user_agent = "Mozilla/5.0 (Windows NT 6.2; rv:20.0) Gecko/20121202 Firefox/20.0"


-def test_cleaned_url():
-    """No API use"""
-    test_url = " https://en.wikipedia.org/wiki/Network security "
-    answer = "https://en.wikipedia.org/wiki/Network_security"
-    target = waybackpy.Url(test_url, user_agent)
-    test_result = target._cleaned_url()
-    assert answer == test_result
-
-
-def test_ts():
-    a = waybackpy.Url("https://google.com", user_agent)
-    ts = a._timestamp
-    assert str(datetime.utcnow().year) in str(ts)
-
-
-def test_dunders():
-    """No API use"""
-    url = "https://en.wikipedia.org/wiki/Network_security"
-    user_agent = "UA"
-    target = waybackpy.Url(url, user_agent)
-    assert "waybackpy.Url(url=%s, user_agent=%s)" % (url, user_agent) == repr(target)
-    assert "en.wikipedia.org" in str(target)
-
-
 def test_url_check():
    """No API Use"""
    broken_url = "http://wwwgooglecom/"
    with pytest.raises(Exception):
-        waybackpy.Url(broken_url, user_agent)
-
-
-def test_archive_url_parser():
-    """No API Use"""
-    perfect_header = """
-    {'Server': 'nginx/1.15.8', 'Date': 'Sat, 02 Jan 2021 09:40:25 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'X-Archive-Orig-Server': 'nginx', 'X-Archive-Orig-Date': 'Sat, 02 Jan 2021 09:40:09 GMT', 'X-Archive-Orig-Transfer-Encoding': 'chunked', 'X-Archive-Orig-Connection': 'keep-alive', 'X-Archive-Orig-Vary': 'Accept-Encoding', 'X-Archive-Orig-Last-Modified': 'Fri, 01 Jan 2021 12:19:00 GMT', 'X-Archive-Orig-Strict-Transport-Security': 'max-age=31536000, max-age=0;', 'X-Archive-Guessed-Content-Type': 'text/html', 'X-Archive-Guessed-Charset': 'utf-8', 'Memento-Datetime': 'Sat, 02 Jan 2021 09:40:09 GMT', 'Link': '<https://www.scribbr.com/citing-sources/et-al/>; rel="original", <https://web.archive.org/web/timemap/link/https://www.scribbr.com/citing-sources/et-al/>; rel="timemap"; type="application/link-format", <https://web.archive.org/web/https://www.scribbr.com/citing-sources/et-al/>; rel="timegate", <https://web.archive.org/web/20200601082911/https://www.scribbr.com/citing-sources/et-al/>; rel="first memento"; datetime="Mon, 01 Jun 2020 08:29:11 GMT", <https://web.archive.org/web/20201126185327/https://www.scribbr.com/citing-sources/et-al/>; rel="prev memento"; datetime="Thu, 26 Nov 2020 18:53:27 GMT", <https://web.archive.org/web/20210102094009/https://www.scribbr.com/citing-sources/et-al/>; rel="memento"; datetime="Sat, 02 Jan 2021 09:40:09 GMT", <https://web.archive.org/web/20210102094009/https://www.scribbr.com/citing-sources/et-al/>; rel="last memento"; datetime="Sat, 02 Jan 2021 09:40:09 GMT"', 'Content-Security-Policy': "default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org analytics.archive.org pragma.archivelab.org", 'X-Archive-Src': 'spn2-20210102092956-wwwb-spn20.us.archive.org-8001.warc.gz', 'Server-Timing': 'captures_list;dur=112.646325, exclusion.robots;dur=0.172010, exclusion.robots.policy;dur=0.158205, RedisCDXSource;dur=2.205932, esindex;dur=0.014647, LoadShardBlock;dur=82.205012, PetaboxLoader3.datanode;dur=70.750239, CDXLines.iter;dur=24.306278, load_resource;dur=26.520179', 'X-App-Server': 'wwwb-app200', 'X-ts': '200', 'X-location': 'All', 'X-Cache-Key': 'httpsweb.archive.org/web/20210102094009/https://www.scribbr.com/citing-sources/et-al/IN', 'X-RL': '0', 'X-Page-Cache': 'MISS', 'X-Archive-Screenname': '0', 'Content-Encoding': 'gzip'}
-    """
-
-    archive = waybackpy._archive_url_parser(
-        perfect_header, "https://www.scribbr.com/citing-sources/et-al/"
-    )
-    assert "web.archive.org/web/20210102094009" in archive
-
-    header = """
-    vhgvkjv
-    Content-Location: /web/20201126185327/https://www.scribbr.com/citing-sources/et-al
-    ghvjkbjmmcmhj
-    """
-    archive = waybackpy._archive_url_parser(
-        header, "https://www.scribbr.com/citing-sources/et-al/"
-    )
-    assert "20201126185327" in archive
-
-    header = """
-    hfjkfjfcjhmghmvjm
-    X-Cache-Key: https://web.archive.org/web/20171128185327/https://www.scribbr.com/citing-sources/et-al/US
-    yfu,u,gikgkikik
-    """
-    archive = waybackpy._archive_url_parser(
-        header, "https://www.scribbr.com/citing-sources/et-al/"
-    )
-    assert "20171128185327" in archive
-
-    # The below header should result in Exception
-    no_archive_header = """
-    {'Server': 'nginx/1.15.8', 'Date': 'Sat, 02 Jan 2021 09:42:45 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Cache-Control': 'no-cache', 'X-App-Server': 'wwwb-app52', 'X-ts': '523', 'X-RL': '0', 'X-Page-Cache': 'MISS', 'X-Archive-Screenname': '0'}
-    """
-
-    with pytest.raises(Exception):
-        waybackpy._archive_url_parser(
-            no_archive_header, "https://www.scribbr.com/citing-sources/et-al/"
-        )
-
-
-def test_save():
-    # Test for urls that exist and can be archived.
-
-    url_list = [
-        "en.wikipedia.org",
-        "www.wikidata.org",
-        "commons.wikimedia.org",
-        "www.wiktionary.org",
-        "www.w3schools.com",
-        "www.ibm.com",
-    ]
-    x = random.randint(0, len(url_list) - 1)
-    url1 = url_list[x]
-    target = waybackpy.Url(
-        url1,
-        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 "
-        "(KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36",
-    )
-    archived_url1 = str(target.save())
-    assert url1 in archived_url1
-
-    # Test for urls that are incorrect.
-    with pytest.raises(Exception):
-        url2 = "ha ha ha ha"
-        waybackpy.Url(url2, user_agent)
-    url3 = "http://www.archive.is/faq.html"
-
-    with pytest.raises(Exception):
-        target = waybackpy.Url(
-            url3,
-            "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) "
-            "AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 "
-            "Safari/533.20.27",
-        )
-        target.save()
+        Url(broken_url, user_agent)


 def test_near():
-    url = "google.com"
-    target = waybackpy.Url(
-        url,
-        "Mozilla/5.0 (Windows; U; Windows NT 6.0; de-DE) AppleWebKit/533.20.25 "
-        "(KHTML, like Gecko) Version/5.0.3 Safari/533.19.4",
-    )
-    archive_near_year = target.near(year=2010)
-    assert "2010" in str(archive_near_year.timestamp)
-
-    archive_near_month_year = str(target.near(year=2015, month=2).timestamp)
-    assert (
-        ("2015-02" in archive_near_month_year)
-        or ("2015-01" in archive_near_month_year)
-        or ("2015-03" in archive_near_month_year)
-    )
-
-    target = waybackpy.Url(
-        "www.python.org",
-        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
-        "(KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246",
-    )
-    archive_near_hour_day_month_year = str(
-        target.near(year=2008, month=5, day=9, hour=15)
-    )
-    assert (
-        ("2008050915" in archive_near_hour_day_month_year)
-        or ("2008050914" in archive_near_hour_day_month_year)
-        or ("2008050913" in archive_near_hour_day_month_year)
-    )
-
    with pytest.raises(Exception):
        NeverArchivedUrl = (
            "https://ee_3n.wrihkeipef4edia.org/rwti5r_ki/Nertr6w_rork_rse7c_urity"
        )
-        target = waybackpy.Url(NeverArchivedUrl, user_agent)
+        target = Url(NeverArchivedUrl, user_agent)
        target.near(year=2010)


-def test_oldest():
-    url = "github.com/akamhy/waybackpy"
-    target = waybackpy.Url(url, user_agent)
-    o = target.oldest()
-    assert "20200504141153" in str(o)
-    assert "2020-05-04" in str(o._timestamp)
-
-
 def test_json():
    url = "github.com/akamhy/waybackpy"
-    target = waybackpy.Url(url, user_agent)
+    target = Url(url, user_agent)
    assert "archived_snapshots" in str(target.JSON)
-
-
-def test_archive_url():
-    url = "github.com/akamhy/waybackpy"
-    target = waybackpy.Url(url, user_agent)
-    assert "github.com/akamhy" in str(target.archive_url)
-
-
-def test_newest():
-    url = "github.com/akamhy/waybackpy"
-    target = waybackpy.Url(url, user_agent)
-    assert url in str(target.newest())
-
-
-def test_get():
-    target = waybackpy.Url("google.com", user_agent)
-    assert "Welcome to Google" in target.get(target.oldest())
-
-
-def test_wayback_timestamp():
-    ts = waybackpy._wayback_timestamp(year=2020, month=1, day=2, hour=3, minute=4)
-    assert "202001020304" in str(ts)
-
-
-def test_get_response():
-    endpoint = "https://www.google.com"
-    user_agent = (
-        "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0"
-    )
-    headers = {"User-Agent": "%s" % user_agent}
-    response = waybackpy._get_response(endpoint, params=None, headers=headers)
-    assert response.status_code == 200
-
-
-def test_total_archives():
-    user_agent = (
-        "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0"
-    )
-    target = waybackpy.Url(" https://outlook.com ", user_agent)
-    assert target.total_archives() > 80000
-
-    target = waybackpy.Url(
-        " https://gaha.e4i3n.m5iai3kip6ied.cima/gahh2718gs/ahkst63t7gad8 ", user_agent
-    )
-    assert target.total_archives() == 0
-
-
-def test_known_urls():
-
-    target = waybackpy.Url("akamhy.github.io", user_agent)
-    assert len(target.known_urls(alive=True, subdomain=True)) > 2
-
-    target = waybackpy.Url("akamhy.github.io", user_agent)
-    assert len(target.known_urls()) > 3
--- a/waybackpy/version.py
+++ b/waybackpy/version.py
@ -4,7 +4,7 @@ __description__ = (
    "Archive pages and retrieve archived pages easily."
 )
 __url__ = "https://akamhy.github.io/waybackpy/"
-__version__ = "2.3.3"
+__version__ = "2.4.2"
 __author__ = "akamhy"
 __author_email__ = "akamhy@yahoo.com"
 __license__ = "MIT"
--- a/waybackpy/cdx.py
+++ b/waybackpy/cdx.py
@ -0,0 +1,214 @@
+from .snapshot import CdxSnapshot
+from .exceptions import WaybackError
+from .utils import (
+    _get_total_pages,
+    _get_response,
+    default_user_agent,
+    _check_filters,
+    _check_collapses,
+    _check_match_type,
+    _add_payload,
+)
+
+# TODO : Threading support for pagination API. It's designed for Threading.
+
+
+class Cdx:
+    def __init__(
+        self,
+        url,
+        user_agent=None,
+        start_timestamp=None,
+        end_timestamp=None,
+        filters=[],
+        match_type=None,
+        gzip=None,
+        collapses=[],
+        limit=None,
+    ):
+        self.url = str(url).strip()
+        self.user_agent = str(user_agent) if user_agent else default_user_agent
+        self.start_timestamp = str(start_timestamp) if start_timestamp else None
+        self.end_timestamp = str(end_timestamp) if end_timestamp else None
+        self.filters = filters
+        _check_filters(self.filters)
+        self.match_type = str(match_type).strip() if match_type else None
+        _check_match_type(self.match_type, self.url)
+        self.gzip = gzip if gzip else True
+        self.collapses = collapses
+        _check_collapses(self.collapses)
+        self.limit = limit if limit else 5000
+        self.last_api_request_url = None
+        self.use_page = False
+
+    def cdx_api_manager(self, payload, headers, use_page=False):
+        """
+        We have two options to get the snapshots, we use this
+        method to make a selection between pagination API and
+        the normal one with Resumption Key, sequential querying
+        of CDX data. For very large querying (for example domain query),
+        it may be useful to perform queries in parallel and also estimate
+        the total size of the query.
+
+        read more about the pagination API at:
+        https://web.archive.org/web/20201228063237/https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md#pagination-api
+
+        if use_page is false if will use the normal sequential query API,
+        else use the pagination API.
+
+        two mutually exclusive cases possible:
+
+        1) pagination API is selected
+
+            a) get the total number of pages to read, using _get_total_pages()
+
+            b) then we use a for loop to get all the pages and yield the response text
+
+        2) normal sequential query API is selected.
+
+            a) get use showResumeKey=true to ask the API to add a query resumption key
+               at the bottom of response
+
+            b) check if the page has more than 3 lines, if not return the text
+
+            c) if it has atleast three lines, we check the second last line for zero length.
+
+            d) if the second last line has length zero than we assume that the last line contains
+               the resumption key, we set the resumeKey and remove the resumeKey from text
+
+            e) if the second line has non zero length we return the text as there will no resumption key
+
+            f) if we find the resumption key we set the "more" variable status to True which is always set
+               to False on each iteration. If more is not True the iteration stops and function returns.
+        """
+
+        endpoint = "https://web.archive.org/cdx/search/cdx"
+        total_pages = _get_total_pages(self.url, self.user_agent)
+        # If we only have two or less pages of archives then we care for accuracy
+        # pagination API can be lagged sometimes
+        if use_page == True and total_pages >= 2:
+            blank_pages = 0
+            for i in range(total_pages):
+                payload["page"] = str(i)
+                url, res = _get_response(
+                    endpoint, params=payload, headers=headers, return_full_url=True
+                )
+
+                self.last_api_request_url = url
+                text = res.text
+                if len(text) == 0:
+                    blank_pages += 1
+
+                if blank_pages >= 2:
+                    break
+
+                yield text
+        else:
+
+            payload["showResumeKey"] = "true"
+            payload["limit"] = str(self.limit)
+            resumeKey = None
+
+            more = True
+            while more:
+
+                if resumeKey:
+                    payload["resumeKey"] = resumeKey
+
+                url, res = _get_response(
+                    endpoint, params=payload, headers=headers, return_full_url=True
+                )
+
+                self.last_api_request_url = url
+
+                text = res.text.strip()
+                lines = text.splitlines()
+
+                more = False
+
+                if len(lines) >= 3:
+
+                    second_last_line = lines[-2]
+
+                    if len(second_last_line) == 0:
+
+                        resumeKey = lines[-1].strip()
+                        text = text.replace(resumeKey, "", 1).strip()
+                        more = True
+
+                yield text
+
+    def snapshots(self):
+        """
+        This function yeilds snapshots encapsulated
+        in CdxSnapshot for more usability.
+
+        All the get request values are set if the conditions match
+
+        And we use logic that if someone's only inputs don't have any
+        of [start_timestamp, end_timestamp] and don't use any collapses
+        then we use the pagination API as it returns archives starting
+        from the first archive and the recent most archive will be on
+        the last page.
+        """
+        payload = {}
+        headers = {"User-Agent": self.user_agent}
+
+        _add_payload(self, payload)
+
+        if not self.start_timestamp or self.end_timestamp:
+            self.use_page = True
+
+        if self.collapses != []:
+            self.use_page = False
+
+        texts = self.cdx_api_manager(payload, headers, use_page=self.use_page)
+
+        for text in texts:
+
+            if text.isspace() or len(text) <= 1 or not text:
+                continue
+
+            snapshot_list = text.split("\n")
+
+            for snapshot in snapshot_list:
+
+                if len(snapshot) < 46:  # 14 + 32 (timestamp+digest)
+                    continue
+
+                properties = {
+                    "urlkey": None,
+                    "timestamp": None,
+                    "original": None,
+                    "mimetype": None,
+                    "statuscode": None,
+                    "digest": None,
+                    "length": None,
+                }
+
+                prop_values = snapshot.split(" ")
+
+                # Making sure that we get the same number of
+                # property values as the number of properties
+                prop_values_len = len(prop_values)
+                properties_len = len(properties)
+                if prop_values_len != properties_len:
+                    raise WaybackError(
+                        "Snapshot returned by Cdx API has {prop_values_len} properties instead of expected {properties_len} properties.\nInvolved Snapshot : {snapshot}".format(
+                            prop_values_len=prop_values_len,
+                            properties_len=properties_len,
+                            snapshot=snapshot,
+                        )
+                    )
+
+                (
+                    properties["urlkey"],
+                    properties["timestamp"],
+                    properties["original"],
+                    properties["mimetype"],
+                    properties["statuscode"],
+                    properties["digest"],
+                    properties["length"],
+                ) = prop_values
+
+                yield CdxSnapshot(properties)
--- a/waybackpy/cli.py
+++ b/waybackpy/cli.py
@ -1,12 +1,13 @@
 import os
 import re
 import sys
+import json
 import random
 import string
 import argparse
-from waybackpy.wrapper import Url
-from waybackpy.exceptions import WaybackError
-from waybackpy.__version__ import __version__
+from .wrapper import Url
+from .exceptions import WaybackError
+from .__version__ import __version__


 def _save(obj):
@ -19,14 +20,15 @@ def _save(obj):
            header = m.group(1)
        if "No archive URL found in the API response" in e:
            return (
-                "\n[waybackpy] Can not save/archive your link.\n[waybackpy] This\
-                 could happen because either your waybackpy (%s) is likely out of\
-                 date or Wayback Machine is malfunctioning.\n[waybackpy] Visit\
-                 https://github.com/akamhy/waybackpy for the latest version of \
-                waybackpy.\n[waybackpy] API response Header :\n%s"
-                % (__version__, header)
+                "\n[waybackpy] Can not save/archive your link.\n[waybackpy] This "
+                "could happen because either your waybackpy ({version}) is likely out of "
+                "date or Wayback Machine is malfunctioning.\n[waybackpy] Visit "
+                "https://github.com/akamhy/waybackpy for the latest version of "
+                "waybackpy.\n[waybackpy] API response Header :\n{header}".format(
+                    version=__version__, header=header
+                )
            )
-        return WaybackError(err)
+        raise WaybackError(err)


 def _archive_url(obj):
@ -34,7 +36,7 @@ def _archive_url(obj):


 def _json(obj):
-    return obj.JSON
+    return json.dumps(obj.JSON)


 def no_archive_handler(e, obj):
@ -45,11 +47,13 @@ def no_archive_handler(e, obj):
        if "github.com/akamhy/waybackpy" in ua:
            ua = "YOUR_USER_AGENT_HERE"
        return (
-            "\n[Waybackpy] Can not find archive for '%s'.\n[Waybackpy] You can"
+            "\n[Waybackpy] Can not find archive for '{url}'.\n[Waybackpy] You can"
            " save the URL using the following command:\n[Waybackpy] waybackpy --"
-            'user_agent "%s" --url "%s" --save' % (url, ua, url)
+            'user_agent "{user_agent}" --url "{url}" --save'.format(
+                url=url, user_agent=ua
+            )
        )
-    return WaybackError(e)
+    raise WaybackError(e)


 def _oldest(obj):
@ -85,46 +89,57 @@ def _near(obj, args):
        return no_archive_handler(e, obj)


-def _save_urls_on_file(input_list, live_url_count):
-    m = re.search("https?://([A-Za-z_0-9.-]+).*", input_list[0])
-
-    domain = "domain-unknown"
-    if m:
-        domain = m.group(1)
-
+def _save_urls_on_file(url_gen):
+    domain = None
+    sys_random = random.SystemRandom()
    uid = "".join(
-        random.choice(string.ascii_lowercase + string.digits) for _ in range(6)
+        sys_random.choice(string.ascii_lowercase + string.digits) for _ in range(6)
    )
+    url_count = 0

-    file_name = "%s-%d-urls-%s.txt" % (domain, live_url_count, uid)
-    file_content = "\n".join(input_list)
-    file_path = os.path.join(os.getcwd(), file_name)
-    with open(file_path, "w+") as f:
-        f.write(file_content)
-    return "%s\n\n'%s' saved in current working directory" % (file_content, file_name)
+    for url in url_gen:
+        url_count += 1
+        if not domain:
+            m = re.search("https?://([A-Za-z_0-9.-]+).*", url)
+
+            domain = "domain-unknown"
+
+            if m:
+                domain = m.group(1)
+
+            file_name = "{domain}-urls-{uid}.txt".format(domain=domain, uid=uid)
+            file_path = os.path.join(os.getcwd(), file_name)
+            if not os.path.isfile(file_path):
+                open(file_path, "w+").close()
+
+        with open(file_path, "a") as f:
+            f.write("{url}\n".format(url=url))
+
+        print(url)
+
+    if url_count > 0:
+        return "\n\n'{file_name}' saved in current working directory".format(
+            file_name=file_name
+        )
+    else:
+        return "No known URLs found. Please try a diffrent input!"


 def _known_urls(obj, args):
    """
    Known urls for a domain.
    """
-    # sd = subdomain
-    sd = False
-    if args.subdomain:
-        sd = True

-    # al = alive
-    al = False
-    if args.alive:
-        al = True
+    subdomain = True if args.subdomain else False

-    url_list = obj.known_urls(alive=al, subdomain=sd)
-    total_urls = len(url_list)
+    url_gen = obj.known_urls(subdomain=subdomain)

-    if total_urls > 0:
-        return _save_urls_on_file(url_list, total_urls)
-
-    return "No known URLs found. Please try a diffrent domain!"
+    if args.file:
+        return _save_urls_on_file(url_gen)
+    else:
+        for url in url_gen:
+            print(url)
+        return "\n"


 def _get(obj, args):
@ -148,12 +163,11 @@ def _get(obj, args):

 def args_handler(args):
    if args.version:
-        return "waybackpy version %s" % __version__
+        return "waybackpy version {version}".format(version=__version__)

    if not args.url:
-        return (
-            "waybackpy %s \nSee 'waybackpy --help' for help using this tool."
-            % __version__
+        return "waybackpy {version} \nSee 'waybackpy --help' for help using this tool.".format(
+            version=__version__
        )

    obj = Url(args.url)
@ -262,8 +276,12 @@ def add_knownUrlArg(knownUrlArg):
    )
    help_text = "Use with '--known_urls' to include known URLs for subdomains."
    knownUrlArg.add_argument("--subdomain", "-sub", action="store_true", help=help_text)
-    help_text = "Only include live URLs. Will not inlclude dead links."
-    knownUrlArg.add_argument("--alive", "-a", action="store_true", help=help_text)
+    knownUrlArg.add_argument(
+        "--file",
+        "-f",
+        action="store_true",
+        help="Save the URLs in file at current directory.",
+    )


 def add_nearArg(nearArg):
--- a/waybackpy/exceptions.py
+++ b/waybackpy/exceptions.py
@ -1,6 +1,15 @@
+"""
+waybackpy.exceptions
+~~~~~~~~~~~~~~~~~~~
+This module contains the set of Waybackpy's exceptions.
+"""
+
+
 class WaybackError(Exception):
    """
-    Raised when Wayback Machine API Service is unreachable/down.
+    Raised when Waybackpy can not return what you asked for.
+     1) Wayback Machine API Service is unreachable/down.
+     2) You passed illegal arguments.
    """


--- a/waybackpy/snapshot.py
+++ b/waybackpy/snapshot.py
@ -0,0 +1,36 @@
+from datetime import datetime
+
+
+class CdxSnapshot:
+    """
+    This class helps to use the Cdx Snapshots easily.
+
+    Raw Snapshot data looks like:
+    org,archive)/ 20080126045828 http://github.com text/html 200 Q4YULN754FHV2U6Q5JUT6Q2P57WEWNNY 1415
+
+    properties is a dict containg all of the 7 cdx snapshot properties.
+    """
+
+    def __init__(self, properties):
+        self.urlkey = properties["urlkey"]
+        self.timestamp = properties["timestamp"]
+        self.datetime_timestamp = datetime.strptime(self.timestamp, "%Y%m%d%H%M%S")
+        self.original = properties["original"]
+        self.mimetype = properties["mimetype"]
+        self.statuscode = properties["statuscode"]
+        self.digest = properties["digest"]
+        self.length = properties["length"]
+        self.archive_url = (
+            "https://web.archive.org/web/" + self.timestamp + "/" + self.original
+        )
+
+    def __str__(self):
+        return "{urlkey} {timestamp} {original} {mimetype} {statuscode} {digest} {length}".format(
+            urlkey=self.urlkey,
+            timestamp=self.timestamp,
+            original=self.original,
+            mimetype=self.mimetype,
+            statuscode=self.statuscode,
+            digest=self.digest,
+            length=self.length,
+        )
--- a/waybackpy/utils.py
+++ b/waybackpy/utils.py
@ -0,0 +1,389 @@
+import re
+import time
+import requests
+from .exceptions import WaybackError, URLError
+from datetime import datetime
+
+from urllib3.util.retry import Retry
+from requests.adapters import HTTPAdapter
+from .__version__ import __version__
+
+quote = requests.utils.quote
+default_user_agent = "waybackpy python package - https://github.com/akamhy/waybackpy"
+
+
+def _latest_version(package_name, headers):
+    endpoint = "https://pypi.org/pypi/" + package_name + "/json"
+    json = _get_response(endpoint, headers=headers).json()
+    return json["info"]["version"]
+
+
+def _unix_ts_to_wayback_ts(unix_ts):
+    return datetime.utcfromtimestamp(int(unix_ts)).strftime("%Y%m%d%H%M%S")
+
+
+def _add_payload(instance, payload):
+    if instance.start_timestamp:
+        payload["from"] = instance.start_timestamp
+
+    if instance.end_timestamp:
+        payload["to"] = instance.end_timestamp
+
+    if instance.gzip != True:
+        payload["gzip"] = "false"
+
+    if instance.match_type:
+        payload["matchType"] = instance.match_type
+
+    if instance.filters and len(instance.filters) > 0:
+        for i, f in enumerate(instance.filters):
+            payload["filter" + str(i)] = f
+
+    if instance.collapses and len(instance.collapses) > 0:
+        for i, f in enumerate(instance.collapses):
+            payload["collapse" + str(i)] = f
+
+    payload["url"] = instance.url
+
+
+def _ts(timestamp, data):
+    """
+    Get timestamp of last fetched archive.
+    If used before fetching any archive, will
+    use whatever self.JSON returns.
+
+    self.timestamp is None implies that
+    self.JSON will return any archive's JSON
+    that wayback machine provides it.
+    """
+
+    if timestamp:
+        return timestamp
+
+    if not data["archived_snapshots"]:
+        return datetime.max
+
+    return datetime.strptime(
+        data["archived_snapshots"]["closest"]["timestamp"], "%Y%m%d%H%M%S"
+    )
+
+
+def _check_match_type(match_type, url):
+    if not match_type:
+        return
+
+    if "*" in url:
+        raise WaybackError("Can not use wildcard with match_type argument")
+
+    legal_match_type = ["exact", "prefix", "host", "domain"]
+
+    if match_type not in legal_match_type:
+        exc_message = "{match_type} is not an allowed match type.\nUse one from 'exact', 'prefix', 'host' or 'domain'".format(
+            match_type=match_type
+        )
+        raise WaybackError(exc_message)
+
+
+def _check_collapses(collapses):
+
+    if not isinstance(collapses, list):
+        raise WaybackError("collapses must be a list.")
+
+    if len(collapses) == 0:
+        return
+
+    for collapse in collapses:
+        try:
+            match = re.search(
+                r"(urlkey|timestamp|original|mimetype|statuscode|digest|length)(:?[0-9]{1,99})?",
+                collapse,
+            )
+            field = match.group(1)
+
+            N = None
+            if 2 == len(match.groups()):
+                N = match.group(2)
+
+            if N:
+                if not (field + N == collapse):
+                    raise Exception
+            else:
+                if not (field == collapse):
+                    raise Exception
+
+        except Exception:
+            exc_message = "collapse argument '{collapse}' is not following the cdx collapse syntax.".format(
+                collapse=collapse
+            )
+            raise WaybackError(exc_message)
+
+
+def _check_filters(filters):
+    if not isinstance(filters, list):
+        raise WaybackError("filters must be a list.")
+
+    # [!]field:regex
+    for _filter in filters:
+        try:
+            match = re.search(
+                r"(\!?(?:urlkey|timestamp|original|mimetype|statuscode|digest|length)):(.*)",
+                _filter,
+            )
+
+            key = match.group(1)
+            val = match.group(2)
+
+        except Exception:
+            exc_message = (
+                "Filter '{_filter}' not following the cdx filter syntax.".format(
+                    _filter=_filter
+                )
+            )
+            raise WaybackError(exc_message)
+
+
+def _cleaned_url(url):
+    return str(url).strip().replace(" ", "%20")
+
+
+def _url_check(url):
+    """
+    Check for common URL problems.
+    What we are checking:
+    1) '.' in self.url, no url that ain't '.' in it.
+
+    If you known any others, please create a PR on the github repo.
+    """
+
+    if "." not in url:
+        exc_message = "'{url}' is not a vaild URL.".format(url=url)
+        raise URLError(exc_message)
+
+
+def _full_url(endpoint, params):
+    full_url = endpoint
+    if params:
+        full_url = endpoint if endpoint.endswith("?") else (endpoint + "?")
+        for key, val in params.items():
+            key = "filter" if key.startswith("filter") else key
+            key = "collapse" if key.startswith("collapse") else key
+            amp = "" if full_url.endswith("?") else "&"
+            full_url = (
+                full_url + amp + "{key}={val}".format(key=key, val=quote(str(val)))
+            )
+    return full_url
+
+
+def _get_total_pages(url, user_agent):
+    """
+    If showNumPages is passed in cdx API, it returns
+    'number of archive pages'and each page has many archives.
+
+    This func returns number of pages of archives (type int).
+    """
+    total_pages_url = (
+        "https://web.archive.org/cdx/search/cdx?url={url}&showNumPages=true".format(
+            url=url
+        )
+    )
+    headers = {"User-Agent": user_agent}
+    return int((_get_response(total_pages_url, headers=headers).text).strip())
+
+
+def _archive_url_parser(header, url, latest_version=__version__, instance=None):
+    """
+    The wayback machine's save API doesn't
+    return JSON response, we are required
+    to read the header of the API response
+    and look for the archive URL.
+
+    This method has some regexen (or regexes)
+    that search for archive url in header.
+
+    This method is used when you try to
+    save a webpage on wayback machine.
+
+    Two cases are possible:
+    1) Either we find the archive url in
+       the header.
+
+    2) Or we didn't find the archive url in
+       API header.
+
+    If we found the archive URL we return it.
+
+    Return format:
+
+    web.archive.org/web/<TIMESTAMP>/<URL>
+
+    And if we couldn't find it, we raise
+    WaybackError with an error message.
+    """
+
+    if "save redirected" in header and instance:
+        time.sleep(60)  # makeup for archive time
+
+        now = datetime.utcnow().timetuple()
+        timestamp = _wayback_timestamp(
+            year=now.tm_year,
+            month=now.tm_mon,
+            day=now.tm_mday,
+            hour=now.tm_hour,
+            minute=now.tm_min,
+        )
+
+        return_str = "web.archive.org/web/{timestamp}/{url}".format(
+            timestamp=timestamp, url=url
+        )
+        url = "https://" + return_str
+
+        headers = {"User-Agent": instance.user_agent}
+
+        res = _get_response(url, headers=headers)
+
+        if res.status_code < 400:
+            return "web.archive.org/web/{timestamp}/{url}".format(
+                timestamp=timestamp, url=url
+            )
+
+    # Regex1
+    m = re.search(r"Content-Location: (/web/[0-9]{14}/.*)", str(header))
+    if m:
+        return "web.archive.org" + m.group(1)
+
+    # Regex2
+    m = re.search(
+        r"rel=\"memento.*?(web\.archive\.org/web/[0-9]{14}/.*?)>", str(header)
+    )
+    if m:
+        return m.group(1)
+
+    # Regex3
+    m = re.search(r"X-Cache-Key:\shttps(.*)[A-Z]{2}", str(header))
+    if m:
+        return m.group(1)
+
+    if instance:
+        newest_archive = None
+        try:
+            newest_archive = instance.newest()
+        except WaybackError:
+            pass  # We don't care as this is a save request
+
+        if newest_archive:
+            minutes_old = (
+                datetime.utcnow() - newest_archive.timestamp
+            ).total_seconds() / 60.0
+
+            if minutes_old <= 30:
+                archive_url = newest_archive.archive_url
+                m = re.search(r"web\.archive\.org/web/[0-9]{14}/.*", archive_url)
+                if m:
+                    instance.cached_save = True
+                    return m.group(0)
+
+    if __version__ == latest_version:
+        exc_message = (
+            "No archive URL found in the API response. "
+            "If '{url}' can be accessed via your web browser then either "
+            "Wayback Machine is malfunctioning or it refused to archive your URL."
+            "\nHeader:\n{header}".format(url=url, header=header)
+        )
+    else:
+        exc_message = (
+            "No archive URL found in the API response. "
+            "If '{url}' can be accessed via your web browser then either "
+            "this version of waybackpy ({version}) is out of date or WayBack "
+            "Machine is malfunctioning. Visit 'https://github.com/akamhy/waybackpy' "
+            "for the latest version of waybackpy.\nHeader:\n{header}".format(
+                url=url, version=__version__, header=header
+            )
+        )
+
+    raise WaybackError(exc_message)
+
+
+def _wayback_timestamp(**kwargs):
+    """
+    Wayback Machine archive URLs
+    have a timestamp in them.
+
+    The standard archive URL format is
+    https://web.archive.org/web/20191214041711/https://www.youtube.com
+
+    If we break it down in three parts:
+    1 ) The start (https://web.archive.org/web/)
+    2 ) timestamp (20191214041711)
+    3 ) https://www.youtube.com, the original URL
+
+    The near method takes year, month, day, hour and minute
+    as Arguments, their type is int.
+
+    This method takes those integers and converts it to
+    wayback machine timestamp and returns it.
+
+    Return format is string.
+    """
+
+    return "".join(
+        str(kwargs[key]).zfill(2) for key in ["year", "month", "day", "hour", "minute"]
+    )
+
+
+def _get_response(
+    endpoint,
+    params=None,
+    headers=None,
+    return_full_url=False,
+    retries=5,
+    backoff_factor=0.5,
+    no_raise_on_redirects=False,
+):
+    """
+    This function is used make get request.
+    We use the requests package to make the
+    requests.
+
+
+    We try five times and if it fails it raises
+    WaybackError exception.
+
+    You can handles WaybackError by importing:
+    from waybackpy.exceptions import WaybackError
+
+    try:
+        ...
+    except WaybackError as e:
+        # handle it
+    """
+
+    # From https://stackoverflow.com/a/35504626
+    # By https://stackoverflow.com/users/401467/datashaman
+
+    s = requests.Session()
+
+    retries = Retry(
+        total=retries,
+        backoff_factor=backoff_factor,
+        status_forcelist=[500, 502, 503, 504],
+    )
+
+    s.mount("https://", HTTPAdapter(max_retries=retries))
+
+    url = _full_url(endpoint, params)
+
+    try:
+        if not return_full_url:
+            return s.get(url, headers=headers)
+        return (url, s.get(url, headers=headers))
+    except Exception as e:
+        reason = str(e)
+        if no_raise_on_redirects:
+            if "Exceeded 30 redirects" in reason:
+                return
+        exc_message = "Error while retrieving {url}.\n{reason}".format(
+            url=url, reason=reason
+        )
+        exc = WaybackError(exc_message)
+        exc.__cause__ = e
+        raise exc
--- a/waybackpy/wrapper.py
+++ b/waybackpy/wrapper.py
@ -1,153 +1,35 @@
 import re
-import requests
-import concurrent.futures
 from datetime import datetime, timedelta
-from waybackpy.__version__ import __version__
-from waybackpy.exceptions import WaybackError, URLError
-
-
-default_user_agent = "waybackpy python package - https://github.com/akamhy/waybackpy"
-
-
-def _get_total_pages(url, user_agent):
-    """
-    If showNumPages is passed in cdx API, it returns
-    'number of archive pages'and each page has many archives.
-
-    This func returns number of pages of archives (type int).
-    """
-    total_pages_url = (
-        "https://web.archive.org/cdx/search/cdx?url=%s&showNumPages=true" % url
-    )
-    headers = {"User-Agent": user_agent}
-    return int((_get_response(total_pages_url, headers=headers).text).strip())
-
-
-def _archive_url_parser(header, url):
-    """
-    The wayback machine's save API doesn't
-    return JSON response, we are required
-    to read the header of the API response
-    and look for the archive URL.
-
-    This method has some regexen (or regexes)
-    that search for archive url in header.
-
-    This method is used when you try to
-    save a webpage on wayback machine.
-
-    Two cases are possible:
-    1) Either we find the archive url in
-       the header.
-
-    2) Or we didn't find the archive url in
-       API header.
-
-    If we found the archive URL we return it.
-
-    And if we couldn't find it, we raise
-    WaybackError with an error message.
-    """
-
-    # Regex1
-    m = re.search(r"Content-Location: (/web/[0-9]{14}/.*)", str(header))
-    if m:
-        return "web.archive.org" + m.group(1)
-
-    # Regex2
-    m = re.search(
-        r"rel=\"memento.*?(web\.archive\.org/web/[0-9]{14}/.*?)>", str(header)
-    )
-    if m:
-        return m.group(1)
-
-    # Regex3
-    m = re.search(r"X-Cache-Key:\shttps(.*)[A-Z]{2}", str(header))
-    if m:
-        return m.group(1)
-
-    raise WaybackError(
-        "No archive URL found in the API response. "
-        "If '%s' can be accessed via your web browser then either "
-        "this version of waybackpy (%s) is out of date or WayBack Machine is malfunctioning. Visit "
-        "'https://github.com/akamhy/waybackpy' for the latest version "
-        "of waybackpy.\nHeader:\n%s" % (url, __version__, str(header))
-    )
-
-
-def _wayback_timestamp(**kwargs):
-    """
-    Wayback Machine archive URLs
-    have a timestamp in them.
-
-    The standard archive URL format is
-    https://web.archive.org/web/20191214041711/https://www.youtube.com
-
-    If we break it down in three parts:
-    1 ) The start (https://web.archive.org/web/)
-    2 ) timestamp (20191214041711)
-    3 ) https://www.youtube.com, the original URL
-
-    The near method takes year, month, day, hour and minute
-    as Arguments, their type is int.
-
-    This method takes those integers and converts it to
-    wayback machine timestamp and returns it.
-
-    Return format is string.
-    """
-
-    return "".join(
-        str(kwargs[key]).zfill(2) for key in ["year", "month", "day", "hour", "minute"]
-    )
-
-
-def _get_response(endpoint, params=None, headers=None):
-    """
-    This function is used make get request.
-    We use the requests package to make the
-    requests.
-
-
-    We try twice and if both the times is fails And
-    raises exceptions we give-up and raise WaybackError.
-
-    You can handles WaybackError by importing:
-    from waybackpy.exceptions import WaybackError
-
-    try:
-        ...
-    except WaybackError as e:
-        # handle it
-    """
-
-    try:
-        return requests.get(endpoint, params=params, headers=headers)
-    except Exception:
-        try:
-            return requests.get(endpoint, params=params, headers=headers)
-        except Exception as e:
-            exc = WaybackError("Error while retrieving %s" % endpoint)
-            exc.__cause__ = e
-            raise exc
+from .exceptions import WaybackError
+from .cdx import Cdx
+from .utils import (
+    _archive_url_parser,
+    _wayback_timestamp,
+    _get_response,
+    default_user_agent,
+    _url_check,
+    _cleaned_url,
+    _ts,
+    _unix_ts_to_wayback_ts,
+    _latest_version,
+)


 class Url:
-    """
-    waybackpy Url class, Type : <class 'waybackpy.wrapper.Url'>
-    """
-
    def __init__(self, url, user_agent=default_user_agent):
        self.url = url
        self.user_agent = str(user_agent)
-        self._url_check()
+        _url_check(self.url)
        self._archive_url = None
        self.timestamp = None
        self._JSON = None
-        self._alive_url_list = []
+        self.latest_version = None
+        self.cached_save = False

    def __repr__(self):
-        return "waybackpy.Url(url=%s, user_agent=%s)" % (self.url, self.user_agent)
+        return "waybackpy.Url(url={url}, user_agent={user_agent})".format(
+            url=self.url, user_agent=self.user_agent
+        )

    def __str__(self):
        """
@ -164,7 +46,7 @@ class Url:

        if not self._archive_url:
            self._archive_url = self.archive_url
-        return "%s" % self._archive_url
+        return "{archive_url}".format(archive_url=self._archive_url)

    def __len__(self):
        """
@ -192,18 +74,6 @@ class Url:

        return (datetime.utcnow() - self.timestamp).days

-    def _url_check(self):
-        """
-        Check for common URL problems.
-        What we are checking:
-        1) '.' in self.url, no url that ain't '.' in it.
-
-        If you known any others, please create a PR on the github repo.
-        """
-
-        if "." not in self.url:
-            raise URLError("'%s' is not a vaild URL." % self.url)
-
    @property
    def JSON(self):
        """
@ -220,7 +90,7 @@ class Url:

        endpoint = "https://archive.org/wayback/available"
        headers = {"User-Agent": self.user_agent}
-        payload = {"url": "%s" % self._cleaned_url()}
+        payload = {"url": "{url}".format(url=_cleaned_url(self.url))}
        response = _get_response(endpoint, params=payload, headers=headers)
        return response.json()

@ -251,37 +121,8 @@ class Url:

    @property
    def _timestamp(self):
-        """
-        Get timestamp of last fetched archive.
-        If used before fetching any archive, will
-        use whatever self.JSON returns.
-
-        self.timestamp is None implies that
-        self.JSON will return any archive's JSON
-        that wayback machine provides it.
-        """
-
-        if self.timestamp:
-            return self.timestamp
-
-        data = self.JSON
-
-        if not data["archived_snapshots"]:
-            ts = datetime.max
-
-        else:
-            ts = datetime.strptime(
-                data["archived_snapshots"]["closest"]["timestamp"], "%Y%m%d%H%M%S"
-            )
-        self.timestamp = ts
-        return ts
-
-    def _cleaned_url(self):
-        """
-        Remove EOL
-        replace " " with "_"
-        """
-        return str(self.url).strip().replace(" ", "_")
+        self.timestamp = _ts(self.timestamp, self.JSON)
+        return self.timestamp

    def save(self):
        """
@ -291,34 +132,68 @@ class Url:
        And to get the archive URL we are required to read the
        header of the API response.

-        _get_response() takes care of the get requests. It uses requests
-        package.
+        _get_response() takes care of the get requests.

        _archive_url_parser() parses the archive from the header.

        """
-        request_url = "https://web.archive.org/save/" + self._cleaned_url()
+        request_url = "https://web.archive.org/save/" + _cleaned_url(self.url)
        headers = {"User-Agent": self.user_agent}
-        response = _get_response(request_url, params=None, headers=headers)
-        self._archive_url = "https://" + _archive_url_parser(response.headers, self.url)
-        self.timestamp = datetime.utcnow()
+
+        response = _get_response(
+            request_url,
+            params=None,
+            headers=headers,
+            backoff_factor=2,
+            no_raise_on_redirects=True,
+        )
+
+        if not self.latest_version:
+            self.latest_version = _latest_version("waybackpy", headers=headers)
+        if response:
+            res_headers = response.headers
+        else:
+            res_headers = "save redirected"
+        self._archive_url = "https://" + _archive_url_parser(
+            res_headers,
+            self.url,
+            latest_version=self.latest_version,
+            instance=self,
+        )
+
+        m = re.search(r"https?://web.archive.org/web/([0-9]{14})/http", self._archive_url)
+        str_ts = m.group(1)
+        ts = datetime.strptime(str_ts, "%Y%m%d%H%M%S")
+        now = datetime.utcnow()
+        total_seconds = int((now - ts).total_seconds())
+
+        if total_seconds > 60 * 3:
+            self.cached_save = True
+
+        self.timestamp = ts
+
        return self

    def get(self, url="", user_agent="", encoding=""):
        """
-        Return the source code of the supplied URL.
+        Return the source code of the last archived URL,
+        if no URL is passed to this method.
+
        If encoding is not supplied, it is auto-detected
         from the response itself by requests package.
        """

-        if not url:
-            url = self._cleaned_url()
+        if not url and self._archive_url:
+            url = self._archive_url
+
+        elif not url and not self._archive_url:
+            url = _cleaned_url(self.url)

        if not user_agent:
            user_agent = self.user_agent

-        headers = {"User-Agent": self.user_agent}
-        response = _get_response(url, params=None, headers=headers)
+        headers = {"User-Agent": str(user_agent)}
+        response = _get_response(str(url), params=None, headers=headers)

        if not encoding:
            try:
@ -328,7 +203,15 @@ class Url:

        return response.content.decode(encoding.replace("text/html", "UTF-8", 1))

-    def near(self, year=None, month=None, day=None, hour=None, minute=None):
+    def near(
+        self,
+        year=None,
+        month=None,
+        day=None,
+        hour=None,
+        minute=None,
+        unix_timestamp=None,
+    ):
        """
        Wayback Machine can have many archives of a webpage,
        sometimes we want archive close to a specific time.
@ -350,25 +233,34 @@ class Url:

        And finally return self.
        """
-        now = datetime.utcnow().timetuple()
-        timestamp = _wayback_timestamp(
-            year=year if year else now.tm_year,
-            month=month if month else now.tm_mon,
-            day=day if day else now.tm_mday,
-            hour=hour if hour else now.tm_hour,
-            minute=minute if minute else now.tm_min,
-        )
+
+        if unix_timestamp:
+            timestamp = _unix_ts_to_wayback_ts(unix_timestamp)
+        else:
+            now = datetime.utcnow().timetuple()
+            timestamp = _wayback_timestamp(
+                year=year if year else now.tm_year,
+                month=month if month else now.tm_mon,
+                day=day if day else now.tm_mday,
+                hour=hour if hour else now.tm_hour,
+                minute=minute if minute else now.tm_min,
+            )

        endpoint = "https://archive.org/wayback/available"
        headers = {"User-Agent": self.user_agent}
-        payload = {"url": "%s" % self._cleaned_url(), "timestamp": timestamp}
+        payload = {
+            "url": "{url}".format(url=_cleaned_url(self.url)),
+            "timestamp": timestamp,
+        }
        response = _get_response(endpoint, params=payload, headers=headers)
        data = response.json()

        if not data["archived_snapshots"]:
            raise WaybackError(
-                "Can not find archive for '%s' try later or use wayback.Url(url, user_agent).save() "
-                "to create a new archive." % self._cleaned_url()
+                "Can not find archive for '{url}' try later or use wayback.Url(url, user_agent).save() "
+                "to create a new archive.\nAPI response:\n{text}".format(
+                    url=_cleaned_url(self.url), text=response.text
+                )
            )
        archive_url = data["archived_snapshots"]["closest"]["url"]
        archive_url = archive_url.replace(
@ -418,161 +310,50 @@ class Url:
        """

        cdx = Cdx(
-            self._cleaned_url(),
+            _cleaned_url(self.url),
            user_agent=self.user_agent,
            start_timestamp=start_timestamp,
            end_timestamp=end_timestamp,
        )
        i = 0
        for _ in cdx.snapshots():
-            i += 1
+            i = i + 1
        return i

-    def live_urls_picker(self, url):
-        """
-        This method is used to check if supplied url
-        is >= 400.
-        """
-
-        try:
-            response_code = requests.get(url).status_code
-        except Exception:
-            return  # we don't care if Exception
-
-        # 200s are OK and 300s are usually redirects, if you don't want redirects replace 400 with 300
-        if response_code >= 400:
-            return
-
-        self._alive_url_list.append(url)
-
    def known_urls(
-        self, alive=False, subdomain=False, start_timestamp=None, end_timestamp=None
+        self,
+        subdomain=False,
+        host=False,
+        start_timestamp=None,
+        end_timestamp=None,
+        match_type="prefix",
    ):
        """
-        Returns list of URLs known to exist for given domain name
-        because these URLs were crawled by WayBack Machine bots.
-        Useful for pen-testers and others.
-        Idea by Mohammed Diaa (https://github.com/mhmdiaa) from:
-        https://gist.github.com/mhmdiaa/adf6bff70142e5091792841d4b372050
+        Yields list of URLs known to exist for given input.
+        Defaults to input URL as prefix.
+
+        This method is kept for compatibility, use the Cdx class instead.
+        This method itself depends on Cdx.
+
+         Idea by Mohammed Diaa (https://github.com/mhmdiaa) from:
+         https://gist.github.com/mhmdiaa/adf6bff70142e5091792841d4b372050
        """

-        url_list = []
-
        if subdomain:
-            url = "*.%s/*" % self._cleaned_url()
-        else:
-            url = "%s/*" % self._cleaned_url()
+            match_type = "domain"
+        if host:
+            match_type = "host"

        cdx = Cdx(
-            url,
+            _cleaned_url(self.url),
            user_agent=self.user_agent,
            start_timestamp=start_timestamp,
            end_timestamp=end_timestamp,
+            match_type=match_type,
+            collapses=["urlkey"],
        )
+
        snapshots = cdx.snapshots()

-        url_list = []
        for snapshot in snapshots:
-            url_list.append(snapshot.original)
-
-        url_list = list(set(url_list))  # remove duplicates
-
-        # Remove all deadURLs from url_list if alive=True
-        if alive:
-            with concurrent.futures.ThreadPoolExecutor() as executor:
-                executor.map(self.live_urls_picker, url_list)
-            url_list = self._alive_url_list
-
-        return url_list
-
-
-class CdxSnapshot:
-    """
-    This class helps to handle the Cdx Snapshots easily.
-
-    What the raw data looks like:
-    org,archive)/ 20080126045828 http://github.com text/html 200 Q4YULN754FHV2U6Q5JUT6Q2P57WEWNNY 1415
-    """
-
-    def __init__(
-        self, urlkey, timestamp, original, mimetype, statuscode, digest, length
-    ):
-        self.urlkey = urlkey  # Useless
-        self.timestamp = timestamp
-        self.original = original
-        self.mimetype = mimetype
-        self.statuscode = statuscode
-        self.digest = digest
-        self.length = length
-        self.archive_url = "https://web.archive.org/web/%s/%s" % (
-            self.timestamp,
-            self.original,
-        )
-
-    def __str__(self):
-        return self.archive_url
-
-
-class Cdx:
-    """
-    waybackpy Cdx class, Type : <class 'waybackpy.wrapper.Cdx'>
-
-    Cdx keys are :
-    urlkey
-    timestamp
-    original
-    mimetype
-    statuscode
-    digest
-    length
-    """
-
-    def __init__(
-        self,
-        url,
-        user_agent=default_user_agent,
-        start_timestamp=None,
-        end_timestamp=None,
-    ):
-        self.url = url
-        self.user_agent = str(user_agent)
-        self.start_timestamp = str(start_timestamp) if start_timestamp else None
-        self.end_timestamp = str(end_timestamp) if end_timestamp else None
-
-    def snapshots(self):
-        """
-        This function yeilds snapshots encapsulated
-        in CdxSnapshot for more usability.
-        """
-        payload = {}
-        endpoint = "https://web.archive.org/cdx/search/cdx"
-        total_pages = _get_total_pages(self.url, self.user_agent)
-        headers = {"User-Agent": self.user_agent}
-        if self.start_timestamp:
-            payload["from"] = self.start_timestamp
-        if self.end_timestamp:
-            payload["to"] = self.end_timestamp
-        payload["url"] = self.url
-
-        for i in range(total_pages):
-            payload["page"] = str(i)
-            res = _get_response(endpoint, params=payload, headers=headers)
-            text = res.text
-            if text.isspace() or len(text) <= 1 or not text:
-                break
-            snapshot_list = text.split("\n")
-            for snapshot in snapshot_list:
-                if len(snapshot) < 15:
-                    continue
-                (
-                    urlkey,
-                    timestamp,
-                    original,
-                    mimetype,
-                    statuscode,
-                    digest,
-                    length,
-                ) = snapshot.split(" ")
-                yield CdxSnapshot(
-                    urlkey, timestamp, original, mimetype, statuscode, digest, length
-                )
+            yield (snapshot.original)
Author	SHA1	Message	Date
Akash Mahanty	88cda94c0b	v2.4.2 (#89 ) * v2.4.2 * v2.4.2	2021-01-24 17:03:35 +05:30
Akash Mahanty	09290f88d1	fix one more error	2021-01-24 16:58:53 +05:30
Akash Mahanty	e5835091c9	import re	2021-01-24 16:56:59 +05:30
Akash Mahanty	7312ed1f4f	set cached_save to True if archive older than 3 mins.	2021-01-24 16:53:36 +05:30
Akash Mahanty	6ae8f843d3	add --file to --known_urls	2021-01-24 16:15:11 +05:30
Akash Mahanty	36b936820b	known urls now yileds, more reliable. And save the file in chucks wrt to response. --file arg can be used to create output file, if --file not used no output will be saved in any file. (#88 )	2021-01-24 16:11:39 +05:30
Akash Mahanty	a3bc6aad2b	too much API usage by duplicate tests was causing too much tests failure	2021-01-23 21:08:21 +05:30
Akash Mahanty	edc2f63d93	Output valid JSON, dumps python dict. Make JSON valid.	2021-01-23 20:43:52 +05:30
Akash Mahanty	ffe0810b12	flag to check if the archive saved is 30 mins older or not	2021-01-16 12:06:08 +05:30
Akash Mahanty	40233eb115	improve code quality, remove unused imports, use system randomness etc	2021-01-16 11:35:13 +05:30
Akash Mahanty	d549d31421	improve save method, now we know that 302 errors indicates that wayback machine is archiving the URL and hasn't yet archived. We construct an artifical archive with the current UTC time and check for HTTP status code 20* or 30*. If we verify the archival, we return the artifical archive. The artificial archive will automatically point to the new archive or in best case will be the new archive after some time.	2021-01-16 10:47:43 +05:30
Akash Mahanty	0725163af8	mimify the logo, remove ugly old logos	2021-01-15 18:14:48 +05:30
Akash Mahanty	712471176b	better error messages(str), check latest version before asking for an upgrade and rm alive checking	2021-01-15 16:47:26 +05:30
Akash Mahanty	dcd7b03302	getting rid of c style str formatting, now using .format	2021-01-14 19:30:07 +05:30
Akash Mahanty	76205d9cf6	backoff_factor=2 for save, incr success by 25%	2021-01-13 10:13:16 +05:30
Akash Mahanty	ec0a0d04cc	+ dequeued0 dequeued0 (https://github.com/dequeued0) for reporting bugs and useful feature requests.	2021-01-12 10:52:41 +05:30
Akash Mahanty	7bb01df846	v2.4.1	2021-01-12 10:18:09 +05:30
Akash Mahanty	6142e0b353	get should retrive the last fetched archive by default	2021-01-12 10:07:14 +05:30
Akash Mahanty	a65990aee3	don't use pagination API if total pages <= 2	2021-01-12 09:46:07 +05:30
Akash Mahanty	259a024eb1	joke? they changed their robots.txt	2021-01-11 23:17:01 +05:30
Akash Mahanty	91402792e6	+ Supported Features tell what the package can do, many users probably do not read the full usage.	2021-01-11 23:01:18 +05:30
Akash Mahanty	eabf4dc046	don't fetch more pages if >=2 pages are empty	2021-01-11 22:43:14 +05:30
Akash Mahanty	5a7bd73565	support unix ts as an arg in near	2021-01-11 19:53:37 +05:30
Akash Mahanty	4693dbf9c1	change str repr of cdxsnapshot to cdx line	2021-01-11 09:34:37 +05:30
Akash Mahanty	f4f2e51315	V2.4.0 (#62 ) * v 2.4.0 * v 2.4.0	2021-01-10 11:53:45 +05:30
Akash Mahanty	d6b7df6837	no need to de-duplicate as we are collapsing the results by urlkey Same urls aren't recieved	2021-01-10 11:36:46 +05:30
Akash Mahanty	dafba5d0cb	collapses=["urlkey"] for known urls	2021-01-10 11:34:06 +05:30
Akash Mahanty	6c71dfbe41	use cdx matchtype for domain and host	2021-01-10 11:10:49 +05:30
Akash Mahanty	a6470b1036	not passing dict to cdxsnapshot	2021-01-10 10:40:32 +05:30
Akash Mahanty	04cda4558e	fix test	2021-01-10 03:18:09 +05:30
Akash Mahanty	625ed63482	remove asserts stmnts	2021-01-10 03:05:48 +05:30
Akash Mahanty	a03813315f	full cdx api support	2021-01-10 02:23:53 +05:30
Akash Mahanty	a2550f17d7	retries support for get requests	2021-01-06 01:58:38 +05:30
Akash Mahanty	15ef5816db	Always cast url to string, avoid passing waybackpy objects to _get_response	2021-01-05 19:46:17 +05:30
Akash Mahanty	93b52bd0fe	FIX : don't use self.user_agent if user_agent passed in get()	2021-01-05 19:31:27 +05:30
Akash Mahanty	28ff877081	Update README.md	2021-01-05 19:08:35 +05:30