Compare commits
8 Commits
Author | SHA1 | Date | |
---|---|---|---|
57a32669b5 | |||
fe017cbcc8 | |||
5edb03d24b | |||
c5de2232ba | |||
ca9186c301 | |||
8a4b631c13 | |||
ec9ce92f48 | |||
e95d35c37f |
20
README.md
20
README.md
@ -1,19 +1,19 @@
|
||||
# waybackpy
|
||||
|
||||

|
||||
[](https://travis-ci.org/akamhy/waybackpy)
|
||||
[](https://pypistats.org/packages/waybackpy)
|
||||
[](https://codecov.io/gh/akamhy/waybackpy)
|
||||
[](https://pepy.tech/project/waybackpy/month)
|
||||
[](https://github.com/akamhy/waybackpy/releases)
|
||||
[](https://www.codacy.com/manual/akamhy/waybackpy?utm_source=github.com&utm_medium=referral&utm_content=akamhy/waybackpy&utm_campaign=Badge_Grade)
|
||||
[](https://github.com/akamhy/waybackpy/blob/master/LICENSE)
|
||||
[](https://codeclimate.com/github/akamhy/waybackpy/maintainability)
|
||||
[](https://www.codefactor.io/repository/github/akamhy/waybackpy)
|
||||
[](https://www.python.org/)
|
||||

|
||||
[](https://pypi.org/project/waybackpy/)
|
||||

|
||||
[](https://github.com/akamhy/waybackpy/graphs/commit-activity)
|
||||
[](https://codecov.io/gh/akamhy/waybackpy)
|
||||

|
||||

|
||||

|
||||
[](https://github.com/akamhy/waybackpy/blob/master/LICENSE)
|
||||
|
||||
|
||||

|
||||
@ -28,14 +28,14 @@ Table of contents
|
||||
* [Installation](#installation)
|
||||
|
||||
* [Usage](#usage)
|
||||
* [As a python package](#as-a-python-package)
|
||||
* [As a Python package](#as-a-python-package)
|
||||
* [Saving an url using save()](#capturing-aka-saving-an-url-using-save)
|
||||
* [Receiving the oldest archive for an URL Using oldest()](#receiving-the-oldest-archive-for-an-url-using-oldest)
|
||||
* [Receiving the recent most/newest archive for an URL using newest()](#receiving-the-newest-archive-for-an-url-using-newest)
|
||||
* [Receiving archive close to a specified year, month, day, hour, and minute using near()](#receiving-archive-close-to-a-specified-year-month-day-hour-and-minute-using-near)
|
||||
* [Get the content of webpage using get()](#get-the-content-of-webpage-using-get)
|
||||
* [Count total archives for an URL using total_archives()](#count-total-archives-for-an-url-using-total_archives)
|
||||
* [With CLI](#with-the-cli)
|
||||
* [With Command-line interface](#with-the-command-line-interface)
|
||||
* [Save](#save)
|
||||
* [Oldest archive](#oldest-archive)
|
||||
* [Newest archive](#newest-archive)
|
||||
@ -63,7 +63,7 @@ pip install git+https://github.com/akamhy/waybackpy.git
|
||||
|
||||
## Usage
|
||||
|
||||
### As a python package
|
||||
### As a Python package
|
||||
|
||||
#### Capturing aka Saving an url using save()
|
||||
```python
|
||||
@ -230,7 +230,7 @@ print(archive_count) # total_archives() returns an int
|
||||
```
|
||||
<sub>Try this out in your browser @ <https://repl.it/@akamhy/WaybackPyTotalArchivesExample></sub>
|
||||
|
||||
### With the CLI
|
||||
### With the Command-line interface
|
||||
|
||||
#### Save
|
||||
```bash
|
||||
|
37
index.rst
37
index.rst
@ -1,9 +1,9 @@
|
||||
waybackpy
|
||||
=========
|
||||
|
||||
|Build Status| |Downloads| |Release| |Codacy Badge| |License: MIT|
|
||||
|Maintainability| |CodeFactor| |made-with-python| |pypi| |PyPI - Python
|
||||
Version| |Maintenance| |codecov| |image12| |contributions welcome|
|
||||
|contributions welcome| |Build Status| |codecov| |Downloads| |Release|
|
||||
|Codacy Badge| |Maintainability| |CodeFactor| |made-with-python| |pypi|
|
||||
|PyPI - Python Version| |Maintenance| |Repo size| |License: MIT|
|
||||
|
||||
|Internet Archive| |Wayback Machine|
|
||||
|
||||
@ -22,7 +22,7 @@ Table of contents
|
||||
- `Installation <#installation>`__
|
||||
|
||||
- `Usage <#usage>`__
|
||||
- `As a python package <#as-a-python-package>`__
|
||||
- `As a Python package <#as-a-python-package>`__
|
||||
|
||||
- `Saving an url using
|
||||
save() <#capturing-aka-saving-an-url-using-save>`__
|
||||
@ -38,7 +38,7 @@ Table of contents
|
||||
- `Count total archives for an URL using
|
||||
total\_archives() <#count-total-archives-for-an-url-using-total_archives>`__
|
||||
|
||||
- `With CLI <#with-the-cli>`__
|
||||
- `With Command-line interface <#with-the-command-line-interface>`__
|
||||
|
||||
- `Save <#save>`__
|
||||
- `Oldest archive <#oldest-archive>`__
|
||||
@ -75,7 +75,7 @@ or direct from this repository using git.
|
||||
Usage
|
||||
-----
|
||||
|
||||
As a python package
|
||||
As a Python package
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Capturing aka Saving an url using save()
|
||||
@ -269,8 +269,8 @@ Count total archives for an URL using total\_archives()
|
||||
Try this out in your browser @
|
||||
https://repl.it/@akamhy/WaybackPyTotalArchivesExample\
|
||||
|
||||
With the CLI
|
||||
~~~~~~~~~~~~
|
||||
With the Command-line interface
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Save
|
||||
^^^^
|
||||
@ -348,8 +348,8 @@ Tests
|
||||
Dependency
|
||||
----------
|
||||
|
||||
- None, just python standard libraries (re, json, urllib, argparse and datetime).
|
||||
Both python 2 and 3 are supported :)
|
||||
- None, just python standard libraries (re, json, urllib, argparse and
|
||||
datetime). Both python 2 and 3 are supported :)
|
||||
|
||||
License
|
||||
-------
|
||||
@ -357,16 +357,17 @@ License
|
||||
`MIT
|
||||
License <https://github.com/akamhy/waybackpy/blob/master/LICENSE>`__
|
||||
|
||||
.. |contributions welcome| image:: https://img.shields.io/static/v1.svg?label=Contributions&message=Welcome&color=0059b3&style=flat-square
|
||||
.. |Build Status| image:: https://img.shields.io/travis/akamhy/waybackpy.svg?label=Travis%20CI&logo=travis&style=flat-square
|
||||
:target: https://travis-ci.org/akamhy/waybackpy
|
||||
.. |Downloads| image:: https://img.shields.io/pypi/dm/waybackpy.svg
|
||||
:target: https://pypistats.org/packages/waybackpy
|
||||
.. |codecov| image:: https://codecov.io/gh/akamhy/waybackpy/branch/master/graph/badge.svg
|
||||
:target: https://codecov.io/gh/akamhy/waybackpy
|
||||
.. |Downloads| image:: https://pepy.tech/badge/waybackpy/month
|
||||
:target: https://pepy.tech/project/waybackpy/month
|
||||
.. |Release| image:: https://img.shields.io/github/v/release/akamhy/waybackpy.svg
|
||||
:target: https://github.com/akamhy/waybackpy/releases
|
||||
.. |Codacy Badge| image:: https://api.codacy.com/project/badge/Grade/255459cede9341e39436ec8866d3fb65
|
||||
:target: https://www.codacy.com/manual/akamhy/waybackpy?utm_source=github.com&utm_medium=referral&utm_content=akamhy/waybackpy&utm_campaign=Badge_Grade
|
||||
.. |License: MIT| image:: https://img.shields.io/badge/License-MIT-yellow.svg
|
||||
:target: https://github.com/akamhy/waybackpy/blob/master/LICENSE
|
||||
.. |Maintainability| image:: https://api.codeclimate.com/v1/badges/942f13d8177a56c1c906/maintainability
|
||||
:target: https://codeclimate.com/github/akamhy/waybackpy/maintainability
|
||||
.. |CodeFactor| image:: https://www.codefactor.io/repository/github/akamhy/waybackpy/badge
|
||||
@ -374,12 +375,12 @@ License <https://github.com/akamhy/waybackpy/blob/master/LICENSE>`__
|
||||
.. |made-with-python| image:: https://img.shields.io/badge/Made%20with-Python-1f425f.svg
|
||||
:target: https://www.python.org/
|
||||
.. |pypi| image:: https://img.shields.io/pypi/v/waybackpy.svg
|
||||
:target: https://pypi.org/project/waybackpy/
|
||||
.. |PyPI - Python Version| image:: https://img.shields.io/pypi/pyversions/waybackpy?style=flat-square
|
||||
.. |Maintenance| image:: https://img.shields.io/badge/Maintained%3F-yes-green.svg
|
||||
:target: https://github.com/akamhy/waybackpy/graphs/commit-activity
|
||||
.. |codecov| image:: https://codecov.io/gh/akamhy/waybackpy/branch/master/graph/badge.svg
|
||||
:target: https://codecov.io/gh/akamhy/waybackpy
|
||||
.. |image12| image:: https://img.shields.io/github/repo-size/akamhy/waybackpy.svg?label=Repo%20size&style=flat-square
|
||||
.. |contributions welcome| image:: https://img.shields.io/static/v1.svg?label=Contributions&message=Welcome&color=0059b3&style=flat-square
|
||||
.. |Repo size| image:: https://img.shields.io/github/repo-size/akamhy/waybackpy.svg?label=Repo%20size&style=flat-square
|
||||
.. |License: MIT| image:: https://img.shields.io/badge/License-MIT-yellow.svg
|
||||
:target: https://github.com/akamhy/waybackpy/blob/master/LICENSE
|
||||
.. |Internet Archive| image:: https://upload.wikimedia.org/wikipedia/commons/thumb/8/84/Internet_Archive_logo_and_wordmark.svg/84px-Internet_Archive_logo_and_wordmark.svg.png
|
||||
.. |Wayback Machine| image:: https://upload.wikimedia.org/wikipedia/commons/thumb/0/01/Wayback_Machine_logo_2010.svg/284px-Wayback_Machine_logo_2010.svg.png
|
||||
|
2
setup.py
2
setup.py
@ -19,7 +19,7 @@ setup(
|
||||
author = about['__author__'],
|
||||
author_email = about['__author_email__'],
|
||||
url = about['__url__'],
|
||||
download_url = 'https://github.com/akamhy/waybackpy/archive/2.1.6.tar.gz',
|
||||
download_url = 'https://github.com/akamhy/waybackpy/archive/2.1.7.tar.gz',
|
||||
keywords = ['wayback', 'archive', 'archive website', 'wayback machine', 'Internet Archive'],
|
||||
install_requires=[],
|
||||
python_requires= ">=2.7",
|
||||
|
@ -74,17 +74,16 @@ def test_save():
|
||||
url2 = "ha ha ha ha"
|
||||
waybackpy.Url(url2, user_agent)
|
||||
time.sleep(5)
|
||||
# Test for urls not allowed to archive by robot.txt.
|
||||
with pytest.raises(Exception):
|
||||
url3 = "http://www.archive.is/faq.html"
|
||||
target = waybackpy.Url(
|
||||
url3,
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:25.0) "
|
||||
"Gecko/20100101 Firefox/25.0",
|
||||
)
|
||||
target.save()
|
||||
|
||||
time.sleep(5)
|
||||
# Test for urls not allowed to archive by robot.txt. Doesn't works anymore. Find alternatives.
|
||||
# with pytest.raises(Exception):
|
||||
# url3 = "http://www.archive.is/faq.html"
|
||||
# target = waybackpy.Url(
|
||||
# url3,
|
||||
# "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:25.0) "
|
||||
# "Gecko/20100101 Firefox/25.0",
|
||||
# )
|
||||
# target.save()
|
||||
# time.sleep(5)
|
||||
# Non existent urls, test
|
||||
with pytest.raises(Exception):
|
||||
url4 = (
|
||||
|
@ -3,7 +3,7 @@
|
||||
__title__ = "waybackpy"
|
||||
__description__ = "A Python library that interfaces with the Internet Archive's Wayback Machine API. Archive pages and retrieve archived pages easily."
|
||||
__url__ = "https://akamhy.github.io/waybackpy/"
|
||||
__version__ = "2.1.6"
|
||||
__version__ = "2.1.7"
|
||||
__author__ = "akamhy"
|
||||
__author_email__ = "akash3pro@gmail.com"
|
||||
__license__ = "MIT"
|
||||
|
@ -19,12 +19,18 @@ default_UA = "waybackpy python package - https://github.com/akamhy/waybackpy"
|
||||
def _archive_url_parser(header):
|
||||
"""Parse out the archive from header."""
|
||||
# Regex1
|
||||
arch = re.search(
|
||||
r"Content-Location: (/web/[0-9]{14}/.*)", str(header)
|
||||
)
|
||||
if arch:
|
||||
return "web.archive.org" + arch.group(1)
|
||||
# Regex2
|
||||
arch = re.search(
|
||||
r"rel=\"memento.*?(web\.archive\.org/web/[0-9]{14}/.*?)>", str(header)
|
||||
)
|
||||
if arch:
|
||||
return arch.group(1)
|
||||
# Regex2
|
||||
# Regex3
|
||||
arch = re.search(r"X-Cache-Key:\shttps(.*)[A-Z]{2}", str(header))
|
||||
if arch:
|
||||
return arch.group(1)
|
||||
@ -134,7 +140,7 @@ class Url:
|
||||
data = json.loads(response.read().decode("UTF-8"))
|
||||
if not data["archived_snapshots"]:
|
||||
raise WaybackError(
|
||||
"'%s' is not yet archived. Use wayback.Url(url, user_agent).save() "
|
||||
"Can not find archive for '%s' try later or use wayback.Url(url, user_agent).save() "
|
||||
"to create a new archive." % self._clean_url()
|
||||
)
|
||||
archive_url = data["archived_snapshots"]["closest"]["url"]
|
||||
|
Reference in New Issue
Block a user