Compare commits

...

8 Commits
2.1.6 ... 2.1.7

Author SHA1 Message Date
57a32669b5 v2.1.7 2020-08-09 11:06:29 +05:30
fe017cbcc8 v2.1.7 2020-08-09 11:06:04 +05:30
5edb03d24b update docs 2020-08-09 11:05:04 +05:30
c5de2232ba Update test_wrapper.py 2020-08-09 10:53:00 +05:30
ca9186c301 update message, sometimes raised for poor performance by wayback machine even if the url is archived. 2020-08-09 10:43:16 +05:30
8a4b631c13 new regex to parse archive, IA changed the header again :( 2020-08-09 10:36:25 +05:30
ec9ce92f48 Update README.md (#23)
* Update README.md

* fix grammar
2020-07-26 10:30:54 +05:30
e95d35c37f re arrange the badges, moved contributions welcome to top 2020-07-26 10:24:31 +05:30
6 changed files with 49 additions and 43 deletions

View File

@ -1,19 +1,19 @@
# waybackpy
![contributions welcome](https://img.shields.io/static/v1.svg?label=Contributions&message=Welcome&color=0059b3&style=flat-square)
[![Build Status](https://img.shields.io/travis/akamhy/waybackpy.svg?label=Travis%20CI&logo=travis&style=flat-square)](https://travis-ci.org/akamhy/waybackpy)
[![Downloads](https://img.shields.io/pypi/dm/waybackpy.svg)](https://pypistats.org/packages/waybackpy)
[![codecov](https://codecov.io/gh/akamhy/waybackpy/branch/master/graph/badge.svg)](https://codecov.io/gh/akamhy/waybackpy)
[![Downloads](https://pepy.tech/badge/waybackpy/month)](https://pepy.tech/project/waybackpy/month)
[![Release](https://img.shields.io/github/v/release/akamhy/waybackpy.svg)](https://github.com/akamhy/waybackpy/releases)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/255459cede9341e39436ec8866d3fb65)](https://www.codacy.com/manual/akamhy/waybackpy?utm_source=github.com&utm_medium=referral&utm_content=akamhy/waybackpy&utm_campaign=Badge_Grade)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/akamhy/waybackpy/blob/master/LICENSE)
[![Maintainability](https://api.codeclimate.com/v1/badges/942f13d8177a56c1c906/maintainability)](https://codeclimate.com/github/akamhy/waybackpy/maintainability)
[![CodeFactor](https://www.codefactor.io/repository/github/akamhy/waybackpy/badge)](https://www.codefactor.io/repository/github/akamhy/waybackpy)
[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/)
![pypi](https://img.shields.io/pypi/v/waybackpy.svg)
[![pypi](https://img.shields.io/pypi/v/waybackpy.svg)](https://pypi.org/project/waybackpy/)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/waybackpy?style=flat-square)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/akamhy/waybackpy/graphs/commit-activity)
[![codecov](https://codecov.io/gh/akamhy/waybackpy/branch/master/graph/badge.svg)](https://codecov.io/gh/akamhy/waybackpy)
![](https://img.shields.io/github/repo-size/akamhy/waybackpy.svg?label=Repo%20size&style=flat-square)
![contributions welcome](https://img.shields.io/static/v1.svg?label=Contributions&message=Welcome&color=0059b3&style=flat-square)
![Repo size](https://img.shields.io/github/repo-size/akamhy/waybackpy.svg?label=Repo%20size&style=flat-square)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/akamhy/waybackpy/blob/master/LICENSE)
![Internet Archive](https://upload.wikimedia.org/wikipedia/commons/thumb/8/84/Internet_Archive_logo_and_wordmark.svg/84px-Internet_Archive_logo_and_wordmark.svg.png)
@ -28,14 +28,14 @@ Table of contents
* [Installation](#installation)
* [Usage](#usage)
* [As a python package](#as-a-python-package)
* [As a Python package](#as-a-python-package)
* [Saving an url using save()](#capturing-aka-saving-an-url-using-save)
* [Receiving the oldest archive for an URL Using oldest()](#receiving-the-oldest-archive-for-an-url-using-oldest)
* [Receiving the recent most/newest archive for an URL using newest()](#receiving-the-newest-archive-for-an-url-using-newest)
* [Receiving archive close to a specified year, month, day, hour, and minute using near()](#receiving-archive-close-to-a-specified-year-month-day-hour-and-minute-using-near)
* [Get the content of webpage using get()](#get-the-content-of-webpage-using-get)
* [Count total archives for an URL using total_archives()](#count-total-archives-for-an-url-using-total_archives)
* [With CLI](#with-the-cli)
* [With Command-line interface](#with-the-command-line-interface)
* [Save](#save)
* [Oldest archive](#oldest-archive)
* [Newest archive](#newest-archive)
@ -63,7 +63,7 @@ pip install git+https://github.com/akamhy/waybackpy.git
## Usage
### As a python package
### As a Python package
#### Capturing aka Saving an url using save()
```python
@ -230,7 +230,7 @@ print(archive_count) # total_archives() returns an int
```
<sub>Try this out in your browser @ <https://repl.it/@akamhy/WaybackPyTotalArchivesExample></sub>
### With the CLI
### With the Command-line interface
#### Save
```bash

View File

@ -1,9 +1,9 @@
waybackpy
=========
|Build Status| |Downloads| |Release| |Codacy Badge| |License: MIT|
|Maintainability| |CodeFactor| |made-with-python| |pypi| |PyPI - Python
Version| |Maintenance| |codecov| |image12| |contributions welcome|
|contributions welcome| |Build Status| |codecov| |Downloads| |Release|
|Codacy Badge| |Maintainability| |CodeFactor| |made-with-python| |pypi|
|PyPI - Python Version| |Maintenance| |Repo size| |License: MIT|
|Internet Archive| |Wayback Machine|
@ -22,7 +22,7 @@ Table of contents
- `Installation <#installation>`__
- `Usage <#usage>`__
- `As a python package <#as-a-python-package>`__
- `As a Python package <#as-a-python-package>`__
- `Saving an url using
save() <#capturing-aka-saving-an-url-using-save>`__
@ -38,7 +38,7 @@ Table of contents
- `Count total archives for an URL using
total\_archives() <#count-total-archives-for-an-url-using-total_archives>`__
- `With CLI <#with-the-cli>`__
- `With Command-line interface <#with-the-command-line-interface>`__
- `Save <#save>`__
- `Oldest archive <#oldest-archive>`__
@ -75,7 +75,7 @@ or direct from this repository using git.
Usage
-----
As a python package
As a Python package
~~~~~~~~~~~~~~~~~~~
Capturing aka Saving an url using save()
@ -269,8 +269,8 @@ Count total archives for an URL using total\_archives()
Try this out in your browser @
https://repl.it/@akamhy/WaybackPyTotalArchivesExample\
With the CLI
~~~~~~~~~~~~
With the Command-line interface
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Save
^^^^
@ -348,8 +348,8 @@ Tests
Dependency
----------
- None, just python standard libraries (re, json, urllib, argparse and datetime).
Both python 2 and 3 are supported :)
- None, just python standard libraries (re, json, urllib, argparse and
datetime). Both python 2 and 3 are supported :)
License
-------
@ -357,16 +357,17 @@ License
`MIT
License <https://github.com/akamhy/waybackpy/blob/master/LICENSE>`__
.. |contributions welcome| image:: https://img.shields.io/static/v1.svg?label=Contributions&message=Welcome&color=0059b3&style=flat-square
.. |Build Status| image:: https://img.shields.io/travis/akamhy/waybackpy.svg?label=Travis%20CI&logo=travis&style=flat-square
:target: https://travis-ci.org/akamhy/waybackpy
.. |Downloads| image:: https://img.shields.io/pypi/dm/waybackpy.svg
:target: https://pypistats.org/packages/waybackpy
.. |codecov| image:: https://codecov.io/gh/akamhy/waybackpy/branch/master/graph/badge.svg
:target: https://codecov.io/gh/akamhy/waybackpy
.. |Downloads| image:: https://pepy.tech/badge/waybackpy/month
:target: https://pepy.tech/project/waybackpy/month
.. |Release| image:: https://img.shields.io/github/v/release/akamhy/waybackpy.svg
:target: https://github.com/akamhy/waybackpy/releases
.. |Codacy Badge| image:: https://api.codacy.com/project/badge/Grade/255459cede9341e39436ec8866d3fb65
:target: https://www.codacy.com/manual/akamhy/waybackpy?utm_source=github.com&utm_medium=referral&utm_content=akamhy/waybackpy&utm_campaign=Badge_Grade
.. |License: MIT| image:: https://img.shields.io/badge/License-MIT-yellow.svg
:target: https://github.com/akamhy/waybackpy/blob/master/LICENSE
.. |Maintainability| image:: https://api.codeclimate.com/v1/badges/942f13d8177a56c1c906/maintainability
:target: https://codeclimate.com/github/akamhy/waybackpy/maintainability
.. |CodeFactor| image:: https://www.codefactor.io/repository/github/akamhy/waybackpy/badge
@ -374,12 +375,12 @@ License <https://github.com/akamhy/waybackpy/blob/master/LICENSE>`__
.. |made-with-python| image:: https://img.shields.io/badge/Made%20with-Python-1f425f.svg
:target: https://www.python.org/
.. |pypi| image:: https://img.shields.io/pypi/v/waybackpy.svg
:target: https://pypi.org/project/waybackpy/
.. |PyPI - Python Version| image:: https://img.shields.io/pypi/pyversions/waybackpy?style=flat-square
.. |Maintenance| image:: https://img.shields.io/badge/Maintained%3F-yes-green.svg
:target: https://github.com/akamhy/waybackpy/graphs/commit-activity
.. |codecov| image:: https://codecov.io/gh/akamhy/waybackpy/branch/master/graph/badge.svg
:target: https://codecov.io/gh/akamhy/waybackpy
.. |image12| image:: https://img.shields.io/github/repo-size/akamhy/waybackpy.svg?label=Repo%20size&style=flat-square
.. |contributions welcome| image:: https://img.shields.io/static/v1.svg?label=Contributions&message=Welcome&color=0059b3&style=flat-square
.. |Repo size| image:: https://img.shields.io/github/repo-size/akamhy/waybackpy.svg?label=Repo%20size&style=flat-square
.. |License: MIT| image:: https://img.shields.io/badge/License-MIT-yellow.svg
:target: https://github.com/akamhy/waybackpy/blob/master/LICENSE
.. |Internet Archive| image:: https://upload.wikimedia.org/wikipedia/commons/thumb/8/84/Internet_Archive_logo_and_wordmark.svg/84px-Internet_Archive_logo_and_wordmark.svg.png
.. |Wayback Machine| image:: https://upload.wikimedia.org/wikipedia/commons/thumb/0/01/Wayback_Machine_logo_2010.svg/284px-Wayback_Machine_logo_2010.svg.png

View File

@ -19,7 +19,7 @@ setup(
author = about['__author__'],
author_email = about['__author_email__'],
url = about['__url__'],
download_url = 'https://github.com/akamhy/waybackpy/archive/2.1.6.tar.gz',
download_url = 'https://github.com/akamhy/waybackpy/archive/2.1.7.tar.gz',
keywords = ['wayback', 'archive', 'archive website', 'wayback machine', 'Internet Archive'],
install_requires=[],
python_requires= ">=2.7",

View File

@ -74,17 +74,16 @@ def test_save():
url2 = "ha ha ha ha"
waybackpy.Url(url2, user_agent)
time.sleep(5)
# Test for urls not allowed to archive by robot.txt.
with pytest.raises(Exception):
url3 = "http://www.archive.is/faq.html"
target = waybackpy.Url(
url3,
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:25.0) "
"Gecko/20100101 Firefox/25.0",
)
target.save()
time.sleep(5)
# Test for urls not allowed to archive by robot.txt. Doesn't works anymore. Find alternatives.
# with pytest.raises(Exception):
# url3 = "http://www.archive.is/faq.html"
# target = waybackpy.Url(
# url3,
# "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:25.0) "
# "Gecko/20100101 Firefox/25.0",
# )
# target.save()
# time.sleep(5)
# Non existent urls, test
with pytest.raises(Exception):
url4 = (

View File

@ -3,7 +3,7 @@
__title__ = "waybackpy"
__description__ = "A Python library that interfaces with the Internet Archive's Wayback Machine API. Archive pages and retrieve archived pages easily."
__url__ = "https://akamhy.github.io/waybackpy/"
__version__ = "2.1.6"
__version__ = "2.1.7"
__author__ = "akamhy"
__author_email__ = "akash3pro@gmail.com"
__license__ = "MIT"

View File

@ -19,12 +19,18 @@ default_UA = "waybackpy python package - https://github.com/akamhy/waybackpy"
def _archive_url_parser(header):
"""Parse out the archive from header."""
# Regex1
arch = re.search(
r"Content-Location: (/web/[0-9]{14}/.*)", str(header)
)
if arch:
return "web.archive.org" + arch.group(1)
# Regex2
arch = re.search(
r"rel=\"memento.*?(web\.archive\.org/web/[0-9]{14}/.*?)>", str(header)
)
if arch:
return arch.group(1)
# Regex2
# Regex3
arch = re.search(r"X-Cache-Key:\shttps(.*)[A-Z]{2}", str(header))
if arch:
return arch.group(1)
@ -134,7 +140,7 @@ class Url:
data = json.loads(response.read().decode("UTF-8"))
if not data["archived_snapshots"]:
raise WaybackError(
"'%s' is not yet archived. Use wayback.Url(url, user_agent).save() "
"Can not find archive for '%s' try later or use wayback.Url(url, user_agent).save() "
"to create a new archive." % self._clean_url()
)
archive_url = data["archived_snapshots"]["closest"]["url"]