Update setup.py

2020-05-05 10:50:09 +05:30 · 2020-05-05 10:40:23 +05:30 · 2020-05-05 10:39:10 +05:30 · 2020-05-05 10:36:05 +05:30 · 2020-05-05 10:32:06 +05:30 · 2020-05-05 10:23:38 +05:30
7 changed files with 223 additions and 16 deletions
--- a/.whitesource
+++ b/.whitesource
@ -0,0 +1,8 @@
+{
+  "checkRunSettings": {
+    "vulnerableCheckRunConclusionLevel": "failure"
+  },
+  "issueSettings": {
+    "minSeverityLevel": "LOW"
+  }
+}
--- a/README.md
+++ b/README.md
@ -1,2 +1,147 @@
-# pywayback
-A python wrapper for Internet Archive's Wayback Machine
+# waybackpy
+[![Codacy Badge](https://api.codacy.com/project/badge/Grade/255459cede9341e39436ec8866d3fb65)](https://www.codacy.com/manual/akamhy/waybackpy?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=akamhy/waybackpy&amp;utm_campaign=Badge_Grade)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/akamhy/waybackpy/blob/master/LICENSE)
+[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/)
+[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/akamhy/waybackpy/graphs/commit-activity)
+
+
+
+![Internet Archive](https://upload.wikimedia.org/wikipedia/commons/thumb/8/84/Internet_Archive_logo_and_wordmark.svg/84px-Internet_Archive_logo_and_wordmark.svg.png)
+![Wayback Machine](https://upload.wikimedia.org/wikipedia/commons/thumb/0/01/Wayback_Machine_logo_2010.svg/284px-Wayback_Machine_logo_2010.svg.png)
+
+The waybackpy is a python wrapper for [Internet Archive](https://en.wikipedia.org/wiki/Internet_Archive)'s [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine).
+
+Table of contents
+=================
+<!--ts-->
+
+* [Installation](#installation)
+
+* [Usage](#usage)
+  * [Saving an url using save()](#capturing-aka-saving-an-url-using-save)
+  * [Receiving the oldest archive for an URL Using oldest()](#receiving-the-oldest-archive-for-an-url-using-oldest)
+  * [Receiving the recent most/newest archive for an URL using newest()](#receiving-the-newest-archive-for-an-url-using-newest)
+  * [Receiving archive close to a specified year, month, day, hour, and minute using near()](#receiving-archive-close-to-a-specified-year-month-day-hour-and-minute-using-near)
+  * [Get the content of webpage using get()](#get-the-content-of-webpage-using-get)
+
+* [Tests](#tests)
+
+* [Dependency](#dependency)
+
+* [License](#license)
+
+<!--te-->
+
+## Installation
+Using [pip](https://en.wikipedia.org/wiki/Pip_(package_manager)):
+
+**pip install waybackpy**
+
+
+
+## Usage
+
+#### Capturing aka Saving an url Using save()
+
+```diff
+ waybackpy.save(url, UA=user_agent)
+```
+> url is mandatory. UA is not, but highly recommended.
+```python
+import waybackpy
+# Capturing a new archive on Wayback machine.
+# Default user-agent (UA) is "waybackpy python package", if not specified in the call.
+archived_url = waybackpy.save("https://github.com/akamhy/waybackpy", UA = "Any-User-Agent")
+print(archived_url)
+```
+This should print something similar to the following archived URL:
+
+<https://web.archive.org/web/20200504141153/https://github.com/akamhy/waybackpy>
+
+#### Receiving the oldest archive for an URL Using oldest()
+
+```diff
+ waybackpy.oldest(url, UA=user_agent)
+```
+> url is mandatory. UA is not, but highly recommended.
+
+
+```python
+import waybackpy
+# retrieving the oldest archive on Wayback machine.
+# Default user-agent (UA) is "waybackpy python package", if not specified in the call.
+oldest_archive = waybackpy.oldest("https://www.google.com/", UA = "Any-User-Agent")
+print(oldest_archive)
+```
+This returns the oldest available archive for <https://google.com>.
+
+<http://web.archive.org/web/19981111184551/http://google.com:80/>
+
+#### Receiving the newest archive for an URL using newest()
+
+```diff
+ waybackpy.newest(url, UA=user_agent)
+```
+> url is mandatory. UA is not, but highly recommended.
+
+
+```python
+import waybackpy
+# retrieving the newest archive on Wayback machine.
+# Default user-agent (UA) is "waybackpy python package", if not specified in the call.
+newest_archive = waybackpy.newest("https://www.microsoft.com/en-us", UA = "Any-User-Agent")
+print(newest_archive)
+```
+This returns the newest available archive for <https://www.microsoft.com/en-us>, something just like this:
+
+<http://web.archive.org/web/20200429033402/https://www.microsoft.com/en-us/>
+
+#### Receiving archive close to a specified year, month, day, hour, and minute using near()
+
+```diff
+ waybackpy.near(url, year=2020, month=1, day=1, hour=1, minute=1, UA=user_agent)
+```
+> url is mandotory. year,month,day,hour and minute are optional arguments. UA is not mandotory, but higly recomended.
+
+
+```python
+import waybackpy
+# retriving the the closest archive from a specified year.
+# Default user-agent (UA) is "waybackpy python package", if not specified in the call.
+# supported argumnets are year,month,day,hour and minute
+archive_near_year = waybackpy.near("https://www.facebook.com/", year=2010, UA ="Any-User-Agent")
+print(archive_near_year)
+```
+returns : <http://web.archive.org/web/20100504071154/http://www.facebook.com/>
+
+```waybackpy.near("https://www.facebook.com/", year=2010, month=1, UA ="Any-User-Agent")``` returns: <http://web.archive.org/web/20101111173430/http://www.facebook.com//>
+
+```waybackpy.near("https://www.oracle.com/index.html", year=2019, month=1, day=5, UA ="Any-User-Agent")``` returns: <http://web.archive.org/web/20190105054437/https://www.oracle.com/index.html>
+> Please note that if you only specify the year, the current month and day are default arguments for month and day respectively. Do not expect just putting the year parameter would return the archive closer to January but the current month you are using the package. If you are using it in July 2018 and let's say you use ```waybackpy.near("https://www.facebook.com/", year=2011, UA ="Any-User-Agent")``` then you would be returned the nearest archive to July 2011 and not January 2011. You need to specify the month "1" for January.
+
+> Do not pad (don't use zeros in the month, year, day, minute, and hour arguments). e.g. For January, set month = 1 and not month = 01.
+
+#### Get the content of webpage using get()
+
+```diff
+ waybackpy.get(url, encoding="UTF-8", UA=user_agent)
+```
+> url is mandatory. UA is not, but highly recommended. encoding is detected automatically, don't specify unless necessary.
+
+```python
+from waybackpy import get
+# retriving the webpage from any url including the archived urls. Don't need to import other libraies :)
+# Default user-agent (UA) is "waybackpy python package", if not specified in the call.
+# supported argumnets are url, encoding and UA
+webpage = get("https://example.com/", UA="User-Agent")
+print(webpage)
+```
+> This should print the source code for <https://example.com/>.
+
+## Dependency
+* None, just python standard libraries. Both python 2 and 3 are supported :)
+
+
+## License
+
+[MIT License](LICENSE)
--- a/setup.cfg
+++ b/setup.cfg
@ -0,0 +1,2 @@
+[metadata]
+description-file = README.md
--- a/setup.py
+++ b/setup.py
@ -0,0 +1,32 @@
+import os.path
+from setuptools import setup
+
+with open(os.path.join(os.path.dirname(__file__), 'README.md')) as f:
+    long_description = f.read()
+
+setup(
+    name = 'waybackpy',
+    packages = ['waybackpy'],
+    version = 'v1.2',
+    description = 'A python wrapper for Internet Archives Wayback Machine',
+    long_description=long_description,
+    long_description_content_type='text/markdown',
+    license='MIT',
+    author = 'akamhy',
+    author_email = 'akash3pro@gmail.com',
+    url = 'https://github.com/akamhy/waybackpy',
+    download_url = 'https://github.com/akamhy/waybackpy/archive/v1.2.tar.gz',
+    keywords = ['wayback', 'archive', 'archive website', 'wayback machine', 'Internet Archive'],
+    install_requires=[],
+    python_requires='>=3.6',
+    classifiers=[
+        'Development Status :: 5 - Production/Stable',
+        'Intended Audience :: Developers',  
+        'Topic :: Software Development :: Build Tools',
+        'License :: OSI Approved :: MIT License',
+        'Programming Language :: Python :: 3',      
+        'Programming Language :: Python :: 3.4',
+        'Programming Language :: Python :: 3.5',
+        'Programming Language :: Python :: 3.6',
+        ],
+)
--- a/waybackpy/init.py
+++ b/waybackpy/init.py
@ -1,6 +1,6 @@
 # -*- coding: utf-8 -*-
-from .wrapper import save, near, oldest, newest
+from .wrapper import save, near, oldest, newest, get

-__version__ = "1.1"
+__version__ = "v1.2"

 __all__ = ['wrapper', 'exceptions']
--- a/waybackpy/exceptions.py
+++ b/waybackpy/exceptions.py
@ -1,14 +1,14 @@
 # -*- coding: utf-8 -*-

 class TooManyArchivingRequests(Exception):
-    """
-    Error when a single url reqeusted for archiving too many times in a short timespam.
+
+    """Error when a single url reqeusted for archiving too many times in a short timespam.
    Wayback machine doesn't supports archivng any url too many times in a short period of time.
    """

 class ArchivingNotAllowed(Exception):
-    """
-    Files like robots.txt are set to deny robot archiving.
+
+    """Files like robots.txt are set to deny robot archiving.
    Wayback machine respects these file, will not archive.
    """

--- a/waybackpy/wrapper.py
+++ b/waybackpy/wrapper.py
@ -1,6 +1,7 @@
 # -*- coding: utf-8 -*-
+import json
 from datetime import datetime
-from waybackpy.exceptions import *
+from waybackpy.exceptions import TooManyArchivingRequests, ArchivingNotAllowed, PageNotSaved, ArchiveNotFound, UrlNotFound, BadGateWay, InvalidUrl
 try:
    from urllib.request import Request, urlopen
    from urllib.error import HTTPError
@ -16,8 +17,8 @@ def clean_url(url):
 def save(url,UA=default_UA):
    base_save_url = "https://web.archive.org/save/"
    request_url = (base_save_url + clean_url(url))
-    hdr = { 'User-Agent' : '%s' % UA }
-    req = Request(request_url, headers=hdr)
+    hdr = { 'User-Agent' : '%s' % UA } #nosec
+    req = Request(request_url, headers=hdr) #nosec
    if "." not in url:
        raise InvalidUrl("'%s' is not a vaild url." % url)
    try:
@ -39,6 +40,26 @@ def save(url,UA=default_UA):
    archived_url = "https://web.archive.org" + archive_id
    return archived_url

+def get(url,encoding=None,UA=default_UA):
+    hdr = { 'User-Agent' : '%s' % UA }
+    request_url = clean_url(url)
+    req = Request(request_url, headers=hdr) #nosec
+    resp=urlopen(req) #nosec
+    if encoding is None:
+        try:
+            encoding= resp.headers['content-type'].split('charset=')[-1]
+        except AttributeError:
+            encoding = "UTF-8"
+    return resp.read().decode(encoding)
+
+def wayback_timestamp(year,month,day,hour,minute):
+    year = str(year)
+    month = str(month).zfill(2)
+    day = str(day).zfill(2)
+    hour = str(hour).zfill(2)
+    minute = str(minute).zfill(2)
+    return (year+month+day+hour+minute)
+
 def near(
    url,
    year=datetime.utcnow().strftime('%Y'),
@ -48,16 +69,15 @@ def near(
    minute=datetime.utcnow().strftime('%M'),
    UA=default_UA,
    ):
-    timestamp = str(year)+str(month)+str(day)+str(hour)+str(minute)
+    timestamp = wayback_timestamp(year,month,day,hour,minute)
    request_url = "https://archive.org/wayback/available?url=%s&timestamp=%s" % (clean_url(url), str(timestamp))
    hdr = { 'User-Agent' : '%s' % UA }
-    req = Request(request_url, headers=hdr)
+    req = Request(request_url, headers=hdr) # nosec
    response = urlopen(req) #nosec
-    import json
-    data = json.loads(response.read().decode('utf8'))
+    data = json.loads(response.read().decode("UTF-8"))
    if not data["archived_snapshots"]:
        raise ArchiveNotFound("'%s' is not yet archived." % url)
-    
+
    archive_url = (data["archived_snapshots"]["closest"]["url"])
    return archive_url
Author	SHA1	Message	Date
akamhy	d3bd5b05b5	Update setup.py	2020-05-05 10:50:09 +05:30
akamhy	d6598a67b9	Update setup.py	2020-05-05 10:40:23 +05:30
akamhy	e5a6057249	Update setup.py	2020-05-05 10:39:10 +05:30
akamhy	2a1b3bc6ee	Update setup.py	2020-05-05 10:36:05 +05:30
akamhy	b4ca98eca2	Update setup.py	2020-05-05 10:32:06 +05:30
akamhy	36b01754ec	Update setup.py	2020-05-05 10:23:38 +05:30
akamhy	3d8bf4eec6	Update setup.py	2020-05-05 10:22:54 +05:30
akamhy	e7761b3709	Update README.md	2020-05-05 10:21:08 +05:30
akamhy	df851dce0c	Update setup.py	2020-05-05 10:16:15 +05:30
akamhy	f5acbcfc95	Update exceptions.py	2020-05-05 10:07:27 +05:30
akamhy	44156e5e7e	Update exceptions.py	2020-05-05 10:05:47 +05:30
akamhy	a6cb955669	Update wrapper.py	2020-05-05 10:04:40 +05:30
akamhy	8acb14a243	Update wrapper.py	2020-05-05 10:00:29 +05:30
akamhy	7d434c3f0f	Update wrapper.py	2020-05-05 09:57:39 +05:30
akamhy	057c61d677	Update wrapper.py	2020-05-05 09:48:39 +05:30
akamhy	6705c04f38	Update wrapper.py	2020-05-05 09:43:13 +05:30
akamhy	e631c0aadb	Update README.md	2020-05-05 09:37:53 +05:30
akamhy	423782ea75	Update README.md	2020-05-05 09:36:11 +05:30
whitesource-bolt-for-github[bot]	7944f0878d	Add .whitesource configuration file (#6 ) Co-authored-by: whitesource-bolt-for-github[bot] <42819689+whitesource-bolt-for-github[bot]@users.noreply.github.com>	2020-05-05 09:33:50 +05:30
akamhy	850b055527	Update README.md	2020-05-05 09:31:43 +05:30
akamhy	32bc765113	Update README.md (#5 ) * Update README.md * Update README.md * Update README.md	2020-05-05 09:27:02 +05:30
akamhy	09b4ba2649	Version 1.2 with bug fixes and support for webpage retrieval (#4 )	2020-05-05 09:03:16 +05:30
akamhy	929790feca	Update README.md (#1 ) Add usage/ documentaion	2020-05-04 21:06:00 +05:30
akamhy	09a521ae43	Create setup.cfg	2020-05-04 16:23:00 +05:30
akamhy	a503be5a86	Create setup.py	2020-05-04 16:21:24 +05:30