Compare commits

...

6 Commits

Author SHA1 Message Date
3b3e78d901 Before and After methods (#175)
* Added before and after functions

* add tests

* formatting
2022-11-17 07:58:46 +05:30
0202efd39d Add Python 3.11 to setup.cfg classifiers list (#179) 2022-11-17 07:56:19 +05:30
25c0adacb0 create CONTRIBUTING.md 2022-03-29 11:30:43 +05:30
5bd16a42e7 lint 2022-03-29 10:42:57 +05:30
57f4be53d5 ignore line 474 beacause 'error: <nothing> not callable'. 2022-03-29 10:24:55 +05:30
64a4ce88af Minor copyediting and also deleted CONTRIBUTORS.md moved content to README.md 2022-03-29 03:39:50 +05:30
7 changed files with 194 additions and 22 deletions

54
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,54 @@
# Welcome to waybackpy contributing guide
## Getting started
Read our [Code of Conduct](./CODE_OF_CONDUCT.md).
## Creating an issue
It's a good idea to open an issue and discuss suspected bugs and new feature ideas with the maintainers. Somebody might be working on your bug/idea and it would be best to discuss it to avoid wasting your time. It is a recommendation. You may avoid creating an issue and directly open pull requests.
## Fork this repository
Fork this repository. See '[Fork a repo](https://docs.github.com/en/get-started/quickstart/fork-a-repo)' for help forking this repository on GitHub.
## Make changes to the forked copy
Make the required changes to your forked copy of waybackpy, please don't forget to add or update comments and docstrings.
## Add tests for your changes
You have made the required changes to the codebase, now go ahead and add tests for newly written methods/functions and update the tests of code that you changed.
## Testing and Linting
You must run the tests and linter on your changes before opening a pull request.
### pytest
Runs all test from tests directory. pytest is a mature full-featured Python testing tool.
```bash
pytest
```
### mypy
Mypy is a static type checker for Python. Type checkers help ensure that you're using variables and functions in your code correctly.
```bash
mypy -p waybackpy -p tests
```
### black
After testing with pytest and type checking with mypy run black on the code base. The codestyle used by the project is 'black'.
```bash
black .
```
## Create a pull request
Read [Creating a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request).
Try to make sure that all automated tests are passing, and if some of them do not pass then don't worry. Tests are meant to catch bugs and a failed test is better than introducing bugs to the master branch.

View File

@ -1,16 +0,0 @@
# CONTRIBUTORS
## AUTHORS
- akamhy (<https://github.com/akamhy>)
- eggplants (<https://github.com/eggplants>)
- danvalen1 (<https://github.com/danvalen1>)
- AntiCompositeNumber (<https://github.com/AntiCompositeNumber>)
- rafaelrdealmeida (<https://github.com/rafaelrdealmeida>)
- jonasjancarik (<https://github.com/jonasjancarik>)
- jfinkhaeuser (<https://github.com/jfinkhaeuser>)
## ACKNOWLEDGEMENTS
- mhmdiaa (<https://github.com/mhmdiaa>) for <https://gist.github.com/mhmdiaa/adf6bff70142e5091792841d4b372050>. known_urls is based on this gist.
- dequeued0 (<https://github.com/dequeued0>) for reporting bugs and useful feature requests.

View File

@ -3,7 +3,7 @@
<img src="https://raw.githubusercontent.com/akamhy/waybackpy/master/assets/waybackpy_logo.svg"><br>
<h3>A Python package & CLI tool that interfaces with the Wayback Machine API</h3>
<h3>Python package & CLI tool that interfaces the Wayback Machine APIs</h3>
</div>
@ -24,7 +24,7 @@
Waybackpy is a Python package and a CLI tool that interfaces with the Wayback Machine APIs.
Wayback Machine has 3 client side APIs.
Internet Archive's Wayback Machine has 3 useful public APIs.
- SavePageNow or Save API
- CDX Server API
@ -37,7 +37,7 @@ These three APIs can be accessed via the waybackpy either by importing it from a
**Using [pip](https://en.wikipedia.org/wiki/Pip_(package_manager)), from [PyPI](https://pypi.org/) (recommended)**:
```bash
pip install waybackpy
pip install waybackpy -U
```
**Using [conda](https://en.wikipedia.org/wiki/Conda_(package_manager)), from [conda-forge](https://anaconda.org/conda-forge/waybackpy) (recommended)**:
@ -143,7 +143,7 @@ com,google)/ 20101010101435 http://google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5W
com,google)/ 20101010101435 http://google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5WLLWU5JKUVNSFZ 391
>>> near.archive_url
'https://web.archive.org/web/20101010101435/http://google.com/'
>>>
>>>
```
##### snapshots
```python
@ -165,7 +165,7 @@ https://web.archive.org/web/20171206002737/http://pypi.org:80/
#### Availability API
It is recommended to not use the availability API due to performance issues. All the methods of availability API interface class, `WaybackMachineAvailabilityAPI`, are also implemented in the CDX server API interface class, `WaybackMachineCDXServerAPI`. Also note
It is recommended to not use the availability API due to performance issues. All the methods of availability API interface class, `WaybackMachineAvailabilityAPI`, are also implemented in the CDX server API interface class, `WaybackMachineCDXServerAPI`. Also note
that the `newest()` method of `WaybackMachineAvailabilityAPI` can be more recent than `WaybackMachineCDXServerAPI`'s same method.
```python
@ -203,4 +203,19 @@ Demo video on [asciinema.org](https://asciinema.org/a/469890), you can copy the
> CLI documentation is at <https://github.com/akamhy/waybackpy/wiki/CLI-docs>.
## CONTRIBUTORS
### AUTHORS
- akamhy (<https://github.com/akamhy>)
- eggplants (<https://github.com/eggplants>)
- danvalen1 (<https://github.com/danvalen1>)
- AntiCompositeNumber (<https://github.com/AntiCompositeNumber>)
- rafaelrdealmeida (<https://github.com/rafaelrdealmeida>)
- jonasjancarik (<https://github.com/jonasjancarik>)
- jfinkhaeuser (<https://github.com/jfinkhaeuser>)
### ACKNOWLEDGEMENTS
- mhmdiaa (<https://github.com/mhmdiaa>) `--known-urls` is based on [this](https://gist.github.com/mhmdiaa/adf6bff70142e5091792841d4b372050) gist.
- dequeued0 (<https://github.com/dequeued0>) for reporting bugs and useful feature requests.

View File

@ -37,6 +37,7 @@ classifiers =
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Programming Language :: Python :: 3.10
Programming Language :: Python :: 3.11
Programming Language :: Python :: Implementation :: CPython
[options]

View File

@ -176,3 +176,39 @@ def test_near() -> None:
filters=["statuscode:200"],
)
cdx.near(unix_timestamp=1286705410)
def test_before() -> None:
user_agent = (
"Mozilla/5.0 (MacBook Air; M1 Mac OS X 11_4) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/604.1"
)
cdx = WaybackMachineCDXServerAPI(
url="http://www.google.com/",
user_agent=user_agent,
filters=["statuscode:200"],
)
before = cdx.before(wayback_machine_timestamp=20160731235949)
assert "20160731233347" in before.timestamp
assert "google" in before.urlkey
assert before.original.find("google.com") != -1
assert before.archive_url.find("google.com") != -1
def test_after() -> None:
user_agent = (
"Mozilla/5.0 (MacBook Air; M1 Mac OS X 11_4) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/604.1"
)
cdx = WaybackMachineCDXServerAPI(
url="http://www.google.com/",
user_agent=user_agent,
filters=["statuscode:200"],
)
after = cdx.after(wayback_machine_timestamp=20160731235949)
assert "20160801000917" in after.timestamp, after.timestamp
assert "google" in after.urlkey
assert after.original.find("google.com") != -1
assert after.archive_url.find("google.com") != -1

View File

@ -191,6 +191,88 @@ class WaybackMachineCDXServerAPI:
payload["url"] = self.url
def before(
self,
year: Optional[int] = None,
month: Optional[int] = None,
day: Optional[int] = None,
hour: Optional[int] = None,
minute: Optional[int] = None,
unix_timestamp: Optional[int] = None,
wayback_machine_timestamp: Optional[Union[int, str]] = None,
) -> CDXSnapshot:
"""
Gets the nearest archive before the given datetime.
"""
if unix_timestamp:
timestamp = unix_timestamp_to_wayback_timestamp(unix_timestamp)
elif wayback_machine_timestamp:
timestamp = str(wayback_machine_timestamp)
else:
now = datetime.utcnow().timetuple()
timestamp = wayback_timestamp(
year=now.tm_year if year is None else year,
month=now.tm_mon if month is None else month,
day=now.tm_mday if day is None else day,
hour=now.tm_hour if hour is None else hour,
minute=now.tm_min if minute is None else minute,
)
self.closest = timestamp
self.sort = "closest"
self.limit = 25000
for snapshot in self.snapshots():
if snapshot.timestamp < timestamp:
return snapshot
# If a snapshot isn't returned, then none were found.
raise NoCDXRecordFound(
"No records were found before the given date for the query."
+ "Either there are no archives before the given date,"
+ " the URL may not have any archived, or the URL may have been"
+ " recently archived and is still not available on the CDX server."
)
def after(
self,
year: Optional[int] = None,
month: Optional[int] = None,
day: Optional[int] = None,
hour: Optional[int] = None,
minute: Optional[int] = None,
unix_timestamp: Optional[int] = None,
wayback_machine_timestamp: Optional[Union[int, str]] = None,
) -> CDXSnapshot:
"""
Gets the nearest archive after the given datetime.
"""
if unix_timestamp:
timestamp = unix_timestamp_to_wayback_timestamp(unix_timestamp)
elif wayback_machine_timestamp:
timestamp = str(wayback_machine_timestamp)
else:
now = datetime.utcnow().timetuple()
timestamp = wayback_timestamp(
year=now.tm_year if year is None else year,
month=now.tm_mon if month is None else month,
day=now.tm_mday if day is None else day,
hour=now.tm_hour if hour is None else hour,
minute=now.tm_min if minute is None else minute,
)
self.closest = timestamp
self.sort = "closest"
self.limit = 25000
for snapshot in self.snapshots():
if snapshot.timestamp > timestamp:
return snapshot
# If a snapshot isn't returned, then none were found.
raise NoCDXRecordFound(
"No records were found after the given date for the query."
+ "Either there are no archives after the given date,"
+ " the URL may not have any archives, or the URL may have been"
+ " recently archived and is still not available on the CDX server."
)
def near(
self,
year: Optional[int] = None,

View File

@ -471,4 +471,4 @@ def main( # pylint: disable=no-value-for-parameter
if __name__ == "__main__":
main() # pylint: disable=no-value-for-parameter
main() # type: ignore # pylint: disable=no-value-for-parameter