improve save method, now we know that 302 errors indicates that wayback machine is archiving the URL and hasn't yet archived. We construct an artifical archive with the current UTC time and check for HTTP status code 20* or 30*. If we verify the archival, we return the artifical archive. The artificial archive will automatically point to the new archive or in best case will be the new archive after some time.

This commit is contained in:
Akash Mahanty
2021-01-16 10:47:43 +05:30
parent 0725163af8
commit d549d31421
2 changed files with 71 additions and 4 deletions

View File

@@ -139,13 +139,26 @@ class Url:
"""
request_url = "https://web.archive.org/save/" + _cleaned_url(self.url)
headers = {"User-Agent": self.user_agent}
response = _get_response(
request_url, params=None, headers=headers, backoff_factor=2
request_url,
params=None,
headers=headers,
backoff_factor=2,
no_raise_on_redirects=True,
)
if not self.latest_version:
self.latest_version = _latest_version("waybackpy", headers=headers)
if response:
res_headers = response.headers
else:
res_headers = "save redirected"
self._archive_url = "https://" + _archive_url_parser(
response.headers, self.url, self.latest_version
res_headers,
self.url,
latest_version=self.latest_version,
instance=self,
)
self.timestamp = datetime.utcnow()
return self