- https://github.com/Benjamin-Loison
- Joined on
2022-10-16
As it's working on a captions file basis it returns two entries which seem the best way to treat this case.
Output for: aquatique le plus haut
- [UCH0XvUpYcxn4V0iZGnZXMnQ.zip](https://crawl…
EXIT_WITH_ERROR
to PRINT
for channels not having an enumerable uploads
playlist
Note that maybe the returned match timestamps aren't as precise as we can (maybe it returns the previous beginning timestamp caption for instance). This should be ideally investigated.
Note that as I'm hosting multiple websites, to guess which website (here the YouTube operational API one) to talk to, I'm using a private sub domain private.sub.domain
. However reaching this…
To verify the correct format of channels.txt
, as I ran dos2unix
on it while the algorithm was running:
verifyChannels.py
:
#!/usr/bin/python3
with open('channels.txt') as f:
…
To verify that the starting set was treated:
isStartingSetTreated.py
:
#!/usr/bin/python3
import os
with open('newChannels.txt') as f:
lines = f.read().splitlines()
for…
Concerning channels/
due to crashes during the unstable process at the time of the process, using:
find -name '*.zip' -exec unzip -t {} \;
Will publish such a release after having treated all the channels I provided it initially.
Also verifying quality by verifying debug/*.err
content:
cat *.err
Closed in favor of #11.
With my 604G free storage, it's enough to already have a nice dataset that would need 604 / 14 = 43 days to be filled.
Note that the main problem might be to have multiple YouTube operational API running on multiple IPs cf [this Git issue comment](https://github.com/Benjamin-Loison/YouTube-operational-API/issues/11…