Prepare the presentation #35
Labels
No Label
bug
captions
discussion
enhancement
epic
high priority
low priority
medium
medium priority
quick
security
waiting presentation
website
youtube-operational-api
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Benjamin_Loison/YouTube_captions_search_engine#35
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
See below for the project description PDF summary.
Source: email of 14/02/23.
(repeated in the email)
Source: https://moodle.r2.enst.fr/moodle/pluginfile.php/39334/mod_resource/content/4/project.pdf
Related to #19.
Also verify algorithm quality by checking
debug/*.err
content:The current only video
QAlRgdhz6sU
which returns:doesn't have captions. Maybe such inappropriate videos for some users may be enforced not to have captions.
Have:
For video
edF4q1DxcEU
as it mentions chapter timestamps that are later than the video duration.Even if there is this warning, we retrieve correctly the captions of this video.
verifyStartTime.py
:No problem for
As
00eyFxG1kFs
has captions and we retrieve them correctly.Note that this seems to be a solved issue from yt-dlp.
Found a video id with captions thanks to:
getVideoIdHavingCaptions.py
:Using:
Also got:
But the only video currently suffering this problem doesn't have captions.
The following error seems to be on yt-dlp end, I should investigate that.
In fact it's an already reported bug from yt-dlp that was reported very recently.
However even by compiling the latest version of yt-dlp, I'm unable to download captions with:
It's unclear if with latest commit we are supposed to have the above command still working. I'm subscribed to GitHub yt-dlp releases in case of a new one.
I should also pay attention in a day to https://github.com/ytdl-patched/yt-dlp/releases
I may have to restart the algorithm, maybe at least verifying the correct download of all treated videos seem to make sense.
It may be an error on my end as:
works fine.
Cf commit
78b2bf18fa
for the patch. However I won't try to filter a channel basis which one was correctly treated, as even Cyprien that is the second most subscribed French YouTube channel wasn't treated correctly.I verified the correct captions download with following algorithm:
verifyDownloadedAllCaptions.py
:Note that this algorithm can't be used for its purpose, as for instance
LHu-CJbPyCo
doesn't have captions on the YouTube UI but according to the API have some.Also got:
But the retrieved captions look correct and are written in the correct format, as checked with the following algorithm.
verifyCaptionsNotDamaged.py
:Also got:
But the captions downloaded of
YabFeyjN47Y
was completely successful.Also got:
But as
0Y680gpug9g
doesn't have comments and we can only know if it has captions by paying, we won't pay to know that.Also got:
The premiere
bxulItfpOuI
should have started a while ago...Also got:
But I have deleted the associated retrieved data, so I added them to my
channels.txt
to verify once it'll be treated.I stopped giving explanations when we retrieve successfully the captions anyway.
Concerning
channels/
due to crashes during the unstable process at the time of the process, using:verifies that there isn't any error with the archives.
To verify that the starting set was treated:
isStartingSetTreated.py
:To verify the correct format of
channels.txt
, as I randos2unix
on it while the algorithm was running:verifyChannels.py
:In fact it seems that
dos2unix
was using writing to another temporary filed2utmpfh4L3d
.Use the:
to prepare the presentation.
The statistics of the presentation were generated with:
presentationStats.py
: