Make a website with a search engine notably based on the captions extracted #31
Notifications
Due Date
No due date set.
Blocks
#25 Make a not pre-release release
Benjamin_Loison/YouTube_captions_search_engine
Reference: Benjamin_Loison/YouTube_captions_search_engine#31
Loading…
x
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Could propose two options:
as an archive with matched files, make sure to remove/censor API keys - currently doing so disable the ability to download whole archives)Could also add
channels.txt
(done) and logs retrieval (make sure that there isn't anything secret in the latter, it's not the case cf at least 1. and 2., it's not much a problem as it's a small file that we can post-process, I am waiting to have one consequent to verify experimentally that only these two occurrences are problematic).Introduce https://crawler.yt.lemnoslife.com for this purpose.
The plan is to use WebSocket, as it's perfectly adapted and compatible.
Benjamin_Loison referenced this issue2023-02-10 20:21:36 +01:00
Should add
vtt
parsing to not be limited to line wrapping.Should add an option to only search through captions.
Should also update
findLatestTreatedCommentsForChannelsBeingTreated.py
with all features to better evaluate algorithm progress.Working at PHP level wasn't making much sense:
Could add later on a link directly to the YouTube video timestamp, however as we aren't limited to line wrapping, it's not easy to implement efficiently this feature.
We could list all occurrences within a video.
Note that maybe the returned match timestamps aren't as precise as we can (maybe it returns the previous beginning timestamp caption for instance). This should be ideally investigated.