35 lines
1.6 KiB
Markdown
35 lines
1.6 KiB
Markdown
# The algorithm:
|
|
|
|
To retrieve the most YouTube video ids in order to retrieve the most video captions, we need to retrieve the most YouTube channels.
|
|
So to discover the YouTube channels graph with a breadth-first search, we proceed as follows:
|
|
1. Provide a starting set of channels.
|
|
2. Given a channel, retrieve other channels thanks to its content by using [YouTube Data API v3](https://developers.google.com/youtube/v3) and [YouTube operational API](https://github.com/Benjamin-Loison/YouTube-operational-API) and then repeat 1. for each retrieved channel.
|
|
|
|
A ready to be used by the end-user website instance of this project is hosted at: https://crawler.yt.lemnoslife.com
|
|
|
|
See more details on [the Wiki](https://gitea.lemnoslife.com/Benjamin_Loison/YouTube_captions_search_engine/wiki).
|
|
|
|
# Running the algorithm:
|
|
|
|
Because of [the current compression mechanism](https://gitea.lemnoslife.com/Benjamin_Loison/YouTube_captions_search_engine/issues/30), Linux is the only known OS able to run this algorithm.
|
|
|
|
```sh
|
|
sudo apt install nlohmann-json3-dev yt-dlp
|
|
make
|
|
./youtubeCaptionsSearchEngine -h
|
|
```
|
|
|
|
If you plan to use the front-end website, also run:
|
|
|
|
```sh
|
|
pip install webvtt-py
|
|
```
|
|
|
|
Except if you provide the argument `--youtube-operational-api-instance-url https://yt.lemnoslife.com`, you have [to host your own instance of the YouTube operational API](https://github.com/Benjamin-Loison/YouTube-operational-API/#install-your-own-instance-of-the-api).
|
|
|
|
Except if you provide the argument `--no-keys`, you have to provide at least one [YouTube Data API v3 key](https://developers.google.com/youtube/v3/getting-started) in `keys.txt`.
|
|
|
|
```sh
|
|
./youtubeCaptionsSearchEngine
|
|
```
|