YouTube_Audio_library_extra.../media_files_extractor.py

import undetected_chromedriver.v2 as uc
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import json

"""
As there is something looking as an anti-bot for downloading media files, we use a Selenium-based approach.
"""

options = Options()
options.add_argument("--user-data-dir=selenium")
browser = uc.Chrome(options=options)
browser.get('https://studio.youtube.com/channel/UC/music')

"""
For `Music` tab, YouTube UI returns 3,000 entries while my reverse-engineering approach returns 5,819 entries.
For `Sound effects` tab, YouTube UI returns 400 entries while my reverse-engineering approach returns 2021 entries.

So I assume YouTube UI pagination doesn't work fine, so to retrieve all media files, the idea is to filter by `Track title` and download all entries, preferably only those that have the title we are looking for, as some tracks have the same titles.
As for `Sound effects`, even with `Sound effect`, `Duration`, `Category` and `Added` there is an ambiguity on which files do we refer to (for instance for `Truck Driving in Parking Structure`, as they all are different).
"""

with open('music.json') as json_file:
    tracks = json.load(json_file)

for track in tracks:
    browser.find_element(By.ID, 'text-input').send_keys(track['title'])
    browser.find_element(By.XPATH, '/html/body/ytcp-text-menu/tp-yt-paper-dialog/tp-yt-paper-listbox/tp-yt-paper-item[2]/ytcp-ve/div/div/yt-formatted-string/span[1]').click()

    number_of_results = int(browser.find_element(By.CSS_SELECTOR, '.page-description').get_attribute('innerHTML').split()[-1])
    print(number_of_results)

    # `DOWNLOAD`
    browser.find_element(By.XPATH, 'div.overflow-actions:nth-child(12) > ytcp-button:nth-child(1) > div:nth-child(2)').click()
    break

#browser.quit()
Move from Firefox to Chromium to be able to retrieve download URL Thanks to `chrome://downloads`. 2023-02-04 19:08:02 +01:00			`import undetected_chromedriver.v2 as uc`
Add `media_files_extractor.py` 2023-02-04 14:18:26 +01:00			`from selenium.webdriver.common.by import By`
Move from Firefox to Chromium to be able to retrieve download URL Thanks to `chrome://downloads`. 2023-02-04 19:08:02 +01:00			`from selenium.webdriver.chrome.options import Options`
Add `media_files_extractor.py` 2023-02-04 14:18:26 +01:00			`import json`

			`"""`
			`As there is something looking as an anti-bot for downloading media files, we use a Selenium-based approach.`
			`"""`

Move from Firefox to Chromium to be able to retrieve download URL Thanks to `chrome://downloads`. 2023-02-04 19:08:02 +01:00			`options = Options()`
			`options.add_argument("--user-data-dir=selenium")`
			`browser = uc.Chrome(options=options)`
Add `media_files_extractor.py` 2023-02-04 14:18:26 +01:00			`browser.get('https://studio.youtube.com/channel/UC/music')`

			`"""`
			For `Music` tab, YouTube UI returns 3,000 entries while my reverse-engineering approach returns 5,819 entries.
			For `Sound effects` tab, YouTube UI returns 400 entries while my reverse-engineering approach returns 2021 entries.

Move from Firefox to Chromium to be able to retrieve download URL Thanks to `chrome://downloads`. 2023-02-04 19:08:02 +01:00			So I assume YouTube UI pagination doesn't work fine, so to retrieve all media files, the idea is to filter by `Track title` and download all entries, preferably only those that have the title we are looking for, as some tracks have the same titles.
			As for `Sound effects`, even with `Sound effect`, `Duration`, `Category` and `Added` there is an ambiguity on which files do we refer to (for instance for `Truck Driving in Parking Structure`, as they all are different).
Add `media_files_extractor.py` 2023-02-04 14:18:26 +01:00			`"""`

			`with open('music.json') as json_file:`
			`tracks = json.load(json_file)`

			`for track in tracks:`
			`browser.find_element(By.ID, 'text-input').send_keys(track['title'])`
			`browser.find_element(By.XPATH, '/html/body/ytcp-text-menu/tp-yt-paper-dialog/tp-yt-paper-listbox/tp-yt-paper-item[2]/ytcp-ve/div/div/yt-formatted-string/span[1]').click()`

			`number_of_results = int(browser.find_element(By.CSS_SELECTOR, '.page-description').get_attribute('innerHTML').split()[-1])`
			`print(number_of_results)`

			# `DOWNLOAD`
			`browser.find_element(By.XPATH, 'div.overflow-actions:nth-child(12) > ytcp-button:nth-child(1) > div:nth-child(2)').click()`
			`break`

			`#browser.quit()`