How to extract the part before a subsection? #1

Open
opened 2024-03-29 16:38:58 +01:00 by Benjamin_Loison · 1 comment

The table before the first section The Purge (2013) on https://en.wikipedia.org/w/index.php?title=The_Purge&oldid=1215947399#Films

The table before the first section `The Purge (2013)` on https://en.wikipedia.org/w/index.php?title=The_Purge&oldid=1215947399#Films
Author
Owner
diff --git a/notifyOnWebpageChange.py b/notifyOnWebpageChange.py
index 2d274e8..2ca1f88 100755
--- a/notifyOnWebpageChange.py
+++ b/notifyOnWebpageChange.py
@@ -143,9 +143,16 @@ webpages = [
         'url': getWikipediaUrl('Futurama_(season_8)', 5),
         'filterText': lambda text : '\n'.join([template.get_arg('OriginalAirDate').string for template in wtp.parse(getWikipediaText(text)).templates[1].templates[30:50:2]]),
     },
+    # Should retrieve changes instead of whole new webpage.
+    # Also tracking a potential new subsection would be nice.
+    # https://gitea.lemnoslife.com/Benjamin_Loison/wikitextparser/issues/1
+    {
+        'url': getWikipediaUrl('The_Purge', 7),
+        'filterText': lambda text : getWikipediaText(text),
+    },
     {
         'url': getWikipediaUrl('The_Purge', 1),
-        'filterText': getWikipediaText,
+        'filterText' : lambda text : '\n'.join([template.string for template in wtp.parse(getWikipediaText(text)).templates[:7]])
     },
     {
         'url': 'https://opus-codec.org/downloads/',
```diff diff --git a/notifyOnWebpageChange.py b/notifyOnWebpageChange.py index 2d274e8..2ca1f88 100755 --- a/notifyOnWebpageChange.py +++ b/notifyOnWebpageChange.py @@ -143,9 +143,16 @@ webpages = [ 'url': getWikipediaUrl('Futurama_(season_8)', 5), 'filterText': lambda text : '\n'.join([template.get_arg('OriginalAirDate').string for template in wtp.parse(getWikipediaText(text)).templates[1].templates[30:50:2]]), }, + # Should retrieve changes instead of whole new webpage. + # Also tracking a potential new subsection would be nice. + # https://gitea.lemnoslife.com/Benjamin_Loison/wikitextparser/issues/1 + { + 'url': getWikipediaUrl('The_Purge', 7), + 'filterText': lambda text : getWikipediaText(text), + }, { 'url': getWikipediaUrl('The_Purge', 1), - 'filterText': getWikipediaText, + 'filterText' : lambda text : '\n'.join([template.string for template in wtp.parse(getWikipediaText(text)).templates[:7]]) }, { 'url': 'https://opus-codec.org/downloads/', ```
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Benjamin_Loison/wikitextparser#1
No description provided.