⚓ T367684 Add superset query service as pywikibot generator
Article Images
Add superset query service as pywikibot generator
Closed, ResolvedPublicFeature
Add superset query service as pywikibot generator
Closed, ResolvedPublicFeature
Superset is SQL query interface to Wikimedia database replica. It has API which allows to run SQL queries over https and it would be nice to have local SQL query access from pywikibot without need of wikimedia cloud accounts. With superset user needs only give OAUTH permission.
- https://wikitech.wikimedia.org/wiki/Superset
- https://superset.wmcloud.org/
- https://phabricator.wikimedia.org/project/profile/6584/
Howto use
- add mediawiki test user for meta.wikimedia.org to user-config.py
- login to meta.wikimedia.org
- login to superset https://superset.wmcloud.org/login and give OAUTH permissions
- python -m unittest -v tests.superset_tests
mybot.py
import pywikibot from pywikibot import pagegenerators from pywikibot.bot import ExistingPageBot class MyBot(ExistingPageBot): def treat_page(self): """Load the given page, do some changes, and save it.""" print(self) def main(): """Parse command line arguments and invoke bot.""" options = {} gen_factory = pagegenerators.GeneratorFactory() # Option parsing local_args = pywikibot.handle_args() # global options local_args = gen_factory.handle_args(local_args) # generators options MyBot(generator=gen_factory.getCombinedGenerator(), **options).run() if __name__ == '__main__': main()
python pwb.py mybot.py -family:commons -lang:commons -supersetquery:"SELECT page_namespace, page_title FROM page LIMIT 10"
Example SQL queries
- mandatory return columns for creating are: page_id OR page_namespace and page_title
- if page_wikidb is defined then it is used as site. Example values are enwiki or enwiki_p or commonswiki or commonswiki_p.
- Target wiki is resolved from string using pywikibot.site.APISite.fromDBName()
SELECT gil_wiki AS page_wikidb, gil_page AS page_id FROM globalimagelinks GROUP BY gil_wiki LIMIT 10
SELECT page_id FROM page LIMIT 10
SELECT page_namespace, page_title FROM page LIMIT 10
Future work
- T367393 - In future Superset should be able to query ToolsDB public databases also and this needs to be added.
Event Timeline
Xqt triaged this task as Low priority.EditedJun 17 2024, 7:51 AM
Xqt changed the subtype of this task from "Task" to "Feature Request".
Xqt subscribed.
Looks like it could be implemented similar to |sparql or mysql
Xqt closed this task as Resolved.Jun 18 2024, 10:28 PM
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL · Credits