⚓ T367684 Add superset query service as pywikibot generator


Article Images

Add superset query service as pywikibot generator

Closed, ResolvedPublicFeature

Superset is SQL query interface to Wikimedia database replica. It has API which allows to run SQL queries over https and it would be nice to have local SQL query access from pywikibot without need of wikimedia cloud accounts. With superset user needs only give OAUTH permission.

Howto use

  1. add mediawiki test user for meta.wikimedia.org to user-config.py
  2. login to meta.wikimedia.org
  3. login to superset https://superset.wmcloud.org/login and give OAUTH permissions
  4. python -m unittest -v tests.superset_tests

mybot.py

import pywikibot
from pywikibot import pagegenerators
from pywikibot.bot import ExistingPageBot

class MyBot(ExistingPageBot):
    def treat_page(self):
        """Load the given page, do some changes, and save it."""
        print(self)

def main():
    """Parse command line arguments and invoke bot."""
    options = {}
    gen_factory = pagegenerators.GeneratorFactory()

    # Option parsing
    local_args = pywikibot.handle_args()  # global options
    local_args = gen_factory.handle_args(local_args)  # generators options

    MyBot(generator=gen_factory.getCombinedGenerator(), **options).run()

if __name__ == '__main__':
    main()

python pwb.py mybot.py -family:commons -lang:commons -supersetquery:"SELECT page_namespace, page_title FROM page LIMIT 10"

Example SQL queries

  • mandatory return columns for creating are: page_id OR page_namespace and page_title
  • if page_wikidb is defined then it is used as site. Example values are enwiki or enwiki_p or commonswiki or commonswiki_p.
SELECT gil_wiki AS page_wikidb, gil_page AS page_id FROM globalimagelinks GROUP BY gil_wiki LIMIT 10
SELECT page_id FROM page LIMIT 10
SELECT page_namespace, page_title FROM page LIMIT 10

Future work

  • T367393 - In future Superset should be able to query ToolsDB public databases also and this needs to be added.

Event Timeline

Xqt triaged this task as Low priority.EditedJun 17 2024, 7:51 AM

Xqt changed the subtype of this task from "Task" to "Feature Request".

Xqt subscribed.

Looks like it could be implemented similar to |sparql or mysql

Xqt closed this task as Resolved.Jun 18 2024, 10:28 PM

Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL · Credits