Wikipedia:Bots/Requests for approval/DASHBot 5 - Wikipedia


Article Images
The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.

Operator: Tim1357

Automatic or Manually assisted: Automatic

Programming language(s): Python (pywikipedia)

Source code available: If ya want it.

Function overview: Remove non-free images from non-mainspaces.

Links to relevant discussions (where appropriate):

Edit period(s): Daily

Estimated number of pages affected: 1000 to start. I have no idea how many after the backlog is done.

Exclusion compliant (Y/N): Y

Already has a bot flag (Y/N): N

Function details: Replaces BJBot, generates a list of Fair-Use images in non-mainspace from Betacommand's tool here. Then, it replaces any and all fair use images with File:NonFreeImageRemoved.svg. It replaces multiple images at the same time (if there are more than one in the page). I am considering making a new version of NonFreeImageRemoved.svg, so that when it replaces the image, it does not fill the entire page. Perhaps the new image would be 200px. That way the biggest it could be would be 200px. See what happens otherwise here.

Per the request of SoWhy, the bot changes all images in the talk namespaces to links. Tim1357 (talk) 23:32, 27 December 2009 (UTC)[reply]

UPDATE: Also per the recomendation of SoWhy, the bot will leave a note on the talk page of the user that added the image. Tim1357 (talk) 01:59, 28 December 2009 (UTC)[reply]

It's an appropriate task, as non-free images should never be in non-main space. --IP69.226.103.13 (talk) 20:19, 28 December 2009 (UTC)[reply]

Will it pick up Non-free images from their inclusion in the non-free media category or my transclusion of a non-free image template? MBisanz talk 10:17, 30 December 2009 (UTC)[reply]
I'm not sure how betacommand generates the list. Let me ask him. Tim1357 (talk) 14:03, 30 December 2009 (UTC)[reply]
The list is apparently from Category:All non-free media Tim1357 (talk) 03:05, 31 December 2009 (UTC)[reply]
(recursive of course) Tim1357 (talk) 03:10, 31 December 2009 (UTC)[reply]
What do you mean "recursive of course?" --IP69.226.103.13 19:18, 31 December 2009 (UTC)
Oh, im sorry. Recursive means that it finds all the files that are in subcategorys as well. That means that articles in Category:Non-free_musical_artist_logos are included in the list, as Category:Non-free_musical_artist_logos is a sub-category of Category:All non-free media. If there are sub-sub categories (i.e. categories in sub-categories) it finds the files in those categories as well. Tim1357 (talk) 19:30, 31 December 2009 (UTC)[reply]
Well, recursive has a different specific technical meaning, so it's more useful, imo, in a bot discussion, which is a community discussion, to say it will check subcategories too, unless you're discussing the code in particular. In this case it's about what it does. Thanks. --IP69.226.103.13 19:36, 31 December 2009 (UTC)
You're right. Thanks : ) Tim1357 (talk) 00:28, 1 January 2010 (UTC)[reply]

I've scaled the default size of File:NonFreeImageRemoved.svg to 200×200px to avoid the problem you noted above. It's an SVG, so this should have no effect when it's used with an explicit size.

I'd also like to suggest that the bot should leave the name of the replaced file visible in some way, e.g. in a <!-- comment --> or, where possible, linked from the caption. People something include non-free images in discussions also outside the odd namespaces (e.g. on the village pumps or the refdesks, and IME quite often at the graphics lab), and it can be annoying to have to dig through the history for the name of the image being discussed. Alternatively, perhaps the bot should treat the Wikipedia: namespace as if it were a talk namespace. —Ilmari Karonen (talk) 10:02, 1 January 2010 (UTC)[reply]

I was thinking that too, I like the Idea of a comment. The only problem is: sometimes the non-free image is in an infobox, and Im not sure how to turn those into links, as they often-times do not have the [[File: or [[Image: markers within the box. Tim1357 (talk) 16:03, 1 January 2010 (UTC)[reply]
Ok, seems like a good idea to trial as soon as we can figure out the infobox issue, I think User:ST47 had a solution to that once, so you might try emailing him. MBisanz talk 00:53, 3 January 2010 (UTC)[reply]
You can also try asking one of the editors who does a lot of work with templates (User:ThaddeusB?), or check the infobox discussion pages to find someone. --IP69.226.103.13 | Talk about me. 05:18, 3 January 2010 (UTC)[reply]
For infoboxes, I think simply including the original image name in a <!-- comment --> (as in, say, "|image = NonFreeImageRemoved.svg<!-- Original_image_name.jpg -->") is probably the best solution. The MediaWiki parser strips comments pretty early, so they shouldn't affect the infobox syntax. —Ilmari Karonen (talk) 14:58, 3 January 2010 (UTC)[reply]
Ok, I guess that works ok. Tim1357 (talk) 16:26, 3 January 2010 (UTC)[reply]

Will the bot avoid editing the same page a number of times in a row, such as happened with BJBot here? - Kingpin13 (talk) 09:04, 5 January 2010 (UTC)[reply]

I dont know why that page was edited, isnt it in the mainspace? Anyways, the bot will group multiple files into a single edit. Tim1357 (talk) 01:33, 7 January 2010 (UTC)[reply]
I can't spot it being mentioned anywhere; will the bot be ignoring pages in Category:Wikipedia non-free content criteria exemptions? - Kingpin13 (talk) 09:52, 8 January 2010 (UTC)[reply]
Not yet, ill find a way to code that in there. Tim1357 (talk) 22:00, 8 January 2010 (UTC)[reply]
 Doing... Tim1357 (talk) 18:36, 10 January 2010 (UTC)[reply]
All  Done with the code, the bot trys to link the image if it can, and replaces it with File:NonFreeImageRemoved.svg if it cant. In all cases, it leaves an inline comment, leaving the images name if it was replaced. Im ready for trial if nobody objects. {{BAGAssistanceNeeded}} Tim1357 (talk) 02:30, 11 January 2010 (UTC)[reply]
No objections from me. The sooner we clean these problems up, the better, imo. --IP69.226.103.13 | Talk about me. 22:47, 10 January 2010 (UTC)[reply]
Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. MBisanz talk 05:38, 11 January 2010 (UTC)[reply]
 Done see /log Tim1357 (talk) 06:07, 15 January 2010 (UTC)[reply]
I don't see any user warnings...? - Kingpin13 (talk) 09:53, 15 January 2010 (UTC)[reply]

Old Discussion

  • Comments (1) Error: The bot removed File:WorcsCoatArms.jpg from 13 pages doing so based on the image existing on the list at Betacommand's list. Good so far. However, another editor changed the license on the image to a free one (sidebar: probably improperly). The bot did not look at the current tagging of the image to see if it is still marked as non-free. This is probably not a huge issue, as Betacommand's report runs once every 24 hours. Very few images are going to be erroneously removed by this shortcoming. (2) About time! Betacommand's list is always hovering around 900-1200 entries. I try hard every day to fight the length of the list down, with no success in reducing the number of violations overall. The number of new violations per day is roughly equivalent to the number of violations removed. This bot will finally overcome that list and keep things in line. Yeah! (3) Discussion on development articles a discussion that has some bearing on what this bot can do is present at Wikipedia_talk:Non-free_content#Non-free_images_on_sandbox.2Fuserspace_developing_articles. --Hammersoft (talk) 15:19, 15 January 2010 (UTC)[reply]
  • Problem What if someone labels a popular PD image as non-free. Concieveably, such an image (like one used in a popular user box) could be removed from thousands of pages causing significant, unnecessary problems. Second of all, it appears Hammersoft used this bot to remove said 13 instances of the image unlike what was claimed above (that the bot removed the image). — BQZip01 — talk 21:27, 15 January 2010 (UTC)[reply]
    In relation to this, perhaps a limit could be placed, where if the image appears on more than X pages, it won't be removed without some-kind-of human confirmation, such as whacking a {{Seriously, we're not kidding, this ought to be deleted}} type template onto the image, or just leaving it up to humans altogether. So: Tim1357, what's the count of whatlinkshere for the current crop of candidate images? Is it something like 98% have two or less? Josh Parris 23:09, 15 January 2010 (UTC)[reply]
Ahh comments! I will respond to requests in the order they were received so:
  1. Kingpin, damn i forgot, done (will make it exclusion compliant and skip if already notified user)
  1. Josh Parris and BQZip01, good idea, If the bot is all of a sudden told to remove more then say 7 copies of an image, It will skip it an log it.

(sorry I went a bit over 50 I think)

Tim1357 (talk) 01:45, 16 January 2010 (UTC)[reply]

I like the basic idea (pulling back things that don't meet our criteria). What I'm most concerned about is the volume we're talking about here. Imagine an image that people use all over Wikipedia (like a check mark) being labeled as non-free right before DASHBot 5 makes its run. Someone would have to go back through the edit history (and if estimates of 800-1200 are accurate) sifting through all those edits to undo those that were removed could be a serious problem (especially when finding a few dozen edits out of thousands each day). A better solution, IMHO, would be to allow users to use an application that runs this code (instead of an automatic bot) under their names and look at each individual image (allowing some personal interaction and personal determination on each image). As it is, even running only on only 7 images max could cause lots of problems and would be a major pain to fix any errors. It is easy to find images that are not used properly (just look at the images and look at what articles it is used in), it is hard to undo such actions as they are on numerous pages and, once removed, are not linked by anything except one person's edit history. With the problems when uploading images, as it is quite easy to have the wrong tag and those tags available are largely incomplete/missing many options, I'm not sure this bot application is a good idea at this time. — BQZip01 — talk 04:11, 16 January 2010 (UTC)[reply]
An idea to fix this and make undo actions significantly easier would be to log such actions on the image page. Something like
DASHBot has removed this image from [page A], [Page B], and [Page C]. Please consider the copyright status of this image and Wikipedia's Non free content criteria before re-adding this image to additional pages.
each link would record the action of the bot, thereby making undo actions much easier and appropriately centralized. These links could be removed after 30 days to de-clutter the page. If this process is added, I would have no problem bumping up the limit to 30 or 40 images at a time. Thoughts? — BQZip01 — talk 04:16, 16 January 2010 (UTC)[reply]
Another idea would be to check each rev in the history of the file, and only remove if the image has always been NFC. - Kingpin13 (talk) 06:30, 16 January 2010 (UTC)[reply]
That's a good alternative too, but that still leaves the problem of images that simply have the wrong tag from uploading. Perhaps a combination of the two? — BQZip01 — talk 15:52, 16 January 2010 (UTC)[reply]
In theory, any file that all of a sudden needs to be removed from many pages, means that it was just recently tagged as such. For that reason, the bot logs all images that call for more then ten removals to a special page. It checks another page to see if it has the OK to remove the file. In other words, it waits for human review before it precedes in removing the files en masse. Tim1357 (talk) 04:55, 17 January 2010 (UTC)[reply]
Ok, but will the person's name who reviews this be attached to the edit? What about logging such edits on the associated image page? image's talk page? — BQZip01 — talk 08:55, 17 January 2010 (UTC)[reply]
Update also I created a list of users not to be warned again. Tim1357 (talk) 04:57, 17 January 2010 (UTC)[reply]
Well, in my humble opinion, I think having the list in one place is a lot better then pinging talk pages. Many of those pages, mind you, are not watched. This way someone can know the place to go in order to review images. Tim1357 (talk) 15:38, 17 January 2010 (UTC)[reply]
My understanding is that BQZip01 was asking if there would be an audit trail in place and obvious. Josh Parris 15:54, 17 January 2010 (UTC)[reply]
Yeppers! However, it seems to me that the image talk page might be an ideal place to annotate such information (I have no problem if that info is removed even a few days after (it will remain in the edit history anyway). — BQZip01 — talk 16:01, 17 January 2010 (UTC)[reply]

Doesn't the pages history serve as an impromptu audit trail? Tim1357 (talk) 16:54, 17 January 2010 (UTC)[reply]

Absolutely, however, finding such changes is problematic. If a bot makes 5000 changes in a day, it takes a long time to sift through and find such changes in the edit history. While you can find any page on which and image is used by simply looking on the image page, you cannot look on the image page to find images that used to be on other pages.
Let's use an example where this bot finds an image tagged for fair use with some rationales given. In conjunction, there are 10 violations of WP:NFCC since the image is being used on 10 user pages. The bot removes user page images and continues with its deletions. 2 days later someone looks at their user page and finds the image missing, fixes the erroneous tag (the image was actually PD), and re-adds it to his user page. The other 9 images are never re-added because no one knew that they were removed in the first place (searching through thousands of edits to see if anyone else's pages were affected is a tedious use of time).
If you provide diffs to each of the removals on the talk page of the "offending" image, it would eliminate the searching and easily centralize any corrections. — BQZip01 — talk 18:17, 17 January 2010 (UTC)[reply]
Darn internet makes it hard to communicate. Thanks for clearing that up. Yes, I can provide a log of difs for when the bot removes files from pages. Ill put a list of diffs, by file, in a table at User:DASHBot/HumanReview. Tim1357 (talk) 18:31, 17 January 2010 (UTC)[reply]
Thanks. That's a great idea, but I still think it would be better to annotate it on the image page, image talk page, or at least provide a link to your table in the edit summary. Does that work for you? — BQZip01 — talk 18:53, 17 January 2010 (UTC)[reply]
Sure, ill leave a link to the page in the edit summary, as I think it is a good idea to keep all of the log in one place. Tim1357 (talk) 18:57, 17 January 2010 (UTC)[reply]
Hmmm...I'm thinking there may be a problem with this. It runs into the same problem: finding the changes in the first place. If there is no change to the image, then finding all of these changes is a problem. If such changes are mentioned on the image talk page (or just a diff to the log update), they can easily be found and reverted if needed. — BQZip01 — talk 19:05, 17 January 2010 (UTC)[reply]

If I may step in to clarify here: it appears that BQZip01 is asking for all image removals be noted on the image's talk page, and I would suggest some sort of notice placed on the image's page pointing to the talk-page. The intention here is that if images have been removed in error, then on the image's talk page is a list of all the reversions that need to be made / the affected pages. "Here's every article that used to use this image, before the bot had its way with them". Please correct me if I'm wrong.

As an aside, there might even be an easy way "Click here to revert these removals"-style to undo the bot's actions (presumably by having the bot do so). Josh Parris 00:59, 20 January 2010 (UTC)[reply]

Ok here is what I think is a reasonable compromise:
  1. The edit summary looks like this: Robot: Removing N Non-Free files per WP:NFCC#9 (Shutoff | Log | Error?)
  2. The talk pages of Images are not edited, I think it is kind of spammy, and really serves no purpose.
  3. The user-message has links to the log and the error page.
  4. The log page is sorted by day. Each Day has its own table, with the actions sorted by file. The table will contain links to diffs performed by the bot.
  5. Anytime more then 40 images are called to be removed, it waits for human conformation.
I hope that works well for everyone.

Tim1357 (talk) 02:04, 20 January 2010 (UTC)[reply]

Tim, I love what you are trying to do here, but I think you are missing the point that Josh and I are trying to make. Let's say an image with an improper label is used on 39 user pages and is removed by a human, with no link to the image or its talk page, no one knows what was removed unless you were watching the pages upon which it was used. I agree it is "kind of spammy", but that is kind of the point. I have no problem with appending
==DASHBot image removal==
DASHBot removed this image from N pages on [www.google.com 21 January 2009] ~~~~
N would be the number of pages (you're already tracking this) and should be easy to insert.
The link would be to the log entry.
This would make changes much easier.
Also, will the log feature a clickable link to undo a group of actions? or will each individual page require an individual "undo" (don't get me wrong but fixing any problems would be a gold mine for my edit count :-) ) — BQZip01 — talk 02:19, 21 January 2010 (UTC)[reply]
I have multiple reasons why that would not be a good idea:
  1. BJBot Once removed a fair use image from one of my userspace draft. I was confused for a second, but the edit comment, the inline comment and the userpage all explained it to me. I did not think of checking the talk page.
  2. Included in the user message explanation is a link to report errors (my talk page). Note that the bot logs all removals, so the log will be useful to anyone.
  3. There are hundreds of files to be removed each week. What if the same file is removed multiple times from an unwatched talk page? That could ammount to a huge ammount of posts to a talk page that nobody will ever look at anyways.
  4. I think leaving messages on talk pages is overkill. User talk pages should suffice, along with the edit summaries, and the inline comments.
Ok thats it. And in IRC, Josh_Parris and I discussed this. He had some reservations, but seemed to sign off on this. Ill get him to comment here again if he can. Tim1357 (talk) 02:33, 21 January 2010 (UTC)[reply]
I did express reservations, right up to the point where Tim1357 pointed out that if images were removed in error, he'd fix the problem himself. Given the bot will only run while Tim1357 is still around, it's the ultimate fall-back; he can go trawling through the logs to figure out what needs reverting.
Having read BQZip01's comments, I believe his scenario is one where the user in question is not using the image on their user page, but may perhaps be the creator of the image. Suddenly she notices that the image isn't used on 38 pages, it's used on 2. There's no where for her to go to discover what happened. (BQZip01, correct me if this scenario is not similar to what you are envisaging) If this is the scenario BQZip01 is contemplating, you might get around it by simply placing a single note on the talkpage "At various times this image was removed from one or more user pages as required by <insert link>; for details inquire at User Talk:DASHBot or see the bot logs at <insert location>" Josh Parris 12:08, 21 January 2010 (UTC)[reply]
That's basically the gist of it. Somehow I am failing to get the point across to Tim (my fault Tim, not yours).
In your scenario, BJBot removed the image and you checked the edit history. Let's say you fixed the tag on the page and decided to add it back into your page. What you don't know is that the same image was removed from 38 other pages (a hypothetical situation here) and you wouldn't have any idea that actually happened unless you were watching one of those pages. You have no way to know what the Bot did. While you can see the log, you will have to know what date it was removed on in which articles it was used to find the appropriate diff(s) to undo. — BQZip01 — talk 17:45, 21 January 2010 (UTC)[reply]
Sorry, I missed the comment that you'd fix any errors yourself. By all means press ahead! — BQZip01 — talk 06:20, 25 January 2010 (UTC)[reply]

OK, If there is no more problems. Maybe I could get started? Tim1357 (talk) 20:23, 31 January 2010 (UTC) {{BAGAssistanceNeeded}}[reply]

Appears issues are resolved, approving.  Approved. MBisanz talk 20:06, 2 February 2010 (UTC)[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.