Wikipedia:Articles for deletion/Anybot's algae articles - Wikipedia


Article Images
The following discussion is an archived debate of the proposed deletion of the article below. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review). No further edits should be made to this page.

The result was delete all. Articles that have been labeled equal to hoaxes by uninvolved editors need to be removed. As pointed out, subsequent edits do not necessarily guarantee the errors have been addressed. –Juliancolton | Talk 00:08, 29 June 2009 (UTC)[reply]

Anybot's algae articles

edit

AfDs for this article:

Anybot's algae articles - see User:Anybot/AfD for a list

Anybot created 4092 algae articles by scraping information out of the AlgaeBase database, and formatting it into articles. In doing so, it introduced numerous serious errors into more-or-less every article. Common errors include:

  • basic taxonomic errors, such as calling a cyanobacterium an alga (a bit like calling a plant an animal, only much much wronger);
  • writing articles about extinct taxa as though they were extant;
  • descriptions that don't distinguish between the many different phases of the algal life-cycle, falsely implying a single generation or alternation of generations;
  • misuse of descriptive terminology; for example, 69.226.103.13, who appears to have expertise in this area, refers to
    "Phormidium, a cyanobacterium is described as having a crustose thallus. The term filamentous used in articles about cyanobacteria should be carefully distinguished as a bacterial colony's sheath. However, since our cyanobacteria articles make them eukaryotes the reader may not understand this is a bacterial colony not a multi-cellular organism with undifferentiated tissue (a thallus)."
  • incorrect species counts, partly caused by the false assumption that the number of species names recorded in AlgaeBase equals the actual number of accepted species, but partly inexplicable;
  • incorrect and contradictory taxonomies, partly due to AlgaeBase itself being outdated, but partly inexplicable;
  • creation of articles on names listed in the database as synonyms for other taxa;

The task of checking and fixing these articles has mainly fallen to 69.226.103.13, who has stated "Every one I investigated contained serious misinformation, except for those that had been later edited by other writers... There are so many errors and so many different types of errors that it is impossible to address each one other than by individually editing each article. I don't write science articles without checking sources. It would take me hours to verify each one." As with all of our articles, these articles are coming up at the top of Google searches; we are misinforming people who don't know better, and putting our incompetence on display to those who do. Our reputation is on the line here, folks.

There seems to be consensus at Wikipedia talk:WikiProject Plants#Algae articles AnyBot writing nonsense that these articles are unsalvageable unless we can find a phycologist willing to donate tens of thousands of hours of time; and even then it would be quicker to delete the articles and start again from scratch. The coder of the bot is presently very busy, and his/her response to this has been lukewarm at best. Specific errors that were pointed out a few months ago still have not been fixed.

The full list of articles nominated for deletion is at User:Anybot/AfD. If you are willing and able to rescue any, by all means do so, and then remove them from the list. Those that remain on the list at the end of this AfD should be deleted. I will personally commit to restoring any articles that should not have been deleted because they had already been corrected or verified as correct. Hesperian 00:28, 22 June 2009 (UTC)[reply]

Question I see it mentioned at Wikipedia talk:WikiProject Plants#Algae articles AnyBot writing nonsense that some articles were pre-existing, and were then over-written by the bot. Are all of those articles included on this list too? -RunningOnBrains(talk page) 06:21, 22 June 2009 (UTC)[reply]

No; these are articles that the Anybot created (i.e. made the very first edit). Hesperian 06:24, 22 June 2009 (UTC)[reply]
What about redirects? These are not listed on the page of anybot's articles. Some of the redirects are not to the correct article, because anybot did not distinguish synonyms-sorry, can't find an example. --69.226.103.13 (talk) 16:53, 22 June 2009 (UTC)[reply]
You want some examples? I noticed that there are quite a few inappropriate redirects to Ulva. See here and look towards the bottom of the page. --Kurt Shaped Box (talk) 20:47, 22 June 2009 (UTC)[reply]
Also, what links here for Palmaria (a disambiguation page). --Kurt Shaped Box (talk) 20:55, 22 June 2009 (UTC)[reply]
No, I'm not asking for examples. Anybot's redirects, and there may be thousands, are not on the list Hesperian created. What will be done with these, will they be deleted also?
For example, and this may be one of the single worst articles anybot created, Leptophyllus was a redirect anybot created to Abedinium. The taxonomy box lists Abedinium as belonging to a diatom order (Brachysiraceae, an easy to recognize diatom order) in an obvious and familiar red macroalgae class (Gigartinales). However, in spite of our highly unusual taxonomy in the wikipedia article, Abedinium is a dinoflagellate in the order Noctilucales. I'm concerned that leaving the redirects will keep pages like this in search engine caches.
PS I deleted the taxobox so you have to look in the history to see it. Also, this is why we cannot just keep articles that have been edited by humans, each one has to be checked. Like the IP edited articles these were edited by two competent human editors but not for the most egregious errors, only for wikipedia style matters. --69.226.103.13 (talk) 21:58, 22 June 2009 (UTC)[reply]
Let's let this AfD run its course. When we're done here, I'll produce a list of redirects and we can go around again. Hesperian 23:42, 22 June 2009 (UTC)[reply]
There seem to be a fair few (based on a quick spot check) Anybot-created redirects to Anybot-created articles. If the articles are deleted, the redirects become then speedy-deletable per Wikipedia:CSD#G8. There are a lot of admins who work in this area daily, so this issue will rapidly become somewhat less of an issue if this AfD is closed as delete. --Kurt Shaped Box (talk) 05:21, 23 June 2009 (UTC)[reply]
  • Delete all. Any mass creation of material that puts Wikipedia into disrepute as an innaccurate source of information should be reverted/deleted/removed SatuSuro 06:34, 22 June 2009 (UTC)[reply]
  • Delete all. We need to get these off of Google and the WP mirrors ASAP. As 69.226. states here, some of these are coming up as the top/sole Google hit. During the course of the discussion of this someone (I can't seem to find the exact comment now) stated that they were an teacher of some denomination and had discovered this issue after one (several?) of his/her students had handed in an assignment containing a howling WP-sourced factual error. This shouldn't be happening. I'd also suggest that in future, anybody considering running a bot creating or editing articles in highly-specialized and arcane fields such as this one should endeavour to get an expert onboard to consult with, before unleashing the bot full-throttle. As I understand it, the bot operator in this case is somewhat knowledgeable in the area but missed blatant errors early on that if spotted, would've avoided this entire situation. --Kurt Shaped Box (talk) 10:57, 22 June 2009 (UTC)[reply]
  • Question for nominator - have you removed from the list any bot-generated article since edited by User:213.214.136.54? As far as I know, these are now correct. It might also be an idea to get someone with a bot to AfD tag all the affected articles. There may be someone with one or more of these on their watchlist who could help with fixes, if made aware of the problem. --Kurt Shaped Box (talk) 11:05, 22 June 2009 (UTC)[reply]
    • No, I haven't. I'll ask 213.214.136.54 is s/he is willing to vouch for the ones s/he has edited. Hesperian 11:16, 22 June 2009 (UTC)[reply]
      • The IP only edited higher level taxonomies in the boxes. If you can generate a list of their edits I can edit the articles, maybe other writers could help. With the Chromalveolates I may have to stubify. I did edit one article, but undid my edit, because it would be a lot of work to edit these articles to a vouchable point, a couple of hours per article at least. --69.226.103.13 (talk) 17:21, 22 June 2009 (UTC)[reply]
        • Okay, here's a list of 213.214.136.54's edits, as generated by ContributionSurveyor - if that's of any use to you... --Kurt Shaped Box (talk) 21:45, 22 June 2009 (UTC)[reply]
          • Okay, this list is useful. It shows some underlying problems with wikipedia algae articles that need fixed first. This list could be used to stubify its articles with a bot under some guidelines: pick diatoms off by division/phylum (or both in some unfortunate cases, or class also). Have plant and protist editors pick the current higher level taxonomy for the Chromalveolata, and for the rest of these organisms (single-celled photosynthetic algae and their closely related non-photosynthetic taxon-mates), then run a bot (prefer an existing bot than anybot) to pull the class from the taxobox and rewrite the single sentence to "Thisgenus is a diatom." Leave that sentence, the taxonomy box, the link to algae base, and, to make it easier for other editors, categorize by family, order, class in that order of preference, as a stub. Check that a taxonomy box does not contain both a division and a phylum. A problem that was not evident earlier is that older higher level taxonomies from 2003/04 are different from later taxonomies. It seems a phycologist comes in every two years and uses their own taxonomy. One has to be picked for an encyclopedia. --69.226.103.13 (talk) 22:34, 22 June 2009 (UTC)[reply]
  • Another question (sorry - last one!) - what should be done with the pages currently stored in Anybot's userspace? --Kurt Shaped Box (talk) 11:14, 22 June 2009 (UTC)[reply]
  • Delete all that contain errors, improve bot code in consultation with 'experts', and run bot again. I had been unaware of the lengthy discussion at WikiProject Algae; first I should note that the original version of the bot contained some errors, which a later version of the bot corrected as soon as they were pointed out. The original version seems to have been run since April, replicating some of the errors, which has inflamed the discussion.

    Now, in my opinion, articles that contain small errors (e.g. the wrong tense) but cite a reliable source are better than no article at all - and if all such pages were deleted from WP the encyclopaedia would probably shrink by a factor of two. As evidenced by the work of some dedicated IP editors, the existence of a skeleton article is often the seed from which a useful and correct article is developed. And as all of the articles use information attributed to a reliable source, it is possible for people to check the data against the facts (no-body should ever use WP as a reliable source in itself). Again, this makes the articles more useful than many other unsourced articles on WP.

    However, I am embarrassed that wide-spread errors do exist. Systematic errors - such as the use of 'alga' instead of 'cyanobacterium' - are very easy to fix automatically. If I had a list of the errors that have been spotted, so that I could easily understand what is said that is wrong, and what should be said, I could re-code the bot until it got everything right, and then put it up for retesting (hopefully it is now notorious enough that people will be willing to check its output). At that point it would be possible to run the bot again and create error-free articles. In the meantime, perhaps it is a good idea to delete articles which contain factual errors. (I will never support the deletion of any article which details a notable subject, and contains factually correct information attributed to a reliable source.)

    I think that the worst case scenario would be to delete articles willy-nilly and thereby deplete WP. We have the potential to use the Algaebase material to generate useful information - if it's not entirely up to date, then neither are most text books; and if the classification needs systematically updating, the bot can do that as taxonomy is updated. If this is done regularly, WP can keep up to date and become as useful a resource as Algaebase is today. Let's be careful to produce the best quality output we can before the deadline. Martin (Smith609 – Talk) 13:39, 22 June 2009 (UTC)[reply]

    I responded to this post on this page's discussion page, in length, repeating much I said on the WP:Plants page, the bot owner's page, and the bot's error reporting page. --69.226.103.13 (talk) 16:49, 22 June 2009 (UTC)[reply]
  • Delete all and scrap the bot. per nom. Niteshift36 (talk) 13:41, 22 June 2009 (UTC)[reply]
  • Keep the ones where the bot is not the only editor. Revert to the last non-bot edit for those created by humans, and delete or stubify the rest. This "kill-em-all" approach is not appropriate for an academic community. People have spent hundreds of hours creating some of these articles, improving others, and fixing the mistakes of the bot. We can't just get rid of all this good, solid content just because it's simpler to delete everything rather than be a bit more selective. Owen× 13:42, 22 June 2009 (UTC)[reply]
Those articles are not being nominated, see above. cygnis insignis 13:55, 22 June 2009 (UTC)[reply]
You are assuming that any subsequent edit is a fix. The truth is, people (and other bots) edit articles for all sorts of maintenance and cosmetic reasons. http://en.wikipedia.org/w/index.php?title=Zygosphaera&action=history ? Hesperian 23:51, 22 June 2009 (UTC)[reply]
I noticed that some had been fixed, by the ip and someone interested in sea grasses, although I suspect that almost all of the subsequent edits were cosmetic. I notice there are language links being added (by bots!), they make the problem even worse. Still think they should be deleted, after a short grace period. cygnis insignis 18:46, 23 June 2009 (UTC)[reply]
  • Comment I noticed this bot created subpages, linked from the posts talk pages, when it found existing articles. The pages that aren't useful, such as the one announced on Talk:Amphibolis (a plant in this example), are potentially distracting and should be unlinked. I also can't see a reason for maintaining erroneous information in user space, the bot could restore improved versions as easily as it created them, the community should agree to their deletion too. Anyhow ... delete all those nominated above, for the reasons given above. cygnis insignis 13:55, 22 June 2009 (UTC)[reply]
  • If possible, delete only articles edited by the bot alone. If this task can't be automated, I am willing to offer my admin services at the conclusion of this AfD. Otherwise, delete all un-vouched-for on the list. Also, make sure that the bot's over-writing of articles is reverted. -RunningOnBrains(talk page) 14:37, 22 June 2009 (UTC)[reply]
I am willing to edit articles edited by other writers if a list can be made. I may not be able to edit the Chromalveolates and there were some protozoa that I probably cannot touch. --69.226.103.13 (talk) 16:49, 22 June 2009 (UTC)[reply]
@Runningonbrains: It isn't all that difficult to generate a list of article edited by Anybot alone. But the problem with this proposal is that a great many edits get made to articles for purely cosmetic purposes. Therefore one cannot assume that an article has been fixed and/or verified merely because someone else has edited it. See, for example, http://en.wikipedia.org/w/index.php?title=Zygosphaera&action=history. You would keep this article? Hesperian 23:47, 22 June 2009 (UTC)[reply]
*sigh*I just don't like the idea of wholesale-deletion of articles which no one has checked to confirm that they have a problem. I don't know enough about biology and/or taxology to do these kinds of checks. To me, the best way to do this is a deletion of bot-only-edited pages, then case-by-case deletions where improvements have been made. I know it's asking a lot from those who edit articles in this field, but I'd like to see the fewest number of deletions possible. I'm happy to try to coordinate this with you and/or other editors. -RunningOnBrains(talk page) 18:43, 23 June 2009 (UTC)[reply]
The IP edited 1567 articles. However, the articles they edited (KP Botany offers to save them below), are among the most difficult genera taxonomically. In other words the fewest possible editors who are competent to correct them. In addition, the IP only edited the higher level taxonomy in most of these, not touching the remaining text. --69.226.103.13 (talk) 19:56, 23 June 2009 (UTC)[reply]
  • Gasp! BAG approved the creation of crap? Delete all. I will edit the Chromalveolates that are salvageable--the IP list. If it's decided to delete them can they be posted to my user space in some way so I do not have to retype the taxoboxes? It's a shame to have an IP do a lot of work correcting wikiGarbage, then have the corrections deleted. I assume the list will be the photosynthetic Heterokonts and dinoflagellates, and I have no problem with editing these articles. No panic, Curtis, I'll use Lee, not Cavalier-Smith. I'll do it over the summer and start in a couple of weeks. I've been ill and had a family emergency that is slowly resolving. Also, the listed problems with the bot were discovered in its trial phase by an editor, who alerted me, and I ignored him/her based on an extraneous issue, then never got back around to looking at these articles. However, BAG told me to shut up, and I have been rather busy. My bad, but, bots do not need to be creating this many articles without specific approval and monitoring throughout. This is what comes of self-elected closed user groups: they decided to create these articles. --KP Botany (talk) 01:33, 23 June 2009 (UTC)[reply]
    Re: "It's a shame to have an IP do a lot of work correcting wikiGarbage, then have the corrections deleted." I agree. As soon as the IP editor tells us that they consider the articles they have edited to be fixed, rather than merely fiddled, I'll remove them from the list.
    I suggest you proceed as follows:
    1. identify the articles you want to work on;
    2. remove them from User:Anybot/AfD, so that they are not deleted as a result of this discussion.
    3. if it is not appropriate to leave them where they are whilst you are working on them, move them into your userspace with an edit summary that cites this discussion (it won't take long for someone to detect and delete the cross-namespace redirects that you leave behind.)
    Hesperian 01:43, 23 June 2009 (UTC)[reply]
  • Delete all articles that are solely edited by Anybot or edits with other contributors (bots included) but considered to be cosmetic/courtesy edits. I know there's a leeway here and what constitutes as cosmetic edits are still open to discussion, but I believe we can use common sense here. Given the large magnitude of the articles that could potentially be deleted, I can offer my help to delete some of them if consensus is reached in this AfD. OhanaUnitedTalk page 02:08, 23 June 2009 (UTC)[reply]
    • I agree Ohana. The only problem is, who will spend the time assessing the edit history of 4000 articles? If you're willing to do so, then go for it: I've been saying all along that anyone may remove articles from the list if they are prepared to vouch for their correctness. Hesperian 02:38, 23 June 2009 (UTC)[reply]
      • Since admin should visit the page before clicking the delete button, it only takes them a few more seconds to click on its history and quickly examine whether someone fixed it or it's cosmetic edit or it's untouched. OhanaUnitedTalk page 16:23, 23 June 2009 (UTC)[reply]
        • But how could an admin without knowledge of phycology tell whether the article has been fixed or it's a cosmetic edit? You're an admin, can you look at Kurt Shaped Box's list of the IP's edits and tell me which ones you would keep under the criteria you posted above? Maybe you could post this on the discussion page. We should keep as many as possible, but if we keep articles that keep spreading misinformation we're being iresponible-particularly when we had the chance to stop the spread and chose not to. --69.226.103.13 (talk) 18:11, 23 June 2009 (UTC)[reply]
  • Delete and block the bot per others. Also checkuser Smith609 (the operator). The Junk Police (reports|works) 03:07, 23 June 2009 (UTC)[reply]
    • Don't be silly. Checkuser is for suspected sockpuppeteers. There's no evidence of that here. Hesperian 03:33, 23 June 2009 (UTC)[reply]
      • On reflection, it actually might be worth checkusering the bot. Smith609 appears to be claiming here that the bot account was compromised somehow. I don't know if the checkusers will agree to do it (it might be considered 'fishing') - but I don't consider it beyond that realms of possibility that a registered user (one of the many banned ones, perhaps), aware of the issue and seeking to make a large mess even bigger, was behind this. --Kurt Shaped Box (talk) 05:15, 23 June 2009 (UTC)[reply]
        • Actually, that's a stupid idea, as I realized almost as soon as I'd posted it. In order to run the bot, <whoever> would have to have access to Smith's computer. Doh! Sorry. <slaps self with trout> --Kurt Shaped Box (talk) 05:27, 23 June 2009 (UTC)[reply]
    • Martin has indef-blocked the bot himself. Guettarda (talk) 05:32, 23 June 2009 (UTC)[reply]
          • Martin runs another bot with a page full of "kooky errors." This is a different type of bot, apparently a user-script. The errors are similar to some errors of anybot, "Adding wrong URLs" (wrong information to taxonomy box), "Incorrect DOI" (wrong information), "Bot is adding dead link tags for links that are not dead:" (bad edits), "CitationBot removed a url= link to Google Books for no apparent reason:" (removing data that it had no business editing, such as replacing articles with redirects), "Replacing origyear with year" (when apparently not supposed to), "replaces author fields with its own" (again, it appears the bot is not being initiated with empty strings), "kooky edits" (lots of those with anybot), "Unnecessary addition," "removes valid ISSNs."
          • Maybe Martin's account has been corrupted along with the bots, or these bots are not well-coded. I have not checked the problems with citation bot. They may be nothing. But there are a lot of them. Should someone investigate this other bot? Is this a corrupted user account (Martin's), or systematic programming problems that are issues with Martin's other bots? Maybe your stupid idea was you wondering how this could happen, Kurt Shaped Box. Codes don't rewrite themselves. --69.226.103.13 (talk) 06:05, 23 June 2009 (UTC)[reply]
  • Delete. I've removed the nine articles I have fixed manually (after spending several hours reading my contributions to find them), but what is the best way to notify other contributors or identify articles they have fixed? Maybe by having a bot put a template (one of the usual AfD ones, I assume) at the top of all 4000 or so pages and giving it a bit of time? But I do think we should proceed with some kind of mass-delete. In addition to the bot problems, AlgaeBase itself has too many errors for this kind of automated process. Two I found today were: (1) typo in the species epithet for Postgaardi mariagersnsis (should be P. mariagerensis), (2) lists Calkinsia as Euglenaceae (this one is more defensible, could just be out of date, as I don't know how this genus was classified before Cavalier-Smith and other recent work. But it does limit its usefulness nonetheless). For the future, forget about mass creation of taxon articles, as all such mass creations I know of have needed a lot of cleanup afterwards and Wikipedia is already over the hump of "oh, no, there is no content, we need to seed it or no one will contribute". Kingdon (talk) 03:45, 24 June 2009 (UTC)[reply]
    • FYI, User:Polbot is able to mass-create species articles that are in IUCN database. See this bot request for approval. From what I know, there's little, if any complaints, about inaccuracies. The potential is there, but relies heavily on whether the coder makes sound and logical judgement when coding the bot. OhanaUnitedTalk page 04:09, 24 June 2009 (UTC)[reply]
      • You wouldn't be saying that if you had fixed as many Polbot miscategorisations and mislinking as I have: creation of categories for monotypic genera; creation of genus categories as subcategories of family categories that didn't exist; putting species in unrelated categories whose name happens to match the genus; linking to pages whose title is the name of a taxon, but which is not actually about that taxon.... Quadell is a good coder, and Polbot is probably the best content creation bot going around, but it still makes plenty of errors. Hesperian 04:17, 24 June 2009 (UTC)[reply]
  • Template. Create a template (with a link to the discussion page) that reads "This page was created by an automated process that is under review. If you can verify the accuracy of the information on this page, please remove this template. If this template is present, you should assume that this page has not been validated by a human." Add the template to each page in question. --Arcadian (talk) 04:01, 24 June 2009 (UTC)[reply]

Runnin anybot to correct its errors

edit

Whilst this debate rumbles on, the errors made by the unauthorised running of a bug-laden version of the bot script are still on WP. I have already coded a script to fix these errors, that will
  1. Only edit pages where Anybot is the sole contributor (not counting certain maintenance bots)
  2. Remove many glaring mistakes
  3. Not introduce any new errors (unless there are errors in Algaebase's higher taxonomy; if these are systematic, they can be fixed automatically.)
This script will also fix problems with kingdom-level classification, but will not address some of the other issues, because I'll have to write a separate script for these taxa. However, it will make the articles less misleading until their fate is sealed. It may be useful for anyone contributing to this discussion to only consider the corrected articles when forming their opinion of how unfixable the 'mess' is; I have noted several cases above where people refer to errors which only exist because of the, as it was put above, 'corrupted account'. Martin (Smith609 – Talk) 13:41, 24 June 2009 (UTC)[reply]
Have you managed to discover how the bot was accessed and run without your knowledge yet? If not, I'd be rather uncomfortable with you setting it to work again. --Kurt Shaped Box (talk) 13:55, 24 June 2009 (UTC)[reply]
I have found the outdated version of the script which must have been run, and disabled it. The script could be run by visiting the URL http://toolserver.org/~verisimilus/Bot/anybot/algae.php - visit it now to verify that this no longer works. I have also included an IP check in the new script so that only I can access it. Martin (Smith609 – Talk) 13:59, 24 June 2009 (UTC)[reply]
It comes up with an error message and tells me to contact you. That's good, I guess. :) If I were you, though, I'd wait to see what the other users who've been involved in the discussions here think about you running the bot again. Then perhaps it might be a good idea to just do ten or twenty edits, just to see how they turn out and run them past the algae guys... --Kurt Shaped Box (talk) 16:26, 24 June 2009 (UTC)[reply]
Suspending the debate pending the outcome of the suggested test edits seems reasonable to me. I am more than willing to change my !vote to Keep if the bot can salvage its own articles in a timely fashion. -RunningOnBrains(talk page) 17:22, 24 June 2009 (UTC)[reply]
I am concerned about the bot editing articles without each being individually checked by a human. A test run of 10-20 articles I don't object to. Please pick 10-20 from different times. I checked articles written by the bot in February, before and after all coding corrections mentioned on the anybot errors page, March, April, before, during, and after the April 18th/19th glitch. If I mention a specific case it is not because I only found errors in that time frame. There are errors that make the articles unusable in all time frames of operation.
I would like to see the algorithm if users here ask for the bot to be run to correct its own errors--the bottom-most coding algorithm. I think it's a bad idea to use this bot for this purpose. How about asking the other bot (mentioned above) that has created taxon articles?
Keeping these articles on wikipedia to continue being accessed, as bad as they are, is not user-friendly for wikipedia. I disagree with a suspension of the deletion while anybot is the one fixing the articles. However, as an IP I have no vote in the matter-doesn't bother me, something wikipedia gets right. --69.226.103.13 (talk) 18:07, 24 June 2009 (UTC)[reply]
Now that's where you're wrong. AfD is not a vote; it is a discussion, and constructive opinions are considered no matter what their source :-D -RunningOnBrains(talk page) 20:49, 24 June 2009 (UTC)[reply]

I'm also uncomfortable for the bot to correct its own mistakes when I have already lost my trust towards the programmer. Each page should be checked by a human to verify whether it's correct or not, not by the bot that screwed it up. OhanaUnitedTalk page 19:42, 24 June 2009 (UTC)[reply]

Just to clarify, it's not being checked by the bot that 'screwed it up', but by a different bot (which operates from the same account). Martin (Smith609 – Talk) 21:21, 24 June 2009 (UTC)[reply]
I still support deleting these articles and do not support re-approving either bot (the original one or the "fix" one). Kingdon (talk) 02:28, 25 June 2009 (UTC)[reply]
I can't find the bot approval discussion in your edit history. It's hard to understand the bot process but it appears that bots require approval and "flagging by a bureacrat." --69.226.103.13 (talk) 07:14, 25 June 2009 (UTC)[reply]

I've been trying to resolve this situation in something of a rush, as I am remarkably busy at the moment. In retrospect this was a dreadful idea; I have apparently introduced new errors; and I have not been able to keep track of all the discussions related to the bot, which seem to span about a dozen different pages. This has led to some editors feeling that I am ignoring them, for which I apologise.

May I propose a solution, which I hope will satisfy everybody?

  • Now: to delete all articles which have only been edited by Anybot (and maintenance bots such as User:Addbot)
  • To retire Anybot in its current implementation
  • If and when I get time to work on this project again (October at the earliest):
    • To discuss whether a bot is capable of creating articles automatically without introducing errors
    • If so, to re-apply for bot approval
      • Discussion about how the bot should operate, whether pages should be tagged with 'this article was created automatically, and whether I am incompetent can be held at that stage
    • To proposition community input into the bot output
    • To ensure that all errors mentioned here are fixed
    • To make the bot's source code openly available
    • To undertake more rigorous testing processes (as advised)

If it helps, I can offer to automatically tag pages for deletion; I understand that my programming credentials are under fire so would be happy to post the code before operating it, or to leave the task to others.

How does that sound? Martin (Smith609 – Talk) 18:50, 26 June 2009 (UTC)[reply]

As I've said a number of times now: This is an esoteric subject. Very few editors know enough about it to correct errors or add content, and those who do have already filtered out the articles that they have corrected. 99% of edits to the remaining articles will be drive-by cosmetic and categorisation changes. The proposal to retain articles edited by a human will only preserve error-riddled articles, and for no apparent gain. Hesperian 04:55, 27 June 2009 (UTC)[reply]

Discussion is also going on at Wikipedia_talk:Bots/Requests_for_approval#Request_for_deflagging_and_blocking_of_Anybot

Hesperian is on top of the situation with this AfD. But, since the bot owners do not appear to listen to anything when said once by only one person or said for months by 5 or 25, it bears repeating: this does nothing to address the articles created by anybot and subsequently edited by humans.
213.214.136.54 did demanding work on upper level taxonomies of over a 1000 Chromalveolata articles. These are articles where wikipedia is deficient in even the most common species, groups of organisms understood by few authorities; even within the field of phycology the understanding of these organisms, particularly the single-celled ones, is low and expertise is held by few.
All of 213.214.136.54's editing efforts, excellent work on the higher level taxonomies of the Chromalveolata, will probably have to be deleted. 213.214.136.54 repeatedly attempted to work within wikipedia's guidelines, discussing issues with the bot owner, trying to get things fixed.
But, by Martin continuing to ignore this editor's contributions, it seems to me that you, Martin, are insulting them on top of the forthcoming injury of having to delete all of their hard work.
How does wikipedia expect they can keep excellent and dedicated IP editors like this, who are willing to correct the serious gap in expertise found in many areas of wikipedia, when the editor is repeatedly ignored, their contributions devalued, and even their hours of efforts to improve wikipedia dismissed as if it does not exist?
--69.226.103.13 (talk) 06:49, 27 June 2009 (UTC)[reply]
I have to admit to feeling slightly insulted myself. My offers to improve the situation have all been scoffed at, and by (a) saying that every single article should be deleted, (b) telling me to fix every article, and (c) ignoring all of my requests for input so I can do a proper job of it, you are implying that my time has absolutely no value. At a time where I am incredibly busy at work and home and barely have time to cook myself meals, this is slightly hurtful. If you intend to insult me then go ahead, but if not then please consider your words more carefully.
You also seem not to value the time of the IP contributors who have worked hard to correct a great percentage of the created articles. To me, if you want to devalue an editor's contributions, and to dismiss their hours of efforts to improve Wikipedia as if they did not exist, the best way to do this is to delete all their edits. I have done all I can in a limited portion of time to enable these articles to be preserved, and I will be the first to admit that I haven't had enough time to do a good job of it. But, according to WT:WikiProject Plants, 213.214.136.54 spent three days fixing articles, and a sample of these edited pages found (by 69.226.103.13)to be error-free.
I think that what we need to establish is whether the rate of errors in the articles which have been edited by human users is significantly greater than the error rate in WP's scientific articles as a whole. It is possible that all are perfectly correct, and it is also possible that all still contain errors and should be deleted (although according to 69.226.103.13's check, this would seem unlikely). If we are considering throwing away three days of work from a knowledgeable and valuable contributor (213.214.136.54), not to mention several days of work from a malicious, nasty and ignorant editor (Smith609, apparently), then I would suggest that (1) we should have rather strong evidence that at least a significant proportion of the edited articles still contain major errors; and (2) that it it worth matching 213.214.136.54's time with a similar amount of time ourselves (if that means that only errant articles are deleted).
Given that 142,310 articles don't even cite their sources, and given that from personal experience somewhere around 50% of the articles in my field (geology) contain glaring errors which I have to fix by hand, I'm going to take some convincing that articles which have been reviewed by editors are any worse than the average WP article. Martin (Smith609 – Talk) 14:58, 27 June 2009 (UTC)[reply]
Per the proposal above (which appeared to be well received, and elicited no objections), I've created Template:AnybotAlgae, which automatically places articles in "Category:Anybot algae articles to be validated". I recommend that a bot be run to append this to all of Anybot's algae articles, whether re-edited or not. Then, on August 1, 2009, delete all the articles that still have the template on it. --Arcadian (talk) 16:53, 27 June 2009 (UTC)[reply]
Yes, that sounds like a very, very good idea - I do have a couple of thoughts though. Firstly, would it be appropriate to amend the template to include the term '...and may contain significant factual errors' (i.e. let's tell the plain truth to our readers!)? Also, I don't know if policy will allow this - but how's about adding {{noindex}} to every article created or significantly edited by Anybot in order to stop these from appearing at the top of Google searches? Either way, I fully support getting everything templated as soon as possible. --Kurt Shaped Box (talk) 17:13, 27 June 2009 (UTC)[reply]
Which proposal above received no objections? Not Martin's? It has objections, serious ones. It completely ignores the IP edits, which are over a 1000 bad articles. The proposal to tag the articles? I think that's a good idea, but I don't know which proposal you're discussing. Noindex sounds like it should have been done the first day this discussion arose. If templating delays deletion I don't support it. However, if it can be quickly done on the way to deletion, I'm 100% behind it. The template looks good, but it also ignores the 1000+ bad articles edited in only the upper taxonomic levels that Hesperian mentions in the link I include in this post. --69.226.103.13 (talk) 17:30, 27 June 2009 (UTC)[reply]
You seem to have missed every post I've made about articles I've reviewed. This doesn't help the situation.
I'll respond and repeat, again, for other readers of the discussion: I have not found a single algae article created by the bot and not completely rewritten by meat editors that is good enough to be left on wikipedia without a couple of hours of hand editing. The articles completely rewritten have been removed from the afd list. If someone has the expertise and over 8000 free hours for the rest they should alert readers of this afd.
213.214.136.54's articles were only corrected for the higher level taxonomies. All of them still contain the fundamental errors within the text that anybot created; I assume they do because I found no evidence to the contrary.
As stated elsewhere, I searched through every major taxon of article created by the bot, from all time frames of its operations, and on both sides of all major errors reported on its board, and found huge errors in every type of article that make the article completely worthless at best. These are not "single glaring errors" in the articles, this bears repeating as it's not being listened to by parties who should be listening, these are errors which make the articles far worse than useless, and bring wikipedia into disrepute for publishing them in the first place, anf for allowing them to continue for so long in the second place.
Please, reread on this page and in WP:Plants and on anybot's error reporting page for examples of the nature of these errors.
Please, if you're going to run a bot on wikipedia, at least read the errors the error reports. If you don't have time to read the error reports you never should have operated a bot. Not any bot or any other bot. Now you clearly state how little time you have, but, instead of devoting it to learning what was wrong, you devoted it to writing an entire new program containing ever more creative errors, then operating this bot clearly without community consent and without community input. Please read the error reports.
Go ahead and search the articles yourself and find a group of articles that can be saved. It seems straight-forward that your arguing and proposing solutions without knowledge of the nature of the errors is without benefit to the discussion. It seems from this post of yours you meant to consider the IP's edits as leaving good articles in spite of what has been said about them. The IP's work corrected only higher level taxonomies.
Also, you can't legitimately cite AlgaeBase as a source for the articles anybot created. I compared the anybot articles to AlgaeBase and did not find organisms with our types of errors.
An article that lists an organism as one kingdom at a family level, another at an order level, with text about a organism that belongs to neither of these kingdoms, and taxonomic information based on a programming error rather than a source, is most emphatically, not a glaring error, it is simply wrong. The entire article is wrong. And it's not in AlgaeBase.
--69.226.103.13 (talk) 17:27, 27 June 2009 (UTC)[reply]
First of all, I originally only ran the bot while I had time to fix its errors. Secondly, I take issue with your estimate of how long it will take to correct articles. Herpothamnion is a typical article I chose at random from the bot's articles - I don't understand how one could spend two hours correcting it. I could check it against algaebase using the link within the article in 20 seconds. Thirdly, the only information that the bot didn't take directly from Algaebase was originally the higher taxonomy; many algae are listed in Kingdom Plantae on Algaebase, and since I was unaware that bacteria were present in the database, I simply classified everything as algae. Beyond the classification, the bot read Therefore Algaebase remains the source of the information. Many of the statements you make are based on your personal assumptions and do not reflect the true situation; in the interests of allowing you to form a balanced view of what has happened, I've made the source code of the fix-script (which I ran in May, and modified to use Algaebase's higher taxonomy last week) available at http://toolserver.org/~verisimilus/Bot/anybot/fix-algae.php. You can view source or download the file to see it in human-readable format. Hope that helps, Martin (Smith609 – Talk) 20:01, 27 June 2009 (UTC)[reply]
Viewing the code now is pointless the bot won't be run. Your errors are not in the code; well, they are, but they started in the algorithm. Errors of this nature have little to do with the code. Again, feel free to post your bottom-most algorithm, for future codes, even for this one, and I will debug it.
How would you fix Herpothamnion based on AlgaeBase? It has no taxonimcally based valid species, and no description on the linked page. I think it's a synonym. --69.226.103.13 (talk) 20:41, 27 June 2009 (UTC)[reply]
Definitely should delete Herpothamnion unless a better source can be found. AlgaeBase makes it look like an old (not currently used) synonym, but it is hard to be sure. The paper doi:10.1515/BOT.2007.025 probably has a better answer, but is (I think) paywalled. Kingdon (talk) 13:30, 28 June 2009 (UTC)[reply]
As I said about you before, one of the editors competent to edit the articles. Schneider and Wynne indicate Herpothamnion is a synonym of Spermothamnion. Their article, based upon the 1950s work of Swedish phycologist and red algae expert Harald Kylin, could be used for synonymy of many red algae, and I'm certain Professor Schneider would send anyone a copy.
The Herpothamnion article is not editable with the link in the article in 20 seconds. It could be quickly edited by one of the editors I mentioned who knows enough to find the proper source, the Botanica Marina review. The information for this species is not in AlgaeBase. This is the case for many of the articles that anybot produced.
I'm not a phycologist; I am an experienced researcher. I reviewed the articles and spent a lot of time trying to find a way to save most, then any, of the articles. Encyclopedia articles require a specific type of research, less than academic writing by far, but they require proper sourcing, and this demands knowledge. Careful knowledge by an editor could save articles, sloppy 20 second editing without the proper sources will leave the articles no better than they are now.
My estimate of the time necessary to correct the articles is based on an estimate of the skills of editors capable of rewriting the articles (from the editors' edit histories), the obscurity of some of the species, and the type of proper sourcing an encyclopedia article should have. This one Botanica Marina article could be used for some species, while most articles would have obscure and less available sources, probably many genera would require research at one of the major research universities with phycology departments (Sweden, California, Australia, East Coast US, Ireland). Understanding what is necessary to repair the articles is part of the work. --69.226.103.13 (talk) 18:16, 28 June 2009 (UTC)[reply]

I agree with the suggestion above about adding a nowiki tag in spite of the additional work of tagging articles slated for deletion in a day or so.[1] I asked at the bot board for someone to do this. I'm sorry if I stepped on anyone's toes. It's so disappointing to see garbage from wikipedia. Many wikipedia articles contain errors, but I've seen few outside of hoaxes with errors this bad. --69.226.103.13 (talk) 18:12, 27 June 2009 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review). No further edits should be made to this page.