MediaWiki talk:Spam-blacklist - Wikipedia


3 people in discussion

Article Images

    Mediawiki:Spam-blacklist is meant to be used by the spam blacklist extension. Unlike the meta spam blacklist, this blacklist affects pages on the English Wikipedia only. Any administrator may edit the spam blacklist. See Wikipedia:Spam blacklist for more information about the spam blacklist.


    Instructions for editors

    There are 4 sections for posting comments below. Please make comments in the appropriate section. These links take you to the appropriate section:

    1. Proposed additions
    2. Proposed removals
    3. Troubleshooting and problems
    4. Discussion

    Each section has a message box with instructions. In addition, please sign your posts with ~~~~ after your comment.

    Completed requests are archived. Additions and removals are logged, reasons for blacklisting can be found there.

    Addition of the templates {{Link summary}} (for domains), {{IP summary}} (for IP editors) and {{User summary}} (for users with account) results in the COIBot reports to be refreshed. See User:COIBot for more information on the reports.


    Instructions for admins

    Any admin unfamiliar with this page should probably read this first, thanks.
    If in doubt, please leave a request and a spam-knowledgeable admin will follow-up.

    Please consider using Special:BlockedExternalDomains instead, powered by the AbuseFilter extension. This is faster and more easily searchable, though only supports whole domains and not whitelisting.

    1. Does the site have any validity to the project?
    2. Have links been placed after warnings/blocks? Have other methods of control been exhausted? Would referring this to our anti-spam bot, XLinkBot be a more appropriate step? Is there a WikiProject Spam report? If so, a permanent link would be helpful.
    3. Please ensure all links have been removed from articles and discussion pages before blacklisting. (They do not have to be removed from user or user talk pages.)
    4. Make the entry at the bottom of the list (before the last line). Please do not do this unless you are familiar with regular expressions — the disruption that can be caused is substantial.
    5. Close the request entry on here using either {{done}} or {{not done}} as appropriate. The request should be left open for a week maybe as there will often be further related sites or an appeal in that time.
    6. Log the entry. Warning: if you do not log any entry you make on the blacklist, it may well be removed if someone appeals and no valid reasons can be found. To log the entry, you will need this number – 498124607 after you have closed the request. See here for more info on logging.

    Proposed additions

     

    Instructions for proposed additions

    1. Please add new entries to the bottom of this section.
    2. Please only use the basic URL – example.com , not https://www.example.com.
    3. Consider informing editors whose actions are discussed here.
    4. Please use the following templates:
    {{IP summary}} – to report anonymous editors suspected of spamming:
    {{IP summary|127.0.0.1}} -- do not use "subst:" with this template
    {{User summary}} – to report registered users suspected of spamming:
    {{User summary|Jimbo Wales}} -- do not use "subst:" with this template
    {{Link summary}} – to report spam domains:
    {{Link summary|example.com}} -- do not use "subst:" with this template
    Do not include the "http[s]://www." portion of a URL inside this template, nor anything behind the domain name. Including this template will give tools to investigate the domain, and will result in COIBot refreshing the link-report. ('COIBot')
    {{BLRequestRegex}} - to suggest more complex regex filters beyond basic domain URLs
    {{BLRequestLink}} - to suggest specific links to be blacklisted

    Please provide diffs ( e.g. [[Special:Diff/99999999]] ) to show that there has been spamming!
    Completed requests should be marked with {{done}}, {{not done}}, or another appropriate indicator, and then archived.
    MER-C 13:00, 3 June 2012 (UTC)Reply
      Added--Hu12 (talk) 00:19, 5 June 2012 (UTC)Reply
    Limited in scope... I'm inclined to wait. Level 3 spam warning given for now. thanks for the report.  Not done--Hu12 (talk) 00:31, 5 June 2012 (UTC)Reply
    Spammers

    See WikiProject Spam report MER-C 09:24, 6 June 2012 (UTC)Reply

      Done--Hu12 (talk) 01:58, 7 June 2012 (UTC)Reply
    Anyone? --Tito Dutta 21:34, 13 June 2012 (UTC)Reply
    Be patient. We're all volunteers here, and we all have real lives to attend to. Thanks for reporting it, and don't worry, this report isn't going to go away until a decision is made. When you say "we have removed a few", who is "we"? ~Amatulić (talk) 22:24, 13 June 2012 (UTC)Reply
    did not know this page has backlog, okay will wait! We're all volunteers here,– applicable for me too--Tito Dutta 02:48, 14 June 2012 (UTC)Reply
    Accounts
    Rashalna (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    Auk.sl28 (talk · contribs · deleted contribs · blacklist hits · AbuseLog · what links to user page · count · COIBot · Spamcheck · user page logs · x-wiki · status · Edit filter search · Google · StopForumSpam)
    117.18.231.30 (talk • contribs • deleted contribs • blacklist hits • AbuseLog • what links to user page • COIBot • Spamcheck • count • block log • x-wiki • Edit filter search • WHOIS • RDNS • tracert • robtex.com • StopForumSpam • Google • AboutUs • Project HoneyPot)
    I cleaned up the rest. the IP adding this Moved the link "up, and is never a sign of good faith. Is there evidence of copyright permission or fair-use disclaimers per WP:COPYRIGHT for these book download? --Hu12 (talk) 14:03, 15 June 2012 (UTC)Reply
    Some clean ups are still remaining. see here, I can help to clean up! Yes, the site allows to download latest copyrighted books. They just scan and upload. The major issue is not they are adding the links as reference, but they are adding these links in external links section as "Download book of this author here" --Tito Dutta 14:17, 15 June 2012 (UTC)Reply
    Those are the cached search results...Here are the actual hyperlink search for rashal.com, cleanup is complete (unless someone re-inserts them). While simply stating "Download book of this author here", is not a valid reason to blacklist, it is a reason to remove as its clearly promotional. There is some evidence of abuse (see accounts above),but it was some time ago and not reason right now to add the link. If the site is carrying work in violation of the creator's copyright, then that may be sufficient for adding it. In this case there are a couple of concerns first the site is not a publisher but an individual, secondly, the site solicits on its homepage for anyone can send download links to copyrighted books;
    " We always try to help you by adding new books... We do know you have some collection of bangla books, we have not.So friends we need your help... give us mediafire.com download links."
    There is no evidence of copyright permission or fair-use disclaimers so per WP:COPYRIGHT this is   Added.--Hu12 (talk) 16:27, 15 June 2012 (UTC)Reply

    Spammy sites offering free gift codes for Minecraft in exchange for completing various sketchy ad-related activities. Most, if not all, sites that purport to offer codes like this are either useless, or worse, will actively steal your information. Even if these particular sites are legit, they're of absolutely no encyclopedic value to the project whatsoever.

    These links have been added numerous times by various IPs to Talk:Minecraft, including:

    elektrikSHOOS (talk) 01:15, 10 June 2012 (UTC)Reply

    Seems the talk page spam is text, not hyperlinks, therefore blacklisting would not solve this.   Not done. Revert as it occurs for now. --Hu12 (talk) 02:40, 14 June 2012 (UTC)Reply

    A fansite used mainly in articles related to the Tales (series). Users seem to have the idea to think its a site that should be used. Quick search also finds copyrighted audio. Also, I'm not sure if I'm supposed to be reporting these kind of sites so a comment about this would be informative. DragonZero (Talk · Contribs) 03:48, 11 June 2012 (UTC)Reply

    I'm not seeing evidence of abuse here. Only account I found that appears like a spam account is A745 (talk · contribs), from 2007. If there is copyvio issues, that needs to be shown. For now i think a better area for discussing this is to   Defer to WPSPAM--Hu12 (talk) 02:58, 14 June 2012 (UTC)Reply

    adscendmedia.com: Linksearch en - meta - de - fr - simple - wikt:en - wikt:frMER-C Cross-wiki • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advancedCOIBot-Local - COIBot-XWiki - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.org • Live link: http://www.adscendmedia.com - Advertising website (Wikipedia is not a soapbox or means of promotion). Please see this website's report for WikiProject Spam. --Captaincollect1970 (talk) 18:26, 16 June 2012 (UTC)Reply

    It appears there was only one IP back in November adding this;
    84.72.79.163 (talk • contribs • deleted contribs • blacklist hits • AbuseLog • what links to user page • COIBot • Spamcheck • count • block log • x-wiki • Edit filter search • WHOIS • RDNS • tracert • robtex.com • StopForumSpam • Google • AboutUs • Project HoneyPot)
    Not enough evidence of current and/or ongoing abuse.   Not done--Hu12 (talk) 01:26, 17 June 2012 (UTC)Reply

    Completed Proposed additions

    Proposed removals

     Use this section to request that a URL be unlisted. Please add new entries to the bottom of this section.

    Requests from site owners or anyone with a conflict of interest will be declined. Otherwise, follow these steps to post a properly-formatted request:

    • Familiarize yourself with the reasons why a site was blacklisted. Look at MediaWiki_talk:Spam-blacklist/log to see who blacklisted the link and when, and the reason given for blacklisting.
    • At the beginning of your request, include the domain in a {{link summary}} template (remove the http:// and www from the domain). This provides tools to find more information on the domain. For example, * {{Link summary|example.com}} results in:
    • When previewing your post with an included {{link summary}}, you will find links to a COIBot-report ('COIBot'), linksearches on en ('Linksearch en'), and tracked discussions ('tracked' and 'advanced'). If the log did not provide sufficient information on why a link was blacklisted, these links often yield more information.
    • Explain how the link can be useful on Wikipedia.
    • Explain your reasoning why the blacklisting is not necessary anymore.
      • Note that the bar for blacklisting is whether a site was spammed to Wikipedia, or otherwise abused, not whether the content of the site is 'spammy' or unreliable. Please indicate why you expect that that abuse has stopped.

    Providing this information often helps in a faster handling of the request.

    Once you have added your request, please check back here from time to time to get the outcome or to answer any additional questions. We will not email you or otherwise notify you about your request, and if no answer is received to a question, the request will be considered abandoned.

    Administrators: Completed requests should be marked with {{done}}, {{not done}}, or another appropriate indicator, then archived.

    mixcloud.com: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com

    The ban relates only to spamming by people connected with the site several years ago: http://en.wikipedia.org/w/index.php?title=Wikipedia_talk:WikiProject_Spam&oldid=317486670#MIXCLOUD_LTD_Spam. I think that the site may now be notable enough to warrant an article on Wikipedia, and some of the mix pages could be valuable as references/external links. memphisto 11:18, 16 May 2012 (UTC)Reply

    I think this makes more sense as a whitelisting request if an article about that company passes WP:WEB or WP:GNG. I don't see how it would be useful for references in general. OhNoitsJamie Talk 18:05, 16 May 2012 (UTC)Reply
    In terms of popularity the Alexa rank for mixcloud.com is currently 4,574 - http://www.alexa.com/siteinfo/mixcloud.com. I also mentioned that it could be valuable as a Wikipedia reference/external link, as it would enable you to reference if a notable DJ had played a particular song or even link to a certain mix if it contributed to an article. memphisto 10:57, 31 May 2012 (UTC)Reply

    I am trying to add a link to the book (Love in the Holy Quran) on this page: http://en.wikipedia.org/wiki/Prince_Ghazi_bin_Muhammad The site is a reference tool containing many reference books in both English and Arabic. Could the site please be added to the whitelist. The site is run on two domains altafsir.com and altafsir.org.— Preceding unsigned comment added by Shart000 (talkcontribs) 11:06, 17 May 2012‎

      Additional information needed What link, specifically, are you interested in using?--Hu12 (talk) 19:43, 17 May 2012 (UTC)Reply

    There are several places on Wikipedia.org that used to link to the site, here is a list of some of the types of links.

    1. The author of "Love in the Holy Quran" lists details of the book here: http:// main.altafsir.com/LoveInQuranIntroEn.asp in English and here: http:// main.altafsir.com/LoveInQuranIntro.asp in Arabic
    2. The site is a reference tool in both Arabic and English, the site owners have spent several million $US transcribing manuscripts of old Arabic Quranic exegesis into digital form. They have several translations. Users were linking to specific pages on altafsir.com as references. Each of the works was authenticated by scholars (most of whom are professors in Universities around the world).

    We would like users to still have the option of using the site as a reference. (There are currently over 100 works transcribed on the site, each work is about 20 volumes.)

    I have noticed that they have 4 domain mirrors to the site, this was a mistake from the site admin. I have asked them to remove the mirrors and to forward the extra domains instead. They are currently working on this. — Preceding unsigned comment added by Shart000 (talkcontribs) 05:44, 28 May 2012 (UTC)Reply

    Thank you. (I tried to add the link that we want but the spam filter is still in effect so we were unable to add it.) — Preceding unsigned comment added by Shart000 (talkcontribs) 05:35, 28 May 2012 (UTC)Reply

    You refer to "we" in your comment above. Who is this "we"? Please know that we do not remove sites from the blacklist at the request of the site owner or anyone associated with the site. It seems that you were trying to add a link to altafsir.com in spite of our guideline Wikipedia:Conflict of interest. We can't permit that. If a trusted, high volume editor feels that the material of your site is worthy of referencing, we would consider a request from such an editor. ~Amatulić (talk) 18:54, 30 May 2012 (UTC)Reply

    Thank you for your response. Why was the site added to the blacklist? Who can we contact about removing the site from the Blacklist? We are a think tank based in Amman, Jordan. We are not directly associated with altafsir.com, but we do help them resolve small issues like this occasionally. — Preceding unsigned comment added by Shart000 (talkcontribs) 05:42, 4 June 2012 (UTC)Reply

      Declined. As I stated above, we don't consider requests for removal from parties with a conflict of interest. If a trusted, high-volume editor determines that links on altafsir.com are useful as references on Wikipedia instead of non-blacklisted alternatives, we would consider requests from such an editor.
    The reasons for blacklisting are given in the links at the top of this section, as well as here. Apparently altafsir.com is also blacklisted globally due to rather massive spamming; see m:User:COIBot/XWiki/altafsir.com for the evidence that led to it. Even if altafsir.com was removed from the local blacklist here, it would still be blacklisted globally with little chance of removal. You can   Defer to Global blacklist to pursue it further but it is likely you will get a similar response. ~Amatulić (talk) 18:24, 6 June 2012 (UTC)Reply

    nepa.com.np: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com

    This site was blocked in March 2008 and no reason has been given. I would like to request that it be delisted. Karrattul (talk) 18:38, 30 May 2012 (UTC)Reply

      Defer to Whitelist to unblock specific pages on that site. I see no reason to de-list it entirely. The reason for blocking is here. ~Amatulić (talk) 18:47, 30 May 2012 (UTC)Reply
    Thanks, the site is intended to be used as a reference for this article http://en.wikipedia.org/wiki/Wikipedia_talk:Articles_for_creation/Sitala_Maju_(song) Karrattul (talk) 17:54, 2 June 2012 (UTC)Reply
    Again,   Defer to Whitelist once the article is accepted into main article space. ~Amatulić (talk) 21:27, 8 June 2012 (UTC)Reply

    mokimobility.com: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com

    MokiMobility is a new mobile device management provider. Prior to a page being created several links were added to relevant articles linking directly to the home page, both by external users and company users. This action was viewed as spamming and the url got blacklisted. A new article has been created for the company and added to relevant articles under vendor sections, but because of the blacklist a link to the homepage cannot be included on the page summarizing the company. A link to the homepage on a company listing is a valid and useful feature of the listing. --Bradem1976 (talk) 17:50, 1 June 2012 (UTC)--Bradem1976 (talk) 17:50, 1 June 2012 (UTC)Reply

      Declined pending the outcome of the current proposal to speedy-delete the article.
    Provided the article is kept,   Defer to Whitelist. There is no need to remove mokimobility.com from the blacklist. If you want just one link in the MokiMobility article, then www.mokimobility.com/about/ is the best one to request at the whitelisting page. A link to the home page won't work in this case because www.mokimobility.com is the actual domain and not just a page and therefore can't be whitelisted. Furthermore www.mokimobility.com/index.html doesn't exist and www.mokimobility.com/index.php redirects to www.mokimobility.com, providing an avenue for further link spamming. ~Amatulić (talk) 18:13, 1 June 2012 (UTC)Reply

    goo.gl: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com Google's in-house link shortner that is used only for links to google searches or products, which has worked for months but is suddenly blocked. It does not appear in the log. There was no discussion of this block. This shortening service, which abides by our rules since non-Google domains can't be entered, helps reduce overly-clunky and massive POST url's, and should not be blocked without some community consultation. - ʄɭoʏɗiaɲ τ ¢ 01:09, 8 June 2012 (UTC)Reply

    It appears to not just be for "in-google" domains. Go there and you can create them for everywhere; the search on wp:en shows one "in-google" and one "out-google". tedder (talk) 01:16, 8 June 2012 (UTC)Reply
    BTW it's on the mediawiki blacklist, not ours. tedder (talk) 01:17, 8 June 2012 (UTC)Reply
    Also, it seems to have been blocked at Meta since December 2009. So I'm not sure how it would have worked "for months". Anomie 01:42, 8 June 2012 (UTC)Reply
    I've been using it for Google maps links since they introduced it, and today is the first time it has been blocked. The search tool is broken I believe Tedder; I've got them in at least 40 articles myself. - ʄɭoʏɗiaɲ τ ¢ 10:31, 8 June 2012 (UTC)Reply
    Do you have an example of someplace you used it that worked and is not found by the search tool? Anomie 10:38, 8 June 2012 (UTC)Reply
    Looks like I've made a mistake... I was using g.co, which google maps seems to have suddenly stopped using. <blatant sarcasm ahead> It's a good thing Google is easy to contact about technical issues. - ʄɭoʏɗiaɲ τ ¢ 12:42, 8 June 2012 (UTC)Reply
    Alright, after some investigation, it appears short URLs generated with Google Maps will always use goo.gl/maps/. Is it possible to whitelist those whilst still blocking other goo.gl links? - ʄɭoʏɗiaɲ τ ¢ 21:14, 8 June 2012 (UTC)Reply
    Sure, that's possible.   Defer to Whitelist for such requests. Although you may have to convince the admins there that there is a need to include a URL shortener in the white list when it's perfectly reasonable to include full URLs in articles. ~Amatulić (talk) 21:25, 8 June 2012 (UTC)Reply
    Done. Hopefully that logic isn't applied though - It's perfectly reasonable to write in technical English that nobody outside of a scientific field could comprehend, but that's counterintuitive to en encyclopedia that anyone can edit... Same with a 5 line URL in an edit window; it obfuscates the surrounding content with gibberish. However, I fully understand the abuse potential that is the reasoning behind blocking URL shorteners in the first place. - ʄɭoʏɗiaɲ τ ¢ 16:05, 9 June 2012 (UTC)Reply

    engineers-excel.com: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com Provides useful (and free) spreadsheet tools for Engineers, lot of content is based on Wikipedia— Preceding unsigned comment added by 124.149.88.232 (talkcontribs) 14:20, 8 June 2012‎

      Declined. No rationale has been given on how this would be a useful reference in any article. Furthermore, sites with content "based on Wikipedia" are generally inappropriate to include as links in Wikipedia, because such references are essentially circular. ~Amatulić (talk) 14:22, 8 June 2012 (UTC)Reply

    This is the homepage of the Calculate Linux project, and the link is very useful to its article (namely in the External Links section). The article currently has a link to calculate-linux.com, but that domain no longer works.

    It seems the link was added to the list automatically. The log of it can be found at https://meta.wikimedia.org/wiki/User:COIBot/XWiki/calculate-linux.org. I also find automatic spam detection in itself going a bit far. It's called a blacklist so we shouldn't have to whitelist links that some bot detected as malicious based on some algorithm. — Preceding unsigned comment added by 81.71.110.7 (talk) 14:11, 9 June 2012 (UTC)Reply

    It wasn't added to the blacklist by the bot. The bot flagged it for human attention because of suspicious activity, and then a human decided to add it. At any rate, you're in the wrong place.   Defer to Global blacklist to ask for it to be removed, or   Defer to Whitelist to ask for one specific page whitelisted locally. Anomie 19:46, 9 June 2012 (UTC)Reply

    Completed Proposed removals

    Troubleshooting and problems

     This section is to report problems with the blacklist. Old entries are archived

    Logging / COIBot Instr

    Blacklist logging

    Full instructions for admins

    Quick reference

    For Spam reports or requests originating from this page, use template {{/request|0#section_name}}

    • {{/request|213416274#Section_name}}
    • Insert the oldid 213416274 a hash "#" and the Section_name (Underscoring_spaces_where_applicable):
    • Use within the entry log here.

    For Spam reports or requests originating from Wikipedia_talk:WikiProject_Spam use template {{WPSPAM|0#section_name}}

    • {{WPSPAM|182725895#Section_name}}
    • Insert the oldid 182725895 a hash "#" and the Section_name (Underscoring_spaces_where_applicable):
    • Use within the entry log here.
    Note: If you do not log your entries, it may be removed if someone appeals the entry and no valid reasons can be found.

    Addition to the COIBot reports

    The lower list in the COIBot reports now have after each link four numbers between brackets (e.g. "www.example.com (0, 0, 0, 0)"):

    1. first number, how many links did this user add (is the same after each link)
    2. second number, how many times did this link get added to wikipedia (for as far as the linkwatcher database goes back)
    3. third number, how many times did this user add this link
    4. fourth number, to how many different wikipedia did this user add this link.

    If the third number or the fourth number are high with respect to the first or the second, then that means that the user has at least a preference for using that link. Be careful with other statistics from these numbers (e.g. good user who adds a lot of links). If there are more statistics that would be useful, please notify me, and I will have a look if I can get the info out of the database and report it. This data is available in real-time on IRC.

    Poking COIBot

    When adding {{LinkSummary}}, {{UserSummary}} and/or {{IPSummary}} templates to WT:WPSPAM, WT:SBL, WT:SWL and User:COIBot/Poke (the latter for privileged editors) COIBot will generate linkreports for the domains, and userreports for users and IPs.

    Discussion

     This section is for other discussions involving the blacklist. Old entries are archived

    There's a question at RSN about a possible malware site. Could someone take a look at Wikipedia:Reliable_sources/Noticeboard#Please_check_the_source? WhatamIdoing (talk) 06:01, 12 February 2011 (UTC)Reply

    Ran the url through a few malware/threat detectors, seems its ok.
    Here are a few scanner tools that could be usefull.
    --Hu12 (talk) 19:53, 12 February 2011 (UTC)Reply
     This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request.

    As you may can see, the spam blacklist is extremely long and growing towards a size which is hard to overview by a human and which probably takes some time to apply (which happens on every edit). Due to that I wrote a script which takes the "easy" regular expressions and parses them back into domain names. Only those that either start with a \b or \. will be taken into account, cause otherwise the exact domain name which is blacklisted can't be extracted (as eg. foo\.com will match barfoo.com). Of course it only takes clear cases into account (the domain names can only contain 0-9, a-z or -, while the TLDs mustn't contain anything except of letters), furthermore all dots must be escaped. After it extracted those domain names, it checks with nslookup whether they still exist, if not, they will be removed from the spam blacklist (they will only be removed if nslookup returns NXDOMAIN, serv fails etc. are ignored). I've already did that with the global spam blacklist on meta twice (1, 2) and there haven't been any problems and none of the removed domains has been re added since.

    So now I ran my script for the English Wikipedia, the new spam blacklist can be found here and the removed lines here. It would be great, if an administrator could apply the new list or I can do it myself, if there's consensus to do so. Feel free to per hand verify some of the removed lines using your systems nslookup, just your browser or the various whois sites out there - Hoo man (talk) 17:38, 6 June 2012 (UTC)Reply

    I'm all for removing dead or outdated entries, especially since there is successful precedent already on meta. We'd just have to make sure the diff is reflected in the logfile. ~Amatulić (talk) 18:11, 6 June 2012 (UTC)Reply
    That's of course easily possible, as you seem to use the same format for logs as meta does. See the meta log entry for the first removal - Hoo man (talk) 18:17, 6 June 2012 (UTC)Reply
      Done. I admit that I have no idea how this page works, but I trust you when you say it will work correctly ;) — Martin (MSGJ · talk) 10:13, 8 June 2012 (UTC)Reply

    Thanks! Please notice this edit made by me :) I'll log the change in a second - Hoo man (talk) 11:54, 8 June 2012 (UTC)Reply

    Ok, I've just noticed, that you use a different log style from what we use on meta (although you link to the meta help page). I've tried to do the "If you remove something from the blacklist, simply remove the relevant entry here." using a small shell script, but that doesn't work well either (especially cause cleaning out only the lines which have been removed results in a lot of left over trash). Any ideas? - Hoo man (talk) 12:25, 8 June 2012 (UTC)Reply

    As there wasn't any reply, I used my list for logging now (diff), it seems like that it didn't create much waste, feel free to revert, if you got a better idea or consider logging the removals, as we do it on meta a better idea - Hoo man (talk) 13:46, 8 June 2012 (UTC)Reply

    The diff looks fine to me.
    We also have a few entries that are redundant because they are already listed on meta. It would be nice to have a tool that locates matching entries between the two lists. ~Amatulić (talk) 14:28, 8 June 2012 (UTC)Reply

    That's easily doable: double lines. I did that with the following short bash script:

    #!/bin/bash
    wget 'http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw' -O /dev/stdout -o /dev/null | sort > meta_spam.txt
    wget 'http://en.wikipedia.org/w/index.php?title=MediaWiki:Spam-blacklist&action=raw' -O /dev/stdout -o /dev/null | sort > en_spam.txt
    comm -12 meta_spam.txt en_spam.txt | grep -vP '^[ #]+'
    rm meta_spam.txt en_spam.txt
    

    - Hoo man (talk) 15:15, 8 June 2012 (UTC)Reply