Wikipedia talk:Suspected copyright violations

This project page was nominated for deletion on 8 November 2017. The result of the discussion was mark as historical.

Archives

Archive 1

Archive 2

General advice[edit]

Occasionally check [1] for all pages using the CSB template - sometimes the bot forgets to add reports to SCV
If something is copied from a Wikipedia mirror, it may still be a problematic article. Make sure it isn't a repost of deleted content or fallout from a copy and paste move.
When in doubt, leave a note here and another editor will provide feedback
Common reasons an article gets reported, but is not a copyvio:
- Copied from a public domain source like the Dictionary of American Naval Fighting Ships or any other work of the US Federal government. However, when created this way, the source needs to be noted in the article.
- Created by WP:AFC. These requests sometimes take months to process and get mirrored, creating false positives. Often the external site reported will look like [2], or it will include a lot of other AFC articles. Also, most AFC creators will note what they're doing in the article history.
- Copy of a Wikipedia mirror. Common signs: Artifacts from Wikipedia layout like a "history" tab, "external links", "references" section and a table of contents, URL similar to Wikipedia-style, links to other Wikipedia-copied articles
Note that content released under GFDL only is no longer permissible since we switched to dual licensing on June 16, 2009 (see Wikipedia:Licensing update).

Question from CV newbie[edit]

@Moonriddengirl: or others: Today I worked on some articles from this list. Per the instructions at the top of the page I added SCV templates (such as 'cleaned' and 'deleted') next to appropriate individual article entries on the list after I determined the article had been deleted or after removed the copy vio concern from that specific article. However, when I came back later I saw that the SCV templates had been removed and the article listings which they referenced remained. What am I doing wrong?-- — Keithbob • Talk • 22:42, 6 January 2014 (UTC)[]

Nothing at all, User:Keithbob. :) When the pages get behind, it sometimes takes a while for the updates to reflect on the master. If you look at [[3]], you can see that your changes are all still there. The article deleted template is entirely optional - the redlink tells us it's gone. The "cleaned" template is especially helpful. Thank you! --Moonriddengirl ^(talk) 00:32, 7 January 2014 (UTC)[]

OK, thanks. Must be an issue with my browser as the other day my posts were gone and the articles remained. Thanks for the feedback, I'll keep working on the list now that I see I'm doing the right thing. Cheers! -- — Keithbob • Talk • 16:05, 8 January 2014 (UTC)[]

Great work![edit]

I've been keeping an eye on WP:SCV and doing little bits here and there. I looked today (even though I'm supposed to be on a wikibreak and vacation) and was glad to see that the backlog had been cleared. Awesome, heroic work! —Tom Morris (talk) 18:10, 26 January 2014 (UTC)[]

Potential change?[edit]

The backlog at WP:SCV is not that bad right now, due in part to the heroic work of a few people and due in part to the frequent breakage of the bot.

I'm wondering if there's a better way to get the community involved in responding to these. I was wondering yesterday whether it would be a good idea to propose that SCV cover articles completely with a template with instructions for next steps - something simpler than the one we use at WP:CP, which can be removed by anyone.

The template would have to acknowledge that there can be many valid reasons for text matches that do not indicate copyright issues, but that due to the identification of a text match, human review is needed to verify the article is one of them. It would acknowledge that there are sometimes false positive. It would request that anyone removing the tag place an explanation at the talk page if the text is not a copyright issue and provide directions for rewriting if it is, but can be fixed. It could also include a caution to article creators that restoring the text as it was posted is not license compliant may lead to a block of their account. It might even include a note that if the copyright issue was a well-intentioned mistake and they do not wish to repair it, leaving the template in place will take care of it, as somebody will fix or delete it.

The purpose is to help better educate people who create the copyright issues so that they can fix the issues themselves, so these notices are not removed without comment leaving copyright issues that are not noticed by new page reviewers who don't check the history (I wouldn't) and so that page reviewers themselves have better instructions on what to do to help resolve these issues.

What do you guys think? Is you like the basic idea I'd be happy to mock-up a template and take it to one of the Village Pumps for feedback.

If you have other basic ideas on what to do to help make this work function better, I'm all ears. :) The one thing I know is that my episodic outreach at WP:AN is of limited use. Not no use; we've picked up people that way. But we need more than one or two.

Some of the people I've thought of who might care ([[User: MER-C | MER-C ]]—[[User: Wizardman | Wizardman ]]—[[User: James086 | James086 ]]—[[User: Justlettersandnumbers | Justlettersandnumbers ]]—[[User: Sphilbrick | Sphilbrick ]]) --Moonriddengirl ^(talk) 13:41, 8 June 2014 (UTC)[]

I'm thinking this "pinggroup" template didn't work. :) I'd have expected somebody to say something, even if, "Um...." User:MER-C, User:Wizardman, User:James086, User:Justlettersandnumbers, User:Sphilbrick. Hey. :) --Moonriddengirl ^(talk) 12:40, 10 June 2014 (UTC)[]

There are way too many dumb false positives for this to stick. That said, I welcome blanking anything with a "Plot" or "Synopsis" section. MER-C 12:45, 10 June 2014 (UTC)[]

Ping Group fail. Now looking at underlying issue.--S Philbrick (Talk) 12:48, 10 June 2014 (UTC)[]

A couple thoughts, apologies for being all over the lot.

I haven't handled any SCV items in some time. I think the answer is simple, but boring. My watchlist got out of control, I worked on cleaning it up, and SCV may have been a casualty of the purge. While this affects only me, and you are raising institutional questions, I think I am correct that the admin dashboard entry for copyright issues is only those tagged for CSD. I glanced at the link to other backlogs, and did not see SCV on the list, so could part of the problem be out of sight out of mind

I see some SCV entries in wp:CP. Am I correct that SCV is only bot generated, and only items seven days old or more get transcluded to CP? I'll note in passing that when I glance at CP to see whether I should allocate some time there, the SCV reports are hidden, so my subconscious assessment of the backlog excludes those items. I'm not ready to propose that the decision to hide them be changed, just making an observation.

I once proposed a concept called User:Sphilbrick/Tour of Duty. It has implication beyond Copyright issues, and the proposal failed to gain traction, but I still think it has value. My hope would be that editors interested in becoming admins would be encouraged to sign up for a TOD, one of which might be spending quality time at SCV, CP or CCI. I don't want the average newbie tacking these, but someone about ready to go for an RfA might be a good candidate. (I understand that this would be a lot of heavy lifting just to deal with copyright, but I hope that if I decided to try to push for it again the copyright team will be supportive.)

As a shorter term solution, it might help to make sure that SCV shows up, either as an additional backlog, or directly in the admin dashboard.

And after all this, I haven't responded to your original suggestion. I confess that when I work on a CP item with the template, my initial reaction is oh crap, a couple extra steps needed to resolve this one. Not a strong reaction, as I accept that the template might be delivering some value, but it colors my view, and my initial reaction is not all that positive. That said, it would raise some awareness, as well as simultaneously reinforcing our commitment to the issue, so maybe it is a good idea. Sorry to be lukewarm, I wish there were better options, as I know you wish as well.--S Philbrick (Talk) 13:36, 10 June 2014 (UTC)[]

I'm squeezing this in here, because I want to answer some specific questions for User:Sphilbrick. Yes, SCV is the bot-generated arm of copyright cleanup. When it works, the bot scans all new articles and finds text matches on the internet, excluding most known mirrors. This is our best opportunity to catch copyright problems early, and it's probably an easier page to work on than WP:CP because the matter is usually pretty clearcut. There's always some "Is this paraphrase too close?" going on, and sometimes we find out that the problem is really unattributed copying from another article that was mirrored somewhere, and there are false positives. It's included on WP:CP just to make sure that the listings aren't completely ignored, with a secondary benefit that sometimes after a day is completed, a page is recreated. I have more than once found that a copyvio G12ed has been replaced and the bot didn't catch it the second time. When the system is functioning as it should, this works really well - by the time an admin or copyright clerk gets to SCV (ideally), the masses of cheerful copyright checkers have gone through the day's listings and either closed everything or left only the difficult ones, which take more time. And, yeah, there's no obvious solutions here to me. --Moonriddengirl ^(talk) 13:08, 12 June 2014 (UTC)[]

I didn't get the group ping, and was anyway away from home. It goes without saying that I'd welcome any change that might get more people involved, so I support Moonriddengirl's suggestion. My only concern is that the same people who currently remove the bot tag from articles without fixing the problem will blithely continue to do so. Questions:

Would it be possible to limit removal of the tag to, say, autopatrolled users?
Would it be possible to make pages that have been tagged by the bot show up in a different colour (orange?) at Special:NewPages?
Should the wording of the header of that page be changed to read something like "Please be sure to check all new pages for possible copyright violations ..."?
Could an additional filter (e.g., "tagged as a potential copyvio") be added to Special:NewPagesFeed?
Per MER-C above, whose sentiments I wholly share, is there any point at all in seeking a consensus that Plot, Episode summary and Synopsis sections should, like all other WP content, be supported by independent reliable sources? That would mean, at the very least, a change to WP:FICTIONPLOT.

Apologies for random and probably impracticable thoughts, Justlettersandnumbers (talk) 10:26, 12 June 2014 (UTC)[]

Nobody got the group ping; it just didn't work. Either I did something wrong, or...it just doesn't work. :D There's a lot of technical stuff in here that I don't know. So I'm going to ping a colleague of mine who knows almost everything to see if she knows about potential filters and colors and whatnot. WhatamIdoing, no pressure - I did say you know "almost" everything. :) Do you know if these kinds of things are possible? Easy? Historically never granted? --Moonriddengirl ^(talk) 12:28, 12 June 2014 (UTC)[]

Quick answers:

Would it be possible to limit removal of the tag to, say, autopatrolled users?
Yes. You'd use the same logic that handles removal of CSDs by article creators. In fact, using exactly that logic (restricting only the page creator, as the most likely source of a new suspected copyvio) might be more appropriate than restricting it to only a tiny fraction of editors.
Would it be possible to make pages that have been tagged by the bot show up in a different colour (orange?) at Special:NewPages?
Yes, it's possible, but it would require dev support, it feels like the thing that might be rejected for performance issues, and I doubt that they would do it anyway, since I believe that they'd rather that you were using Special:NewPagesFeed, instead of trying to further develop the older one. On the other hand, all right-thinking devs love Maggie and hate copyvios, so if you really believe this would be helpful, then the first step is probably to find a group of people who say that they personally would change their editing (e.g., to do more copyvio removal) in response to this change, and get them to decide exactly what they want.
Should the wording of the header of that page be changed to read something like "Please be sure to check all new pages for possible copyright violations ..."?
This is easily done, if people want to do it. Any admin can change MediaWiki:Newpages-summary.
Could an additional filter (e.g., "tagged as a potential copyvio") be added to Special:NewPagesFeed?
Yes. Steven Walling will know how to do this.
Is there any point at all in seeking a consensus that Plot, Episode summary and Synopsis sections should, like all other WP content, be supported by independent reliable sources?
IMO, and speaking as an editor who has followed WP:V, RS, and NOR for years, this discussion would be a waste of your time. WhatamIdoing (talk) 16:38, 12 June 2014 (UTC)[]

Requiring sourcing for plot summaries does not discourage copyvios -- in fact the editor who I CCIed and blocked yesterday had sourced plot summaries. It's also the wrong way to write a plot summary. You need to watch the show first. Plot summaries in reliable reviews sometimes regurgitate the show's marketing and never contain spoilers. MER-C 01:30, 16 June 2014 (UTC)[]

Thanks to all for replies. It looks as if 1, 3 and 4 might be fairly straightforward to implement (5 was always totally unrealistic, of course). Is there any consensus that any of those three is remotely desirable, or likely to be of any use at all? (as a complement, not an alternative, to Moonriddengirl's suggestion). Also, ping Steven Walling in case he has any comment. Justlettersandnumbers (talk) 09:29, 16 June 2014 (UTC)[]

Huge copy-right issue regarding dramas[edit]

I just did a brief check and found that almost every single Taiwanese drama created has copy-right issues....(2000-2010)(2011-present). Many of the korean dramas (1997 to 2014) and Japanese dramas have copy-right issues as well. SmileBlueJay97 talk 05:44, 23 August 2014 (UTC)[]

Dealing with CWW?[edit]

What, if anything, should we do when there is obvious Copying within Wikipedia but it's not clear what the source article is, in those all-too-frequent cases when the the contributor does not respond to requests to provide attribution (a single random example is Karosa LC 757)? Day after day here gets hung up on these after all the more serious problems have been dealt with. I've tagged some of these articles with {{Copying within Wikipedia}}, but I happened to look at the documentation and find that it is "highly unrecommended". Why is that, I wonder? Slightly more drastic would be to redirect the page with a note to say "please restore only with proper attribution"; might that be an acceptable approach? Pinging Moonriddengirl, Wizardman, MER-C, Sphilbrick, Crow – who did I forget? – in case any of them has any comment. Justlettersandnumbers (talk) 11:11, 11 February 2015 (UTC)[]

If the authors aren't properly attributed, then Wikipedia's use of the material is a copyright violation. If the material obviously comes from a Wikipedia article, but it isn't possible to tell which one, then there is no way to provide proper attribution. In that case, the page needs to be deleted, see WP:CSD#G12. --Stefan2 (talk) 14:33, 11 February 2015 (UTC)[]

@Justlettersandnumbers:In your specific case, I've run into this author in similar circumstances. So far, he seems to be the only substantial author of the various bus articles in question, so nominally attribution is not required per WP:NOATT. To the more general case, I've got a similar one I'm working with where the author doesn't respond to queries. In that case, I did a specific enwiki-only google search for the infringing content, and spent an hour or 2 identifying the oldest appearance of that text likely to have been the source the author used. That's not particularly definitive that this is where the author took it from, but at least ensures the original writer of those words is identified. (I then have a related problem where he continues to create articles without attribution, I may need help with that, but not to derail your topic). Crow^Caw 22:22, 11 February 2015 (UTC)[]
- If he gets some of his articles deleted per WP:CSD#G12 due to missing attribution, he might be more willing to co-operate in the future. --Stefan2 (talk) 22:27, 11 February 2015 (UTC)[]

user:Crow, I will block editors who CWW repeatedly without attribution, just as I will block editors who copy content from other websites repeatedly without attribution. I will generally put more time in the former into attempted to educate, but in the end it comes down to one thing: people who will not stop violating copyright policy need to be stopped. --Moonriddengirl ^(talk) 12:43, 15 February 2015 (UTC)[]

Thanks [[MRG. I'll give you the details later, but for now, I'm afraid my side-issue has derailed JLAN's original question, if you could opine on that so he doesn't kill me. ;) Crow^Caw 22:35, 15 February 2015 (UTC)[]

Justlettersandnumbers, if you can't figure it out, you can always blank and list at WP:CP with a note of explanation on the talk page. The content can be retained if attributed, of course, but until then - it's a copyvio. --Moonriddengirl ^(talk) 22:40, 15 February 2015 (UTC)[]

(edit conflict)Crow, there's no danger of that (something about golden eggs must be relevant here ...). I am, as often, blown away by the extent of your dedication. I simply don't have the patience or the inclination to spend that hour or two tracking down something that a collaborative editor could tell me in as many seconds; and while I agree in principle with Stefan2, I fear that G12 nominations would in many cases simply be declined by admins. So Moonriddengirl's firm position is music to my ears; if desperate, I may draw the attention of some editors to it. Meanwhile I've been toying with a talk page template for those uncooperative editors, based on {{Uw-copying}}, but adapted to cases where the source page is not known. So far I've got this:

==Copying within Wikipedia== Thank you for your contributions to Wikipedia. It appears that you copied or moved text from one or more pages into DESTINATION PAGE. While you are welcome to re-use the content of Wikipedia, here or elsewhere, our licensing does require that you provide attribution to the original contributor(s). When copying within Wikipedia, this is supplied at minimum in an edit summary at the page into which you've copied content. It is good practice, especially if copying is extensive, to also place a properly formatted {{copied}} template on the talk pages of the source and destination. Please either provide attribution in this case, or identify the source article so that someone else can do it. You can read more about the procedure and the reasons at Wikipedia:Copying within Wikipedia. Thank you. ~~~~

Now I'm wondering about adding something like "Editors who repeatedly copy content without attribution risk being blocked from editing". Thoughts? Justlettersandnumbers (talk) 23:12, 15 February 2015 (UTC)[]

I think that is probably good, and not too threatening in tone. Crow^Caw 23:35, 28 February 2015 (UTC)[]

Expand bot remit to non-article namespaces?[edit]

Pinging @Coren:, @Crow:, @Hut 8.5:, @Moonriddengirl:, @MER-C:, @Diannaa: as potential interessees. User:CorenSearchBot currently only works for article and draft namespace, but I see rather frequently copyright violating texts being posted on user pages (chiefly in user sandboxes); would an expansion of the bot's namespace scope make sense?Jo-Jo Eumerus (talk, contributions) 15:02, 20 January 2016 (UTC)[]

Not opposed to the idea but there are two things we would have to consider:

How common is it for people to copy and paste articles into user sandboxes to work on, and is it significantly more common than doing the same thing in draft space? Such copy and pastes are likely to show up as potential copyvios of Wikipedia mirrors.
Workload. We're struggling to keep on top of the bot's output at the moment, and this will significantly increase the number of reports to check. The expansion to draft space resulted in a lot more pages being reported.

Hut 8.5 18:16, 20 January 2016 (UTC)[]

Can't the bot simply be set up to whitelist Wikipedia mirrors? --Stefan2 (talk) 18:22, 20 January 2016 (UTC)[]

Yes, there is a whitelist here. However, one needs to add mirrors to it, which is not a zero workload process.Jo-Jo Eumerus (talk, contributions) 19:02, 20 January 2016 (UTC)[]

I was in the middle of an email to MRG which started off with an apology for limited copyright work (abandoned because I found the answer), and now I see this. I support the expansion, though I understand Hut's concern about workload. One caveat – I think it should be to user subpages, not including the main user page or talk pages, which may legitimately have copies of material in articles. (I think that was the intention, but want to make sure it isn't to the entire user subspace.

My rationale: yes, I am aware that many editors, especially new editors, copy and paste copyrighted material, then make changes. I think we all agree that this is not a supported practice. One argument for ~~exenting~~ extending the bot is that it might help us to nip this bad practice in the bud. A second argument is that an editor who starts with copyrighted material, then does a lot of work to create an article, but still has a close paraphrase issue, may be very discouraged to lose an article after hours of work. Catching it earlier is better. Third, even if the editor completely cures the problem, the copyrighted material may still be in the history, which is a problem and mymight not now be detected.

One other concern. My own personal practice, when creating an article about a basketball player, is to copy an existing article, not so much to use any of the words, but to remind myself of the upper structure and needed section. I grant that isn't ideal, and I tend to do the first steps off-wiki, but if others do it onwiki, on the assumption that they will cure any copyright problems before moving to mainspace, we may find we have to educate people about best practices. Could this be done on a test basis, to see if it creates other issues? We don't need to be overwhelmed with false positives.--S Philbrick (Talk) 18:33, 20 January 2016 (UTC)[]

In general I think it would be a good idea, if practicable. Random thoughts about what's been said above:

Coren's bot does generally whitelist the wikipedia.org domain, but the speed at which we are now mirrored will make it hard for the bot to determine that. Perhaps some advanced logic to check internet hits against live wikipedia articles, but I don't want to commit Coren's time.
The problem with adding mirrors to the whitelist is that some of them aim to be single repositories of knowledge, so pull content from many other places other than WP. So we can't whitelist as not all of the content is free. I do try to get the obvious ones as they pop up.
Another concern is that the bot searches are not free. WMF pays the major engines to search at the rate we do, so expanding that scope will result in a real world charge increase.
As far as the ever growing backlog (which I'm partly responsible for due to holidays, travel, and other things keeping me away) I would rather have a backlog than a copyvio slip through. At least that way we know that a page matched the listed page at the time of the bot report, so don't have to dig deeply.
Using another article as a template is surprisingly common. If they would only properly attribute the copy it would greatly speed up the resolution of SCV entry. Perhaps a way to publicize this? Maybe adding it to the "softer" bot warning: Did you copy this content from another Wikipedia article? If so, that's fine, just follow the steps at WP:CWW or something?

::*Make userspace NOINDEX by default (with easy magic-word removal of that, for those who do want indexing)? I'm not sure why that is not currently the case. It would not only limit the exposure of copyvios until we can address them, but also remove much of the motivation for promotional user pages.

Just my thoughts from time in the trenches. Volunteers welcome! Crow^Caw 22:58, 20 January 2016 (UTC)[]
If memory serves, userspace is already NOINDEXed by default, there was discussion in the Village Pump about this.Jo-Jo Eumerus (talk, contributions) 23:10, 20 January 2016 (UTC)[]

Indeed it is! Done last July it seems. I guess I'm remembering from my 2014 UP patrolling days. Ok, strike my last point then! Crow^Caw 23:14, 20 January 2016 (UTC)[]

CorenSearchBot down[edit]

Tracked in Phabricator
Task T125459

Due to the discontinuation of the Yahoo BOSS search API, CorenSearchBot has been down since April 2nd. The WMF Community Tech team is currently working with Coren to find and implement a replacement API. Ryan Kaldari (WMF) (talk) 22:31, 12 April 2016 (UTC)[]

Thank you, Ryan Kaldari (WMF), Coren! Although the break gives us some small chance of dealing with the backlog, the potential damage to the project far outweighs that benefit. It's good to know you are working on it. Thanks, Justlettersandnumbers (talk) 08:24, 13 April 2016 (UTC)[]

New tool to identify copyright violations[edit]

Editors interested in identifying and resolving copyright violations may enjoy CopyPatrol, a new tool from Community Tech, powered by User:EranBot. The tool is essentially a web interface for User:EranBot/Copyright/rc, but works much in the same way CorenSearchBot worked by combing recent changes and identifying possible copyright violations. Login with your Wikimedia account to get started. Use "Page fixed" after removing the copyright violation, or "No action needed" if there is no violation or it has already been resolved. You can provide feedback here. Hope you all find this useful! Best, MusikAnimal (WMF) (talk) 16:53, 6 July 2016 (UTC)[]

Marked historical[edit]

The MFD has concluded with marking this historical, but there are some links to clean up and the page to edit. Jo-Jo Eumerus (talk, contributions) 16:41, 17 November 2017 (UTC)[]

I am minded that all mentions should point to Wikipedia:Copyright problems now. Jo-Jo Eumerus (talk, contributions) 16:45, 17 November 2017 (UTC)[]

anhducwiki7

Wednesday, October 27, 2021