Wikipedia:Wikipedia Signpost/2020-11-01/Op-Ed
Anti-vandalism with masked IPs: the steps forward
- Johan Jönsson works for the WMF with the technical development of the Wikimedia wikis and Community Relations.
The Wikimedia wikis can be edited by registered and unregistered users alike. When someone isn’t logged in to an account, instead of their user name, the history – and the recent changes feed, your watchlist and so on – will show their IP address. This is mainly for attribution: when you write on the Wikimedia wikis, the copyright still belongs to you. You just give permission for the text to be spread and changed. So we need to attribute authorship to someone: a name, a pseudonym or at least an IP address. But knowing the IP behind an edit is also a tool we use to fight the edits we don’t want to see: vandalism and harassment, spam, and those that push a specific point of view at the cost of neutrality.
Roughly a year ago, a team within the Wikimedia Foundation’s Product department started a process on IP masking – hiding the IP addresses we today show in public. Our goal was roughly to try to address all the problems we knew it was going to bring, and hopefully be able to do it with no more work for vandal fighters than before we started. Recently the Wikimedia Foundation’s Legal department clarified their guidance: for legal reasons – which they can’t explain in detail due to legal privilege, the legal professional rules that control what lawyers can say about their work – this is something we have to do. We’re flexible on the how and the when, but not on the if. Thus that’s the reality we must deal with and the situation we are publicizing to the communities, as soon as we can.
There are other reasons for bringing up the subject, of course. The longer I work on the project, the stranger I personally find it that we publicly publish IPs – which I used to find completely natural, not least since I mainly contributed without being logged in for years in the earlier days of Wikipedia – of people who are trying to help make the wiki better. As a movement, we’ve had occasional debates on whether publishing the IPs really is what we should be doing for about as long as we’ve been doing it. But these are reasons for starting a conversation. Our legal experts telling us that this is something that has to be done is reason to do it.
I think one main communications issue is that we’ve tried to let the Wikimedia contributors in as early as possible and it’s not apparent to everyone where we are in the process. OK, we say, so we have to do this: Please let us know your fears and issues and everything you want us to take into account. This is something we need to solve with the wikis and vandal fighters, so that we can mitigate as much as possible. We try to ask questions as early as possible instead of doing internal planning based on our assumptions. The Wikimedia wikis have very different cultures and needs. They don’t see the same patterns around problems like undisclosed paid editing, harassment and returning vandals. The fact that I’m intimately familiar with this work on one wiki doesn’t mean there aren’t many things we need to learn from the communities, and no single wiki is a good model for all. What works for you or me will not work everywhere else.
We try to take the conversation that normally happens in Phabricator – open, but not easily accessible for most Wikimedians – and put it on the wiki. This means that we’re a couple of steps earlier in the process than people expect us to be. Some see that we plan to mask IPs, try to figure out how this is going to work and come away with the impression oh no, they have no idea what they’re doing. They have no plan. We do have a plan. It’s just that collecting information from the communities before we plan solutions is part of it. There’s time to work this out together. We’re not throwing the switch next week. Whether we know what we’re doing remains to be seen, of course, and I’m not the one to judge.
How do we plan to mitigate problems? Partly by giving more people access to the information that we’ll now be hiding from the public. We’ve been toying with the idea of a system with three tiers. First, we’d either build a new user right or maybe even just make access to the information opt-in, as long as the user meets certain criteria. Second, others could have access to part of the IP, to be able to see which range it belongs to. The threshold for access to the first user right would be lower than adminship on many wikis, since access still needs to be provided to admins on Wikimedia wikis with less stringent criteria, such as five or so users saying sure, why not, this new person seems serious and sincere. Third, the public and those with no interest in the tasks where this information is relevant would see a masked IP. Those who are involved in cross-wiki vandal fighting would need global access. We don’t intend to break the system by putting this on the checkusers and stewards. The details need to be hashed out with the communities.
Partly we’re aiming to solve it by building new tools. We’re trying to make the checkusers’ and stewards’ lives easier by updating the checkuser tool and working on a tool to find potential undetected sockpuppets. We’re working on surfacing the information about what the IP address means in a way that’ll be accessible to more vandal fighters than used to be the case. We want to hear more needs and suggestions.
So we talk to people. In various places and languages, to figure out how it would affect them. It varies: a significant number of English Wikipedia vandal fighters have expressed concern on Meta, while Swedish Wikipedia hasn't, when explicitly asked. The Arabic Wikipedia discussion did not raise the same problems as the Chinese one.
Why do IP masking at all, some ask. Why not disable IP editing instead? We’re investing significant time and resources in trying to solve this because we’re convinced that turning off unregistered editing would severely harm the wikis. Benjamin Mako Hill has collected research on the subject. Another researcher told us that if we turn IP editing off, we’ll doomed the wikis to a slow death: not because the content added by the IP edits, but because of the increased threshold to start editing. We can’t do it without harming long-term recruitment. The role unregistered editing plays also varies a lot from wiki to wiki. Compare English and Japanese Wikipedia, for example. The latter wiki has a far higher percentage of IP edits, yet the revert rate for IP edits is a third of what it is on English Wikipedia: 9.5% compared to 27.4%, defined as reverted within 48 hours. And some smaller wikis might suffer greatly even in the shorter term.
And that’s the heart of the problem: There is no available strategy without risk. Legal risk. Risk of vandalism. Risk of hurting long-term editor recruitment. So we hope to be able to work together, listen to suggestions and problems, and build around potential obstacles and mitigate concerns. Give the communities the tools they need.
Discuss this story
How might this work with the current problem of IP leaking the identity of logged in users who are blocked to other users on the same IP? All the best: Rich Farmbrough 19:58, 1 November 2020 (UTC).[reply]
So where is the substantial improvement in anti-abuse tools you promised when you announced this unwanted project? Oh wait, you haven't deployed anything. MER-C 20:00, 1 November 2020 (UTC)[reply]
I normally support WMF decisions, but a lack of transparency on why this must take place, and insisting "not if, but how" it will take place, is reminiscent of the so heavily opposed renaming efforts, also forced upon the community as something that must happen in some form or another. ɱ (talk) 20:13, 1 November 2020 (UTC)[reply]
"we publicly publish IPs [...] of people"
protected group. They are just contributors like everyone else, who have made a different choice after hitting the edit button - namely to have their contribution attributed to their IP address rather than an (easily created) account.
- this framing, which seems to have been the main motivation for initiating the entire effort before the sudden recent discovery of the legal requirements, is questionable to say the least. It casts these "people" as helpless victims whose IP address is forcibly exposed by the decision of others. But "IP editors" are not an immutableNow, I agree that an editor's IP address can be very sensitive (I have long advocated this view myself, e.g. as a main author of the German Wikipedia's checkuser guidelines, which are more restrictive than those of many other projects out of such concerns). But the reality is that many editors rationally decide that this is not the case for them personally.
Also unacknowledged in the rhetoric about this project is that contributing under IP can often even be the more privacy-preserving choice: The information that can be derived from a dynamic IP is frequently much less revealing than what can be concluded from a logged-in user's aggregate edits (I compiled a few examples in this Wikimania talk a good while ago).
Regards, HaeB (talk) 20:16, 1 November 2020 (UTC)[reply]
"due to legal privilege"
Question: @Johan (WMF): I understand that there may be reasons to keep things private, but this is a very peculiar assertion. If this is a case of legal privilege, who are the parties? Surely the WMF is the client? Mo Billings (talk) 23:58, 1 November 2020 (UTC)[reply]
Question: I also am not reassured that a magical tool will be sufficient to track long-term abuse. Will that 'wand' allow us to distinguish the following known pattern of disparate IP usage? 'Griefer451' has access to computers at home, at work and sometimes at the library. They have a 'fairly' distinct style allied with a grievous resentment towards WP, resulting in both numerous defacements at intervals together with an impression that this 'editor' is somehow familiar even though a number of IPs are used at dissimilar periods of day and also migrating over weeks. How are we ever to shut down this vandal? If we can't notice that the IPs related by vandalism are clustered? That even when (home) IPs change they are actually from the same pool? This is not theoretical, but actual long-term patterns.
Further, how are we to ever notice school kiddy vandalism? Will there be a magic flag added to the tokenized identity that says this is a middle school educational pool so we can apply the dunce cap?
The legal team say they have determined an unassailable legal stance for WP? Have they determined whether it is workable? I would challenge the WMF thusly. Have every member of the legal team spend one or two hours a day following IP edits around WP, fixing the obvious vandalisms and reverting the graffitos, for at least a month. Oh, and track back in time _all_ the edits those IPs have left lying around for months. First, the lawyers will *love* the billables. Second, WMF will gain a new respect for the amount of time that IP inadvertencies soaks up, while rueing the cost of reality-based research. I feel that legal opinions are not information sufficient to proceed, but must be reconciled with our day-to-day realities. Moreover, I feel, anyone not having spent hours and hours fixing IP vandalism is not qualified to appreciate the difficulties already existing. Don't make it impossible. Shenme (talk) 04:49, 2 November 2020 (UTC)[reply]
EU Privacy Law
I used to work in the Data Protection area in the EU, so I have a suspicion that I know why this is necessary, and why the WMF might not want to concede that IP data is personal information until they are in a position to stop displaying it. However I'm curious as to what we are going to do with the hundreds of millions of edits that are currently linked to an IP address. Leave them untouched? If you stop displaying the IP address how do you expect people to comply with the attribution part of CC-BY-SA? To me it has long seemed a bit of a nonsense that we require attribution of IP addresses, better in my view to have edits by logged in users as CC-BY-SA and in future to have some of legalese to the effect that if you choose not to use an account the SA bit of CC-BY-SA does not apply to you as you have not given a name for reusers to attribute your edits to. The recruitment of new editors is a really important point, but there is an alternative. Currently we are over dependent on the desktop view as the mobile view recruits very few readers to become editors. Making the mobile view more editor friendly for smartphone users is probanbly too big a software task for the WMF. But if we launched a tablet view an intermediate in editor friendliness between mobile and desktop, and maybe upgraded everyone on their first edit from Vector to Monobook, we might have sufficient new editors that we could afford to lose IP editing. ϢereSpielChequers 09:55, 2 November 2020 (UTC)[reply]
User contributions
I ask this as an editor without much technical understanding of the "masking" process being proposed here: will editors still be able to see the user contributions of IP editors? Help:User contributions points out that "Other users' user contribution pages can also be accessed and are useful for seeing how other users have contributed. They can be used to track down vandalism, serial copyright violations, etc." I routinely use IP editors' user contributions pages to find and revert all of the vandalism a vandal has posted after stumbling across one instance of it in my watchlist. Will this still be possible with the IPs "masked"? If not, it will make spotting and quickly fixing the work of vandalism-only IP editors much more difficult for me. -Bryan Rutherford (talk) 04:26, 3 November 2020 (UTC)[reply]
WMF legal & the Community
EU-US Privacy Shield invalidation
Those interested in details about this requirement might want to review the July ruling from the European Court of Justice finding that the EU-US Privacy Shield framework failed to protect Europeans' rights to data privacy.[1] 107.242.121.56 (talk) 21:30, 3 November 2020 (UTC)[reply]
About time
I have raised this issue a number of times. I think Wikipedia is today the only website which openly displays users' IP address, which can reveal data about them, and make them potentially vulnerable to hackers. We know that revealing such data can be highly inappropriate, which is why we allow oversighting of edits by unlogged in users. But by default the WMF is revealing information about users without adequately warning them of the consequences. It should be a priority matter to automatically hide people's IP address, and not because the WMF can get sued but because it can put people in harm's way, and nobody should be put in harm's way because of editing Wikipedia, even if they are vandals. The WMF could automatically assign a unique username to each IP address, making it clear this is an unregistered account, but identifying it so it can be monitored, and still allowing checkusers to look at the IP address if appropriate. It should do this for each new IP user, but also convert all existing IP edits into unique usernames, providing functionaries with all the data of the changed IP names. The information the legal team probably wants to conceal is detail on the ways that an IP address can be vulnerable (and thus the rationale for why they want to do this), and it is right that such information is concealed, and that we shouldn't be speculating here on those vulnerabilities. SilkTork (talk) 12:06, 6 November 2020 (UTC)[reply]
Yes, it seems quite extraordinary that we allow unregistered and anonymous people to join in without logging in from any fly-by-night Internet Cafe or temporary SIM card phone and have a go at doing whatever they fancy for good or ill with basically no accountability at all. A site "that anybody can edit" should simply mean "that anybody can freely register for" (in a couple of minutes), basta - every other website in the world works that way, and it doesn't seem to stop many of them getting huge numbers of customers. As for masking, well, it seems utterly extraordinary that the legal eagles can't tell us what law we're supposed to be complying with - why the hell not, it's a basic right to know how we're being governed. Masking is an utterly ludicrous solution, both because the IPs should be logging in, and because (as others have said above) it will make the tracking-down of vandalism worse - how are we going to warn somebody when we have no way at all of knowing if they did it before, it makes no sense: doubly ridiculous. Get them to log in and all the technical faffing-about and complexity is sidestepped. Should have been done years ago. Chiswick Chap (talk) 20:42, 14 November 2020 (UTC)[reply]
A different IP problem
Since we have IP-related dev attention here, I want to raise a side topic: It's intensely frustrating (as well as a security problem) that our systems are presently blanket-blocking (often on a WMF-global basis) all sorts of IP addresses that WMF assumes are "web host providers or colocation providers", without regard to the obvious facts that a) IP addresses and the servers behind them often serve multiple purposes; b) once a user is logged into an actual account, what IP address they are coming from and what other services are provided by the owner of that IP address are irrelevant; and c) the endpoints of most VPNs anyone would bother subscribing to are very likely to be "web host providers or colocation providers" as most of their bread-and-butter, or they would not have the bandwidth to be useful VPN endpoints in the first place.
You do not have permission to edit this page, for the following reason:
You are currently unable to edit Wikipedia.
You are still able to view pages, but you are not currently able to edit, move, or create them.
Editing from 123.456.789.0/22 has been blocked (disabled) by AdminUserName for the following reason(s):
The IP address that you are currently using has been blocked because it is believed to be a web host provider or colocation provider. To prevent abuse, web hosts and colocation providers may be blocked from editing Wikipedia.
You will not be able to edit Wikipedia using a web host or colocation provider.
Since the web host acts like a proxy or VPN, because it hides your IP address, it has been blocked. To prevent abuse, these IPs may be blocked from editing Wikipedia. If you do not have any other way to edit Wikipedia, you will need to request an IP block exemption.
If you do not believe you are using a web host, you may appeal this block by adding the following text on your talk page: {{unblock|reason=Caught by a colocation web host block but this host or IP is not a web host. My IP address is _______. Place any further information here. ~~~~}}. You must fill in the blank with your IP address for this block to be investigated. Your IP address can be determined using whatismyip.com. Alternatively, if you wish to keep your IP address private you can use the unblock ticket request system. If you are using a Wikipedia account, you will need to request an IP block exemption by either using the unblock template or by submitting an appeal using the unblock ticket request system.
Administrators: The IP block exemption user right should only be applied to allow users to edit using web host in exceptional circumstances, and they should usually be directed to the functionaries team via email. If you intend to give the IPBE user right, a CheckUser needs to take a look at the account. This can be requested most easily at SPI Quick Checkuser Requests. Unblocking an IP or IP range with this template is highly discouraged without at least contacting the blocking administrator.
Using ISP Rangefinder
This block has been set to expire: 14:28, 11 October 2022.
Even if blocked, you will usually still be able to edit your user talk page and email other editors and administrators.
Other useful links: Blocking policy · Username policy · Appealing blocks: policy and guide
If the block notice is unclear, or it does not appear to relate to your actions, please ask for assistance as described at Help:I have been blocked.
I sometimes have to bounce around between 10+ endpoints on my VPN provider's network before I find one from which I can edit, and this is just downright stupid. (And then it changes again a few days later so I can't use that one, meanwhile the unnecessary block on another expires and I can use it again. For a few days. Then I have to try to come in from Panama or Japan or Zimbabwe. Until next week, then maybe Liechtenstein or New Zealand. It's just random, brain-farty, wannabe-security nonsense.)
I've had requests to unblock a specific VPN IP address, for me as a logged-in user, declined simply because it's a technical hassle. It shouldn't be a hassle. It's only a hassle because of how things have been set up on the sysadmin side of things. And sometimes these requests are declined for even more daft reasons, like maybe I'm not really who I say I am, and why am I coming in from IP addresses all over the globe, is maybe my account compromised, or am I "really me" but a bad-actor after all, despite years of service? It's blatant circular reasoning: We're screwing with your ability to edit by carpet-bombing various IP addresses because someone vandalized through them once upon a time; then we're declaring you to be a possible vandal or sockpuppet or system cracker because this idiocy has forced you to try to use other IP addresses to get in. That's called "blaming the victim".
This has to stop. While I don't entirely disagree with the dev's announcement/op-ed thing above expressing concerns that just blockading all anon IP edits would do harm to the projects by erecting a barrier to entry that many potential editors would not climb (though pt.wikipedia is providing direct evidence against that prediction), it's more than just hypothetically harmful to use blunderbuss approaches to "security" (actually just anti-vandalism and anti-socking convenience) that thwart editors like me (with 15+ years of solid experience here, and advanced permissions), and actually reduce real security by convincing various legit, account-registered editors to stop trying to log in through VPNs. Given how many editors are now editing with mobile laptops, phones, and tablets, from locations they do not completely control and which are sometime actively targeted by persons and organizations trying to eavesdrop on data, this is a real and growing security hole (especially for users with advanced permissions like TemplateEditor, Admin, etc), as well as a totally unnecessary pain in the butt.
Johan Jönsson, I doubt you have anything personally to do with this problem, but you appear to be in a development-insider position to amplify the squeaking of this wheel so that it actually gets some grease.
PS: This firehose approach to IP blocking doesn't even function as intended, anyway. It is often the case that I can edit from one of my VPN's IP address for anywhere from several minutes to an hour or longer without incident, only to eventually have it stop working, with that dunderheaded block notice popping up finally. There is a huge lag in the ability of the system that does the IP address analysis to even "get a bead" on what the IP address is and match it to a block list. That's a bit like having a car-door lock that only actually locks the door at some random interval, anywhere from a minute to several hours, after you press the lock button (and probably do it when the actual car owner is trying to get in, not when a thief is). It's certainly doing jack to prevent vandalism or socking, since by far the majority of such unconstructive behavior is going to happen quickly, not after 39 minutes or 2.6 hours of editing around as an anon at that same IP address. The entire approach is just flat-out broken.
— SMcCandlish ☏ ¢ 😼 16:32, 17 November 2020 (UTC)[reply]