How to review supplemental material without revealing your identity

The de facto way to consider an article as scientifically valid is whether the publisher carried out a peer review process or not. The reviewers are people with proven expertise in the field - by publishing peer reviewed articles in the area - who are capable of assessing the scientific value of an article. Because those reviewers can be direct competitors - who want to bog down the authors because they compete for funding or colleagues of the type: you scratch my back, I scratch yours - the editor doesn’t unveil who are the reviewers. In my opinion the reviewers should be made public, both to the authors and to the readers of the article, but I would leave that for another post.

However keeping the reviewers anonymous these days is becoming more difficult because many scientific articles include supplemental material that can’t be attached directly to the article, i. e. a huge list of proteins identified in a proteomics experiment. This poses a problem for editors and reviewers because by watching which IPs are accessing the machine which is hosting the data the original authors can guess who are the reviewers.

I have heard about this issue as a really huge problem for peer review process and from advocates of third party hosts with sophisticated technologies which anonymize reviewers. But it turns out that accessing any site in the web anonymously is not as obscure or complicated as it sounds.

A quick way - but maybe not so reliable - of anonymizing your web traffic is by googling for ‘browse anonymously.’ You can find many web proxies that claim to anonymize your identity. Usually you have to paste the web site you want to access anonymously and then you’ll be redirected to the web page normally, albeit with a much slower load. The people who are hosting the server will see, at most, an IP address where the anonymous proxy is, that won’t probably correspond to a place where any reviewers are located.

But I don’t recommend using any proxy out there unless you really need to access something anonymously quickly and you have nothing in place. After all, who knows what they can do with your data, or if your IP leaks somewhere. I would recommend the use of the Tor network. Tor is an anonymity network where volunteers spread all over the world provide their machines to act as anonymizing proxies. Oversimplifying, Tor connects, encrypts and obfuscates the web traffic between you and the host with many of these proxies so that it becomes damn difficult to find out the original IP. When using a browser that goes through the Tor network the guys hosting the data will be seeing different random IPs all over the world with no relation whatsoever to each other.

There are different ways to setup Tor but if you want to use it without thinking too much go to the download page of Tor and get the Tor browser bundle. That will come with a firefox browser which is already configured for accessing the tor network. If in your institution or company, the network policy is controlled by a fascist network administrator who denies everything in the firewall regardless of the true danger for security is, go to settings and indicate you are behind a firewall.

By giving the link to the Tor bundle browser in the supplemental material the editor and the reviewers shouldn’t have any problem accessing self-hosted data. Downloading the Tor bundle browser shouldn’t be an obstacle, I haven’t seen anyone complaining when asked to download a propietary viewer to visualize closed formats for raw data, which is quite common in proteomics.

However with this post I’m not advocating to host your own data instead of sharing it. Hosting it yourself and sharing are not mutually exclusive. I think sharing your scientific data is a moral imperative when you are funded with public money. I encourage sending scientific data to as many public repositories as possible, but I also think individual researchers have the right to host their own data if they want to.

I know it’s hard to believe but many researchers who are funded with money coming from tax-payers are reluctanct to share their data, at least in the proteomics community where I work. If their data is stolen and other people find more interesting things the original authors missed they lose the relevance necessary to keep getting funded, specially when the analyzers don’t give enough credit to the generators of data which is quite frequent. But I will leave that for another post.