<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Danny Navarro's Blog]]></title>
  <link href="http://dannynavarro.net/atom.xml" rel="self"/>
  <link href="http://dannynavarro.net/"/>
  <updated>2012-03-03T12:55:55+01:00</updated>
  <id>http://dannynavarro.net/</id>
  <author>
    <name><![CDATA[Danny Navarro]]></name>
    <email><![CDATA[j@dannynavarro.net]]></email>
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[Migrating to Octopress]]></title>
    <link href="http://dannynavarro.net/2011/09/20/migrating-to-octopress/"/>
    <updated>2011-09-20T19:07:00+02:00</updated>
    <id>http://dannynavarro.net/2011/09/20/migrating-to-octopress</id>
    <content type="html"><![CDATA[<p>I finally found some time to migrate my blog to <a href="http://wordpress.com/">Octopress</a> from
<a href="http://octopress.org/">Wordpress.com</a>. The critical reason to migrate from Wordpress has
been the support for nice code syntax highlighting, something I
couldn&#8217;t have wordpress.com, at least for free. I know there are very
nice wordpress plugins for syntax highlighting but in order to use
them I would have to host it myself. I don&#8217;t want to go through the
hassle of maintaining a typical PHP/MySQL stack or to be worried about
being <a href="http://en.wikipedia.org/wiki/Slashdotted"><em>slashdotted</em></a>.</p>

<p>Having worked with an excellent documentation tool like <a href="http://sphinx.pocoo.org/">Sphinx</a>, I
started looking to static blog generators meant. It turned out that
<a href="http://blog.manuelviera.es/">Manu Viera</a>, a colleague working at <a href="http://www.yaco.es/">Yaco</a> with me, shared the
same itch and had already looked several static web generators in
Python, which is our main language at <a href="http://www.yaco.es/">Yaco</a>. Manu found
<a href="https://github.com/ametaireau/pelican">pelican</a> the best candidate but still I found it a bit immature,
not something like something like <a href="https://github.com/mojombo/jekyll">Jekyll</a>.</p>

<p>Then I found <a href="http://wordpress.com/">Octopress</a>, a framework built on top of <a href="https://github.com/mojombo/jekyll">Jekyll</a>
with <a href="http://octopress.org/docs/plugins/">several plugins</a>, including syntax highlighting or automatic
support for <a href="http://disqus.com/">disqus</a> comments.</p>

<p>The migration from wordpress was not too painful. I used the default
Jekyll script to import wordpress posts and disqus importer for the
comments.  After some sed commands I got nice markdown formatted
scripts.</p>

<p>I had some trouble in the beginning configuring an isolated Ruby
runtime in <a href="http://archlinux.org/">Arch Linux</a> just for Octopress but after discovering
<a href="https://github.com/sstephenson/rbenv">rbenv</a>, everything went smooth. (I prefer rbenv instead RVM with
rbenv I know at any moment what it&#8217;s doing).</p>

<p>Deploying an Octopress generated site to <a href="http://pages.github.com/">github pages</a> is as
<a href="http://octopress.org/docs/deploying/index.html">easy</a> as pie.</p>

<p>Aside of nice Python syntax highlighting now I have some extra
advantages I didn&#8217;t have with wordpress.com:</p>

<ul>
<li><p>Markdown syntax when writing my posts.</p></li>
<li><p>I can use the best text editor to mankind: <a href="http://www.vim.org/">vim</a> :P</p></li>
<li><p>My blog data becomes more manageable. If at some point I don&#8217;t want
to host it github, I could just to push it somewhere else with no
modification.</p></li>
<li><p>I got a very nice default theme for free, that aside of looking
good, it&#8217;s also very easy to tweak and maintain.</p></li>
<li><p>Now I have a good excuse to learn Ruby outside of RoR influence.
Ruby is one of those languages I wish I would be better at, even if
Python remains my main working language.</p></li>
</ul>


<p>In any case, I must say the service provided by wordpress.com has been
quite good but this one of those cases where you have to say: <em>“Sorry,
it&#8217;s not you, it&#8217;s just me”.</em></p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Using custom events in Pyramid]]></title>
    <link href="http://dannynavarro.net/2011/06/12/using-custom-events-in-pyramid/"/>
    <updated>2011-06-12T00:00:00+02:00</updated>
    <id>http://dannynavarro.net/2011/06/12/using-custom-events-in-pyramid</id>
    <content type="html"><![CDATA[<p><a href="http://docs.pylonsproject.org/docs/pyramid.html">Pyramid</a> is a <a href="http://webpython.codepoint.net/wsgi_tutorial">WSGI</a> application framework that primarily follows a<a href="http://docs.pylonsproject.org/projects/pyramid/1.0/narr/router.html"> request-response mechanism</a>. However, if you need to work with events you can still <a href="http://docs.pylonsproject.org/projects/pyramid/1.0/narr/events.html">use them</a>. It comes with <a href="http://docs.pylonsproject.org/projects/pyramid/1.0/api/events.html#event-types">some default event types</a> that are emitted implicitely by Pyramid as long as you have a subscriber for them. For most applications the default event types are enough, but what if you want to write your custom event type and emit it explicitly from your code? It turns out that the <a href="http://docs.pylonsproject.org/projects/pyramid/1.0/glossary.html#term-application-registry">application registry</a> that Pyramid uses by default comes with a handy <a href="https://github.com/Pylons/pyramid/blob/master/pyramid/registry.py#L36"><em>notify</em> method</a>. Pyramid <a href="https://github.com/Pylons/pyramid/blob/master/pyramid/router.py#L77">uses this method internally</a>  for its default events. Here is how you would take advantage of it:</p>

<pre>from pyramid.events import subscriber

class MyCustomEventType(object):
    def __init__(self, msg):
        self.msg = msg

@subscriber(MyCustomEventType)
def my_subscriber(event):
    print(event.msg)

def my_view(request):
    request.registry.notify(MyCustomEventType("Here it comes"))
    return {}
</pre>


<p>When running the application, every time a request goes through <em>my_view,</em> an event with a message is emitted, in this case, &#8220;Here it comes&#8221;. The subscriber then handles the event by printing the message, but it could do anything you want.</p>

<p>Notice that I&#8217;m using a <a href="http://docs.pylonsproject.org/projects/pyramid/1.0/narr/events.html#configuring-an-event-listener-using-a-decorator">decorator to hook</a> <em>my_subscriber</em>. In order for the decorator to work you have to make sure you <a href="http://docs.pylonsproject.org/projects/pyramid/1.0/narr/configuration.html#configuration-decorations-and-code-scanning">call the <em>scan</em> method when configuring</a> the application.</p>

<p>Be aware though, that all these events are synchronous because Pyramid is primarily a request-response framework, all the events emitted block until the subscribers are done. If you want non-blocking events in Pyramid you could spawn a process from the subscriber or come with <a href="http://blog.dannynavarro.net/2011/01/14/async-web-apps-with-pyramid/">some other solution</a>.</p>

<p>But the events in Pyramid are just another functionality that it offers. Pyramid is not a event-oriented framework, if you want to go all the way with async events you should look into <a href="http://twistedmatrix.com/trac/">Twisted</a> or <a href="http://www.tornadoweb.org/">Tornado</a>.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Why Arch Linux]]></title>
    <link href="http://dannynavarro.net/2011/05/21/why-arch-linux/"/>
    <updated>2011-05-21T00:00:00+02:00</updated>
    <id>http://dannynavarro.net/2011/05/21/why-arch-linux</id>
    <content type="html"><![CDATA[<p>I have been using <a href="http://www.archlinux.org/">Arch Linux</a> for 3 years now. I still use Debian and Ubuntu for the servers I administer but I acknowledge Arch Linux has taught many valuable lessons.</p>

<p>With Arch Linux there is very little in your system that you are not aware of. You have to configure everything yourself by editing config files. The process is not that hard because all those configuration files are meant to be tweaked. You also count with an <a href="https://wiki.archlinux.org">excellent wiki</a> to help you.</p>

<p>The <a href="https://wiki.archlinux.org/index.php/The_Arch_Way">Arch Linux philosophy</a> doesn&#8217;t try to shield the user from complexity with extra layers. Instead it focuses on making the direct configuration as simply as possible. For example, writing a proper boot script is much straightforward than in other distros. At the same time if you are not careful you have more chances of really screw things up everything.</p>

<p>Arch Linux aggressively updates from upstream sources. This has the advantage and disadvantages of being always in the bleeding-edge. I also like the idea of putting more responsibility about the stability of software in developers than in packagers, as long as you are aware of this as a user. As a user you have to assume the responsibility of being at the cutting-edge. Things may not go always smooth but you count with excellent tools to manage chaos.</p>

<p>That brings me to the real killer feature that makes Arch Linux shine over the rest: the packaging system. <a href="https://wiki.archlinux.org/index.php/Pacman">Pacman</a>, <a href="https://wiki.archlinux.org/index.php/ABS">ABS</a>, <a href="https://wiki.archlinux.org/index.php/AUR">AUR</a>, <a href="https://wiki.archlinux.org/index.php/Makepkg">makepkg</a> and the <a href="https://wiki.archlinux.org/index.php/PKGBUILD">PKGBUILD</a> format are just great. You usually don&#8217;t have to mess with packaging that much, everything installs nicely and dependencies are correctly handled, specially if you stick to the official repository.</p>

<p>But if you don&#8217;t like something about a package or need another version you have all the tools in place for the creation and introspection of packages without disrupting pacman bookkeeping (pacman is the equivalent of dpkg/apt-get in Debian).  Let me illustrate all this with something I had to deal with this week.</p>

<p>I decide to use <a href="http://compass-style.org/blog/">Compass</a> to make my stormy relationship with CSS smoother. Compass is a Ruby gem, the usual way to install gems is through Ruby packaging system but I don&#8217;t want to mess with the Ruby libraries already installed in the system with pacman. If I install those gems as root pacman will not be able to keep track of them, everything could break in the future, and most importantly, without an easy solution.</p>

<p><a href="http://rhodesmill.org/brandon/2011/adding-compass/">A way</a> to deal with this issue is to install the Compass gem in some directory and handle the runtime somehow. You usually end up with a new runtime environment for each project you start. There are excellent tools to manage runtimes in Ruby like <a href="http://rake.rubyforge.org/">Rake</a>, but boy, I already have enough <a href="http://www.doughellmann.com/projects/virtualenvwrapper/">managing</a> my Python <a href="http://pypi.python.org/pypi/virtualenv">virtualenvs</a>.</p>

<p>I see that Compass is already in AUR. <a href="http://aur.archlinux.org/">AUR</a> is a very liberal package repository where anyone can upload source packages. When you install from AUR you usually have a review the PKGBUILD, the comments of other users and check how many users have voted the package to be included in official repositories. With tools like <a href="https://wiki.archlinux.org/index.php/Yaourt">yaourt</a> the whole process is very smooth.</p>

<p>Alright, the ruby-compass PKGBUILD looks good to me so I install it. Now compass is a good system citizen and can be updated, installed and uninstalled through pacman. Compass works as expected but it turns out that the most interesting feature I wanted to use in Compass is only available in the latest version of Compass, the version in AUR is not the latest one.</p>

<p>No problem, it&#8217;ll probably be some version bumps and I&#8217;ll be done. I download the PKGBUILD, bump the versions and build the package again but then I realize that the new version depends on new Ruby gems that are not in AUR.</p>

<p>At this point I would avoid getting into a dependency hell and go for Rake, but wait, I&#8217;m using Arch Linux, let&#8217;s see what happens if I continue with the Arch flow.</p>

<p>I take the PKGBUILD of Compass as a template, which is generic enough for any Ruby gem, and use them for the Ruby dependencies. I update licences, versions checksums, build them and done, everything works. They are all coming from <a href="http://rubyforge.org/">rubyforge</a> and follow the same building conventions, making my life easy as a packager.</p>

<p>I upload the PKGBUILDs to AUR with just one <a href="https://bbs.archlinux.org/viewtopic.php?id=97137">burp</a> command. Now I can install the latest version of compass through pacman without any issue. I then send my modified version of PKGBUILD to the original Compass packager who updates it. That&#8217;s it, now anyone can install the latest version of Compass with all its dependencies from AUR. I now can install Compass at home with just one command: just <em>yaourt -Sy ruby-compass.</em></p>

<p>Now I just have to keep an eye in new updates on the dependencies I&#8217;m now maintaining in AUR but rubyforge offers an excellent notification system for gem updates.</p>

<p>That&#8217;s it. The whole thing took less than 30 minutes.</p>

<p>I don&#8217;t know if nowadays writing a DEB package spec is that hard, I acknowledge I never tried. The tutorials I found about them drove me away when I considered it some years ago.</p>

<p>It&#8217;s not only the packaging format itself, there is also the community and policy aspects. Editing your PKGBUILDs is something that every Arch Linux user does. For AUR there is very little regulating making the packaging smoother process at the expense of shifting the trust on the packages to the user. In general, most packages in AUR are good enough but for production machines I still value more the trust the Debian and Ubuntu package maintainers.</p>

<p>That&#8217;s where open source community shines, you have many choices.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Commenting out in Chamaleon templates]]></title>
    <link href="http://dannynavarro.net/2011/05/18/commenting-out-in-chamaleon-templates/"/>
    <updated>2011-05-18T00:00:00+02:00</updated>
    <id>http://dannynavarro.net/2011/05/18/commenting-out-in-chamaleon-templates</id>
    <content type="html"><![CDATA[<p>If you want to prevent <a href="http://chameleon.repoze.org/" title="Chameleon is an excellent HTML templating engine">Chameleon</a> from rendering some portions of an HTML template you might be tempted to do something like this:</p>

<pre>
</br>
&lt;!-- &lt;div&gt;${context.name}&lt;/div&gt; --&gt;
</br>
</pre>


<p>However Chameleon will still evaluate what&#8217;s inside the ${&#8230;} block even if it&#8217;s within an HTML comment. Chameleon must do this because you might want to insert conditional comments.</p>

<p>This dummy tal:condition block will do the job:</p>

<pre>
</br>
&lt;span tal:condition="None"&gt;
  &lt;div&gt;${context.name}&lt;/div&gt;
&lt;/span&gt;
</br>
</pre>


<p>Chameleon ignore anything inside the condition block.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Moving to Spain]]></title>
    <link href="http://dannynavarro.net/2011/05/15/moving-to-spain/"/>
    <updated>2011-05-15T00:00:00+02:00</updated>
    <id>http://dannynavarro.net/2011/05/15/moving-to-spain</id>
    <content type="html"><![CDATA[<p>After almost 3 years in The Netherlands working as <a href="http://en.wikipedia.org/wiki/Proteomics">proteomics</a> informatician at <a href="http://bioms.chem.uu.nl/">Albert Heck&#8217;s lab</a>, I&#8217;m moving to <a href="http://en.wikipedia.org/wiki/Sevilla">Seville</a>, Spain, to work as a web developer for<a href="http://yaco.es/"> Yaco Sistemas</a>, a fresh and dynamic open source friendly company.</p>

<p>This is an important shift in my career since I won&#8217;t be working on proteomics informatics and academic research anymore. I have mixed feelings about leaving proteomics. On one hand I like the area because there are plenty of tough challenges to be solved. But on the other hand I&#8217;m glad I can dedicate all my time to develop web applications, that might not be as sophisticated as proteomics software, but that will be immediately useful for the <em>masses</em>. I love web development and the Python community but within proteomics I could only intersect with the Python web development community quite sporadically. Now I&#8217;ll have the chance to be part of it full time.</p>

<p>Personally, The Netherlands is the most comfortable and easy-going country I ever lived. Here I had the chance to work with very smart people and made friends that will never forget. What I have learned during these years is priceless.</p>

<p>But I can&#8217;t deny my origins, Spain is where I feel at home even if sometimes I don&#8217;t find it too exciting because I&#8217;m too familiar with the culture. However Seville is quite far from <a title="Pamplona" href="http://en.wikipedia.org/wiki/Pamplona">my hometown</a>, in the North of Spain. The culture in the South is very different from the North, so in a way, I&#8217;ll be another foreigner excited about the peculiarities I discover about Andalusian culture.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Async Pyramid example done right]]></title>
    <link href="http://dannynavarro.net/2011/01/23/async-pyramid-example-done-right/"/>
    <updated>2011-01-23T00:00:00+01:00</updated>
    <id>http://dannynavarro.net/2011/01/23/async-pyramid-example-done-right</id>
    <content type="html"><![CDATA[<p>After speaking with <a href="http://mg.pov.lt/blog">Marius Gedminas</a> at freenode, he gave me enough hints to rewrite <a href="http://blog.dannynavarro.net/2011/01/14/async-web-apps-with-pyramid/">my previous async <i>view</i> example </a>with <i>locks</i> instead of <i>Value</i>, which is prone to race conditions. I also added a queue to allow jobs to wait for being processed.</p>

<br />


<br />


<pre>from multiprocessing import Process, Lock, Queue

job = 0
q = Queue(maxsize=3)
lock = Lock()

def work():
    import time; time.sleep(8)
    job = q.get()
    print("Job done: {0}".format(job))
    print("Queue size: {0}\n".format(q.qsize()))
    if not q.empty():
        work()
    else:
        lock.release()

def my_view(request):
    global job
    if not q.full():
        job += 1
        q.put(job)
        # Not running
        if lock.acquire(False):
            Process(target=work).start()
            print("Job {0} submitted and working on it".format(job))
        else:
            print("Job {0} submitted while working".format(job))
    else:
        print("Queue is full")
    print("Queue size: {0}\n".format(q.qsize()))
    return {'project':'asyncapp'}
<br />
</pre>


<p>With every request a job is sent. Here the queue accepts 3 jobs. The recursion in <i>work</i> makes sure there is only 1 process working at a time.</p>

<p>I will leave <a href="http://blog.dannynavarro.net/2011/01/14/async-web-apps-with-pyramid/">my previous example</a> with <i>Value</i> because it&#8217;s easier to understand but this version is much safer.</p>

<p><strong>Update:</strong> You can avoid the use of locks by <a href="http://blog.doughellmann.com/2009/04/pymotw-multiprocessing-part-2.html"> using 2 queues</a>.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Async web apps with Pyramid?]]></title>
    <link href="http://dannynavarro.net/2011/01/14/async-web-apps-with-pyramid/"/>
    <updated>2011-01-14T00:00:00+01:00</updated>
    <id>http://dannynavarro.net/2011/01/14/async-web-apps-with-pyramid</id>
    <content type="html"><![CDATA[<p>For the last few months I&#8217;ve been working in a kind of <a href="http://en.wikipedia.org/wiki/Content_management_system">CMS</a> for <a href="http://en.wikipedia.org/wiki/Proteomics">proteomics</a> results using <a href="http://docs.pylonsproject.org/projects/pyramid/dev/">pyramid</a> (in reality I started it with <a href="http://bfg.repoze.org/">repoze.bfg</a> which became pyramid after joining the pylons project).</p>

<p>My experience with pyramid has been really smooth until I had to write a form to parse huge input files in order to build <em>experiment sites</em>.</p>

<p>In an average view for an <em>add form</em> I would return a response redirecting to the newly created <a title="models were renamed to resources after a long thread" href="http://www.mail-archive.com/pylons-devel@googlegroups.com/index.html#00910"><del>model</del> resource</a>. But in this case, the proteomics files can be quite large. I needed some way of having the view returning a response while the files were being parsed. Here is a simplified representation of how I made it work:</p>

<pre><span id="LC1">from multiprocessing import Value, Process</span>

<span id="LC3">is_parsing = Value('B', 0)</span>

<span id="LC5">def parse():</span>
<span id="LC6">    print "Parsing started"</span>
<span id="LC7">    import time; time.sleep(10)</span>
<span id="LC8">    print "Parsing is done"</span>
<span id="LC9">    is_parsing.value = False</span>

<span id="LC11">def my_view(request):</span>
<span id="LC13">    if not is_parsing.value:</span>
<span id="LC14">        is_parsing.value = True</span>
<span id="LC15">        Process(target=parse).start()</span>
<span id="LC16">    else:</span>
<span id="LC17">        print("Still parsing...")</span>
<span id="LC18">    return {'project':'asyncapp',</span>
<span id="LC10">            'is_parsing': is_parsing.value}</span></pre>


<br />


<br />


<p>Here I&#8217;m launching another process to do the parsing while the web app returns the response without the parsing being done. The <em>is_building </em>variable is shared in both processes (no need of <em>global</em> statement). In my case I only want to run one parsing process at a time, so I didn&#8217;t have the need for locks or queues.</p>

<p>In the template of this view I can either offer a form to create <em>an experiment </em>or inform that there is already a experiment site being built. When building I can have the browser <a href="http://plugins.jquery.com/project/refresh">polling</a> to check if the parsing is done.</p>

<p>That&#8217;s enough in my case, I don&#8217;t need to scale to thousands of users parsing multiple files at the same time, but I was curious about how to deal with that problem if I had to. I played a bit with different ideas I was given in the always supportive #repoze channel at <a href="http://freenode.net/">freenode</a>.</p>

<p>First, instead of multiple processes I tried <strong>OS threads, </strong>I know about the <a href="http://blip.tv/file/2232410">infamous GIL</a> but want to see it with my own eyes. However I got some intimidating random errors from paster/pyramid that were enough to drive me off that path.</p>

<p>I also <a href="http://eventlet.net/doc/patching.html#monkeypatching-the-standard-library">monkeypatched</a> the standard library with <a href="http://eventlet.net/">eventlet</a> so that the OS threads would become <strong>green threads</strong>. The dummy example I show above seemed to run fine but when trying in my real application I ran into more cryptic thread errors from monkeypatched <a href="http://www.zodb.org/">ZODB</a>, which is what I use in my real app. I also tried <a href="http://www.gevent.org/">gevent</a> with similar results. <del datetime="2011-01-23T10:04:41+00:00">If you want to use eventlet or gevent you have to find another storage mechanism that works with green threads.</del> <strong>Update:</strong> I was monkeypatching incorrectly, <a href="http://braintrace.ru">Andrey Popp</a>&#8217;s <a href="http://blog.dannynavarro.net/2011/01/14/async-web-apps-with-pyramid/#comment-58">comment</a> explains how to do it.</p>

<p>Another potential source of problems when scaling with<strong> long polling</strong>, specially if you would like to add a nice responsive progress bar and a kind of log showing what is being done while parsing.</p>

<p><a href="http://en.wikipedia.org/wiki/WebSockets">WebSockets</a> are being regarded as the ultimate solution to deal with this kind of problems. First of all, let&#8217;s pretend websockets would be supported by all major browsers soon.</p>

<p>How can websockets be handled in pyramid? It turns out that dealing with websockets within <a href="http://www.python.org/dev/peps/pep-3333/">the WSGI protocol</a> is <a href="http://groups.google.com/group/paste-users/browse_thread/thread/2f3a5ba33b857c6c/2d63769fd9db6da3">messy</a>. However eventlet and gevent <a href="http://eventlet.net/doc/modules/websocket.html">have</a> <a href="http://www.gelens.org/code/gevent-websocket/">ways</a> to have websockets working within WSGI. Theoretically you could run a monkeypatched pyramid application behind <a href="http://gunicorn.org/">gunicorn</a> which can make the websockets <a href="https://github.com/benoitc/gunicorn/tree/master/examples/websocket">accessible</a> in the <em>request.environ</em> in pyramid. There is still some websocket protocol tasks (i.e. handshaking, closing socket, etc.) which would make writing something looking like a normal pyramid view hard.</p>

<p>But it happens that <a href="https://github.com/boothead">Ben Ford</a> has a already written a wrapper to take care of that problem:  <a href="https://github.com/boothead/stargate">stargate</a> (as he says, <em>communication for pyramids</em>). With stargate, in your pyramid app you create <em>websocket view</em> by subclassing from a base class that deals with the minutiae of the websocket protocol (up to v76, the latest version at the time of writing). In a websocket view, instead of returning anything from that view, you just write a handler to catch what is coming from the websocket. The great advantage of stargate is that you don&#8217;t need to run another process to deal with websockets, you can handle websockets from within pyramid. Additionally, stargate has 100% unit test coverage and some <a href="http://boothead.github.com/stargate/">documentation</a>.</p>

<p>While websockets look like the way forward I think it&#8217;s going to take some time for websockets to become mainstream in all browser, specially after mozilla announced it won&#8217;t support websockets in the next release of firefox because of <a href="http://hacks.mozilla.org/2010/12/websockets-disabled-in-firefox-4/">security issues</a>.</p>

<p>However, with <a href="http://nodejs.org/">node.js</a>, it seems finally event driven web frameworks are becoming mainstream, bringing projects like <a href="http://socket.io/">Socket.IO</a>. Socket.IO provides an abstraction layer to the developer to write event driven web applications. Socket.IO gives the same way of writing regardless of what the browser supports, being websockets, long polling or Flash; the developer writes the app the same way.</p>

<p>Although initially Socket.IO is meant to be used with node.js, there is something available for the server side in Python: <a href="https://github.com/SocketTornadIO/SocketTornad.IO">SocketTornad.IO</a>. It&#8217;s built on top of <a href="http://www.tornadoweb.org/">Tornado web framework</a>. In spite of Tornado having <a href="https://github.com/facebook/tornado/blob/master/tornado/wsgi.py#L194">some WSGI support</a>, I&#8217;m afraid it <a href="http://www.tornadoweb.org/documentation#wsgi-and-google-appengine">won&#8217;t be easy</a> to have the async features when in WSGI mode.</p>

<p>If I were to support now many concurrent users in a highly responsive application I would probably ditch pyramid  and go directly with SocketTornado.IO. Perhaps I will still be using pyramid for the non async part and have a front web server dispatching requests accordingly.</p>

<p>But it turns out that this is just a fun thought experiment, the multiprocess solution is fine for me because, like most web developers or bioinformaticians, for now I don&#8217;t need to write highly responsive applications for thousands of users.</p>

<p><strong>Update: </strong><a href="http://mg.pov.lt/blog">Marius Gedminas</a> pointed out a better way to do this with locks. I will leave the code snippet using Value because is quite illustrative but you shouldn&#8217;t use Value if you to do something similar, instead check a better example I wrote in <a href="http://blog.dannynavarro.net/2011/01/23/async-pyramid-example-done-right/">another post</a>.</p>

<p><strong>Update: </strong> Check <a href="https://github.com/abourget/pyramid_socketio">pyramid_socketio</a> for a newer version of async apps.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[The bioinformatics curse]]></title>
    <link href="http://dannynavarro.net/2011/01/11/the-bioinformatics-curse/"/>
    <updated>2011-01-11T00:00:00+01:00</updated>
    <id>http://dannynavarro.net/2011/01/11/the-bioinformatics-curse</id>
    <content type="html"><![CDATA[<p>As a bioinformatician you will be considered a programmer by biologists and a biologist by programmers. When talking with programmers you will suck at programming, when talking with biologists you will suck at biology. Biologists don&#8217;t want to know much about computing, understandably, they want to get their job done. Programmers might show some curiosity in biology but tend to shield themselves from biology complexity in order to get to get work done. As a bioinformatician you have to know enough of biology to be in the cutting-edge so what you research continues being relevant and keep improving your programming skills so you are still productive for what is expected of a programmer nowadays.</p>

<p>Some influential bioinformaticians group try to define the bioinformatics field as if it were precisely the research they are doing, frowning upon bioinformatics research not similar to theirs (or similar but superior to theirs). The followers of these groups try to imitate them so that they can be some day become <em>experts</em> in the field. I see also other bioinformaticians gathering together just because they work in biology using a computer, regardless of how little overlap there is in the things they do. It&#8217;s like group therapy, sharing experiences with people marginalized for the same reason.</p>

<p>Bioinformatics field is still in the very beginning. The field is very broad and will eventually be fragmented in multiple <em>official</em> fields. Working in an emerging field can be very exciting because you don&#8217;t have the constrains rules of an established field. But if social recognition is important to you, think twice when getting into bioinformatics. You&#8217;d likely feel out-of-place wherever you go.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[ZSH prompt for virtualenv, git and bzr]]></title>
    <link href="http://dannynavarro.net/2010/10/16/zsh-prompt-for-virtualenv-git-and-bzr/"/>
    <updated>2010-10-16T00:00:00+02:00</updated>
    <id>http://dannynavarro.net/2010/10/16/zsh-prompt-for-virtualenv-git-and-bzr</id>
    <content type="html"><![CDATA[<p>Some of my colleagues are surprised I don&#8217;t use NetBeans or Eclipse or some other fat IDE. In fact, I believe the most <a href="http://mg.pov.lt/blog/unix-is-an-ide.html">powerful IDE is UNIX</a>.</p>

<p>I also acknowledge that fine-tuning all your UNIX tools to your exact requirements can be cumbersome. But you don&#8217;t need to customize everything from scratch. Open source is great because you can <em><a href="http://vimeo.com/4763707">steal</a></em> what other people are sharing and put it altogether as you like.</p>

<p>Lately I have been reading <a href="http://briancarper.net/blog/570/git-info-in-your-zsh-prompt">some</a> <a href="http://tech.blog.aknin.name/2010/10/14/zsh-and-virtualenv/">great</a> <a href="http://stevelosh.com/blog/2010/02/my-extravagant-zsh-prompt/">posts</a> about customization <a href="http://en.wikipedia.org/wiki/Zsh">ZSH</a>. I consider the shell prompt a fundamental part of the UNIX IDE. In the screenshot below I show how my customized ZSH prompt plays nicely with <a href="http://bazaar.canonical.com/">bzr</a>, <a href="http://git-scm.com/">git</a> and <a href="http://www.doughellmann.com/projects/virtualenvwrapper/">virtualenvwrapper</a>.</p>

<p><img class="alignnone size-full wp-image-109" title="zsh_demo" src="http://jdnavarro.files.wordpress.com/2010/10/zsh_demo1.png" alt="" width="536" height="754" /></p>

<p>I barely have done anything from scratch. I just stitched configurations and tips from these sources:</p>

<ol>
    <li>Prompt decorations: <a href="http://aperiodic.net/phil/prompt/">http://aperiodic.net/phil/prompt/</a>, <a href="http://git.sysphere.org/dotfiles/tree/zshrc">http://git.sysphere.org/dotfiles/tree/zshrc</a></li>
    <li>Zenburn theme: <a href="http://git.sysphere.org/dotfiles/tree/Xdefaults">http://git.sysphere.org/dotfiles/tree/Xdefaults</a></li>
    <li>git prompt hack: <a href="http://briancarper.net/blog/570/git-info-in-your-zsh-prompt">http://briancarper.net/blog/570/git-info-in-your-zsh-prompt</a></li>
    <li>virtualenv prompt hack: <a href="http://www.doughellmann.com/docs/virtualenvwrapper/tips.html#zsh-prompt">http://www.doughellmann.com/docs/virtualenvwrapper/tips.html#zsh-prompt</a></li>
    <li>bzr prompt hack: from scratch imitating git hack.</li>
</ol>


<p>End results:</p>

<ol>
    <li><a href="http://github.com/jdnavarro/dotfiles/blob/master/.zshrc#L164">.zshrc</a></li>
    <li>virtualenwrapper <a href="http://github.com/jdnavarro/dotfiles/blob/master/sandbox/virtualenvs/postactivate">postactivate</a> and <a href="http://github.com/jdnavarro/dotfiles/blob/master/sandbox/virtualenvs/postdeactivate">postdeactivate</a></li>
</ol>


<p>Isn&#8217;t open source great? I would be flattered if you also can steal my configuration files ;)</p>

<p>In another blog post I&#8217;ll write about my customizations for Vim, <a href="http://software.schmorp.de/pkg/rxvt-unicode.html">urxvt</a> and <a href="http://awesome.naquadah.org/">awesomewm</a> to reach the ultimate UNIX IDE.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[The value of 'wasting' your time]]></title>
    <link href="http://dannynavarro.net/2010/08/06/the-value-of-wasting-your-time/"/>
    <updated>2010-08-06T00:00:00+02:00</updated>
    <id>http://dannynavarro.net/2010/08/06/the-value-of-wasting-your-time</id>
    <content type="html"><![CDATA[<p>I&#8217;m the kind of guy who likes to learn just about everything just for fun. I have the feeling I don&#8217;t really grasp something until I have real experience with it. That&#8217;s why redoing something other people have done is the best activity I can think for learning what&#8217;s the matter really about. But it seems some people have difficulties tolerating my views about learning. I sometimes hear things like:</p>

<p><em>Why do you reinvent the wheel? Why do you want to waste your time?</em></p>

<p>Well, if I hadn&#8217;t &#8216;wasted&#8217; so much time during all these years I would still be doing terrible things like <a href="http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454">using regex to parse HTML</a> with a crippled Perl scripts. The skills for what I&#8217;m getting my salary are mostly from wasted time. I wouldn&#8217;t be nowhere with only my <em>formal</em> training.</p>

<p>Information technology nowadays is mostly about about <strong>knowledge</strong> investment. Doing stuff is not as hard as learning how to do stuff orders of magnitude more efficiently. Learning is kind of accumulative, the more you learn the better you are at learning and the faster your efficiency grows. With computers is really hard to hit a physical limit where it&#8217;s not possible to improve significantly more by learning.</p>

<p>It&#8217;s <a href="http://en.wikipedia.org/wiki/The_Mythical_Man-Month">well known</a> that few highly productive people can beat large corporations made of people who work linearly with the skills they once learned in order to get a job.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Sharing proteomics data, trickier than it seems]]></title>
    <link href="http://dannynavarro.net/2010/07/31/sharing-proteomics-data-trickier-than-it-seems/"/>
    <updated>2010-07-31T00:00:00+02:00</updated>
    <id>http://dannynavarro.net/2010/07/31/sharing-proteomics-data-trickier-than-it-seems</id>
    <content type="html"><![CDATA[<p>Reading the <a href="http://cameronneylon.net/blog/metrics-of-use-how-to-align-researcher-incentives-with-outcomes/">blog post</a> from <a href="http://cameronneylon.net">Cameron Neylon</a> about how research incentives should align with research outcomes made me see a clear relation with the problem of sharing data in proteomics.</p>

<p>Accessing genomics or transcriptomics data is more or less straightforward compared to proteomics data. You go to a data repository and download whatever you are looking for. Anyone who tries to do something similar in proteomics usually ends up downloading spreadsheets with incomplete data from supplemental material of published articles.</p>

<p>It&#8217;s frequent to hear proteomics informaticians complaining about experimentalists not making their data easily available and how they could be so greedy of not publishing the data for what they have been funded with public money. But jumping into conclusions about experimentalists being greedy or sloppy is not fair to me. The real issue is far deeper than that.</p>

<p>First of all, in order to share proteomics data the first requirement is to have the proper infrastructure. Lately there has been some projects aiming to provide the sharing infrastructure. I would say that the most popular proteomics repositories are <a href="http://www.peptideatlas.org/">PeptideAtlas</a>, <a href="http://www.humanproteinpedia.org/">Proteinpedia</a>, <a href="http://www.ebi.ac.uk/pride/">PRIDE</a> and <a href="https://proteomecommons.org/">ProteomeCommons</a>.</p>

<p>As far as I know, the main way Proteinpedia and PeptideAtlas handle data upload is with manual curators who study the format and the way the proteomics experiments were done. Then they come up with the best way to put the data in the repository. Usually is a combination of custom parsing with some manual editing to make experiments consistent in the repository. Obviously this kind of method is not that scalable for every proteomics experiment out there. In order to cope Proteinpedia focuses on human submissions only, whereas PeptideAtlas team is already looking into other <a href="http://www.peptideatlas.org/upload/">solutions</a> to facilitate data submission.</p>

<p>PRIDE, in the other hand, accept only data submission in PRIDE XML format. The generation of this XML format is not supported by most of the proteomics tools experimentalists use. However the PRIDE team provide an <a href="http://code.google.com/p/pride-converter/">application</a> that tries to convert every format out there and every experiment design to the PRIDE XML format. But because they try cover every single aspect of every experiment, I&#8217;ve seen experimentalists struggle when trying to fill all the forms. Moreover, they also don&#8217;t like the rigidity of the model to depict what the proteomics experiment was about. It&#8217;s true that there are some optional controlled vocabulary terms to give flexibility but still experiments have difficulties wrapping their head about how to use those terms.</p>

<p>This problem is not exclusive of PRIDE. Each proteomics lab uses its own mass spec terminology and frequently forms designed by developers with no first-hand experience making experiments don&#8217;t match the terms the experimentalists would understand. After all, experimentalists like to spend their time with experiments, they don&#8217;t like spending their time in things that are considered bureaucracy. The PRIDE team is aware of this problem a keeps trying with more <a href="http://www.ebi.ac.uk/pride/proteomeharvest/">familiar ways</a> for experimentalists to fill form data.</p>

<p>ProteomeCommons seems to be the repository getting most traction. It&#8217;s currently where most proteomics experimentalists are submitting their data to fend off journal editors&#8217; complains about the lack of published data. ProteomeCommons is built on top of the <a href="https://trancheproject.org/">Tranche</a> network, a kind of global distributed filesystem, that potentially offers infinite scalability to store data by just adding more nodes to the network. The tranche network looks just like a big hard disk with several files. Everybody can upload anything they like. That&#8217;s where ProteomeCommons comes in place, it&#8217;s the web gateway to upload the data, the web application offers some options to annotate the data, but it&#8217;s not as complete as what you have in the other databases. That&#8217;s understandable because if they bothered the experimentalists with thousands of forms with mandatory fields, the experimentalist wouldn&#8217;t submit their data to ProteomeCommons. It&#8217;s also worth mentioning that ProteomeCommons is not required to use Tranche network, any other proteomics repository could use Tranche network as a backend to store huge proteomics files. The rest of the repositories are currently looking into using the Tranche network to store the huge amounts of proteomics data, specially derived from raw data.</p>

<p>You might have notice the conspicuous absence of format standards in my description about the different repositories. If everybody used the same proteomics standards the infrastructural problems to share data would have been solved. Right?</p>

<p>The major effort to standardized proteomics formats is being carried out by <a href="http://www.psidev.info/">HUPO-PSI</a>. It&#8217;s a kind of consortium where they have regular meetings where representatives of different proteomics groups among the world agree on what has to go in the standard and how. You can follow the discussions and chip in for what would you like to have in the formats.</p>

<p>Aside of the typical problems of something <a href="http://www.codinghorror.com/blog/2005/06/the-pontiac-aztek-and-the-perils-of-design-by-committee.html">designed by committee</a>, it remains to be seen if mass spec vendors and proteomics software developers will fully embrace the standards. Proteomics data is highly heterogeneous by nature, there are very different kinds of proteomics experiments depending on what is the research being done. High quality proteomics experiments is not something that can be converted into an assembly line process where everything can be easily is fixed and standardized.</p>

<p>However the new stable <a href="http://www.psidev.info/index.php?q=node/257">format</a> <a href="http://www.psidev.info/index.php?q=node/403">releases</a> from HUPO PSI look good enough to me to at least start making the data exchange among repositories possible. There is also the promise from several mass spec vendors of future commitment to fully support the standards. I hope all those promises don&#8217;t end up in just that.</p>

<p>In my opinion all these infrastructural difficulties are going to be solved somehow relatively soon. The problem is that I don&#8217;t think that just by solving the infrastructural problems everybody will start sharing data transparently. There are other difficulties.</p>

<p>The main reason experimentalists are, at least, uploading their data to Tranche is to avoid being bugged by proteomics journals into making their proteomics data available. Many proteomics journals are getting really <a href="http://www.mcponline.org/site/misc/ParisReport_Final.xhtml">serious</a> about this making the data available. Why proteomics journals are so interested in having the authors making their data available?</p>

<p>One could argue that editors of these journals believe in the moral imperative of making the data available but I don&#8217;t buy ethics as the main reason. The majority of biomedical scientific journals are still for profit companies, not academic institutions. In order to survive they have to make money selling something as every other company. For a journal publishing articles with lots of citations from other journals, with lots citations themselves, are the best way to guarantee that companies and institutions will keep renewing the yearly subscriptions.</p>

<p>It&#8217;s not something that I can&#8217;t demonstrate with facts, but lately I&#8217;m getting the feeling that proteomics is being disregarded by people in other biological fields as <em>low quality research</em>. After all proteomics is just a technology, a tool, to find out biological insights. Mass spec research by itself wouldn&#8217;t get so much funding if it couldn&#8217;t be used for biological research. What I think it&#8217;s happening is that biologists are taking less seriously biological findings in pure proteomics journals. Most published proteomics experiments are irreproducible and if you start digging into the published data you frequently find many false positives. That&#8217;s why proteomics journals editors are enforcing the experimentalists to release their data and make it as transparent as possible. They hope they can gain more credibility and get more citations from non-proteomics journals.</p>

<p>But still one can see that most experimentalists are reluctant to make their best data fully available. Many informaticians trying to analyze the data think that this attitude is because of cultural resistance to change. Many of these informaticians try to evangelize the experimentalists about why is so important to share data. Among evangelists the most notable group is the <a href="http://www.fixingproteomics.org/">Fix Proteomics Campaign</a>, which proposes some habits to make proteomics more credible.</p>

<p>But experimentalists are not dummies. I have seen them changing really quickly any habit if they find something better. The problem is that making their data transparently available is worse for them and here is where I think the campaigns miss the point. Let me explain something unique about mass spectrometry proteomics that seems easily forgotten by many people.</p>

<p>Mass spectrometers are really expensive instruments. Getting the adequate skills to operate them takes several years of training. To make things more costly these instruments become obsolete in a matter of few years because there new ones are constantly new ones coming up with better features. When a new instrument arrives to the lab a lot of time is spent optimizing it and learning how to troubleshoot it. If you don&#8217;t keep getting those new mass spectrometers you are left behind by the competitors because they can get advantage of more powerful instruments.</p>

<p>How in academics is possible to maintain a high funding inflow? In a company you have to sell a a service or a product but you can&#8217;t do so in an academic group. Usually you rely on grant agencies to provide funding. Granting agencies grant money by scientific productivity of the group. Publications in <em>reputable</em> journals are the main tangible measurement used by granting agencies for scientific productivity.</p>

<p>But most journals, as I said before, have to operate like companies. Proteomics data by itself is not publishable, if there is no story with some biological insight or some novel way to improve results, what will you write in a proteomics paper with just high quality data? Generating high quality proteomics data is damn difficult, I would say even more difficult than to come with fancy analysis of data. Let me explain.</p>

<p>People coming from genomics and transcriptomics fields sometimes forget that the chemical nature of proteins is much more diverse than DNA or RNA. After all, DNA and RNA have more or less homogeneous chemical properties regardless of its sequence. Proteins, in the other hand, are chemically completely different from each other depending on the sequence - that&#8217;s why they can carry out so many molecular functions. A proteins from the nucleus are completely different than the proteins from the cytoplasmic membrane.</p>

<p>The proteome is also much more dynamic than the genome. The same cell under different conditions show completely different proteome profile. You also have to take chemical modifications of proteins into account, which are only detectable by probing the proteins directly. Chemical modifications like phosphorylation act as functional switches for proteins, a protein with a modification has also different chemical properties than the same protein without modification. The heterogeneity of proteins makes protein purification, proper separation, and identification by mass spec an entire field by itself.</p>

<p>So an experimentalist, who has had years of training just to be able to identify proteins and chemical modifications, might, understandably, lack the skills for sophisticated analysis that will make the story sexier for proteomics journals. Software to analyze proteomics data as a <em><a href="http://en.wikipedia.org/wiki/Power_user">power user</a></em><em>, </em>without programming knowledge, is still in the early days. As a developer I can see how difficult it is to make analysis software that covers every kind of proteomics experiment with a <em>point and click</em> interface.</p>

<p>The most logical step for an experimentalist when having good data would be to look for people specialized in analysis to come up with a powerful story. Usually the best people analyzing are independent proteomics informaticians that would do the analysis only if they get the credit for it. After all they have to also have to get funded to keep doing research and they can always claim they are the ones writing the paper. But even if proteomics informaticians give proper credit to the data generators in their papers - which I wouldn&#8217;t say it&#8217;s always true -, very few granting agencies will keep the experimental lab funded just to generate data that other people will use to write publications.</p>

<p>To tackle this problem the the most powerful experimental proteomics labs are trying to aggressively hire programmers who can do analysis <em>in-house</em> so that the credit remains within the group. The first problem these groups face is that there are almost no proteomics informaticians in the job market. They have to invest in people with programming skills who will eventually get the proteomics knowledge necessary to make useful programs for analysis or to be able to analyze the data themselves.</p>

<p>Also, because of the lack of programming knowledge of the experimentalists it can be tricky for them to envision which potential programmers will have the skills required to become a good analyzer, not to mention how to motivate the programmers with an excellent about why to join their field. There are also programmers, where I include myself, that precisely look actively for this highly experimental labs instead of pure informatics groups in order to get a closer understanding of experimental data by interacting directly with the experimentalists.</p>

<p>But this kind of setting where data generators and informaticians try to work together is propitious to get into <a href="http://en.wikipedia.org/wiki/Dilbert">Dilbert-kind</a> of situations. I would say that it&#8217;s mainly because of the misunderstanding generated by the technological gap.</p>

<p>I feel fortunate of working in <a href="http://bioms.chem.uu.nl/">my current group</a> because I think it&#8217;s one of the few experimental proteomics lab where people are aware of this problem and actively try to improve the communication with the informaticians.</p>

<p>The final point I want to make in this post is that if the data generators were rewarded properly for what they are good at, generating unbeatable high quality, sharing proteomics data transparently, and as soon as it&#8217;s generated, would become mainstream. I see granting agencies rewards as the main cause for not sharing data because they usually don&#8217;t reward the generation of data accordingly. They should also reward the labs that not only make the data available but make it as accessible as possible for data analyzers, so that proteomics research field would advance much faster than it&#8217;s currently doing. I acknowledge granting agencies are changing slowly for the better but there is still a long way to go.</p>

<p>But understanding how granting agencies work and what are their motivations is still something quite fuzzy to me. I think I&#8217;m still not old enough to understand the politics behind research funding.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[How to review supplemental material without revealing your identity]]></title>
    <link href="http://dannynavarro.net/2010/05/25/how-to-review-supplemental-material-without-revealing-your-identity/"/>
    <updated>2010-05-25T00:00:00+02:00</updated>
    <id>http://dannynavarro.net/2010/05/25/how-to-review-supplemental-material-without-revealing-your-identity</id>
    <content type="html"><![CDATA[<p>The <em>de facto</em> way to consider an article as scientifically valid is whether the publisher carried out a <a href="http://en.wikipedia.org/wiki/Peer_review">peer review</a> process or not. The reviewers are people with proven expertise in the field - by publishing peer reviewed articles in the area - who are capable of assessing the scientific value of an article. Because those reviewers can be direct competitors - who want to bog down the authors because they compete for funding or colleagues of the type: <em>you scratch my back, I scratch yours</em> - the editor doesn&#8217;t unveil who are the reviewers. In my opinion the reviewers should be made public, both to the authors and to the readers of the article, but I would leave that for another post.</p>

<p>However keeping the reviewers anonymous these days is becoming more difficult because many scientific articles include supplemental material that can&#8217;t be attached directly to the article, i. e. a huge list of proteins identified in a proteomics experiment. This poses a problem for editors and reviewers because by watching which <a href="http://computer.howstuffworks.com/internet/basics/question549.htm">IPs</a> are accessing the machine which is hosting the data the original authors can <a href="http://whatismyipaddress.com/">guess</a> who are the reviewers.</p>

<p>I have heard about this issue as a really huge problem for peer review process and from advocates of third party hosts with sophisticated technologies which anonymize reviewers. But it turns out that accessing any site in the web anonymously is not as obscure or complicated as it sounds.</p>

<p>A quick way - but maybe not so reliable - of anonymizing your web traffic is by googling for &#8217;<a href="http://www.google.com/search?q=browse+anonymously">browse anonymously.</a>&#8217; You can find many web <a href="http://en.wikipedia.org/wiki/HTTP_proxy">proxies</a> that claim to anonymize your identity. Usually you have to paste the web site you want to access anonymously and then you&#8217;ll be redirected to the web page normally, albeit with a much slower load. The people who are hosting the server will see, at most, an IP address where the anonymous proxy is, that won&#8217;t probably correspond to a place where any reviewers are located.</p>

<p>But I don&#8217;t recommend using any proxy out there unless you really need to access something anonymously quickly and you have nothing in place. After all, who knows what they can do with your data, or if your IP leaks somewhere. I would recommend the use of the <a href="http://www.torproject.org/">Tor</a> network. Tor is an anonymity network where volunteers spread all over the world provide their machines to act as anonymizing proxies. Oversimplifying, Tor connects, encrypts and obfuscates the web traffic between you and the host with many of these proxies so that it becomes damn difficult to find out the original IP. When using a browser that goes through the Tor network the guys hosting the data will be seeing different random IPs all over the world with no relation whatsoever to each other.</p>

<p>There are different ways to setup Tor but if you want to use it without thinking too much go to the <a href="http://www.torproject.org/easy-download.html.en">download page</a> of Tor and get the Tor browser bundle. That will come with a firefox browser which is already configured for accessing the tor network. If in your institution or company, the network policy is controlled by a <a href="http://en.wikipedia.org/wiki/Bastard_Operator_From_Hell">fascist network administrator</a> who denies everything in the firewall regardless of the true danger for security is, go to <em>settings</em> and indicate you are behind a firewall.</p>

<p>By giving the link to the Tor bundle browser in the supplemental material the editor and the reviewers shouldn&#8217;t have any problem accessing self-hosted data. Downloading the Tor bundle browser shouldn&#8217;t be an obstacle, I haven&#8217;t seen anyone complaining when asked to download a propietary viewer to visualize closed formats for raw data, which is quite common in proteomics.</p>

<p>However with this post I&#8217;m not advocating to host your own data instead of sharing it. Hosting it yourself and sharing are not mutually exclusive. I think sharing your scientific data is a moral imperative when you are funded with public money. I encourage sending scientific data to as many public repositories as  possible, but I also think individual researchers have the right to host their own data if they want to.</p>

<p>I know it&#8217;s hard to believe but many researchers who are funded with money coming from tax-payers are reluctanct to share their data, at least in the <a href="http://en.wikipedia.org/wiki/Proteomics">proteomics</a> community where I work. If their data is <em>stolen</em> and other people find more interesting things the original authors missed they lose the relevance necessary to keep getting funded, specially when the analyzers don&#8217;t give enough credit to the generators of data which is quite frequent. But I will leave that for another post.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[The blogging itch]]></title>
    <link href="http://dannynavarro.net/2010/03/24/the-blogging-itch/"/>
    <updated>2010-03-24T00:00:00+01:00</updated>
    <id>http://dannynavarro.net/2010/03/24/the-blogging-itch</id>
    <content type="html"><![CDATA[<p>Starting a blog has been in my personal todo list for years. Finally, after encouraging myself with <a href="http://www.codinghorror.com/blog/2006/03/users-dont-care-about-you.html">some</a> <a href="http://www.codinghorror.com/blog/2007/10/how-to-achieve-ultimate-blog-success-in-one-easy-step.html">old</a> <a title="post" href="http://www.codinghorror.com/blog/2006/02/fear-of-writing.html" target="_self">posts</a> from Jeff Atwood (yeah, I like reading <a title="Jeff Atwood" href="http://www.codinghorror.com/blog/" target="_self">Jeff Atwood</a> and I frequently agree with him), I decided to just kicked it off with whatever I have in my mind: why it has been so difficult to get started? Why now? It was not only procrastination, there has been something else. Let me explain.</p>

<p>I started using routinely a feed reader around 4 years ago. Since then I&#8217;ve been constantly replacing the blog feeds I follow with others I found more interesting. Right now all the blogs I&#8217;m following are excellently written, and I&#8217;m not talking only about celebrity blogs, but also the kind of highly technical blogs that make it, for example, into <a href="http://planet.python.org/">Python planet</a>. When I think about the quality of writing I would like for my blog several blogs from Python planet come to my mind. But even if I put myself a much lower quality threshold to get started, it turns out that writing something of acceptable quality, just the writing, is indeed harder than it seems. I can <a href="http://zenhabits.net/2009/04/seven-productivity-tips-for-people-that-hate-gtd/" target="_self">allow myself to suck</a> but can&#8217;t go public with something I know is total crap. Recently I reached a writing level where I&#8217;m comfortable enough to go public.</p>

<p>But obviously publishing a decent blog is not only about writing properly. You need something to write about. For me it&#8217;s not worth writing superficially just for the sake of writing. In order to comment on something I need to have a deep understanding of the matter, otherwise I can&#8217;t confidently give a public opinion. But for that to happen I need something I didn&#8217;t get until recently: specialized knowledge.</p>

<p>When I started my Biochemistry degree back in Spain I liked computers as a hobby  (I was a relatively early Linux user) but my real passion was molecular biology, the wet lab. Soon I realized how much tedious manual work and luck influenced in successful biological experiments. In the other hand, <em>experiments</em> using the computer were quick and you could somehow understand much better what was going on.</p>

<p>Later I worked for <a href="http://pandeylab.igm.jhmi.edu/akhilesh.html">Akhilesh Pandey</a> developing <a href="http://hprd.org/">HPRD</a>, a human protein database that included (and still does) plenty of high quality manually curated from scientific literature not found in any other biological databases. My role was a kind of bridge between the programmers, the curators and the biological requirements for the project. My research career started totally different to what it&#8217;s expected from a young researcher, instead of specializing in something first and from there try to understand later how what you study contributes to science in general, I had to first understand the global picture before trying to get into the nitty-gritty. That didn&#8217;t mean I didn&#8217;t want to get deep into different areas. The problem was that I wanted to get into too many things at the same time. After some time the fields where I really ended up focusing have become fewer. Now, instead of working in large teams, I work more isolated in a very specialized projects within the area of proteomics informatics.</p>

<p>Then if finally I ended up working in a very specialized topic was the global view experience a waste of time? Absolutely not. My way of thinking has been critically shaped since then. I&#8217;m the kind of person who has to have a clear reason of everything I do. Now it&#8217;s clear for me on what I want to focus on, without losing the context of everything I do. I know what I want to learn and for what. I still maintain more or less the same goals that I had when I got into research, the difference now is how I want to reach those goals. But I&#8217;ll leave my goals for another post.</p>

<p>In the end the lack of specialized knowledge has been a important blocker to start a blog. I didn&#8217;t feel with enough authority to write acceptable posts for concrete topics. Now in <a href="http://en.wikipedia.org/wiki/Python_(programming_language)">Python</a> and <a href="http://en.wikipedia.org/wiki/Proteomics">proteomics</a> I&#8217;m more or less getting above the knowledge threshold where I can start writing something critically.</p>

<p>At the same type proteomics informatics is a blend of very different specialized topics. In proteomics I still don&#8217;t totally understand how a <a href="http://www.chemguide.co.uk/analysis/masspec/howitworks.html">mass spectrometer</a> works but of so much parsing, merging, querying, filtering, and plotting mass spec data I got a good handle of what mass spec proteomics data is like. About programming I&#8217;m far from writing something I would qualify as <em>good code</em>. Even if my coding capabilities have improved over the years I qualify my code worse than I used to. In reality my code is better now, it&#8217;s just that, with time, I&#8217;m becoming more critical about what I would call good code. However I&#8217;m fully aware that being so critical with my own code is a <a href="http://www.codinghorror.com/blog/2009/07/nobody-hates-software-more-than-software-developers.html">symptom of programming competence</a>. I feel I&#8217;m on the right track, if I keep pushing like I&#8217;m doing I&#8217;ll eventually get to write something I could qualify as good enough. For now I feel competent enough in Python to start saying something meaningful about programming publicly.</p>

<p>The bottom line is that now I have some specific knowledge and a decent ability for writing. One would think that these factors provide the ideal scenario to start a blog. I know that having a blog is very important for my profession but that hasn&#8217;t be the last push to start now. I&#8217;m writing this post right now because I need it. Let me rephrase it: I&#8217;m not forcing myself to write a blot, I need to write.</p>

<p>What is really happening is that blogging is just part of a bigger transformation I&#8217;m going through. I&#8217;ve been lurking for quite some time different open source communities, initially just to learn more from people who are way better than me in programming. But lately I have been developing a great admiration for certain communities and  individuals. I need to give them something back. I&#8217;m gradually participating more in the community. Now I need show who I am, how I see the world, what I want in life and what do I admire and why.</p>

<p>That&#8217;s why I&#8217;m blogging.</p>
]]></content>
  </entry>
  
</feed>
