Sunday, November 29, 2009

Those lying numbers


There are more than one way to contribute to open source projects. Unfortunately, eclipse dash only shows one aspect of it, the code; and even for this one aspect it does not capture the reality of things. Why is this? Dash counts CVS commit and this misses the key aspect of who authored the code.


So why am I getting to this today? Because people are using dash as the scarecrow on diversity, but I believe it does not represent the reality of each project. Here the case study of p2:

1% for Cloudsmith? 1% for EclipseSource? WTF? This does not look very diverse... Unfortunately these numbers shows exactly what I want: they are bogus. They do not represent the reality of the investment done by those two companies or the number of patches received by individuals. Indeed, Thomas H., Henrik L. and Ian B. have all been regular contributors to the project and know a lot of the code base. In fact I'm sure that if IBM was to pull the plug on the project it would carry on just fine (probably with even more freedom since I would be gone ). Their companies have products based on p2 (I believe this to be a sign of commitment for the size of p2), they come to every call, and are not afraid of taking on big issues, etc...


So why are the numbers so low?

  • Patches committed for others. I have been committing a lot of patches either from the community or on behalf of Thomas H. Unfortunately this again inflate the IBM numbers to the detriment of Cloudsmith or "individuals".
  • Lately the code has been very much in flux caused by a large refactoring (package rename, etc) which inflates the commit count and dilute others commits.
  • Number of IBM committers. IBM has more committers than others on the project thus allowing for more code to be produced. However if those companies were to increase their number of participants (wink, wink) to a number equal to those of IBM, they would then be at par. Maybe should we compare the companies based on the average commit per committer (e.g commitCount / committer).


I'm sure that I'm missing other factors about why those numbers are so low, but you get the point... Though I recognize that almost every project would use a little more diversity, we have to be careful on how numbers are being used. If we want to use dash as a reliable hint on the activity and diversity, then we should revise how the numbers are being computed to take into account: patch author instead of committer, activity in bugs, activity on ML, number of ppl asking questions in forums, etc...

Project diversity

When a project is not diverse, I find it a bit too easy to blame it on the actual company who started the project, the people steering it or how inviting to contribution the project is. Though I don't want to underplay those points, I argue that there are other things that matter:

Relevance of the project: maybe the topic of your project does not interest anyone but you or your company. Sorry.


Timing of the project: you can have the coolest technology, if you are too late or too early, it will be harder to excite the crowds.


Pace of the project: the project goes too fast for others to follow or committers to accept external contributions (e.g. the pace imposed by the company's internal schedule is such that it does not allow for the external contributions to be considered by committers). Conversely The project goes too slow for anyone to be willing to bet on it.


Quality of the project: the project is running well, builds are regular, deliverables are on schedule, bugs and enhancements are dealt with quickly. The community gets what it needs, why should it care?


Amount of code: is there enough code to show the direction of the project. People are happy to work on a project but I think they feel more conformable starting from a working code base than a whiteboard.


Now if I relate that to p2, where we have a diverse community of contributors (IBM, Cloudsmith, EclipseSource, University of Lille-Artois) I think we have been lucky because p2 came at the right time, solving a real pain point. Indeed, the same year we announced p2 (EclipseCon 2007), there was at least 5 talks on how to manage Eclipse, and Cisco announced the creation of the Mayinstall project. As for code goes, we already had a functional prototype and we continued developing it in the open, holding public calls every week. All that said, I believe that p2 would still be a single-company-developed project, if the companies who joined the force had not had a business interest to contribute.


Where does leave us? Luck. I would argue that much like for any other success, the success and diversity of a project just happen to be a combination of preparation and timing with of course a zest of hard-work and persistence.


Friday, November 27, 2009

Nesting categories

A recurring topic around categorization in p2 is the ability to nest categories.


Like for the categorization of bundles, the trick consists in using the eclipse feature editor to express the dependencies and thus construct the desired nesting.

Steps:
First phase, creation of the inner category.
  1. Create a feature project called InnerCategory1. In our example this will be the feature containing the elements to be shown categorized.
  2. Turn this feature into a category by creating a p2.inf and filling it with:
    properties.1.name=org.eclipse.equinox.p2.type.category
    properties.1.value=true
  3. Remove everything from the build.properties of the feature.
  4. Use the features and plug-ins tab of the feature editor to add content to be categorized.
This concludes the creation of the inner category. The following steps create the "top level" category.
  1. Create a feature project called TopLevelCategory
  2. Turn this feature into a category creating a p2.inf and filling it with:
    properties.1.name=org.eclipse.equinox.p2.type.category
    properties.1.value=true
  3. Remove everything from the build.properties of the feature.
  4. In the features tab of the editor, add InnerCategory1
  5. Export the top level feature enabling metadata generation
You can find the code of this example on the wiki.
Happy categorization, happy provisioning!

Categorizing plug-ins

Despite what is commonly believed and what I have repeated to several occasions, p2 does allow for the installation of anything, anything being for example just bundles. It just happens that today, with the practices adopted over the years, features have grown to be the primary way of delivering features.
I'm now showing how to create a p2 repository whose category refers to bundles.
Steps:
  1. Create a new feature project. This adds the category property to the installable unit, thus allowing the UI to recognize the feature as a category to be displayed.
  2. Create a file named p2.inf in the feature project and paste in the following two lines:
    properties.1.name=org.eclipse.equinox.p2.type.category
    properties.1.value=true

  3. Include the plug-ins you want to see being categorized. This is where you define what goes in the category. Each entry in the plug-ins list will be shown under the category.

  4. Remove everything from the build.properties included in the feature. This causes PDE to not generate a feature.jar.
  5. Export the feature enabling metadata generation.

You can find a zip of this example on the wiki.
Happy categorization, happy provisioning!

Sunday, November 01, 2009

p2 is going public

I mean API public. Therefore we are soliciting input from people who have been using our provisional API or people that are looking into using it. To do so, please open a bug report capturing your use cases as well as stating the problems you have been experiencing with the current provisional API.

Thanks in advance.