Nicolas Raoul's blog: 2011

Sunday, November 27, 2011

AnkiDroid 1.0 released!

Yesterday I released version 1.0 of AnkiDroid (flashcards app for Android).
With already 140.000 users, why is AnkiDroid only at version 1.0?
We released many 0.x versions, and now we feel the app is ready to be called 1.0!
The number of users has been multiplied by 8 in less than a year:

With 1.0, today is a good opportunity to describe how the AnkiDroid community of contributors works.

1) We do all we can to be friendly with all newcomers. We answer all questions and thank users for using the product.

2) We encourage everybody to participate, at every level. We introduce users to the bug tracker, and liberally give them rights to edit the Wiki, which makes them feel they are contributors rather than just users. Similarly, everyone is able to fork and edit the code, without having to ask the permission to anyone. Localization is the archetype of this spirit: Anyone can translate strings, and they are included in AnkiDroid automatically. Like on Wikipedia, consensus is the rule.

Why do we need to keep involving more and more people? Well, contributors have day jobs or exams, so their individual participation includes periods of activity alternating with periods of inactivity. See for instance this timeline of contributions, one color per contributor:

The huge majority of contributors are volunteers, but some people are also paid to contribute. Notably, the first Simplified Chinese localization has been sponsored by a Chinese company.

Geographically, developers have always been coming from very various countries: Egypt, Japan, Germany, Sweden, Spain, Brazil... Beta-testers come from virtually all over the globe, with even one in Antarctica. Thanks to this diversity, tricky issues with right-to-left languages or special characters are detected and fixed early. AnkiDroid is now available in 27 languages.

Feel free to comment if you have any question!

Nicolas Raoul

Friday, November 18, 2011

Targeting China? Be prepared for additional web development costs

If you want your web application to tap into the huge Chinese market, I have a bad news for you.

Until April 2011, IE6 was still the #1 browser in China. Now it is still at 37%.
That's bad news, because IE6 is an old browser with many bugs. Making your website usable on IE6 is very complex, it will typically double your UI custom widgets development costs.

If you are not targeting China, then do like Google Apps, Facebook, Twitter and YouTube: don't invest time in making your website compatible with IE6. Actually 87% of the global top-30 websites offer a sub-optimal experience to IE6 users.

But if you are targeting China, be prepared. In particular:

Don't use Bootstrap, Kendo UI, and most web frameworks as they don't work on IE6.
Use Web frameworks that are committed to support IE6: 960 Grid System, HTML5 Boilerplate, GWT, among others.

Chrome recently became the second most-used browser in China, but guess what was the second-most used web browser earlier this year? Not Firefox, not Safari. It was Maxthon, from Hong Kong. It is based on Trident, just like Internet Explorer. There is also 搜狗, that can use IE and Chrome as a rendering engine. Finally, claiming 172 million active users, 360安全 is also based on Trident, and shows the same faults as IE6.

Nicolas Raoul

新しいオープンソースプロジェクト：Troois

ニコラです。

昨日Trooisという新しいオープンソースプロジェクトをリリースしました。Google App Engineで動作するAmazon S3のクローンです。

Amazon S3はファイルを書きこんだり、読み込んだりするアプリケーションにすごく便利です。とくに：
・クラウドの複数のノードで動作しているアプリケーション（この場合はファイルシステムを使えません）
・Heroku（クラウド PaaS）を使ってる場合。Herokuのファイルシステムはリードオンリーであり、Herokuのデータベースは費用が高いリソースです。

しかしAmazon S3は有料です。安価ですが、無料の選択肢があれば、下記の場合にメリットがあります：
・請求書などの複雑性を減らしたい。
・ランニングコストをゼロにしたい。例：非営利プロジェクト
・莫大な支払いのリスクを避けたい。有料サービスの場合、プログラムエラーや意図的犯行で莫大な支払いが起こる可能性があります。

Amazon S3の無料版がなかったので、作りました。Google App Engineで動作します。ソースコードをAffero General Public Licenseでオープンソースにしたので、皆さんは自由に使うことができますし、オープンにすればソースコードの変更もできます。

プロジェクトの名前は、「Troois」にしました。S3のフランス語発音「S trois」の「trois」に、Googleの「oo」を入れました。日本語の発音で「トローワ」かな…

Amazon S3と同じく、REST HTTP POSTで送ったファイルを、REST HTTP GETでダウンロードできます。

是非あなたのGoogle App Engineインスタンスで試して、フィードバックを送ってください。そして、是非是非新しい機能にご協力ください！

@nicolas_raoul

Friday, October 28, 2011

Where to run your background jobs for free?

Until now, Google App Engine was widely seen as the perfect solution to running jobs on the cloud for free.

It is indeed powerful and convenient, offering for instance cron and task queue APIs.

I have been using GAE a lot for not-for-profit projects I am involved in, and was enchanted. The dream will end in 3 days, as this email from Google told me:

"As part of Google's long-term commitment to App Engine, we are also updating our policies, pricing and support model to reflect its status as a fully supported Google product"

In 3 days, GAE will get more expensive. Many applications will switch from the free zone to the paying zone, and they have 4 options:

- Pay

- Make the jobs faster, by writing smarter code or reducing the scope.

- Let the jobs unprocessed after quota is reached, which might be acceptable for some apps.

- Switch to an alternative service.

Under the new pricing, GAE offers 9 hours of backend instance, but most jobs will run into another limit much sooner: only 50.000 database writes are OK is the free zone. So, is it time to switch to Celadon Cedar stack to benefit from Heroku's new pricing ?

PaaS offer	CPU time	Database operations
Google App Engine	9 hours	50.000
Heroku	720 hours	Unlimited

The CPU time is not directly comparable, but that's still quite a difference. So, where's the catch? Well, on Heroku you either have free frontend OR free backend. If you want one worker dyno for free, you must use zero web dyno. The consequence is that implementing any kind of web interface to control your delayed_jobs is a real challenge.

@nicolas_raoul

Tuesday, October 11, 2011

AnkiDroid presentation in Roppongi Hills

On Thursday I will give a presentation about AnkiDroid, in Japanese!

Place: Roppongi Hills, Mori Tower 2F, Hills Space

Time: 19:00~22:00, 2011 October 13th

Admission: 1000 JPY with one drink.

Other people will present various other creative projects, should be a lot of fun!

In the morning of the same day, I will be at the eDocumentJapan conference as a Japanese/English interpreter for IT pioneer John Newton.

Monday, October 3, 2011

ECM presentations at eDocument Japan

We are giving two presentations at the eDocument Japan conference:

- The place of Open Source in the global ECM market

- Social Content Management with Open Source software Alfresco

The presentations will be in English, translated to Japanese.

Tokyo Big Sight, October 13th, 10:00 AM and 12:20 PM.

Alfresco's CTO in person is coming to Japan for the occasion.

Organized by JIIMA (Japan Image and Information Management Association)

Wednesday, August 24, 2011

Giving a presentation about Alfresco in Shinjuku

On Thursday I will give a presentation about Alfresco in Shinjuku, Tokyo.

I still have to think about the details, but I will probably be presenting the basics of Alfresco and how to set up content management rules.

I will be speaking in Japanese.

Place: 東京都新宿区百人町2-27-6 関東ITソフトウェア健保会館
Time: 25th of August, 2011 (Thu) 19:00
Price: Free registration here

Tuesday, July 19, 2011

AnkiDroid one of "The 100 best Android apps"!

AnkiDroid has been growing a lot recently: the number of users has quadrupled in the last 6 months!

With now 75.000 installs, AnkiDroid has just been selected by makeuseof.com as one of The 100 Best Android Apps!
Congratulations to all of the team!

I will soon release AnkiDroid 0.8 with a lot of bug fixes, and a version with a totally re-engineered database and a much more efficient SRS algorithm should be out before the end of the year.

Tuesday, March 22, 2011

Alfresco accreditation

I just received the certificate for the Alfresco accreditation I passed a few weeks ago.

Three of my colleagues also managed to get it, so the whole company has just been declared an Alfresco Recognized Partner, allowing us to use the shiny green badge!

Tuesday, February 15, 2011

Applying Business Intelligence to Bug Tracking

Last week I released AnkiDroid 0.5.1, and judging by the Android Market's comments, people seem to love it :-)

Since a few releases already, an opt-in feedback mechanism sends us a report everytime a problem happens. The anonymous reports are automatically scanned by a Google App Engine application to determine whether it is a new bug or just an additional occurrence of an already known bug. The data can be exploited in two ways:

1) An online application allows one to browse the reports and bugs, see which bugs happen the most often for a given version, and associate them with issues in the bug tracker.

2) A set of Business Intelligence tools allows one to drill-down reports in a multidimensional OLAP cube, and generate reports to show any interesting findings. As a quick example, here is the distribution of crashes among Android versions. Those tools use the open source Pentaho Business Intelligence suite.

Friday, February 4, 2011

Alfresco: Categories vs. Spaces

Managing huge amounts of documents requires to know the limits of the ECM software you are using. Here is a study I performed about the limits and best strategies for Alfresco.

Categories vs. Spaces

In Alfresco, documents are usually hierarchized in spaces (kind of folder). But how about using Alfresco's "categories" feature instead of spaces?

Note: For all graphs in this article, horizontal axis = number of documents, vertical axis = time taken in milliseconds

This graph shows the time taken to show a space, based on the number of document this space contains, and the same for a category. There are no sub-categories nor sub-spaces involved.
For the same number of documents, categories show faster than spaces. That is especially true for above 100 documents. In a space, time is proportional to the number of documents. In a category, time is more logarithmic.
For huge numbers of documents, categories show in less than 3 seconds, whereas spaces take a very long time only to show Java errors related to a shortage of memory.

Impact of the spaces hierarchy on performance

To measure the impact of hierarchy, a comparison was done between two file spaces organization strategies:
(1) All files in about five spaces.
(2) Each file contained in its own 3 levels of sub-spaces (subspace1/subspace2/subspace3/file).

This graph shows the time taken by Alfresco's explorer to show a category, based on the number of documents that are shown.
Surprisingly, having the files scattering in a lot of different folders is more efficient.
Alfresco seems to have difficulties handling many files in the same space.
This has to be taken into account when analyzing the category performance tests, they use strategy (1), the slowest.

Impact of the number of categories applied

This graph shows the time taken by Alfresco's explorer to show a category, based on the number of documents in this category.
Two tests have been done, with a different numbers of categories.
As one would expect, if each document has 3 categories, it is faster than if each document has 10 categories.

Impact of the size of the repository on performances

This graph shows the time taken by Alfresco's explorer to show a category, based on the number of categorized documents in the repository.
Each document has 3 categories randomly selected from a pool of 20 existing categories.
The different curves show different usages of the explorer:
- Navigation in the categories tree view, with subcategories inclusion checked/unchecked.
- With or without 10000 additional uncategorized documents.
- First click or after three clicks (to measure cache performance)
- Search in a root category (Software Document Classification) including subcategories
- Search in a leaf category (Configuration Description)

Performance seem to be proportional to the number of documents in the repository at first, and then become more stable after 6000 documents.
Cached requests don't take more than 3 seconds, even with a repository of 160000 documents, which means a result set of 24000 documents.
On the contrary, search time grows consistently with the size of the repository.
The light blue curve's values are surprising and might be an artifact to the state of the database during the measure. Usual values are expected to be closer to the brown curve.

Method

Using Google Chromium and its Speed Tracer extension, I measured the time between the DOM click and the end of the processing (excluding repaints that occur after the page is shown completely).

Conditions:
- Alfresco Enterprise 3.2 with heap.maxsizesize = 500MB
- Ubuntu Karmic 2009.10 with Sun Java HotSpot 1.6.0_15
- Laptop with Intel Core Duo T9600 2.8GHz and 4GB RAM

Notes:
Empty categories are shown in about 600 milliseconds. But once, with 10000 categorized documents plus 10000 uncategorized documents, a particular empty category took 6 seconds to load, consistently.
Even for a well-defined operation, performance is not very predictable.

Conclusion

Categories show faster than spaces in the Alfresco explorer, especially when they contain large numbers of documents.
On huge repositories, performances are slow at start, but that get better once requests have been cached, most pages take less than 3 seconds to load.

Depending on the requirements, performances of categories might be deemed acceptable, there is no bottleneck or operation that takes more than 10 seconds.

However, some features are not available to someone who would use the Alfresco's “Categories” tree view exclusively, for instance:
- No permissions settings based on categories.
- No content rules settings based on categories.
- No "Add content" button when browsing categories.

Monday, January 24, 2011

Just passed the Alfresco accreditation

Back in 2009, I was in Milan designing the future Alfresco accreditation tests. Those tests will be offered to anyone, starting from summer 2011.

Because my company Aegif is an Alfresco partner, we have just been subjected to the test! Some of the questions have been written by me (what? unfair?), but all-in-all it was a bit more difficult than I expected. There are 137 questions (some multiple choice, some multiple response) to answer in 60 minutes. Some questions are very specific (which file does what) and some more general (which feature is not available).

Now I can add "Alfresco Recognized Developer" to my titles ;-)
More importantly, my company becomes an "Alfresco Recognized Partner".

Wednesday, January 12, 2011

Automatically deploy Alfresco WCM content to an FTP server

In Alfresco WCM, deploying means generating "baked" web content from XML content and templates, into a local FSR directory. Here is how to take this further and also deploy the content to your web server via FTP automatically.

Make sure you have both Alfresco WCM and an FSR (now called File System Deployment Target) installed and working.
Edit the Deployment Server's deployment/default-target.xml file and add a "postCommit" section linking to a postcommit script (example).
Create the postcommit script, calling the lftp tool in mirror mode (example).

Unfortunately, the default Alfresco Deployment server does not report anything about the script's activity and potential errors. To see or log messages, please download Alfresco's source code, modify ProgramRunnable.java like this, recompile, overwrite alfresco-deployment-3.3.2.jar with the one you just generated, and then restart the server.

Nicolas Raoul's blog