Tuesday, November 16, 2010

Aggregation for Pentaho/InfoBright

Infobright is a column-oriented database, so it is efficient for Business Intelligence (InfiniDB is worth checking too). But my current BI project has 24 dimensions so I also needed aggregates to reach a good level of performance.

Infobright is usually more or less supported by most of the Pentaho suite, but sure not by the Aggregation Designer. I must be one of the first persons who tried, I have found many bugs and workarounds. It was a tough ride, but finally I have found a way to make it work, so here it is!

1) Open PAD and design your aggregate the usual way. Ignore the primary key errors.
2) When you're done, in "Export and Publish", execute the DDL.
3) Try executing the DML, but chances are it will fail. This is probably because Infobright does not support INSERT well.
4) So instead, click on "Preview" and copy-paste the DML SQL code.
5) Open PDI and create a "Table Input" step with the copied SQL's select-from-groupby portion.
6) Connect it to an Infobright Loader step that will write the data into the table created by the DDL.
7) Run the transformation. You can use cron and pan.sh to run it automatically every night.
8) Back to PAD's "Export and Publish", you would normally publish your updated schema, but it results in a NullPointerException.
9) So, export your updated schema
10) Open it with Schema Workbench, ignore the primary key errors, and publish from there.
11) That's all!

Tested with: pad-ce-1.2.1.RC1 biserver-ce-3.7.0.RC1 pdi-ce-4.1.0-RC1 infobright-3.4.2-0

1 comment:

  1. Hi Nicolas,

    I'm glad to see you were able to integrate Pentaho's Aggregation Designer. How has the performance measured for you thus far?

    I appreciate you working to resolve all the issues and being active in our community forums. Please feel free to reach out to me if you have any questions, comments, or suggestions.


    Jeff Kibler
    Community Manager