Tech Talk #10

Jaspersoft ETL Architecture

Alan Kavanagh &  Ernesto Ongaro

Agenda


  • ETL Basics
  • Jaspersoft ETL Editions
  • Architecture
  • Installation
  • The Job Designer (Talend Studio)
  • The Admin Center (TAC)
  • Q&A

ETL Basics




[E]xtract 
[T]ransform
[L]oad 

Variations

(replica of source systems)
[E]
[L]

(transformations occur in database)
[E]
[L]
[T]

Why ETL?

  • Speed - Source systems are usually optimized for writing small bits of data at once (OLTP)
  • Complexity - reporting and analytics from source data is usually too complex
  • Join multiple systems - correlate between sales and marketing for example
  • Data Quality - Take machine data and turn it into business data 

Types of Transforms

  • Selecting only columns you want
  • Translating values (1= male, 2= female)
  • Deriving calculated values (qty * unit_price = sale_amt)
  • Joining multiple systems
  • Aggregations (sales per day)


Data Quality:
  • Duplicates
  • Missing fields
  • Standardization
  • Linking - fuzzy logic

Community vs Commercial


Biggest difference:
Once you design a job in the community editions, you have to export as a JAR and you're on your own. Scheduling, failures, etc - you're on your own.

Commercial edition comes with a web app to manage all this.

There's also some differences in the designer...

(Studio)


  • CDC (Change Data Capture)
  • Data Viewer
  • Versioning/Shared Repo
  • Metadata wizards


Architecture..lots of moving pieces!


The pieces:

  • Studio: Desktop application for designing jobs (analogous to iReport)
  • Admin Center: J2EE Web App for managing jobs, users (analogous to JasperReports Server)
    • CmdLine: generates and deploys processes to a JobServer
    • Database: Like our repo, database for internals
  • SVN Server: Code is checked in and out here automatically by Studio and Admin Center
  • JobServer(s): Run the actual ETL jobs




Installation

Like JasperReports Server


  • Bundled install with Tomcat + H2, JobServer, CommandLine and JobServer

OR

  • Studio, AMC, JobServer, Admin Center downloaded seprarately

Link to install notes

Jaspersoft ETL Demo



Job Designer
Start Job Server + Command Line
Job workflows (publish)

Q&A




thank you!

Upcoming topics:
  • March  19 (GMT-7) OLAP vs Domains
  • March 26 (GMT) Linux Installation Tips


http://www.jaspersoft.com/external/jaspersoft-tech-talks

jaspersoftetl

By ernestoo

jaspersoftetl

  • 2,541