Analyzing 100 million rows with Amazon Redshift


ernesto ongaro
june 2014

Agenda: 

9.30 - 10.00am: Desayuno de bienvenida y presentación
10.00 - 10.30am: Analizando 100 millones de lineas de datos en segundos [demostración en vivo]
10.30 - 11.00am: Jaspersoft y el mercado de la Business Intelligence hoy
11:00 - 11.45: Últimas novedades de Jaspersoft version 5.6 [demostración en vivo]
11.45: Cierre del evento



What is Amazon Redshift?




  • Cloud hosted BI data-warehouse
  • Billed by the hour
  • Scales out
  • Performant
  • 100% compatible with Jaspersoft

Redshift Performance

050100150200250300350400450500550redshiftimpala - diskimpala - memshark - diskshark - memhive 0.10
Aggregation query with 253 million groups against Hadoop Hive, Imapala, Shark

source

Redshift Price




Effective Price per TB per Year
Size On-Demand 1yr RI 3yr RI
dw1.xlarge $4,161 $2,739 $1,245
dw1.8xlarge $4,161 $2,739 $1,245
dw2.large $16,425 $10,710 $6,593
dw2.8xlarge $19,163 $13,242 $6,593

source

Demo Data


The data consists of flight arrival and departure details for all commercial flights within the USA, from October 1987 to April 2008.


 ~105 million records

Demo time...

arrdelaydepdelay12345678910111201020-5-2.52.557.512.51517.501020-551525Frontier Airlines Inc.9arrdelay : -0.9036237471087124depdelay : 0.43




Questions?



Thank you!
@not_a_poet

Analyze 100 million Rows with Amazon Redshift

By ernestoo

Analyze 100 million Rows with Amazon Redshift

  • 2,137