Lazy Pandas

PyOhio 2013
Lightning Talk

Ron DuPlain
ron.duplain@gmail.com


photo credit: mine

Motivation

  1. Preprocess some dataset into one or more tables.
  2. Load the data into pandas.
  3. ... without aggressively consuming memory.

Too large to fit into memory?


  1. Use a query to pre-filter data.
  2. Watch for memory consumption in the middleware!

Obstacle


I have tried ...


  • Solr's CSV writer with pysolr,
    direct urlopen, or requests.
  • Direct SQLAlchemy query iteration.


... and ended up consuming 2GB of memory
from a 40MB CSV file on disk.

(SQLAlchemy is still awesome!)


Code





Lazy Pandas

By Ron DuPlain

Lazy Pandas

An experiment in lazily loading data into a Python pandas v0.12.0 DataFrame.

  • 4,008