Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think you just left out the most important bit of the story... What exactly did you do in order to reduce it from 25 hours to 30 minutes?


Quick steps:

* Create new table on disk.

* Create temp table in RAM.

* Start chunked import.

* Import 50k rows in to temp table.

* Dump temp table to new table.

* Clear temp table.

* Import next 50k rows.

* Repeat.

* Drop old table, rename new table on disk to regular name

Each row being imported was updating indices. Just removing indexes, then importing, did speed it up, but not as dramatically. Importing most to RAM table, then chunking to the final table. That was a core key.

I'd indicated early on "let's try temp memory tables". It was dismissed ("we tried that" and also "what's that?"). So I did my own tests, and it was pretty dramatic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: