hadoop worker node configuration recommendations (1-2-10)

If you really know how many variables are at play in a "typical" Hadoop cluster (including which components to use & what use cases are most important to you) it is easy to see where there aren't too many node sizing guides published out there.  That said, I'll go out on a limb and offer my personal sizing guide for worker nodes.

Spec out your worker nodes with multiples of the following logical building block.

1 Hard Drive (1-4TB in size)  –  2 CPU Cores  –  10 GB of RAM

With that approach, here are some typical worker node configurations based on these variables.

T-Shirt SizeDrivesCoresMemory
Small4848GB
Medium81696GB
Large1224128GB

That said, what I see a LOT of is a box with 12 2TB drives, dual 8 core processors, and 128GB of RAM.

What about master nodes, well... keep it simply and order the same boxes and if you happen to be in the 12 disk setup then pull out about 6 of the drives to help you with the inevitable failures that will occur across your cluster!