Site Map
A directory tree view of all the pages on this wiki.
All blog posts; most recent first.
Blog Posts
-
moving my tech blog (already missing confluence)
created by
Dec 18, 2020
-
hive delta file compaction (minor and major)
created by
Dec 23, 2019
-
hive acid transactions with partitions (a behind the scenes perspective)
created by
Dec 22, 2019
-
viewing the content of ORC files (using the Java ORC tool jar)
created by
Dec 12, 2019
-
topology supervision features of streaming frameworks (or lack thereof)
created by
Mar 21, 2019
-
are partially-written hdfs files accessible? (not exactly, but much more yes than I previously thought)
created by
Mar 21, 2019
-
use spark to calculate salary statistics for georgia educators (the fourth book of the trilogy)
created by
Mar 09, 2019
-
building a c# storm topology (yes, it is a jvm-based framework)
created by
Jun 23, 2018
-
learning something new every day (seems hdfs is not as immutable as i thought)
created by
Oct 03, 2017
-
accepting "best answer" on hcc (it is the ~right~ thing to do)
created by
Apr 10, 2017
-
NoClassDefFoundError for Log4jLoggerFactory on hdp 2.5.3 when running the KafkaSpout in your topology? (how's that for a title?)
created by
Mar 10, 2017
-
initial hbase grants on a new secure hadoop cluster (without ranger)
created by
Mar 04, 2017
-
opening up a port on centos 7 firewall (using firewalld)
created by
Mar 02, 2017
-
my talk at devnexus (links to video and preso)
created by
Feb 26, 2017
-
why you should be a supporting "member" of oss (hey, it works for npr)
created by
Dec 28, 2016
-
how to counteract ageism? (maintain relevance!)
created by
Dec 28, 2016
-
don't be in such a hurry to change your job (the suck continuum explained)
created by
Oct 20, 2016
-
the agile manifesto (it is still a good idea)
created by
Aug 12, 2016
-
joining multiple datasets with pig (i/o courtesy of hcatloader & hcatstorer)
created by
Aug 08, 2016
-
storing dynamically created file names with pig (piggbank's multistorage to the rescue)
created by
Jul 26, 2016
-
unboxing my new little box (my first intel nuc)
created by
Jun 01, 2016
-
why spark's mapPartitions transformation is faster than map (calls your function once/partition, not once/element)
created by
May 19, 2016
-
performing a non-root ambari install (with hortonworks admin 1 course)
created by
Apr 19, 2016
-
need an overview of hadoop? (i need some reviewers)
created by
Mar 17, 2016
-
some "mandatory" training really is MANDATORY (termination threat is a clue)
created by
Feb 04, 2016
-
novel thoughts on national pride (ask not...)
created by
Jan 14, 2016
-
never try (well... if you employer doesn't value you or your opinion)
created by
Jan 12, 2016
-
authoring presentations with markdown (deckset gets you pretty far)
created by
Dec 09, 2015
-
viewing diffs between powerpoint decks (with a little help from adobe)
created by
Oct 16, 2015
-
transitioned to training (and loving it)
created by
Oct 15, 2015
-
hadoop mount points (more art than science)
created by
Sept 02, 2015
-
trying out hive testbench on hdp sandbox (and packaging it up for deployment elsewhere)
created by
Jul 07, 2015
-
installing hdp 2.2 with ambari 2.0 (moving to the amazon cloud)
created by
Jun 30, 2015
-
presenting at hadoop summit (archiving evolving databases in hive)
created by
Jun 11, 2015
-
installing hdp 2.2 with ambari 2.0 (moving to the azure cloud)
created by
May 06, 2015
-
connecting dbvisualizer to hive (running on hdp 2.2)
created by
Apr 10, 2015
-
got a hadoop question? (then ask lester!)
created by
Mar 28, 2015
-
took the pig/hive test (got a shiny new certificate)
created by
Mar 28, 2015
-
declaring work is beneath you (probably not the best course of action)
created by
Feb 25, 2015
-
os patching your hadoop cluster (pre & post rolling upgrades)
created by
Feb 13, 2015
-
parameterizing mapred.* properties (cli vs oozie)
created by
Feb 05, 2015
-
hadoop mini smoke test (VERY mini)
created by
Jan 10, 2015
-
a lightning quick tutorial on pdsh (for when you need to run the same command on many machines)
created by
Jan 09, 2015
-
help me go to belgium (not asking for money, just a couple of votes)
created by
Jan 08, 2015
-
improving datanode resiliency (it's all about the settings)
created by
Dec 02, 2014
-
simple hadoop cluster user provisioning process (simple = w/o pam or kerberos)
created by
Nov 24, 2014
-
hadoop worker node configuration recommendations (1-2-10)
created by
Nov 13, 2014
-
a patent for the "idea" of ingesting data into hadoop (is it really "sponge worthy"?)
created by
Oct 17, 2014
-
changes to hive's decimal datatype (it could cost you lots of pennies)
created by
Oct 15, 2014
-
stinger.next to the rescue (but you do have stinger.NOW tuning options available, well, "now")
created by
Oct 10, 2014
-
a robust set of hadoop master nodes (it is hard to swing it with two machines)
created by
Sept 15, 2014
-
obtained hortonworks' apache hadoop administrator certification (finally)
created by
Sept 05, 2014
-
hadoop security (it's all about layering)
created by
Aug 25, 2014
-
hadoop superuser (you can have more than 'hdfs')
created by
Aug 13, 2014
-
hadoop demystified presentation (with atlanta's .net user group)
created by
Jul 30, 2014
-
hadoop streaming with .net map reduce api (executing on hdp for windows)
created by
Jul 28, 2014
-
installing hdp on windows (and then running something on it)
created by
Jul 23, 2014
-
small files and hadoop's hdfs (bonus: an inode formula)
created by
Jul 11, 2014
-
volvo thinks like me (they've got two of my mantras covered)
created by
Jun 17, 2014
-
setting up hdp 2.1 with non-standard users for hadoop services (why not use a non-standard user for ambari, too)
created by
May 07, 2014
-
using your mac to install a virtualized hadoop cluster? (then setup a local repo on it)
created by
May 06, 2014
-
use hive to calculate salary statistics for georgia educators (third of a three-part series)
created by
Apr 30, 2014
-
use pig to calculate salary statistics for georgia educators (second of a three-part series)
created by
Apr 30, 2014
-
use mapreduce to calculate salary statistics for georgia educators (first of a three-part series)
created by
Apr 30, 2014
-
hadoop component versions by distributions (the open source ones)
created by
Apr 29, 2014
-
feeling a bit prolific (or maybe i'm just a smart aleck)
created by
Apr 14, 2014
-
manually installing hue (on my virtualized 5-node cluster)
created by
Apr 08, 2014
-
create and share a hive udf (the cli is your friend)
created by
Mar 29, 2014
-
create and share a pig udf (anyone can do it)
created by
Mar 29, 2014
-
stopping oozie from limiting the number of reducers on your hive action (just add some more xml)
created by
Mar 20, 2014
-
how do i load a fixed-width formatted file into hive? (with a little help from pig)
created by
Mar 06, 2014
-
visiting the computer history museum (yes, i'm a geek)
created by
Mar 01, 2014
-
confluence column width hack (where have you been!?!?)
created by
Jan 30, 2014
-
what's after the hortonworks sandbox? (a 5-node cluster!)
created by
Jan 20, 2014
-
building a virtualized 5-node HDP 2.0 cluster (all within a mac)
created by
Jan 19, 2014
-
disruptive possibilities (the rise of platform architecture)
created by
Dec 30, 2013
-
foxtrot and java (it still cracks me up)
created by
Dec 18, 2013
-
hadoop world 2013 (reflections from nyc)
created by
Nov 05, 2013
-
sometimes the pig walks to slaughter because he knows it is better for the farmer (or the team)
created by
Oct 24, 2013
-
too big to ignore (too boring to read)
created by
Oct 23, 2013
-
cat herding (bringing together information, ideas, and technologies)
created by
Oct 09, 2013
-
agile = more meetings? (wtf!)
created by
Oct 03, 2013
-
taking sides (finally)
created by
Aug 22, 2013
-
scaled agile framework (please share your experiences)
created by
Aug 12, 2013
-
fancy yourself a data scientist? (then show me the money!)
created by
Aug 08, 2013
-
fruITion and recrEAtion (a double-header book review)
created by
Jul 07, 2013
-
hadoop yarn (in a nutshell)
created by
Jun 27, 2013
-
just reboot it (when was that ever a good idea?)
created by
Jun 19, 2013
-
assholes and prima donnas (you need a few)
created by
Jun 03, 2013
-
i'm a certified hadoop developer (so, what does that mean?)
created by
Apr 11, 2013
-
hey, i'm here for you (but, you need to show up)
created by
Apr 09, 2013
-
published again (well… not really…)
created by
Mar 24, 2013
-
what the world needs now is another nosql preso (like i need a hole in my head)
created by
Feb 07, 2013
-
do we really have bugs? (did he really ask that?)
created by
Feb 03, 2013
-
how projects really start (it's all about the money)
created by
Jan 27, 2013
-
leadership principles (shouldn't the rangers know?)
created by
Jan 03, 2013
-
generalizing specialists (or is it specializing generalists?)
created by
Dec 20, 2012
-
we need collaborators (not a chief collaboration officer)
created by
Dec 03, 2012
-
emailing manifestos (just don't do it)
created by
Nov 28, 2012
-
lucky (it doesn't mean privileged)
created by
Nov 26, 2012
-
are you a mort, elvis or einstein (or are these labels nonsense)?
created by
Nov 20, 2012
-
enterprise 2.0 book review (using web 2.0 technologies within organizations)
created by
Nov 18, 2012
-
give as few orders as possible (encourage autonomy and responsibility)
created by
Nov 03, 2012