Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

As the title suggests, this posting was something I came up with AFTER I published the first three installments of my Open Georgia Analysis way back in 2014. And yes, you might have also noticed I took a long break from blogging about Big Data technologies in 2018 and I’m hoping to change that for 2019. On the other hand, my personal blog had a LOT of fun entries due to a TON of international travel in 2018.

...

Now that looks a bit better than the output of the RDD’s take() method!!

Calculate Statistics via API

Trim down to just educator records for 2010.

Code Block
breakoutModewide
languagepy
>>> filtered = teachers.filter(teachers['orgType'] == 'LBOE').filter(teachers['year'] == 2010)
>>> filtered.show(2)
+---------------+--------------------+-------+--------+-------------------+------+----+
|           name|                 org|orgType|  salary|              title|travel|year|
+---------------+--------------------+-------+--------+-------------------+------+----+
|ABBOTT,DEEDEE W|ATLANTA INDEPENDE...|   LBOE| 52122.1|GRADES 9-12 TEACHER|   0.0|2010|
|  ABBOTT,RYAN V|ATLANTA INDEPENDE...|   LBOE|56567.24|    GRADE 4 TEACHER|   0.0|2010|
+---------------+--------------------+-------+--------+-------------------+------+----+

kdfjkdsfjkdfsj

Code Block
breakoutModewide
languagepy
>>> from pyspark.sql.functions import min, max, avg, count, col
>>> expr = [min(col("salary")),max(col("salary")),avg(col("salary"))]
>>> filtered.groupBy("title").agg(*expr).sort("title").show(10)
+--------------------+-----------+-----------+------------------+               
|               title|min(salary)|max(salary)|       avg(salary)|
+--------------------+-----------+-----------+------------------+
|ADAPTED PHYS ED T...|   19384.24|   96320.19|56632.125714285714|
|ADULT EDUCATION D...|      182.4|  179041.16| 39572.89157894737|
|ADULT EDUCATION T...|      775.2|   60668.84|19230.814603174604|
|AFTER-SCHOOL PROG...|     624.04|   78493.98|39559.009999999995|
|ALTERNATIVE SCHOO...|   111199.8|  127149.12|119174.45999999999|
| ASSISTANT PRINCIPAL|    3418.53|  119646.31|  76514.0633944955|
|  ATHLETICS DIRECTOR|  122789.04|  122789.04|         122789.04|
|   ATTENDANCE WORKER|    7553.42|    23392.9|12826.486666666666|
|         AUDIOLOGIST|   36329.59|  102240.46| 73038.71333333333|
|             AUDITOR|    5380.63|   83811.88| 66145.27090909092|
+--------------------+-----------+-----------+------------------+

WORK IN PROGRESS