As the title suggests, this posting was something I came up with AFTER I published the first three installments of my Open Georgia Analysis way back in 2014. And yes, you might have also noticed I took a long break from blogging about Big Data technologies in 2018 and I’m hoping to change that for 2019. On the other hand, my personal blog had a LOT of fun entries due to a TON of international travel in 2018.
...
Now that looks a bit better than the output of the RDD’s take()
method!!
Calculate Statistics via API
Trim down to just educator records for 2010.
Code Block | ||||
---|---|---|---|---|
| ||||
>>> filtered = teachers.filter(teachers['orgType'] == 'LBOE').filter(teachers['year'] == 2010)
>>> filtered.show(2)
+---------------+--------------------+-------+--------+-------------------+------+----+
| name| org|orgType| salary| title|travel|year|
+---------------+--------------------+-------+--------+-------------------+------+----+
|ABBOTT,DEEDEE W|ATLANTA INDEPENDE...| LBOE| 52122.1|GRADES 9-12 TEACHER| 0.0|2010|
| ABBOTT,RYAN V|ATLANTA INDEPENDE...| LBOE|56567.24| GRADE 4 TEACHER| 0.0|2010|
+---------------+--------------------+-------+--------+-------------------+------+----+ |
kdfjkdsfjkdfsj
Code Block | ||||
---|---|---|---|---|
| ||||
>>> from pyspark.sql.functions import min, max, avg, count, col
>>> expr = [min(col("salary")),max(col("salary")),avg(col("salary"))]
>>> filtered.groupBy("title").agg(*expr).sort("title").show(10)
+--------------------+-----------+-----------+------------------+
| title|min(salary)|max(salary)| avg(salary)|
+--------------------+-----------+-----------+------------------+
|ADAPTED PHYS ED T...| 19384.24| 96320.19|56632.125714285714|
|ADULT EDUCATION D...| 182.4| 179041.16| 39572.89157894737|
|ADULT EDUCATION T...| 775.2| 60668.84|19230.814603174604|
|AFTER-SCHOOL PROG...| 624.04| 78493.98|39559.009999999995|
|ALTERNATIVE SCHOO...| 111199.8| 127149.12|119174.45999999999|
| ASSISTANT PRINCIPAL| 3418.53| 119646.31| 76514.0633944955|
| ATHLETICS DIRECTOR| 122789.04| 122789.04| 122789.04|
| ATTENDANCE WORKER| 7553.42| 23392.9|12826.486666666666|
| AUDIOLOGIST| 36329.59| 102240.46| 73038.71333333333|
| AUDITOR| 5380.63| 83811.88| 66145.27090909092|
+--------------------+-----------+-----------+------------------+ |
WORK IN PROGRESS