Performance Metrics

Application metrics

How quickly can I reduce how many events? Depends on
reduction factor
size per event
how much of the event is accessed during reduction (to make decision (skimming) and also to pass on to output (slimming))

System metrics

memory usage and caching strategy
- I/O metrics
- spark inbuilt metrics
- CPU time of all executors
- time spent on garbage in garbage collection, time in serialization
- from HDFS you get rows and data read from HDFS
measure network traffic, important for reading from EOS