Application metrics
- How quickly can I reduce how many events? Depends on
- reduction factor
- size per event
- how much of the event is accessed during reduction (to make decision (skimming) and also to pass on to output (slimming))
System metrics
- memory usage and caching strategy
- I/O metrics
- spark inbuilt metrics
- CPU time of all executors
- time spent on garbage in garbage collection, time in serialization
- from HDFS you get rows and data read from HDFS
- measure network traffic, important for reading from EOS