How big the spark stream window could be? -


i have data flows need calculated. thinking use spark stream job. there 1 thing not sure , feel worry about.

my requirements :

data comes in csv files every 5 minutes. need report on data of recent 5 minutes, 1 hour , 1 day. if setup spark stream calculation. need interval 5 minutes. need setup 2 window 1 hour , 1 day.

every 5 minutes there 1gb data comes in. 1 hour window calculate 12gb (60/5) data , 1 day window calculate 288gb(24*60/5) data.

i not have experience on spark. worries me.

  1. can spark handle such big window ?

  2. how ram need calculation 288 gb data? more 288 gb ram? (i know may depend on disk i/o, cpu , calculation pattern. want estimated answer based on experience)

  3. if calculation on 1 day / 1 hour data expensive in stream. have better suggestion?


Comments

Popular posts from this blog

Django REST Framework perform_create: You cannot call `.save()` after accessing `serializer.data` -

Why does Go error when trying to marshal this JSON? -