sql - Slow PostgreSQL query with (incorrect?) indexes -

- July 15, 2015

i have events table 30 million rows. following query returns in 25 seconds

select distinct "events"."id", "calendars"."user_id"  "events"  left join "calendars" on "events"."calendar_id" = "calendars"."id"  "events"."deleted_at" null  , tstzrange('2016-04-21t12:12:36-07:00', '2016-04-21t12:22:36-07:00') @> lower(time_range)  , ("status" null or (status->>'pre_processed') null)

status jsonb column index on status->>'pre_processed'. here other indexes created on events table. time_range of type tstzrange.

create index events_time_range_idx on events using gist (time_range); create index events_lower_time_range_index on events(lower(time_range)); create index events_upper_time_range_index on events(upper(time_range)); create index events_calendar_id_index on events (calendar_id)

i'm out of comfort zone on , trying reduce query time. here's output of explain analyze

  hashaggregate  (cost=7486635.89..7486650.53 rows=1464 width=48) (actual time=26989.272..26989.306 rows=98 loops=1)   group key: events.id, calendars.user_id   ->  nested loop left join  (cost=0.42..7486628.57 rows=1464 width=48) (actual time=316.110..26988.941 rows=98 loops=1)     ->  seq scan on events  (cost=0.00..7475629.43 rows=1464 width=50) (actual time=316.049..26985.344 rows=98 loops=1)           filter: ((deleted_at null) , ((status null) or ((status ->> 'pre_processed'::text) null)) , ('["2016-04-21 19:12:36+00","2016-04-21 19:22:36+00")'::tstzrange @> lower(time_range)))           rows removed filter: 31592898     ->  index scan using calendars_pkey on calendars  (cost=0.42..7.50 rows=1 width=48) (actual time=0.030..0.031 rows=1 loops=98)           index cond: (events.calendar_id = (id)::text) planning time: 1.468 ms execution time: 26989.370 ms

and here explain analyze events.deleted_at part of query removed

hashaggregate  (cost=7487382.57..7487398.33 rows=1576 width=48) (actual time=23880.466..23880.503 rows=115 loops=1)   group key: events.id, calendars.user_id   ->  nested loop left join  (cost=0.42..7487374.69 rows=1576 width=48) (actual time=16.612..23880.114 rows=115 loops=1)     ->  seq scan on events  (cost=0.00..7475629.43 rows=1576 width=50) (actual time=16.576..23876.844 rows=115 loops=1)           filter: (((status null) or ((status ->> 'pre_processed'::text) null)) , ('["2016-04-21 19:12:36+00","2016-04-21 19:22:36+00")'::tstzrange @> lower(time_range)))           rows removed filter: 31592881     ->  index scan using calendars_pkey on calendars  (cost=0.42..7.44 rows=1 width=48) (actual time=0.022..0.023 rows=1 loops=115)           index cond: (events.calendar_id = (id)::text)

planning time: 0.372 ms execution time: 23880.571 ms

i added index on status column. else there , i'm unsure how proceed going forward. suggestions on how query time down more manageable number?

the b-tree index on lower(time_range) can used conditions involving <, <=, =, >= , > operators. @> operator may rely on these internally, far planner concerned, range check operation black box, , can't make use of index.

you need reformulate condition in terms of b-tree operators, i.e.:

lower(time_range) >= '2016-04-21t12:12:36-07:00' , lower(time_range) < '2016-04-21t12:22:36-07:00'

Search This Blog

Arrya Code

sql - Slow PostgreSQL query with (incorrect?) indexes -

Comments

Post a Comment

Popular posts from this blog

ios - Memory not freeing up after popping viewcontroller using ARC -

Django REST Framework perform_create: You cannot call `.save()` after accessing `serializer.data` -

Why does Go error when trying to marshal this JSON? -