python - how to filter a spark dataframe by a boolean column -


i created dataframe has following schema:

in [43]: yelp_df.printschema() root  |-- business_id: string (nullable = true)  |-- cool: integer (nullable = true)  |-- date: string (nullable = true)  |-- funny: integer (nullable = true)  |-- id: string (nullable = true)  |-- stars: integer (nullable = true)  |-- text: string (nullable = true)  |-- type: string (nullable = true)  |-- useful: integer (nullable = true)  |-- user_id: string (nullable = true)  |-- name: string (nullable = true)  |-- full_address: string (nullable = true)  |-- latitude: double (nullable = true)  |-- longitude: double (nullable = true)  |-- neighborhoods: string (nullable = true)  |-- open: boolean (nullable = true)  |-- review_count: integer (nullable = true)  |-- state: string (nullable = true) 

now want select records "open" column "true". shown below, lots of them "open".

business_id          cool date       funny id                   stars text                 type     useful user_id              name               full_address         latitude      longitude      neighborhoods open review_count state 9ykzy9papeippouje... 2    2011-01-26 0     fwkvx83p0-ka4js3d... 4     wife took me h... business 5      rltl8zkdx5vh5nax9... morning glory cafe 6106 s 32nd st ph... 33.3907928467 -112.012504578 []            true 116          az    zrjwvlyzejq1vaihd... 0    2011-07-27 0     ijz33sjrzxqu-0x6u... 4     have no idea wh... business 0      0a2kyel0d3yb1v6ai... spinato's pizzeria 4848 e chandler b... 33.305606842  -111.978759766 []            true 102          az    6orac4uyjcsjl1x0w... 0    2012-06-14 0     ieslbzqucldszsqm0... 4     love gyro pla... business 1      0ht2ktfliobpvh6cd... haji-baba          1513 e  apache bl... 33.4143447876 -111.913032532 []            true 265          az    _1qqzuf4zzoyfcvxc... 1    2010-05-27 0     g-wvgaisbqqamhlnn... 4     rosie, dakota, an... business 2      uzetl9t0ncrogoyff... chaparral dog park 5401 n hayden rd ... 33.5229454041 -111.90788269  []            true 88           az    6ozycu1rpktng2-1b... 0    2012-01-05 0     1ujfq2r5qfjg_6exm... 4     general manager s... business 0      vymm4ktsc8zfqbg-j... discount tire      1357 s power road... 33.3910255432 -111.68447876  []            true 5            az    

however following command run in pyspark returns nothing:

yelp_df.filter(yelp_df["open"] == "true").collect() 

what right way it?

you're comparing data types incorrectly. open listed boolean value, not string, doing yelp_df["open"] == "true" incorrect - "true" string.

instead want do

yelp_df.filter(yelp_df["open"] == true).collect() 

this correctly compares values of open against boolean primitive true, rather non-boolean string "true".


Comments

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -