google refine - How to save only specific JSON elements in a new OpenRefine column -


{     "business_id": "sq0j7bgstazkvqlf5anqyq",     "full_address": "214 e main st\ncarnegie\ncarnegie, pa 15106",     "hours": {},     "open": true,     ** "categories": ["chinese", "restaurants"] ** ,     "city": "carnegie",     "review_count": 9,     "name": "don don chinese restaurant",     "neighborhoods": ["carnegie"],     "longitude": -80.0849615,     "state": "pa",     "stars": 2.5,     "latitude": 40.4083473,     "attributes": {         "take-out": true,         "alcohol": "none",         "noise level": "quiet",         "parking": {             "garage": false,             "street": false,             "validated": false,             "lot": false,             "valet": false         },         "delivery": true,         "has tv": true,         "outdoor seating": false,         "attire": "casual",         "waiter service": false,         "accepts credit cards": true,         "good kids": true,         "good groups": false,         "price range": 1     },     "type": "business" } 

value.parsejson()['categories'] create new column called 'categories' in openrefine, possible filter , keep 'chinese' value , remove other values?

in example above, grel expression:

value.parsejson()['categories'] 

results in array containing 2 values:

["chinese", "restaurants"] 

you can manipulate grel expressions act on arrays. example, select first value in array use:

value.parsejson()['categories'][0] 

which select first entry in array (increase number in square brackets @ end of expression select other entries in array)

if want filter on specific value in array use 'filter' expression:

filter(value.parsejson()['categories'],v,v=="chinese") 

this result in new array word "chinese" in (in above example). store in new column, need convert array string:

filter(value.parsejson()['categories'],v,v=="chinese").join("") 

to avoid issues case-sensitivity, , possibility of 'chinese' appearing more once in 'categories' array, i'd convert values lowercase first , de-duplicate array before converting string - end with:

filter(foreach(value.parsejson()["categories"],v,v.tolowercase()),w,w=="chinese").uniques().join("") 

Comments

Popular posts from this blog

Django REST Framework perform_create: You cannot call `.save()` after accessing `serializer.data` -

Why does Go error when trying to marshal this JSON? -