google refine - How to save only specific JSON elements in a new OpenRefine column -
{ "business_id": "sq0j7bgstazkvqlf5anqyq", "full_address": "214 e main st\ncarnegie\ncarnegie, pa 15106", "hours": {}, "open": true, ** "categories": ["chinese", "restaurants"] ** , "city": "carnegie", "review_count": 9, "name": "don don chinese restaurant", "neighborhoods": ["carnegie"], "longitude": -80.0849615, "state": "pa", "stars": 2.5, "latitude": 40.4083473, "attributes": { "take-out": true, "alcohol": "none", "noise level": "quiet", "parking": { "garage": false, "street": false, "validated": false, "lot": false, "valet": false }, "delivery": true, "has tv": true, "outdoor seating": false, "attire": "casual", "waiter service": false, "accepts credit cards": true, "good kids": true, "good groups": false, "price range": 1 }, "type": "business" }
value.parsejson()['categories']
create new column called 'categories'
in openrefine, possible filter , keep 'chinese'
value , remove other values?
in example above, grel expression:
value.parsejson()['categories']
results in array containing 2 values:
["chinese", "restaurants"]
you can manipulate grel expressions act on arrays. example, select first value in array use:
value.parsejson()['categories'][0]
which select first entry in array (increase number in square brackets @ end of expression select other entries in array)
if want filter on specific value in array use 'filter' expression:
filter(value.parsejson()['categories'],v,v=="chinese")
this result in new array word "chinese" in (in above example). store in new column, need convert array string:
filter(value.parsejson()['categories'],v,v=="chinese").join("")
to avoid issues case-sensitivity, , possibility of 'chinese' appearing more once in 'categories' array, i'd convert values lowercase first , de-duplicate array before converting string - end with:
filter(foreach(value.parsejson()["categories"],v,v.tolowercase()),w,w=="chinese").uniques().join("")
Comments
Post a Comment