Creating a table to query in Spark using Python -


i'm trying load file directly s3 , trying sparksql on it. eventually, plan on bringing in multiple files multiple tables (1:1 map between files , tables).

so i've been following this tutorial @ describing each step. i'm bit stuck on declaring proper schema , variable, if any, referred in from clause in sql statement.

here's code:

sqlcontext.sql("create table if not exists region (region_id int, name string, comment string)") region = sc.textfile("s3n://thisisnotabucketname/region.tbl")  raw_data = sc.textfile("s3n://thisisnotabucketname/region.tbl") csv_data = raw_data.map(lambda l: l.split("|")) row_data = csv_data.map(lambda p: row( region_id=int(p[0]), name=p[1], comment=p[2] ))  interactions_df = sqlcontext.createdataframe(row_data) interactions_df.registertemptable("interactions")  tcp_interactions = sqlcontext.sql(""" select region_id, name, comment region region_id > 1 """) tcp_interactions = sqlcontext.sql(""" select * region """) tcp_interactions.show() 

and here's sample data. there no header

0|africa|lar deposits. blithely final packages cajole. regular waters final requests. regular accounts according | 1|america|hs use ironic, requests. s| 2|asia|ges. thinly pinto beans ca| 

tcp_interactions.show() returning nothing. header of region_id|name|comment|. doing incorrectly? in sql statement, region pointing region variable declared in first line of code, or pointing else?


Comments

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -