添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
相关文章推荐
鼻子大的饭盒  ·  java AES 128 ...·  1 年前    · 
风流倜傥的乒乓球  ·  vue.js3.2.20: ...·  1 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

AnalysisException: u"cannot resolve 'name' given input columns: [ list] in sqlContext in spark

Ask Question

I tried a simple example like:

data = sqlContext.read.format("csv").option("header", "true").option("inferSchema", "true").load("/databricks-datasets/samples/population-vs-price/data_geo.csv")
data.cache() # Cache data for faster reuse
data = data.dropna() # drop rows with missing values
data = data.select("2014 Population estimate", "2015 median sales price").map(lambda r: LabeledPoint(r[1], [r[0]])).toDF()

It works well, But when i try something very similar like:

data = sqlContext.read.format("csv").option("header", "true").option("inferSchema", "true").load('/mnt/%s/OnlineNewsTrainingAndValidation.csv' % MOUNT_NAME)
data.cache() # Cache data for faster reuse
data = data.dropna() # drop rows with missing values
data = data.select("timedelta", "shares").map(lambda r: LabeledPoint(r[1], [r[0]])).toDF()
display(data)

It raise error: AnalysisException: u"cannot resolve 'timedelta' given input columns: [ data_channel_is_tech,...

off-course I imported LabeledPoint and LinearRegression

What could be wrong?

Even the simpler case

df_cleaned = df_cleaned.select("shares")

raises same AnalysisException (error).

*please note: df_cleaned.printSchema() works well.

I found the issue: some of the column names contain white spaces before the name itself.

data = data.select(" timedelta", " shares").map(lambda r: LabeledPoint(r[1], [r[0]])).toDF()

worked. I could catch the white spaces using

assert " " not in ''.join(df.columns)  

Now I am thinking of a way to remove the white spaces. Any idea is much appreciated!

.builder \ .appName("Python Spark SQL basic example") \ .config("spark.some.config.option", "some-value") \ .getOrCreate() df=spark.read.csv(r'test.csv',header=True,sep='^') print("#################################################################") print df.printSchema() df.createOrReplaceTempView("test") re=spark.sql("select max_seq from test") print(re.show()) print("################################################################")

2) Input file,here 'max_seq ' contains space so we are getting bellow exception

Trx_ID^max_seq ^Trx_Type^Trx_Record_Type^Trx_Date
Traceback (most recent call last):
  File "D:/spark-2.1.0-bin-hadoop2.7/bin/test.py", line 14, in <module>
    re=spark.sql("select max_seq from test")
  File "D:\spark-2.1.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\sql\session.py", line 541, in sql
  File "D:\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in __call__
  File "D:\spark-2.1.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\sql\utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u"cannot resolve '`max_seq`' given input columns: [Venue_City_Name, Trx_Type, Trx_Booking_Status_Committed, Payment_Reference1, Trx_Date, max_seq , Event_ItemVariable_Name, Amount_CurrentPrice, cinema_screen_count, Payment_IsMyPayment, r

2) Remove space after 'max_seq' column then it will work fine

Trx_ID^max_seq^Trx_Type^Trx_Record_Type^Trx_Date
17/03/20 12:16:25 INFO DAGScheduler: Job 3 finished: showString at <unknown>:0, took 0.047602 s
17/03/20 12:16:25 INFO CodeGenerator: Code generated in 8.494073 ms
  max_seq
only showing top 20 rows
##############################################################
                I'm experiencing the same error with a parquet file. I cannot change the header separator. Your solution doesn't fix completely the issue (title does not specify CSV file only). Any suggestion?
– Jérémy
                May 4, 2021 at 12:58
As there were tabs in my input file, removing the tabs or spaces in the header helped display the answer.
My example:
saledf = spark.read.csv("SalesLTProduct.txt", header=True, inferSchema= True, sep='\t')
saledf.printSchema()
|-- ProductID: string (nullable = true)
|-- Name: string (nullable = true)
|-- ProductNumber: string (nullable = true)
saledf.describe('ProductNumber').show()
 +-------+-------------+
 |summary|ProductNumber|
 +-------+-------------+
 |  count|          295|
 |   mean|         null|
 | stddev|         null|
 |    min|      BB-7421|
 |    max|      WB-H098|
 +-------+-------------+

If you don't have whitespaces in headers, this error also raised when you not specify headers for csv at all like this:

df = sqlContext.read.csv('data.csv')

So you need to change it to this:

df = sqlContext.read.csv('data.csv', header=True)

Recently, I came across this issue while working on Azure synapse analytics; my error was the same.

analysisexception: cannot resolve '`xxxxxx`' given input columns: [];; 'filter ('passenger_count > 0) +- relation[] csv traceback (most recent call last):
 file "/opt/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1364, in filter jdf = self._jdf.filter(condition._jc) file "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__ answer, self.gateway_client, self.target_id, self.name)
 file "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 75, in deco raise analysisexception(s.split(': ', 1)[1], stacktrace)""

This error came due to improper wording in our code or in CSV file use this code to read csv file:

-df = spark.read.load("examples/src/main/resources/people.csv",
                     format="csv", sep=";", inferSchema="true", header="true")

If you are again stuck somewhere in synapse or pyspark visit this site FOR Error info: https://docs.actian.com/avalanche/index.html#page/User/Common_Data_Loading_Error_Messages.htm

and for more info visit documentation: https://spark.apache.org/docs/latest/api/python/

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.