添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
相关文章推荐
酒量小的登山鞋  ·  - A9VG电玩部落·  1 年前    · 
会开车的火龙果  ·  腾讯动漫·  1 年前    · 
开朗的小刀  ·  中南大学就业信息网·  1 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I've been debugging this for hours and I have failed so far.

I have a stand-alone spark cluster and a Minio server with docker compose. I did it based on "Adding some MinIO to your standalone Apache Spark cluster by Vasileios Anagnostopoulos."

I have had a whole different experience so far compared to what he had in that article. after figthing with different bugs now I got one last problem, the spark cluster does not access the credentials!

  • I am running it locally I am not using an ec2 instance.

  • I know AWS looks for the credentials in .AWS/credentials the java system properties and environment variables.

  • As I am not using an ec2 instance I opted for the environment variables.

  • I have AWS_ACCESS_KEY_ID=theroot AND AWS_SECRET_ACCES_KEY=theroot123 in my docker-compose file for the master and worker node.

  • I have checked inside the containers and i do have the enviroment variables set.

  • I am coping my custom spark-defaults.conf to my container conf folder, the conf file looks like this:

    spark.ui.reverseProxy true

    spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory

    spark.hadoop.fs.s3a.committer.name directory

    spark.hadoop.fs.s3a.committer.staging.tmp.path /tmp/spark_staging

    spark.hadoop.fs.s3a.buffer.dir /tmp/spark_local_buf

    spark.hadoop.fs.s3a.committer.staging.conflict-mode fail

    spark.hadoop.fs.s3a.access.key theroot

    spark.hadoop.fs.s3a.secret.key theroot123

    spark.hadoop.fs.s3a.endpoint http://my-minio-server:9000

    spark.hadoop.fs.s3a.connection.ssl.enabled false

    spark.hadoop.fs.s3a.path.style.access true

    spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem

    spark.hadoop.fs.s3a.attempts.maximum 0

    spark-submit command:

    spark-submit --packages org.apache.hadoop:hadoop-aws:3.3.4 --master spark://127.0.0.1:7077 spark-access-minio.py
    

    Errors log:

    Caused by: com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
        Caused by: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
        java.nio.file.AccessDeniedException: s3a://mybucket/addresses.csv: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
    

    What am I missing?

    Thanks!

    I ended up making it work like this. but still I want to know how to set up in the conf file.

    from pyspark.sql import SparkSession
    from pyspark import SparkConf
    spark = SparkSession\
        .builder\
        .appName("Test json")\
        .config("spark.hadoop.fs.s3a.endpoint", "http://my-minio-server:9000")\
        .config("spark.hadoop.fs.s3a.access.key", 'theroot')\
        .config("spark.hadoop.fs.s3a.secret.key", 'theroot123')\
        .config("spark.hadoop.fs.s3a.path.style.access", True)\
        .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")\
        .getOrCreate()
    log4jLogger = spark.sparkContext._jvm.org.apache.log4j
    LOGGER = log4jLogger.LogManager.getLogger(__name__)
    sourceBucket = "mybucket"
    inputPath = f"s3a://{sourceBucket}/addresses.csv"
    outputPath = f"s3a://{sourceBucket}/output_survey4.csv"
    df = spark.read.option("header", "true").format("s3selectCSV").csv(inputPath)
    df.write.mode("overwrite").parquet(outputPath)
    spark.stop()
                    Hi, if this is an update to your question, add it there. You can edit your question. Here it will be lost.
    – Martin54
                    Feb 12 at 5:12
            

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.

  •