without haste but without rest

Spark 자료구조 본문

Scrap

Spark 자료구조

JinungKim 2021. 12. 27. 00:31

RDD, DF, DataSet 

내부 로직의 어떤 차이로 인해 속도 차이가 발생하는지 찾아보기

 

https://www.analyticsvidhya.com/blog/2020/11/what-is-the-difference-between-rdds-dataframes-and-datasets/

 

Differences Between RDDs, Dataframes and Datasets in Spark

Apache spark continues to be the first choice for the data engineers. Understand the difference between RDDs, Dataframes and Datasets in spark

www.analyticsvidhya.com

 

https://github.com/apache/spark

 

GitHub - apache/spark: Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark - A unified analytics engine for large-scale data processing - GitHub - apache/spark: Apache Spark - A unified analytics engine for large-scale data processing

github.com

코드를 뜯어보고 이해하는 게 더빠를듯

 

 

https://community.cloudera.com/t5/Support-Questions/why-dataframes-are-faster-in-all-lnaguages/td-p/122507

 

why dataframes are faster in all lnaguages?

Why spark dataframes are faster in scala/python. the same is not the case with RDD's. RDD's created in scala are faster than the one in python.

community.cloudera.com

df는 어떤 언어쓰던 Catalyst Optimizer를 사용한다.

Comments