Spark 자료구조

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

without haste but without rest

Spark 자료구조 본문

Scrap

Spark 자료구조

JinungKim 2021. 12. 27. 00:31

RDD, DF, DataSet

내부 로직의 어떤 차이로 인해 속도 차이가 발생하는지 찾아보기

https://www.analyticsvidhya.com/blog/2020/11/what-is-the-difference-between-rdds-dataframes-and-datasets/

Differences Between RDDs, Dataframes and Datasets in Spark

Apache spark continues to be the first choice for the data engineers. Understand the difference between RDDs, Dataframes and Datasets in spark

www.analyticsvidhya.com

https://github.com/apache/spark

GitHub - apache/spark: Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark - A unified analytics engine for large-scale data processing - GitHub - apache/spark: Apache Spark - A unified analytics engine for large-scale data processing

github.com

코드를 뜯어보고 이해하는 게 더빠를듯

https://community.cloudera.com/t5/Support-Questions/why-dataframes-are-faster-in-all-lnaguages/td-p/122507

why dataframes are faster in all lnaguages?

Why spark dataframes are faster in scala/python. the same is not the case with RDD's. RDD's created in scala are faster than the one in python.

community.cloudera.com

df는 어떤 언어쓰던 Catalyst Optimizer를 사용한다.

저작자표시 (새창열림)

'Scrap' 카테고리의 다른 글

Airflow architecture & Celery executor (0)	2022.03.04
iterm 단어 단위로 이동 단축키 설정 (0)	2021.12.30
Kudu를 이용한 빅데이터 다차원 분석 시스템 개발 - NAVER D2 (0)	2021.12.13
읽어볼 자료 - Airbnb Minerva (0)	2021.12.08
[SK Planet T Academy] GCP pub/sub을 활용한 데이터 파이프라인 강의 (0)	2020.10.15

'Scrap' Related Articles

Comments

without haste but without rest

Spark 자료구조 본문

Spark 자료구조

'Scrap' 카테고리의 다른 글

티스토리툴바