PySpark分布式机器学习与大数据分析(Distributed Machine Learning and Big Data Analysis with PySpark)

From cslt Wiki
Jump to: navigation, search

Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley, the Spark codebase was later donated to the Apache Software Foundation that has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance.