Spark ML Runs 10x Faster on GPUs, Databricks Says

October 27, 2016

Spark ML Runs 10x Faster on GPUs, Databricks Says

Alex Woodie

Apache Spark machine learning workloads can run up to 10x faster by moving them to a deep learning paradigm on GPUs, according to Databricks, which today announced that its hosted Spark service on Amazon’s new GPU cloud.

Databricks, the primary commercial venture behind Apache Spark, today announced that it’s now supporting TensorFrames, the new Spark library based on Google‘s (NASDAQ: GOOG) TensorFlow deep learning framework, on its hosted Spark service, which runs on Amazon Web Services (NASDAQ: AMZM). The deep learning service will be generally available within two weeks, the company says.

TensorFrames, which was unveiled this March as a technical preview, lets Spark harness TensorFlow for the purpose of programing deep neural networks, the primary computational method powering so-called “deep learning” algorithms. TensorFrames is also available to on-prem Spark users as a GitHub project, but it’s not yet available for download in the Apache Spark project, which limits its usefulness for the time being.

Earlier this month, AWS unveiled powerful GPU cloud instances based on the latest NVIDIA (NASDAQ: NVDA) Tesla K80 GPUs. With today’s announcement, Databricks is now officially supporting TensorFrames running on those AWS GPUs.


Common Spark machine learning tasks, such as image processing and text analysist, run up to 10 times faster using TensorFrames running on GPUs, Databricks said in a blog post today. What’s more, the code behind a simple numerical task like kernel density estimation was 3 times shorter using TensorFrames compared to using optimized Scala code, and it was four times less expensive to run, in terms of AWS resources (CPU vs. GPU).

The addition of deep learning to the super-popular Spark framework is important, Databricks says, because it allows Spark developers to perform a range of data analysis tasks—including data wrangling, interactive queries, and stream processing—within a single framework. That helps avoid the complexity inherent in using multiple frameworks and libraries.

Practical uses for Spark-based deep learning include image recognition, handwriting recognition, and language translation. Medical researchers could use TensorFrames and GPUs to better detect tumors in pathology images, the company says, while linguists would benefit from language translation that’s nearly on par with humans.

Databricks is preconfiguring TensorFrames on AWS. The software side of the setup includes Apache Spark, the TensorFrame library and initiation scripts, NVIDIA’s CUDA and cuDNN libraries (users can also use other deep learning libraries, such as Caffe). The hardware is composed of Amazon EC2 g2.2xlarge (1 GPU) and g2.8xlarge (4 GPUs) instance types. Databricks says that p2 (1-16 GPUs) instance types are coming soon.


Spark machine learning jobs run considerately faster and cheaper as TensorFrames on GPUs compared to optimized Scala code on CPUs, according to Databricks (image source: Databricks)

Databricks says the preconfiguration work saves each customer about 60% compared to configuring the setup themselves. The company further tweaks the Spark instance on the GPUs to prevent contention. “GPU context switching is expensive, and GPU libraries are generally optimized for running single tasks,” the company says in the blog. “Therefore, reducing Spark parallelism per executor results in higher throughput.”

While Databricks uses Apache Spark in its hosted service, the version of Spark that Databricks customers have access to is not available to the general public. So when will TensorFrames come to Apache Spark? That’s not yet clear.

At the Strata + Hadoop World conference last month, Databricks CEO Ali Ghodsi told Datanami that TensorFrames will eventually get its own library in the open source Apache Spark framework, right along with MLlib, SparkSQL, Spark Streaming, and GraphX. “It’s coming,” Ghodsi said.

Related Items:

Databricks CEO on Streaming Analytics, Deep Learning, and SQL

AWS Beats Azure to K80 General Availability

Share this:

Join the discussion Cancel reply

Your email address will not be published. Required fields are marked *


Name *

Email *


Notify me of follow-up comments by email.

Notify me of new posts by email.

Only registered users may comment. Register using the form below.

  • Check off newsletters you would like to receive*
    • HPCwire
    • EnterpriseTech
    • Datanami
    • Technology Conferences & Events
    • Advanced Computing Job Bank
    • Technology Product Showcase
  • Email*
  • Name*
    First Last
  • Organization*
  • Job Function*
    Technology: CIO/CTO/CSOTechnology: ConsultantTechnology: DeveloperTechnology: Data Center ManagementTechnology: Data Center OperationsTechnology: Data Intelligence ManagementTechnology: Data Scientist/AnalystTechnology: EngineeringTechnology: HPC ManagementsTechnology: HPC OperationsTechnology: IT ManagementsTechnology: IT OperationsBusiness: Business Development/SalesBusiness: CEO/President/OwnerBusiness: EVP/SVP/VPBusiness: ManagementBusiness: MarketingBusiness: OperationsBusiness: Product ManagementBusiness: OtherAcademia/EducationScienceResearch & DevelopmentOther
  • Industry*
    AerospaceAutomotiveEducationFinancial ServicesGovernmentHardware VendorHealthcare/Life SciencesManufacturingMedia/EntertainmentOil/Gas/EnergyResearch CenterRetailSoftware Vendor/ISVTransportation/UtilitiesTelecomVAR/VAD/IntegratorOther
  • Country*
    United StatesCanadaAfghanistanAlbaniaAlgeriaAmerican SamoaAndorraAngolaAntigua and BarbudaArgentinaArmeniaAustraliaAustriaAzerbaijanBahamasBahrainBangladeshBarbadosBelarusBelgiumBelizeBeninBermudaBhutanBoliviaBosnia and HerzegovinaBotswanaBrazilBruneiBulgariaBurkina FasoBurundiCambodiaCameroonCape VerdeCentral African RepublicChadChileChinaColombiaComorosCongo, Democratic Republic of theCongo, Republic of theCosta RicaCôte d'IvoireCroatiaCubaCyprusCzech RepublicDenmarkDjiboutiDominicaDominican RepublicEast TimorEcuadorEgyptEl SalvadorEquatorial GuineaEritreaEstoniaEthiopiaFijiFinlandFranceGabonGambiaGeorgiaGermanyGhanaGreeceGreenlandGrenadaGuamGuatemalaGuineaGuinea-BissauGuyanaHaitiHondurasHong KongHungaryIcelandIndiaIndonesiaIranIraqIrelandIsraelItalyJamaicaJapanJordanKazakhstanKenyaKiribatiNorth KoreaSouth KoreaKuwaitKyrgyzstanLaosLatviaLebanonLesothoLiberiaLibyaLiechtensteinLithuaniaLuxembourgMacedoniaMadagascarMalawiMalaysiaMaldivesMaliMaltaMarshall IslandsMauritaniaMauritiusMexicoMicronesiaMoldovaMonacoMongoliaMontenegroMoroccoMozambiqueMyanmarNamibiaNauruNepalNetherlandsNew ZealandNicaraguaNigerNigeriaNorwayNorthern Mariana IslandsOmanPakistanPalauPalestinePanamaPapua New GuineaParaguayPeruPhilippinesPolandPortugalPuerto RicoQatarRomaniaRussiaRwandaSaint Kitts and NevisSaint LuciaSaint Vincent and the GrenadinesSamoaSan MarinoSao Tome and PrincipeSaudi ArabiaSenegalSerbia and MontenegroSeychellesSierra LeoneSingaporeSlovakiaSloveniaSolomon IslandsSomaliaSouth AfricaSpainSri LankaSudanSudan, SouthSurinameSwazilandSwedenSwitzerlandSyriaTaiwanTajikistanTanzaniaThailandTogoTongaTrinidad and TobagoTunisiaTurkeyTurkmenistanTuvaluUgandaUkraineUnited Arab EmiratesUnited KingdomUruguayUzbekistanVanuatuVatican CityVenezuelaVietnamVirgin Islands, BritishVirgin Islands, U.S.YemenZambiaZimbabwe
  • City*
  • State*
    AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyomingArmed Forces AmericasArmed Forces EuropeArmed Forces Pacific
  • Province*
    AlbertaBritish ColumbiaManitobaNew BrunswickNewfoundland & LabradorNorthwest TerritoriesNova ScotiaNunavutOntarioPrince Edward IslandQuebecSaskatchewanYukon
    • Please check here to receive valuable email offers from Datanami on behalf of our select partners.
This iframe contains the logic required to handle AJAX powered Gravity Forms.

Previous Post
Next Post