-
Notifications
You must be signed in to change notification settings - Fork 708
Scalding Sources
Scalding sources are how you get data into and out of your scalding jobs. There are several useful sources baked into the project and a few more in the scalding-commons repository. Here are a few basic ones to get you started:
-
To read a text file line-by-line, use
TextLine(filename)
. For every line infilename
, this source creates a tuple with two fields:-
line
contains the text in the given line -
offset
contains the byte offset of the given line withinfilename
-
-
To read or write a tab or comma-separated values file use
TypedText.tsv
andTypedText.csv
respectively. Source here.Example:
import com.twitter.scalding.source.TypedText
TypedText.tsv[(String, Int)]("input")
.map {
case (name, age) =>
s"$name is $age years old"
}
-
To create a pipe from data in a Scala
Iterable
, use theIterableSource
. For example,IterableSource(List(4,8,15,16,23,42), 'foo)
will create a pipe with a field'foo
.IterableSource
is especially useful for unit testing. -
A
NullSource
is useful if you wish to create a pipe for only its side effects (e.g., printing out some debugging information). For example, although defining a pipe asCsv("foo.csv").debug
without a sink will create ajava.util.NoSuchElementException
, adding a write to aNullSource
will work fine:Csv("foo.csv").debug.write(NullSource)
.
- Scaladocs
- Getting Started
- Type-safe API Reference
- SQL to Scalding
- Building Bigger Platforms With Scalding
- Scalding Sources
- Scalding-Commons
- Rosetta Code
- Fields-based API Reference (deprecated)
- Scalding: Powerful & Concise MapReduce Programming
- Scalding lecture for UC Berkeley's Analyzing Big Data with Twitter class
- Scalding REPL with Eclipse Scala Worksheets
- Scalding with CDH3U2 in a Maven project
- Running your Scalding jobs in Eclipse
- Running your Scalding jobs in IDEA intellij
- Running Scalding jobs on EMR
- Running Scalding with HBase support: Scalding HBase wiki
- Using the distributed cache
- Unit Testing Scalding Jobs
- TDD for Scalding
- Using counters
- Scalding for the impatient
- Movie Recommendations and more in MapReduce and Scalding
- Generating Recommendations with MapReduce and Scalding
- Poker collusion detection with Mahout and Scalding
- Portfolio Management in Scalding
- Find the Fastest Growing County in US, 1969-2011, using Scalding
- Mod-4 matrix arithmetic with Scalding and Algebird
- Dean Wampler's Scalding Workshop
- Typesafe's Activator for Scalding