Skip to content

Commit

Permalink
typelevel#385 - document chaing with and without using a thrush
Browse files Browse the repository at this point in the history
  • Loading branch information
chris-twiner committed Jun 14, 2023
1 parent 13701db commit 9e55b49
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 1 deletion.
3 changes: 2 additions & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,8 @@ lazy val docs = project
.settings(sparkMlDependencies(sparkVersion, Compile))
.settings(
addCompilerPlugin("org.typelevel" % "kind-projector" % "0.13.2" cross CrossVersion.full),
scalacOptions += "-Ydelambdafy:inline"
scalacOptions += "-Ydelambdafy:inline",
libraryDependencies += "org.typelevel" %% "mouse" % "1.2.1"
)
.dependsOn(dataset, cats, ml)

Expand Down
29 changes: 29 additions & 0 deletions docs/FeatureOverview.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ val aptTypedDs2 = aptDs.typed
```

## Typesafe column referencing

This is how we select a particular column from a `TypedDataset`:

```scala mdoc
Expand Down Expand Up @@ -646,6 +647,34 @@ withCityInfo.select(
).as[AptPriceCity].show().run
```

### Chained Joins

Joins, or any similar operation, may be chained using a thrush combinator removing the need for intermediate values. Instead of:

```scala mdoc
val withBedroomInfoInterim = aptTypedDs.join(citiInfoTypedDS).inner { aptTypedDs('city) === citiInfoTypedDS('name) }
val withBedroomInfo = withBedroomInfoInterim
.join(bedroomStats).left { withBedroomInfoInterim.col('_1).field('city) === bedroomStats('city)}

withBedroomInfo.show().run()
```

you can use thrush from [mouse](https://github.com/typelevel/mouse):

```scala
libraryDependencies += "org.typelevel" %% "mouse" % "1.2.1"
```

```scala mdoc
import mouse.all._

val withBedroomInfo = aptTypedDs.join(citiInfoTypedDS).inner { aptTypedDs('city) === citiInfoTypedDS('name) }
.thrush( interim => join(bedroomStats).left { interim.col('_1).field('city) === bedroomStats('city)} )

withBedroomInfo.show().run()
```


```scala mdoc:invisible
spark.stop()
```

0 comments on commit 9e55b49

Please sign in to comment.