Combine multiple rows that contain a duplicate value into a single row, you choose the row or features that pass through

Similar to DropRows , but the instruction block lets the chef differentiate among all the matching rows and select one or reconstruct a composite row.

Instruction file arguments can contain multiple columns to fold on!

Helpful for disambiguating multiple events that happen on a single day.

Advanced Instruction

Signature

( ns: String, foldColumns: Seq[Iccn], foldValues: Seq[String], rows: Seq[IccnNamespaceRow] ) : Seq[IccnNamespaceRow]

Remember that FoldRows instructions return a Sequence of rows, enabling you to build Many to Many mappings.

Row.upsert is a helpful technique when consolidating different features from multiple rows in to a single result.

Combinations

Advanced Technique: AddColumn, then FoldRows! Adding a new identifier column, then folding it is a powerful combination you can use to synthesize or reduce a key space before deduplicating or sorting. A classic example is Social Security Number scrubbing: removing dashes, zero padding, the works.

Samples

FoldRows BenefitsDecisionDate.scala

// FoldRows BenefitsDecisionDate.scala Sample
// (ns: String, foldColumns: Seq[Iccn], foldValues: Seq[String], rows: Seq[IccnNamespaceRow] ) : Seq[IccnNamespaceRow]

val sortBy = "BenefitsDecisionDate"

// Sort rows by time and use the last one
Seq(rows.sortBy( _.date(ns, sortBy) ).reverse.head)

FoldRows PersonnelNumber.scala

One of my data ingredients contained multiple rows per employee ID, and I needed to find if ANY of them contained a true flag.

// FoldRows FoldRows PersonnelNumber Sample
// Do any of the rows have 'reviewed' flag true? If not, return the first one.
// (ns: String, foldColumns: Seq[Iccn], foldValues: Seq[String], rows: Seq[IccnNamespaceRow] ) : Seq[IccnNamespaceRow]

val bestRow = rows.find( _.stringEmpty("reviewed").size > 0 ).getOrElse( rows.head )

/*
// Println debugging example!
if ( rows.size > 1 ) {
  println(s"FoldRows personnelNumber ${foldValues} ->  ${rows.map{_.stringEmpty("reviewed")}.mkString(", ")}  to  ${bestRow}")
}
*/

// Fold Many to One
Seq(bestRow)