Combine multiple rows that contain a duplicate value into a single row, you choose the row or features that pass through

Similar to [[DropRows]], but the instruction block lets the chef differentiate among all the matching rows and select one or reconstruct a composite row.

Instruction file arguments can contain multiple columns to fold on!

Helpful for disambiguating multiple events that happen on a single day.

Advanced Instruction

Signature

( ns: String, foldColumns: Seq[Iccn], foldValues: Seq[String], rows: Seq[IccnNamespaceRow] ) : Seq[IccnNamespaceRow]

Remember that FoldRows Insturctions expect a Sequence of rows to be returned, enabling you to build Many to Many mappings.

Combinations

Advanced Technique: AddColumn, then FoldRows! Adding a new identifier column, then folding it is a powerful combination you can use to synthesize or reduce a key space before deduplicating or sorting. A classic example is Social Security Number scrubbing: removing dashes, zero padding, the works.

Examples

FoldRows BenefitsDecisionDate.scala

//(ns: String, foldColumns: Seq[Iccn], foldValues: Seq[String], rows: Seq[IccnNamespaceRow]) : Seq[IccnNamespaceRow]

val sortBy = "BenefitsDecisionDate"

// Sort rows by time and use the last one
Seq(rows.sortBy( _.date(ns, sortBy) ).reverse.head)

FoldRows personnelNumber.scala

One of my data ingredients contained multiple rows per employee ID, and I needed to find if ANY of them contained a true flag.

// Do any of the rows have 'reviewed' flag true? If not, return the first one.

val bestRow = rows.find( _.stringEmpty("reviewed").size > 0 ).getOrElse( rows.head )

/*
// Println debugging example!
if ( rows.size > 1 ) {
  println(s"FoldRows personnelNumber ${foldValues} ->  ${rows.map{_.stringEmpty("reviewed")}.mkString(", ")}  to  ${bestRow}")
}
*/

// Fold Many to One
Seq(bestRow)