FoldRows
Similar to DropRows , but the instruction block lets the chef differentiate among all the matching rows and select one or reconstruct a composite row.
Instruction file arguments can contain multiple columns to fold on!
Helpful for disambiguating multiple events that happen on a single day.
Advanced Instruction
Signature
( ns: String, foldColumns: Seq[Iccn], foldValues: Seq[String], rows: Seq[IccnNamespaceRow] ) : Seq[IccnNamespaceRow]
Remember that FoldRows instructions return a Sequence of rows, enabling you to build Many to Many mappings.
Row.upsert is a helpful technique when consolidating different features from multiple rows in to a single result.
Combinations
Advanced Technique: AddColumn, then FoldRows! Adding a new identifier column, then folding it is a powerful combination you can use to synthesize or reduce a key space before deduplicating or sorting. A classic example is Social Security Number scrubbing: removing dashes, zero padding, the works.
Samples
FoldRows BenefitsDecisionDate.scala
// FoldRows BenefitsDecisionDate.scala Sample
// (ns: String, foldColumns: Seq[Iccn], foldValues: Seq[String], rows: Seq[IccnNamespaceRow] ) : Seq[IccnNamespaceRow]
val sortBy = "BenefitsDecisionDate"
// Sort rows by time and use the last one
Seq(rows.sortBy( _.date(ns, sortBy) ).reverse.head)
FoldRows PersonnelNumber.scala
One of my data ingredients contained multiple rows per employee ID, and I needed to find if ANY of them contained a true flag.
// FoldRows FoldRows PersonnelNumber Sample
// Do any of the rows have 'reviewed' flag true? If not, return the first one.
// (ns: String, foldColumns: Seq[Iccn], foldValues: Seq[String], rows: Seq[IccnNamespaceRow] ) : Seq[IccnNamespaceRow]
val bestRow = rows.find( _.stringEmpty("reviewed").size > 0 ).getOrElse( rows.head )
/*
// Println debugging example!
if ( rows.size > 1 ) {
println(s"FoldRows personnelNumber ${foldValues} -> ${rows.map{_.stringEmpty("reviewed")}.mkString(", ")} to ${bestRow}")
}
*/
// Fold Many to One
Seq(bestRow)