Overview
A Falcon recipe is a static process template with parameterized workflow to realize a specific use case. Recipes are defined in user space. Recipes will not have support for update or lifecycle management.
For example:
- Replicating directories from one HDFS cluster to another (not timed partitions)
- Replicating hive metadata (database, table, views, etc.)
- Replicating between HDFS and Hive - either way
- Data masking etc.
Proposal
Falcon provides a Process abstraction that encapsulates the configuration for a user workflow with scheduling controls. All recipes can be modeled as a Process with in Falcon which executes the user workflow periodically. The process and its associated workflow are parameterized. The user will provide a properties file with name value pairs that are substituted by falcon before scheduling it. Falcon translates these recipes as a process entity by replacing the parameters in the workflow definition.
Falcon CLI recipe support
Recipe command usage is defined here.
CLI accepts recipe option with a recipe name and optional tool and does the following:
- Validates the options; name option is mandatory and tool is optional and should be provided if user wants to override the base recipe tool
- Looks for <name>-workflow.xml, <name>-template.xml and <name>.properties file in the path specified by falcon.recipe.path in client.properties. If files cannot be found then Falcon CLI will fail
- Invokes a Tool to substitute the properties in the templated process for the recipe. By default invokes base tool if tool option is not passed. Tool is responsible for generating process entity at the path specified by FalconCLI
- Validates the generated entity
- Submit and schedule this entity
- Generated process entity files are stored in tmp directory
Base Recipe tool
Falcon provides a base tool that recipes can override. Base Recipe tool does the following:
- Expects recipe template file path, recipe properties file path and path where process entity to be submitted should be generated. Validates these arguments
- Validates the artifacts i.e. workflow and/or lib files specified in the recipe template exists on local filesystem or HDFS at the specified path else returns error
- Copies if the artifacts exists on local filesystem
- If workflow is on local FS then falcon.recipe.workflow.path in recipe property file is mandatory for it to be copied to HDFS. If templated process requires custom libs falcon.recipe.workflow.lib.path property is mandatory for them to be copied from Local FS to HDFS. Recipe tool will copy the local artifacts only if these properties are set in properties file
- Looks for the patten ##[A-Za-z0-9_.]*## in the templated process and substitutes it with the properties. Process entity generated after the substitution is written to the empty file passed by FalconCLI
Recipe template file format
- Any templatized string should be in the format ##[A-Za-z0-9_.]*##.
- There should be a corresponding entry in the recipe properties file "falcon.recipe.<templatized-string> = <value to be substituted>"
Example: If the entry in recipe template is <workflow name="##workflow.name##"> there should be a corresponding entry in the recipe properties file falcon.recipe.workflow.name=hdfs-dr-workflow
Recipe properties file format
- Regular key value pair properties file
- Property key should be prefixed by "falcon.recipe."
Example: falcon.recipe.workflow.name=hdfs-dr-workflow
Recipe template will have <workflow name="##workflow.name##">. Recipe tool will look for the patten ##workflow.name##
and replace it with the property value "hdfs-dr-workflow". Substituted template will have <workflow name="hdfs-dr-workflow">
Metrics
HDFS DR and Hive DR recipes will capture the replication metrics like TIMETAKEN, BYTESCOPIED, COPY (number of files copied) for an instance and populate to the GraphDB.
Managing the scheduled recipe process
- Scheduled recipe process is similar to regular process
- List : falcon entity -type process -name <recipe-process-name> -list
- Status : falcon entity -type process -name <recipe-process-name> -status
- Delete : falcon entity -type process -name <recipe-process-name> -delete
Sample recipes
- Sample recipes are published in addons/recipes
Packaging
- There is no packaging for recipes at this time but will be added soon.