When building a flow, you will need to test it at every stage, to ensure data is flowing as directed throughout each processor. To make this work, test data will need to be generated, ideally to match (roughly) vendor needs. Fuusion supplies a processor for just this purpose called GenerateFlowFile. GenerateFlowFile is one of the most widely used processors, simply because it allows you to apply a model schema, and generate very realistic data. This will be the starting point for most of your flows when first building them out.
One key thing to remember is that GenerateFlowFile is a fairly powerful processor. You can quickly generate thousands of flowfiles and back up your flow quite easily. So, in initial testing phases in particular, you want to limit the run-time of your GenerateFlowFile processor to a very short duration, such as 1 second, or set it to run once.
While building your flow, you should start and stop the flow often, and make use of the list queue to see whether the flow is performing as expected.
Right-click the connector, where you see the queued flows, and select list queue.
Here you can see that the GenerateFlowFile processor has created a couple of 0 byte flow files.
In the next processor, in this example UpdateAttribute, we can see some test attributes are applied, Attribute A, B and D.
Starting and stopping the processor flow can be done nearly immediately, because NiFi processes the flows very quickly, particularly if of small size. Start and stop a processor by right-clicking and selecting Start, then Stop.
At each connection, we can check the dataflows, and the changes made, by checking the list queue.
In the list queue, select view on any of the dataflow files. Here we see that the flowfile shows AttributeA, AttributeB and AttributeD. But the combined attributes is simply showing tests, where the intended result was to combine the attributes A and B.
So we can simply return to the UpdateAttribute processor, and double-check the expression language entry for combined attributes. For more information on expression language for NiFi, see the NiFi Expression Language Guide.
But let us assume that the desired outcome was achieved. You can continue through the flow, stopping and starting each processor in turn, and checking each for the desired results. You can also review the data provenance by right-clicking and selecting at any point, to see where changes to the data occurred, and track the progress through the flow.
Click the icon on the three-circled icon on the right to track the data through the flow visually.
From within the diagram, you can see where the data was created, and at each point the data was modified.
You can click on any point in this flow and review the attributes as they were at that point in the flow.
Another common tip or trick commonly used when building a flow is to have the flow sent to an unconfigured output port. This allows you to test the flow right up until the delivery point, and verify that the output is as expected, before sending it to another NiFi server , another flow, or a database for further processing. This practice provides the flexibility for you to go through each and every one of your components.
You can also make use of the NiFi Summary to review the recent actions of NiFi, and relate them to the flow you are working on. For example here we see 10 flow files of 0 bytes based on expression language.
We can click upon the entry and drill down into details from here as well, by clicking the chart icon.
One key rule to remember is that any task you wish to perform should be done with a flow, rather than a single processor. Even though you can perform multiple tasks within a single processor, troubleshooting is far simpler if you can review each processor in turn, rather than debugging the processor, and looking into several settings within a single processor to seek out the problem. The granularity of a flow is your very best friend when troubleshooting. By separating each step, you can troubleshoot a defined point of failure, rather than reviewing the full process within a single processor.
The same logic is true for multiple branching flows that feed into one another. It can be very helpful to split them off into separate process groups, and then troubleshoot each in turn.