Apache Nifi

 What is Apache NiFi?

 

Apache NiFi is a free and open-source application that automates and manages data flow across systems. It is a secure and dependable data processing and distribution system that incorporates a web-based user interface for the purpose of creating, monitoring, and controlling data flows.

Apache NiFi is a robust, scalable, and reliable system that is used to process and distribute data. It is built to automate the transfer of data between systems.

  • NiFi offers a web-based User Interface for creating, monitoring, and controlling data flows. NiFi stands for Niagara Files which was developed by National Security Agency (NSA) but now it is maintained by the Apache foundation. 
  • Apache NiFi is a web-based UI platform where we need to define the source, destination, and processor for data collection, data storage, and data transmission, respectively.
  • Each processor in the NiFi has relations that are used while connecting one processor to another.

 

Why do we use Apache NiFi?

 

Apache NiFi is open-source; therefore, it is freely available in the market. It supports several data formats, such as social feeds, geographical locations, logs, etc.

Apache NiFi supports a wide variety of protocols such as SFTP, KAFKA, HDFS, etc. which makes this platform more popular in the IT industry. There are so many reasons to choose Apache NiFi. They are as follows.

·        Apache NiFi helps organizations to integrate NiFi with their existing infrastructure.

·        It allows users to make use of Java ecosystem functions and existing libraries.

·        It provides real-time control that enables the user to manage the flow of data between any source, processor, and destination.

·        It helps to visualize DataFlow at the enterprise level.

·        It helps to aggregate, transform, route, fetch, listen, split, and drag-and-drop the data flow.

·        It allows users to start and stop components at individual and group levels.

·        NiFi enables users to pull the data from various sources to NiFi and allows them to create flow files.

·        It is designed to scale out in clusters that provide guaranteed delivery of data.

·        Visualize and monitor performance, and behavior in the flow bulletin that offers inline and insight documentation.

 

Features of Apache NiFi

 

The features of Apache NiFi are as follows:

·        Apache NiFi is a web-based User Interface that offers a seamless experience of design, monitoring, control, and feedback.

·        It even provides a data provenance module that helps to track and monitor data from the source to the destination of the data flow.

·        Developers can create their customized processors and reporting tasks as per the requirements.

·        It supports troubleshooting and flow optimization.

·        It enables rapid development and testing effectively.

·        It provides content encryption and communication over a secure protocol.

·        It supports buffering of all queued data and provides an ability of backpressure as the queues can reach specified limits.

·        Apache NiFi delivers a system to the user, the user to the system, and multi-tenant authentication security features.

 

Apache NiFi Architecture

 

Apache NiFi Architecture includes a web server, flow controller, and processor that runs on a Java Virtual Machine (JVM). It has three repositories such as FlowFile Repository, Content Repository, and Provenance Repository.

  • Web Server

Web Server is used to host the HTTP-based command and control API.

  • Flow Controller

The flow controller is the brain of the operation. that allocates threads to run modifications and maintains the scheduling for when modules obtain resources to execute.

           The Flow Controller functions as the engine, determining when a thread is assigned to a specific processor. 

         The Flow Controller acts as the broker facilitating the exchange of Flow Files between processors.

 

  • Extensions

Several types of NiFi extensions are defined in other documents. Extensions are used to operate and execute within the JVM.

  • FlowFile Repository

The FlowFile Repository includes the current state and attribute of each FlowFile that passes through the data flow of NiFi. It keeps track of the state that is active in the flow currently. The standard approach is the continuous Write-Ahead Log which is located in a described disk partition.

  • Content Repository

The Content Repository is used to store all the data present in the flow files. The default approach is a fairly simple mechanism that stores blocks of data in the file system.

To reduce the contention on any single volume, specify more than one file system storage location to get different partitions.

  • Provenance Repository

The Provenance Repository is where all the provenance event data is stored. The repository construct is pluggable to the default implementation that makes use of one or more physical disk volumes.Event data is indexed and searchable in each location.

 The repository tracks and stores all the events of all the flowfiles that flow in NiFi. There are two provenance repositories - volatile provenance repository (in this repository all the provenance data get lost after restart) and persistent provenance repository. Its default directory is also in the root directory of NiFi and it can be changed using "org.apache.nifi.provenance.PersistentProvenanceRepository" and "org.apache.nifi.provenance.VolatileProvenanceRepositor" property for the respective repositories.

 

From the NiFi 1.0 version, a Zero-Leader Clustering pattern is incorporated. Every node in the cluster executes similar tasks on the data but operates on a different set of data.

Apache Zookeeper picks a single node as a Cluster Coordinator. The Cluster Coordinator is used for connecting and disconnecting nodes. Also, every cluster has one Primary Node.

 

Key concepts of Apache NiFi

The key concepts of Apache NiFi are as follows:

·        Flow: Flow is created to connect different processors to moving and modify data from source to destination.

·        Data Pipeline : data transfer from source to destination

  • Connection: Connection is used to connect the processors that act as a queue to hold the data in a queue when required. It is also known as a bounded buffer in Flow-based programming (FBP) terms. It allows several processes to interact at different rates.
  • Processors: The Processor is the NiFi component that is used to listen for incoming data; pull data from external sources; publish data to external sources; and route, transform, or extract information from FlowFiles.

The processor is a Java module that is used to either fetch data from the source system or to be stored in the destination system. Several processors can be used to add an attribute or modify the content in the FlowFile. It is responsible for sending, merging, routing, transforming, processing, creating, splitting, and receiving flow files.

  • Flow File: FlowFile is the basic concept of NiFi that represents a single object of the data selected from the source system in NiFi. It allows users to make changes to Flowfile when it moves from the source processor to the destination. Various events such as Create, Receive, Clone, etc. that are performed on Flowfile using different processors in a flow.

       A FlowFile is made up of two components: FlowFile Attributes and FlowFile Content. Content is the data that is represented by the FlowFile. Attributes are characteristics that provide information or context about the data; they are made up of key-value pairs. All FlowFiles have the following Standard Attributes:

uuid: A Universally Unique Identifier that distinguishes the FlowFile from other FlowFiles in the system.

filename: A human-readable filename that may be used when storing the data to disk or in an external service

path: A hierarchically structured value that can be used when storing data to disk or an external service so that the data is not stored in a single directory

  • Event: An event represents the modification in Flowfile when traversing by the NiFi Flow. Such events are monitored in the data provenance.
  • Data provenance: Data provenance is a repository that allows users to verify the data regarding the Flowfile and helps in troubleshooting if any issues arise while processing the Flow file.

·        Process group: The process group is a set of processes and their respective connections that can receive data from the input port and send it through output ports.

 Input port

The input port is used to get data from the processor, which is not available in the process group. When the Input icon is dragged to the canvas, then it allows adding an Input port to the dataflow.

 

 Output port

The output port is used to transfer data to the processor, which is not available in the process group. When the output port icon is dragged into the canvas, then it allows adding an output port.

 

Template

The template icon is used to add the dataflow template to the NiFi canvas. It helps to reuse the data flow in the same or different instances. After dragging, it allows users to select the existing template for the data flow.

 

Label

These are used to add text on NiFi canvas regarding any component available in the NiFi. It provides colors used by the user to add an aesthetic sense.

Relationship: Each Processor has zero or more Relationships defined for it. These Relationships are named to indicate the result of processing a FlowFile. After a Processor has finished processing a FlowFile, it will route (or “transfer”) the FlowFile to one of the Relationships. A DFM is then able to connect each of these Relationships to other components in order to specify where the FlowFile should go next under each potential processing result. Each connection consists of one or more Relationships

 

Reporting Task: Reporting Tasks run in the background to provide statistical reports about what is happening in the NiFi instance. The DFM adds and configures Reporting Tasks in the User Interface as desired. Common reporting tasks include the ControllerStatusReportingTask, MonitorDiskUsage reporting task, MonitorMemory reporting task, and the StandardGangliaReporter.

 

Parameter Provider: Parameter Providers can provide parameters from an external source to Parameter Contexts. The parameters of a Parameter Provider may be fetched and applied to all referencing Parameter Contexts.

Funnel: A funnel is a NiFi component that is used to combine the data from several Connections into a single Connection.

The Funnel is used to send the output of the processor to various processors. Users can drag the Funnel icon into the canvas to add Funnel to the dataflow. It allows adding a Remote Process Group in the NiFi canvas.

 

Process Group: Process group helps to add process groups in NiFi canvas. When the Process Group icon is dragged into the canvas, it enables to enter the Process Group name, and then it is added to the canvas.

When a dataflow becomes complex, it often is beneficial to reason about the dataflow at a higher, more abstract level. NiFi allows multiple components, such as Processors, to be grouped together into a Process Group. The NiFi User Interface then makes it easy for a DFM to connect together multiple Process Groups into a logical dataflow, as well as allowing the DFM to enter a Process Group in order to see and manipulate the components within the Process Group.

 

Port: Dataflows that are constructed using one or more Process Groups need a way to connect a Process Group to other dataflow components. This is achieved by using Ports. A DFM can add any number of Input Ports and Output Ports to a Process Group and name these ports appropriately.

 

Remote Process Group: Just as data is transferred into and out of a Process Group, it is sometimes necessary to transfer data from one instance of NiFi to another. While NiFi provides many different mechanisms for transferring data from one system to another, Remote Process Groups are often the easiest way to accomplish this if transferring data to another instance of NiFi.

 

Bulletin: The NiFi User Interface provides a significant amount of monitoring and feedback about the current status of the application. In addition to rolling statistics and the current status provided for each component, components are able to report Bulletins. Whenever a component reports a Bulletin, a bulletin icon is displayed on that component. System-level bulletins are displayed on the Status bar near the top of the page. Using the mouse to hover over that icon will provide a tool-tip that shows the time and severity (Debug, Info, Warning, Error) of the Bulletin, as well as the message of the Bulletin. Bulletins from all components can also be viewed and filtered in the Bulletin Board Page, available in the Global Menu.

 

flow.xml.gz: Everything the DFM puts onto the NiFi User Interface canvas is written, in real time, to one file called the flow.xml.gz. This file is located in the nifi/conf directory by default. Any change made on the canvas is automatically saved to this file, without the user needing to click a "Save" button. In addition, NiFi automatically creates a backup copy of this file in the archive directory when it is updated. You can use these archived files to rollback flow configuration. To do so, stop NiFi, replace flow.xml.gz with a desired backup copy, then restart NiFi. In a clustered environment, stop the entire NiFi cluster, replace the flow.xml.gz of one of nodes, and restart the node. Remove flow.xml.gz from other nodes. Once you confirmed the node starts up as a one-node cluster, start the other nodes. The replaced flow configuration will be synchronized across the cluster. The name and location of flow.xml.gz, and auto archive behavior are configurable. See the System Administrator’s Guide for further details.

 

DataFlow Manager: A DataFlow Manager (DFM) is a NiFi user who has permissions to add, remove, and modify components of a NiFi dataflow.

Describe MiNiFi

 

MiNiFi is a project of NiFi that is intended to enhance the fundamental concepts of NiFi by emphasizing the gathering of data at its generation source. MiNiFi is meant to operate at the source, which is why it places a premium on minimal area and resource utilization.

 

Is it possible for a NiFi Flow file to contain complex data as well?

 

Yes, with NiFi, a FlowFile may include both organized (XML, JSON files) and complex (graphics) data.



What specifically is a Processor Node?

 

A Processor Node is a shell all around the Processor that manages the processor's state. The Processor Node is responsible for maintaining the Positioning of processors in the graph.
Processor configuration characteristics. Scheduling the processor's states.

 

What does the Reporting Task involve?

 

A Reporting Task is a NiFi expansion endpoint that is responsible for reporting and analyzing NiFi's inner statistics in order to transmit the data to other sources or to display status data straight in the NiFi UI.

 

Is the processor capable of committing or rolling back the session?

 

Yes, the processor is the module that may submit and reverse data via the session. When a Processor starts rolling back a session, all FlowFiles retrieved during the session are restored to their prior states. If, on the other hand, the Processor decides to submit the session, it will update the FlowFile repositories with the necessary information.

 

What does "Write-Ahead-Log" mean in the context of FlowFileRepository?

 

This implies that any changes made to the FlowFileRepository will be first logged and checked for consistency. Remain in the logs to avoid data loss, both before and during data processing, as well as checkpoints on a frequent basis to facilitate reversal.

 

Does the Reporting Task get access to the entire contents of the FlowFile?

 

No, a Reporting Task has no access to the contents of any specific FlowFile. Rather than that, a Reporting Task gets accessibility to all Provenance Events, alerts, and metrics associated with graph components, like Bits of data, read or written.

 

Apache NiFi Interview Questions For Experienced

 

What use does FlowFileExpiration serve?

 

It assists in determining when this FlowFile must be terminated and destroyed after a certain period of time. Assume you've set FlowFileExpiration to 1 hour. As soon as the FlowFile is detected in the NiFi platform, the countdown begins. Furthermore, once FlowFile reaches the connection, it will verify the age of the FlowFile; if it is older than 1 hour, the FlowFile will be ignored and destroyed.

 

What is the NiFi system's backpressure?

 

Backpressure depending on the number of FlowFiles or the quantity of the data. If it exceeds the set limit, the NiFi controller will not schedule the feeding processor to run if the object count has reached or exceeded 10,000 queued FlowFiles.


the link will return pressure to the producing processor, causing it to stop running. As a result, no more FlowFiles are created until the backpressure is removed.

 

Is it possible to alter the settings of a processor while it is running?

 

No, the settings of the processor cannot be altered or modified while it is operating. You must first halt it and then allow for all FlowFile processing to complete. Then and only then may you modify the processor's settings.

 

What use does RouteOnAttribute serve?

 

RouteOnAttribute permits the system to make congestion control within the flow, allowing certain FlowFiles to be treated differently than others.

 

What Is The NiFi Template?

 

A template is a workflow that may be reused, which you may import and export across many NiFi instances. It can save a lot of time compared to generating Flow repeatedly. The template is produced in the form of an XML file.

 

What does the term "Provenance Data" signify in NiFi?

 

NiFi maintains a Data provenance library that contains all information about the FlowFile. As data continues to flow and is converted, redirected, divided, consolidated, and sent to various endpoints, all of this metadata is recorded in NiFi's Provenance Repository. Users can conduct a search for the processing of every single FlowFile.

 

What is a FlowFile's "lineageStartDate"?

 

This FlowFile property indicates the date and time the FlowFile was added or generated in the NiFi system. Even if a FlowFile is copied, combined, or divided, a child FlowFile may be generated. However, the lineageStartDate property will provide the timestamp for the ancestor FlowFile.

 

How to get data from a FlowFile's attributes?

 

Numerous algorithms are available, including ExtractText, and EvaluateXQuery which can help you get data from the FlowFile attribute. Furthermore, you may design your own customized microprocessor to meet the same criteria if no off-the-shelf processor is provided.

 

What occurs to the ControllerService when a DataFlow is used to generate a template?

 

When a template is produced via DataFlow and if it has an associated ControllerService, a new instance of the control system service will be generated throughout the import process.

 

What occurs if you save a passcode in a DataFlow and use it to generate a template?

 

A password is a very sensitive piece of information. As a result, when publishing the DataFlow as templates, the password is removed. Once you export the template into another NiFi system, whether the same or a different one, you must enter the password once more.

 

Apache NiFi  FAQs

 

What is bulleting and how does it benefit NiFi?

 

While you can review the archives for anything noteworthy, having notifications come up on the board is far handier. If a Process records something as a WARNING, a "Bulletin Indicator" will appear in the Processor. This indication, which resembles a sticky note, will be displayed for five minutes following the occurrence of the event. If the bulletin is part of a cluster, it will additionally specify which network device released it. Furthermore, we may alter the log frequency at which bulletins are generated

 

What is a NiFi process group?

 

A process group can assist you in developing a sub-data stream that users can include in your primary data flow. The destination address and the input port are used to transmit and receive information from the process group, respectively.

 

What use does a flow controller serve?

 

The flow controller is the project's brain that allocates threads to run modifications and maintains the scheduling for when modules obtain resources to execute. The Flow Controller functions as the engine, determining when a thread is assigned to a specific processor.

 

How Does Nifi Handle Massive Payload Volumes in a Dataflow?

 

DataFlow can handle massive amounts of data. As data flows via NiFi, referred to as a FlowFile, is handed around, the FlowFile's information is only accessible when necessary.

 

What is the distinction between NiFi's FlowFile and Content repositories?

 

The FlowFile Library is where NiFi stores information about a particular FlowFile that is currently online in the stream. The Content Repository stores the exact bytes of a FlowFile's information.

 

What does "deadlock in backpressure" imply?

 

Assume you're using a processor, such as PublishJMS, to release the information to the target list. The destination queue, on the other hand, is full, and your FlowFile will be sent to the failed relationship. And when you retry the unsuccessful FlowFile, the incoming backpressure linkage becomes full, which might result in a backpressure stalemate. 

 

What is the remedy for the "back pressure deadlock"?

 

There are several alternatives, including

The administrator can temporarily boost the failed connection's backpressure level.
Another option to explore in this scenario is to have Reporting Tasks monitor the flow for big queues.

 

How does NiFi ensure the delivery of messages?

 

This is accomplished by implementing an effective permanent write-ahead logging and information repository.

 

Can you utilize the fixed Ranger setup on the HDP to work with HDF?

 

Yes, you may handle HDF with a single Ranger deployed on the HDP. Nevertheless, the Ranger that comes with HDP doesn't include NiFi service definition and must be installed manually.

 

Is NiFi capable of functioning as a master-slave design?

 

No, starting with NiFi 1.0, the 0-master principle is taken into account. Furthermore, each unit in the NiFi network is identical. The Zookeeper service manages the NiFi cluster. Apache ZooKeeper appoints a single point as the Cluster Administrator, and ZooKeeper handles redundancy seamlessly.

 

 

Processors Categorization in Apache NiFi

 

The following are the process categorization of Apache NiFi.

  • AWS Processors

AWS processors are responsible for communicating with the Amazon web services system. Such category processors are PutSNS, FetchS3Object, GetSQS,PutS3Object, etc.

  • Attribute Extraction Processors

Attribute Extraction processors are responsible for extracting, changing, and analyzing FlowFile attributes processing in the NiFi data flow.

Examples are ExtractText, EvaluateJSONPath, AttributeToJSON, UpdateAttribute, etc.

  • Database Access Processors

The Database Access processors are used to select or insert data or execute and prepare other SQL statements from the database.

Such processors use the data connection controller settings of Apache NiFi. Examples are PutSQL, ListDatabaseTables, ExecuteSQL, PutDatabaseRecord, etc.

  • Data Ingestion Processors

The Data Ingestion processors are used to ingest data into the data flow, such as a starting point of any data flow in Apache NiFi. Examples are GetFile, GetFTP, GetKAFKA,GetHTTP, etc.

  • Data Transformation Processors

Data Transformation processors are used for altering the content of the FlowFiles.

These can be used to replace the data of the FlowFile when the user has to send FlowFile as an HTTP format to invoke an HTTP processor. Examples are JoltTransformJSON ReplaceText, etc. 

  • HTTP Processors

The HTTP processors work with the HTTP and HTTPS calls. Examples are InvokeHTTP, ListenHTTP, PostHTTP, etc.

  • Routing and Mediation Processors

Routing and Mediation processors are used to route the FlowFiles to different processors depending on the information in attributes of the FlowFiles.

It is responsible for controlling the NiFi data flows. Examples are RouteOnContent, RouteText, RouteOnAttribute, etc.

  • Sending Data Processors

Sending Data Processors are the end processors in the Data flow. It is responsible for storing or sending data to the destination.

After sending the data, the processor DROP the FlowFile with a successful relationship. Examples are PutKAFKA, PutFTP, PutSFTP, PutEmail, etc.

  • Splitting and Aggregation Processors

The Splitting and Aggregation processors are used to split and merge the content available in the Dataflow. Examples are SplitXML, SplitJSON, SplitContent, MergeContent, etc.

  • System Interaction Processors

The system interaction processors are used to run the process in any operating system. It also runs scripts in various languages with different systems.

Examples are ExecuteScript, ExecuteStreamCommand, ExecuteGroovyScript, ExecuteProcess, etc.

 

 

1. What is virtual hosting?

Virtual hosting is crucial for many web administrators, especially those who work for big companies with many websites. An interviewer might ask you this question to gauge your knowledge of Apache's functions and business applications. When answering this question, define the term and give an example of when to use virtual hosting.

 Example: 'Virtual hosting is a process that allows you to run multiple websites from a single server. There are two kinds of virtual hosts. Name-based virtual hosting allows you to use a single IP address for multiple websites, while IP-based virtual hosting requires multiple IP addresses for the different sites. Both methods use a single server, and most of the time, an administrator uses name-based virtual hosting. You might use virtual hosting if your organisation has multiple websites connected to a single server, which can make data more secure and streamline the web administration process.'

 

2. Why is log analysis critical?

Web administrators use access log analysis to gain insights into a website's performance that they then share with marketing or sales teams. An interviewer may ask this question to confirm how familiar you are with the relationship between web development and other functions of a company. When answering this question, define the concept and give adequate examples.

 Example: 'Analysing access logs, which are records of the requests users make to a server, can tell you about who visits a company's website and what they do when they get there. You can learn how many individual IP addresses visit your site, how often a specific IP address visits, which resources users click on and other pieces of data that can help you see which resources are most useful to visitors. If a website has a lot of traffic, you can use a log analysis program that compiles data from server access logs and organises it for you.'

 

3. How can you resolve a 503 HTTP error?

It is often the responsibility of IT experts to oversee the smooth running of an organisation's website and resolve server errors. An interviewer may ask questions to learn how you solve this type of problem. When answering, explain what a 503 HTTP error means and highlight the procedure to resolve it.

 Example: 'A 503 HTTP error message means the server is unavailable at the time of the user's request. This might happen when the system is overloaded or when there is an issue with an application running on the server. The first step I take is to restart the server, which often fixes the issue. If restarting the server does not work, then I might look at the server logs to identify the specific action that caused the overload. I might also check to see if any of the website's systems started an automatic update, as this can cause a system issue.'

 

9. How do you create a custom processor in NiFi?

Writing custom processors helps you perform different operations to transform flow file content according to specific needs. Interviewers ask this question to understand your knowledge of using processors in NiFi. Mention the steps you use to create a custom processor.

 Example: A custom processor is a special type of mainframe that users apply for a special purpose. I create a custom processor by extending the AbstractProcessor class and overriding the onTrigger method. When using the latter method, I first implement the logic for the processor. By accessing the flow file object, I read and write the data, and even use the NiFi ProcessorContext object to access different variables.

 

10. What is Bulletin?

Bulletin is a NiFi UI that provides information about occurrences, allowing developers to avoid going through log messages to find errors. Interviewers ask this question to understand the potential benefits of Bulletin. When answering, explain what it means and its benefits to developers.

 Example: Bulletin is a NiFi UI that provides meaningful feedback and monitors the status of an application. When a processor records a “Warning”, a Bulletin indicator appears on the processor for almost five minutes after the event's occurrence. When the Bulletin is part of a cluster, it specifies which network released it. Developers can see the system-level Bulletins on the status bar near the top of the page. By hovering the mouse over the icon, they can see the time and severity of the Bulletin. They can even view and filter all Bulletins on the Bulletin Board Page.

 

NiFi Term

FBP Term

Description

FlowFile

Information Packet

 A FlowFile represents each object moving through the system and for each one, NiFi keeps track of a map of key/value pair attribute strings and its associated content of zero or more bytes.

FlowFile Processor

Black Box

 Processors actually perform the work. In [eip] terms a processor is doing some combination of data Routing, Transformation, or Mediation between systems. Processors have access to attributes of a given FlowFile and its content stream. Processors can operate on zero or more FlowFiles in a given unit of work and either commit that work or rollback.

Connection

Bounded Buffer

Connections provide the actual linkage between processors. These act as queues and allow various processes to interact at differing rates. These queues then can be prioritized dynamically and can have upper bounds on load, which enable back pressure.

Flow Controller

Scheduler

 The Flow Controller maintains the knowledge of how processes actually connect and manages the threads and allocations thereof which all processes use. The Flow Controller acts as the broker facilitating the exchange of FlowFiles between processors.

Process Group

Subnet

 A Process Group is a specific set of processes and their connections, which can receive data via input ports and send data out via output ports. In this manner process groups allow creation of entirely new components simply by composition of other components.

 

 10. Can you explain how to create a custom processor in NiFi?

 

You can create a custom processor in NiFi by extending the AbstractProcessor class and overriding the onTrigger method. In the onTrigger method, you will need to implement the logic for your processor. You can access the NiFi FlowFile object to read and write data, and you can also use the NiFi ProcessContext object to access properties and variables.

 

11. What is provenance in the context of NiFi? Why is it important?

 

Provenance is the history of a given piece of data, and it is important in NiFi because it allows you to track where data came from and how it has been processed. This is useful for debugging purposes, as well as for understanding the data flow through a NiFi system.

 

12. What information is captured by NiFi Provenance Repository?

 

The NiFi Provenance Repository captures information about the dataflow through NiFi, including the data that is processed, the NiFi processors that are used, the NiFi connections that are used, and the NiFi parameters that are used.

 

14. What is a Connection Queue in NiFi?

 

A Connection Queue is a queue of FlowFiles that are waiting to be processed by a downstream connection.

 

15. Can you tell me about the process used by NiFi to handle back pressure?

 

Back pressure is the name given to the process of slowing down or stopping the flow of data through a system when that system is becoming overwhelmed. This is done in order to prevent the system from becoming overloaded and crashing. NiFi uses a back pressure mechanism to automatically control the flow of data through the system in order to prevent data loss.

 

16. Is it possible to run NiFi as a cluster? If yes, then how?

 

Yes, it is possible to run NiFi as a cluster. In order to do so, you will need to start up multiple NiFi instances and then configure them to work together as a cluster. The specifics of how to do this will vary depending on your particular environment and setup.

 

 What’s the difference between FlowFileAttributes and FlowFileContent?

 

FlowFileAttributes are metadata associated with a FlowFile, while FlowFileContent is the actual data contained in the FlowFile.

 

 Can you explain how to create a custom processor in NiFi?

 

You can create a custom processor in NiFi by extending the AbstractProcessor class and overriding the onTrigger method. In the onTrigger method, you will need to implement the logic for your processor. You can access the NiFi FlowFile object to read and write data, and you can also use the NiFi ProcessContext object to access properties and variables.

 

 

No comments: