Jump to content

Dataflow: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
Added short description
Tags: Mobile edit Mobile app edit Android app edit
 
(224 intermediate revisions by more than 100 users not shown)
Line 1: Line 1:
{{Short description|Computing concept}}
'''Dataflow''' is a term used in [[computing]], and may have various shades of meaning. It is closely related to [[message passing]].
{{about|software engineering|the flow of data within a computer network|Traffic flow (computer networking)|the graphical representation of flow of data within an information system|data flow diagram|the hardware architecture|Dataflow architecture|the Dubai-based company|DataFlow Group}}
{{Multiple issues|
{{lead too short|date=November 2013}}
{{More citations needed |date= September 2016 }}
}}

In [[computing]], '''dataflow''' is a broad concept, which has various meanings depending on the application and context. In the context of [[software architecture]], data flow relates to [[stream processing]] or [[reactive programming]].


==Software architecture==
==Software architecture==
[[Dataflow programming|Dataflow computing]] is a software paradigm based on the idea of representing computations as a [[directed graph]], where nodes are computations and data flow along the edges.<ref name="sig">{{cite web |last1=Schwarzkopf |first1=Malte |title=The Remarkable Utility of Dataflow Computing |url=https://www.sigops.org/2020/the-remarkable-utility-of-dataflow-computing/ |website=ACM SIGOPS |access-date=31 July 2022 |date=7 March 2020}}</ref> Dataflow can also be called [[stream processing]] or [[reactive programming]].<ref>[http://www.jonathanbeard.io/blog/2015/09/19/streaming-and-dataflow.html A Short Intro to Stream Processing]</ref>
'''Dataflow''' is a [[software architecture]] based on the idea that changing the value of a variable should automatically force recalculation of the values of other variables.


There have been multiple data-flow/stream processing languages of various forms (see [[Stream processing]]). Data-flow hardware (see [[Dataflow architecture]]) is an alternative to the classic [[von Neumann architecture]]. The most obvious example of data-flow programming is the subset known as [[reactive programming]] with spreadsheets. As a user enters new values, they are instantly transmitted to the next logical "actor" or formula for calculation.
[[Dataflow programming]] embodies these principles, with [[spreadsheet]]s perhaps the most widespread embodiment of dataflow. For example, in a spreadsheet you can specify a cell formula which depends on other cells; then when any of those cells is updated the first cell's value is automatically recalculated. It's possible for one change to initiate a whole sequence of changes, if one cell depends on another cell which depends on yet another cell, and so on.


[[Distributed data flow]]s have also been proposed as a programming abstraction that captures the dynamics of distributed multi-protocols. The data-centric perspective characteristic of data flow programming promotes high-level functional specifications and simplifies formal reasoning about system components.
The dataflow technique is not restricted to recalculating numeric values, as done in spreadsheets. For example, dataflow can be used to redraw a picture in response to mouse movements, or to make a robot turn in response to a change in light level.

One benefit of dataflow is that it can reduce the amount of [[Dependency (computer science)|coupling]]-related code in a program. For example, without dataflow, if a variable X depends on a variable Y, then whenever Y is changed X must be explicitly recalculated. This means that Y is coupled to X. Since X is also coupled to Y (because X's value depends on the Y's value), the program ends up with a cyclic dependency between the two variables. Most good programmers will get rid of this cycle by using an [[observer pattern]], but only at the cost of introducing a non-trivial amount of code. Dataflow improves this situation by making the recalculation of X automatic, thereby eliminating the coupling from Y to X. Dataflow makes implicit a significant amount of code that otherwise would have had to be tediously explicit.

Dataflow is also sometimes referred to as reactive programming.

There have been a few programming languages created specifically to support dataflow. In particular, many (if not most) [[visual programming language]]s have been based on the idea of dataflow.


==Hardware architecture==
==Hardware architecture==
{{main|Dataflow architecture}}
{{main|Dataflow architecture}}
Hardware architectures for dataflow was a major topic in [[Computer architecture]] research in the 1970s and early 1980s. [[Jack Dennis]] of [[MIT]] pioneered the field of static dataflow architectures. Designs that use conventional memory addresses as data dependency tags are called static dataflow machines. These machines did not allow multiple instances of the same routines to be executed simultaneously because the simple tags could not differentiate between them. Designs that use [[Content-addressable memory]] are called dynamic dataflow machines by Arvind (also of MIT). They use tags in memory to facilitate parallelism.
Hardware architectures for dataflow was a major topic in [[computer architecture]] research in the 1970s and early 1980s. [[Jack Dennis]] of the [[Massachusetts Institute of Technology]] (MIT) pioneered the field of static dataflow architectures. Designs that use conventional memory addresses as data dependency tags are called static dataflow machines. These machines did not allow multiple instances of the same routines to be executed simultaneously because the simple tags could not differentiate between them. Designs that use [[content-addressable memory]] are called dynamic dataflow machines by [[Arvind (computer scientist)|Arvind]]. They use tags in memory to facilitate parallelism.
Data flows around the computer through the components of the computer. It gets entered from the input devices and can leave through output devices (printer etc.).

==Diagrams==
The term dataflow may also be used to refer to the flow of data within a system, and is the name normally given to the arrows in a [[data flow diagram]] that represent the flow of data between external entities, processes, and data stores.


==Concurrency==
==Concurrency==
A dataflow network is a network of concurrently executing processes or automata that can communicate by sending data over ''channels'' (see [[message passing]].)
A dataflow network is a network of concurrently executing processes or automata that can communicate by sending data over ''channels'' (see [[message passing]].)


[[Kahn process networks]], named after one of the pioneers of dataflow networks, are a particularly important class of such networks. In a [[Kahn process networks|Kahn process network]] the processes are ''determinate''. This implies that each determinate process computes a [[continuous function]] from input streams to output streams, and that a network of determinate processes is itself determinate, thus computing a continuous function. This implies that the behaviour of such networks can be described by a set of recursive equations, which can be solved using [[fixpoint]] [[theory]].
In [[Kahn process networks]], named after [[Gilles Kahn]], the processes are ''determinate''. This implies that each determinate process computes a [[continuous function]] from input streams to output streams, and that a network of determinate processes is itself determinate, thus computing a continuous function. This implies that the behavior of such networks can be described by a set of recursive equations, which can be solved using [[fixed point theory]]. The movement and transformation of the data is represented by a series of shapes and lines.


== Other meanings ==
The concept of dataflow networks is closely related to another model of concurrency known as the [[Actor model]].
Dataflow can also refer to:
* [[Power BI]] Dataflow, a [[Power Query]] implementation in the cloud used for transforming source data into [[Data cleansing|cleansed]] Power BI Datasets to be used by Power BI report developers through the [[Microsoft Dataverse]] (formerly called Microsoft Common Data Service).
* [[Google Cloud Dataflow]], a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem.


==See also==
==See also==
{{Wiktionarypar|dataflow}}
{{Wiktionary-inline|dataflow}}
* [[Binary Modular Dataflow Machine]] (BMDFM)
*[[Data flow diagram]]
* [[Communicating sequential processes]]
*[[Dataflow programming]]
* [[Complex event processing]]
*[[Lazy evaluation]]
* [[Data-flow diagram]]
*[[Complex event processing]]
* [[Data-flow analysis]], a type of program analysis
*[[Pure Data]]
* [[Data stream]]
*[[Flow-based programming]] (FBP)
*[[Functional reactive programming]]
* [[Dataflow programming]] (a programming language paradigm)
*[[Oz (programming language)| Oz programming language]]
* [[Erlang (programming language)]]
*[[Lucid_(programming_language) | Lucid programming language]]
* [[Flow-based programming]] (FBP)
* [[Flow control (data)]]
* [[Functional reactive programming]]
* [[Lazy evaluation]]
* [[Lucid (programming language)]]
* [[Oz (programming language)]]
* [[Packet flow]]
* [[Pipeline (computing)]]
* [[Pure Data]]
* [[State transition]]
* [[TensorFlow]]
* [[Theano_(software)|Theano]]
* [[Ward-Mellor methodology]]

== References ==
{{Reflist}}



==External links==
* [http://bmdfm.com BMDFM]: Binary Modular Dataflow Machine, [[BMDFM]].
* [http://greta.cs.ioc.ee/~khoros2/k2tools/cantata/cantata.html Cantata]: Dataflow Visual Language for [[image processing]].
* [http://common-lisp.net/project/cells/ Cells]: Dataflow extension to [[Common Lisp]] [[Common Lisp Object System|Object System]], CLOS.
** [http://pycells.pdxcb.net/ PyCells]: Python port.
* [http://sourceforge.net/projects/flow-based-pgmg JavaFBP] : Open source framework for Java and C#
* [http://www.pervasivedatarush.com/ DataRush]: Dataflow framework for Java.
* [http://www.iseesystems.com/softwares/Education/StellaSoftware.aspx Stella]: Dataflow Visual Language for dynamic dataflow [[Mathematical model|modeling]] and [[Computer simulation|simulation]].
* [http://opensource.adobe.com/group__asl__overview.html Adam and Eve]: Extension for C++, by [[Adobe]].
* [http://www.pointillistic.com/open-REBOL/moa/steel/liquid/index.html Liquid Rebol]


[[Category:Computer data]]
[[Category:Computer architecture]]
[[Category:Programming paradigms]]
[[Category:Computational models]]


<!-- [[Category:Computer data]] – Dataflow has nothing to do with this category! -->
[[de:Datenfluss]]
[[Category:Computer architecture]]
[[it:Dataflow]]
[[Category:Models of computation]]
[[ja:データフロー]]

Latest revision as of 13:49, 25 June 2024

In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming.

Software architecture

[edit]

Dataflow computing is a software paradigm based on the idea of representing computations as a directed graph, where nodes are computations and data flow along the edges.[1] Dataflow can also be called stream processing or reactive programming.[2]

There have been multiple data-flow/stream processing languages of various forms (see Stream processing). Data-flow hardware (see Dataflow architecture) is an alternative to the classic von Neumann architecture. The most obvious example of data-flow programming is the subset known as reactive programming with spreadsheets. As a user enters new values, they are instantly transmitted to the next logical "actor" or formula for calculation.

Distributed data flows have also been proposed as a programming abstraction that captures the dynamics of distributed multi-protocols. The data-centric perspective characteristic of data flow programming promotes high-level functional specifications and simplifies formal reasoning about system components.

Hardware architecture

[edit]

Hardware architectures for dataflow was a major topic in computer architecture research in the 1970s and early 1980s. Jack Dennis of the Massachusetts Institute of Technology (MIT) pioneered the field of static dataflow architectures. Designs that use conventional memory addresses as data dependency tags are called static dataflow machines. These machines did not allow multiple instances of the same routines to be executed simultaneously because the simple tags could not differentiate between them. Designs that use content-addressable memory are called dynamic dataflow machines by Arvind. They use tags in memory to facilitate parallelism. Data flows around the computer through the components of the computer. It gets entered from the input devices and can leave through output devices (printer etc.).

Concurrency

[edit]

A dataflow network is a network of concurrently executing processes or automata that can communicate by sending data over channels (see message passing.)

In Kahn process networks, named after Gilles Kahn, the processes are determinate. This implies that each determinate process computes a continuous function from input streams to output streams, and that a network of determinate processes is itself determinate, thus computing a continuous function. This implies that the behavior of such networks can be described by a set of recursive equations, which can be solved using fixed point theory. The movement and transformation of the data is represented by a series of shapes and lines.

Other meanings

[edit]

Dataflow can also refer to:

  • Power BI Dataflow, a Power Query implementation in the cloud used for transforming source data into cleansed Power BI Datasets to be used by Power BI report developers through the Microsoft Dataverse (formerly called Microsoft Common Data Service).
  • Google Cloud Dataflow, a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem.

See also

[edit]

The dictionary definition of dataflow at Wiktionary

References

[edit]
  1. ^ Schwarzkopf, Malte (7 March 2020). "The Remarkable Utility of Dataflow Computing". ACM SIGOPS. Retrieved 31 July 2022.
  2. ^ A Short Intro to Stream Processing