Introducing Graph Programming and Graph Runtime

Jun 15

In a previous blog, we made the observation that the mainstream programming languages and their stack driven state machine (SDSM) runtimes don’t support predictive optimization. This is one of the main reasons why enterprise systems become overly complex, costly to maintain, and inflexible. The billion dollar follow-up question is: what is a better alternative for building enterprise systems? Our answer is graph computing.

Graph is a data structure consisting of a set of connected nodes (i.e. more formally vertices connected by edges). Graphs can represent many problems in practice, such as networks (including social, communication or transportation networks) or plans (such as workflows, business or project plans).

Directed and undirected graphs represent the two main types of graphs. Social networks are usually undirected as the connections are not directional; workflow graphs are usually directed because of the dependencies and ordering among tasks.

One special case of directed graph to note is called a directed acyclic graph (DAG), which is a directed graph without any directed cycles. A DAG is an abstraction for any workflow or plan that can be completed by a single pass, where each task or step in the plan is carried out exactly once following their dependencies. Because of this single-pass property, a DAG is much easier to analyze, process, and optimize by a computer algorithm than other types of graphs.

y = x * x

x = x + 1

w = y - 1

z = x + y + w

In principle, the runtime information of any program can be represented by a DAG, regardless of its complexity. One sure way to create a DAG representation of a program is to keep track of every intermediate result during its execution as a separate node. Figure 2 is a small sample code and its equivalent DAG representation. The value of x is represented by two separate nodes in the DAG, for both before and after values of the highlighted increment statement. Such a duplication is necessary in order to remove the cycles in the DAG.

In general, the topology of a DAG may depend on values that are only known at runtime. For example, in the following loop:

double sum = 0.;

for(int i = 0; i < nloop; ++i) {

    // use the value i for some calculation

    sum += foo(i);

The resulting DAG representation needs to have nloop number of nodes for the value of i, in order to be acyclic. The DAG itself is therefore different if the program’s inputs lead to different runtime values of nloop, even from the exact same source code. Because of the runtime dependency, in general a DAG cannot be created fully at compile time; some runtime input data is needed to create the DAG.

In the examples so far, the DAG is created from conventional programming languages, similar to how Tensorflow and PyTorch create their DAGs in Python. But those are certainly neither the only choices, nor necessarily the best choices. A DAG can be defined and created in a number of ways. For example, some no-code or low-code solutions let users define and create DAGs from sophisticated GUIs; others have implemented special syntax, or domain specific languages (DSL), for defining and creating the DAG.

Let us now formally define graph programming and graph runtime: Graph programming (GP), or graph oriented programming (GOP), is the programming interface and/or tools (such as a GUI) to define and create the computational DAG. Graph runtime is the runtime environment that executes the computational DAG. Their relationship is shown in Figure 3.

The main benefit of graph runtime is that it enables much more sophisticated runtime optimizations than the traditional SDSM runtime. In the example of Figure 2, by having both before and after values of x in the DAG, the graph runtime is free to execute the two statements y = x * x and w = y - 1 in any order without compromising the correctness, as there is no direct dependency between them. Since the DAG captures only true dependencies, it leaves tremendous degrees of freedom to re-order individual nodes (or tasks) while respecting their logical dependencies. In a traditional SDSM runtime, that reordering is not possible because the old value of x is no longer accessible after executing the highlighted statement x = x + 1; subsequent reference to x only retrieves the after value. The freedom to reorder tasks can significantly reduce the wait and response time in a distributed system, as various inputs or intermediate results often arrive at different and unpredictable times.

The DAG also enables the accurate prediction of a program’s future resource consumption, making sophisticated predictive optimization possible. For example, a graph runtime can track and estimate the CPU, memory and data input/output sizes of every node in the DAG, then optimize the placement of individual nodes to available computers in the server farm at runtime. Such automated placement optimization could lead to significantly better CPU utilization, a smaller memory footprint, and less network data transfers. In theory, much of the manual analysis, design, and tuning of a system’s scale, topology and load-balancing could be replaced by automatic optimizations based on the DAG; as a result the system could achieve true auto scaling without much human intervention at run time.

Why are predicative optimizations feasible in a graph runtime, but not in a SDSM runtime? If we use a military analogy, the DAG represents the entire battle plan with all the missions and their dependencies in order to achieve the ultimate objective; while the SDSM only reveals the immediate sequence of actions without any information on the bigger picture. A commander with an order in DAG can organize and optimize the troop’s actions according to the available resources and evolving battlefield conditions; while the commander with an order in SDSM can’t really do much beyond strictly following the order step by step.

If we naively create the DAG by tracking all the intermediate results as shown in Figure 1, its size would be too large to process even for a simple application. To keep the DAG at a manageable size, nodes in the DAG can be entire function calls whose internal states are hidden from the DAG. In this case, the SDSM runtime is used when executing the function calls within individual nodes, while the top level dependencies and data feeds between nodes are managed by the DAG and graph runtime. The hybrid of graph and SDSM runtime is a very powerful technique. It leverages the traditional compiler and SDSM optimizations for the execution of low level functions, while at the same time benefits from the automated predictive optimization of the top level business logic from DAG and the graph runtime. Using the same military analogy, this hybrid runtime corresponds to a realistic command and control structure, where only the senior commanders have access to the full operation plan in DAG, while the low level commanders are only given orders of their units’ immediate actions in SDSM.

Even though graph runtime and graph programming offer great promise for enterprise systems, there are a few difficult technical challenges that have prevented its wide adoption so far. Here are some of the most difficult and critical ones:

How to create the DAG without actually running the entire program? As we mentioned earlier, the DAG itself may depend on certain runtime values. It would defeat the purpose if we have to run the entire program in SDSM in order to create the DAG.
How to determine the granularity of the node level functions of a DAG? We need to balance two competing objectives: limiting the size of the DAG, and leaving enough degree of freedom in the DAG for effective optimizations.
How to perform graph optimization? Even though in theory many predictive optimizations are feasible with the full dependency information in the DAG, in practice these graph optimization problems are often extremely difficult to solve, some of them are known to be NP-hard.
How to make graph programming easy to use, and at the same time versatile, powerful, and flexible? Should we adopt an existing programming language, design a new syntax/DSL, or use a GUI to construct the DAG?

We won’t be able to dive into the details of these topics in this article, as each of them deserves its own in-depth analysis, which we will discuss in future blogs. The good news is that we have made significant progress in resolving these technical challenges. Graph computing is emerging as a promising technology for practical use in building enterprise systems.

We believe graph programming is a new programming paradigm that offers tremendous potential, it could revolutionize enterprise computing. We’d love to hear your thoughts on graph computing, please email info@juliustech.co or leave your comments below.

Yadong Li

Introducing Graph Programming and Graph Runtime

Why Graph Computing is STELLAR

Why are enterprise systems so terrible?