Skip to content

Enhance Graph.update() and add whole-graph update tests#1843

Open
Andy-Jost wants to merge 4 commits intoNVIDIA:mainfrom
Andy-Jost:graph-updates
Open

Enhance Graph.update() and add whole-graph update tests#1843
Andy-Jost wants to merge 4 commits intoNVIDIA:mainfrom
Andy-Jost:graph-updates

Conversation

@Andy-Jost
Copy link
Copy Markdown
Contributor

@Andy-Jost Andy-Jost commented Mar 31, 2026

Extend tests of the exsiting Graph.update function and refactor existing graph code in preparation for further work.

Summary

  • Extends Graph.update() to accept both GraphBuilder and GraphDef as sources, giving users flexibility to update instantiated graphs from either the stream-capture or explicit-graph API
  • Surfaces detailed CUgraphExecUpdateResultInfo on update failure (reason enum + docstring) instead of a generic CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE
  • Splits the monolithic _graphdef.pyx (2000+ lines) into a _graph_def/ subpackage with three focused modules for maintainability
  • Reorganizes graph test files into thematic groups with module docstrings
  • Adds new tests for whole-graph update covering happy paths and error cases

Changes

  • cuda/core/_graph/_graph_builder.pyx: Refactored Graph.update() to dispatch on GraphBuilder vs GraphDef, call cuGraphExecUpdate with a CUgraphExecUpdateResultInfo struct, and raise a descriptive CUDAError on failure
  • cuda/core/_graph/_graph_def/: Split _graphdef.pyx into _graph_def.pyx (Condition, GraphAllocOptions, GraphDef), _graph_node.pyx (GraphNode base class and builder methods with GN_* inline helpers), and _subclasses.pyx (all concrete node subclasses). Handle property annotations updated to use driver.* types consistently.
  • tests/graph/: Renamed test files to reflect their scope (test_graph_builder.py, test_graph_builder_conditional.py, test_graph_memory_resource.py, test_graph_update.py, test_graphdef*.py, test_device_launch.py); added module docstrings; moved tests to appropriate files
  • tests/graph/test_graph_update.py: Added parametrized test_graph_update_kernel_args (GraphBuilder + GraphDef), test_graph_update_conditional, test_graph_update_unfinished_builder, test_graph_update_topology_mismatch, test_graph_update_wrong_type

Test Coverage

  • Parametrized happy path: kernel-only graph updated with new pointer args, tested via both GraphBuilder and GraphDef
  • Conditional switch update: existing test (renamed) exercising topology-compatible conditional graph updates
  • Unfinished builder: ValueError when source GraphBuilder hasn't finished capturing
  • Topology mismatch: CUDAError with descriptive reason from CUgraphExecUpdateResultInfo
  • Wrong type: TypeError for invalid argument types

Related Work

Rename test files to reflect what they actually test:
- test_basic -> test_graph_builder (stream capture tests)
- test_conditional -> test_graph_builder_conditional
- test_advanced -> test_graph_update (moved child_graph and
  stream_lifetime tests into test_graph_builder)
- test_capture_alloc -> test_graph_memory_resource
- test_explicit* -> test_graphdef*

Made-with: Cursor
- Extend Graph.update() to accept both GraphBuilder and GraphDef sources
- Surface CUgraphExecUpdateResultInfo details on update failure instead
  of a generic CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE message
- Release the GIL during cuGraphExecUpdate via nogil block
- Add parametrized happy-path test covering both GraphBuilder and GraphDef
- Add error-case tests: unfinished builder, topology mismatch, wrong type

Made-with: Cursor
@Andy-Jost Andy-Jost added this to the cuda.core v1.0.0 milestone Mar 31, 2026
@Andy-Jost Andy-Jost added P0 High priority - Must do! feature New feature or request cuda.core Everything related to the cuda.core module labels Mar 31, 2026
@Andy-Jost Andy-Jost self-assigned this Mar 31, 2026
@Andy-Jost Andy-Jost requested review from cpcloud, leofang, mdboom, rparolin and rwgk and removed request for leofang March 31, 2026 18:25
@github-actions
Copy link
Copy Markdown

- Chain GraphDef kernel launches sequentially (n.launch instead of
  g.launch) to avoid concurrent writes to the same memory location
- Update GraphDef.handle and GraphNode.handle annotations to reflect
  that as_py returns driver types (CUgraph, CUgraphNode), not int

Made-with: Cursor
The monolithic _graphdef.pyx (2000+ lines) is split into three focused
modules under _graph_def/: _graph_def.pyx (Condition, GraphAllocOptions,
GraphDef), _graph_node.pyx (GraphNode base class and builder methods),
and _subclasses.pyx (all concrete node subclasses). Long method bodies
in GraphNode are factored into cdef inline GN_* helpers following
existing codebase conventions. Handle property annotations updated to
use driver.* types consistently.

Made-with: Cursor
@Andy-Jost
Copy link
Copy Markdown
Contributor Author

_graphdef.pyx was broken into 3 parts under _graph_def/. No need to review those in detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module feature New feature or request P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant