Desktop automation has reached an inflection point. For two decades, Robotic Process Automation (RPA) dominated enterprise workflow automation through deterministic scripting. Today, a fundamentally different architecture—vision-language-action (VLA) GUI agents—challenges the assumption that automation requires brittle, hand-coded selectors. These are not competing products on the same spectrum; they represent distinct architectural paradigms optimized for different problem classes.
This article dissects both architectures at the systems level, examines where each fails, and analyzes how Mano-P, an open-source GUI agent project by Mininglamp Technology, implements the VLA paradigm with on-device inference.
The Structural Fragility of RPA
RPA tools—UiPath, Automation Anywhere, Blue Prism—operate on a selector-action model. Each automation step identifies a UI element via DOM path, CSS selector, accessibility attribute, or pixel coordinate, then executes a predefined action. This architecture carries four compounding failure modes:
DOM Coupling and Selector Fragility. A single UI update—renamed button ID, restructured div hierarchy, relocated modal—breaks the entire downstream chain. Enterprise RPA deployments report 30-40% of maintenance effort goes to selector repair after application updates. This is not a bug; it is the architectural consequence of coupling automation logic to implementation-specific element identifiers rather than semantic intent.









