Projects
Serpent Robotics: SR-01
2025
robotics · startup · full-stack product
A rope-climbing robot for tree care. I co-founded the company and built across hardware, controls, interface, business model, and website.
I co-founded Serpent Robotics and spearheaded product development, controls, and design for a robot that keeps tree care operators safely on the ground. We raised $270K+ in non-dilutive funding across eight competition wins including Wharton VIP-X, Pennovation, and the President's Engagement and Innovation Prize. 65+ arborist interviews and 12 shadowing days shaped the product before we built anything. Three signed pilot contracts. I also designed and built serpentrobotics.com end-to-end.
$270K+ raised, non-dilutive | 3 pilot contracts signed | 65+ arborist interviews | 15× higher fatality than avg.
Skills: Python, Flutter, C++, ESP32, Raspberry Pi, CAD, Customer Discovery, PRD
Monocular Grasp Estimation
2025
computer vision · ml pipeline
Robotic grasping from a single RGB camera, no depth sensor. ~80% accuracy on Cornell. Best Technical Excellence award.
Most robotic grasping systems need a depth camera. This one works from a single RGB image. I built the full pipeline: monocular pseudo-depth replaces the depth sensor, then a heuristic fusing edge saliency, center-of-gravity ranking, and ray-casting determines where and how wide to grasp. ~80% Top-1 accuracy on the Cornell Grasping Dataset, running at an estimated 5 FPS on a Raspberry Pi 3.
~80% Top-1 accuracy, Cornell | 5 FPS on Raspberry Pi 3 | #1 Best Technical Excellence
Role: Computer vision engineer · Grasp estimation pipeline
Team: Solo (shown portion)
Skills: Python, Computer Vision, DepthAnythingV2, OpenCV, PyTorch
- Upgraded depth backbone from MiDaS to DepthAnythingV2 Small after systematic evaluation
- Fused saliency map: Canny edge detection + depth gradient magnitude
- CoG-biased ray-casting for zero-shot grasp width without any training data
- Modular, swappable architecture for depth estimation and pose estimation modules
I built a robotic grasping pipeline that works from a single RGB image. The goal was simple: get usable grasp estimates on hardware that can't carry a depth camera.
System Architecture
A single RGB frame moves through three main stages: pseudo-depth estimation, grasp candidate generation, and ranking. Each part is modular, so I could swap depth backbones, heuristics, and pose-estimation components without rebuilding the rest of the stack. That mattered because most of the work was iterative. Every version broke in a slightly different way, and those failure modes drove the next design choice.
Depth Estimation
I started with MiDaS and switched to DepthAnythingV2 Small after testing both. MiDaS was faster in raw inference, but DepthAnythingV2 gave cleaner silhouettes and steadier gradients around object boundaries. That mattered more for the downstream heuristic than squeezing out a bit of speed. The output is normalized rather than metric, which keeps the pipeline deployable without calibrated sensors.
Heuristic Grasp Estimation
Grasp candidates come from a fused saliency map combining Canny edges and depth-gradient magnitude. I then filter and rank them with a physics-informed score that favors positions closer to the object's center of gravity. A CoG-biased ray-casting step estimates grasp width from the object's geometry, which lets the system make stable zero-shot predictions without training data. That choice was deliberate. When the deployment setting is unknown, geometry can be more reliable than a model trained on the wrong distribution.
Real-object Inference & Tuning
Cornell was the benchmark, but I also used real objects to see whether the heuristic held up outside curated data. That meant inspecting how center-of-gravity boosting, contour direction, candidate count, and ray-casting width behaved on unfamiliar shapes. I built an internal tuning dashboard to expose the full decision chain, including latency, valid grasps, output count, and ray-casting parameters. It made debugging much easier because I could see why a prediction worked, not just whether it worked.
Evaluation
On the Cornell Grasping Dataset, the final pipeline reached roughly 80% Top-1 grasp accuracy while staying lightweight enough for embedded hardware. The full walkthrough shows the sequence from RGB input to pseudo-depth, masking, edge extraction, center-of-gravity estimate, grasp set, and final overlay against ground truth. Estimated performance was about 5 FPS on a Raspberry Pi 3. The project won Best Technical Excellence in a 510-student UPenn CIS course.
- Building a complete real-time vision pipeline from RGB input to grasp output showed how individual module decisions (depth backbone choice, saliency weighting) propagate through the entire system
- Geometry-based methods can outperform deep learning in constrained deployment scenarios; understanding when to use each approach is the actual engineering skill
- Evaluating against the Cornell dataset provided a clear benchmark but exposed the gap between dataset performance and real-world generalization, a gap that would need addressing before deployment
Machine Learning to Predict Traffic Accident Severity
2025
machine learning · model training & selection
A case study in model selection and honest framing. We predicted US traffic accident severity from weather, road, and Spotify Top-200 features, and then spent most of our energy deciding what the numbers actually meant.
A CIS 5200 final project at Penn with Mateo Taylor and Lucas Flahault. The interesting work wasn't training the final model. It was deciding which of twelve candidates deserved the tuning budget, figuring out why we bailed on neural nets after the first capacity sweep, explaining why a kernel approximation was worth trying even though it lost, and working out how to read a +4.8% R² lift without overclaiming any of it. The results sit at the end of the write-up, but the process is really the point.
12 Model families compared head-to-head | 4× R² improvement: 0.11 OLS → 0.435 tuned XGBoost | +4.8% Additional lift from Spotify features
Role: ML engineer · Model selection · Analysis
Team: Mateo Taylor, Lucas Flahault, Steyn Knollema. Built for CIS 5200 Machine Learning at UPenn.
Skills: Python, XGBoost, scikit-learn, TensorFlow, Pandas, Feature Engineering, Hyperparameter Tuning
- Framed severity as regression rather than classification so the loss could treat the 1 to 4 scale as ordinal instead of categorical
- Screened twelve candidate models on a 500k-row subsample before spending any of the tuning budget on the eventual winner
- Treated the Nyström-kernel detour as a cheap negative result that confirmed linear methods were not going to rescue this dataset
- Ran tuning on both the baseline and the song-augmented datasets independently, so the +4.8% lift could not be dismissed as a tuning artifact
- Read the feature-importance plot as a reframing tool, and rewrote the thesis from 'music predicts crashes' into 'music is a regional fingerprint'
The pipeline itself was a fairly standard five-stage machine learning workflow: prepare the data, screen candidate models on a subsample, evaluate them head-to-head, retrain the winner at full scale with proper tuning, and then read the result honestly. Where the project actually lived was in the decisions at each stage, the small calls about what to drop, what to keep, what to stop testing, and what the numbers were really telling us once we were willing to look at them carefully.
Stage 1. Data Preparation
The raw US-Accidents dataset held 7.73M records across 47 features and nine years, which we then joined to the weekly Spotify Top-200 charts (74k rows, 40 features) by ISO-week and US region, and that join cut the usable set down to 4.08M rows. Four columns were dropped for missingness above 30% (End_Lat, End_Lng, Wind_Chill, Precipitation), since imputation at this scale was much more likely to manufacture false structure than to recover any real signal. Nine more columns went for redundancy or unusable variance (ID, Source, Description, Street, County, Zipcode, Country, Airport_Code, Weather_Timestamp). From there we expanded StartTime into six engineered temporal features, including accident duration, start hour, day, month, is-morning-rush, is-evening-rush, and is-night, on the theory that bucketing time would give the tree splitters cleaner cut points than a raw datetime ever could. Three of those engineered features ended up in the final model's top ten, which was a nice confirmation that the theory had been worth the effort.
Stage 2. Initial Training on a Subsample
Before spending any real compute on the full 4.1M rows, we screened twelve model families on a 500k-row subsample with a small GridSearchCV on each. The screening was deliberately shallow, with just enough tuning to rank the candidates fairly and not a minute more, because the point was to allocate budget rather than to find the winner on the first pass. We framed severity as a regression problem rather than a classification one so that the loss would respect the 1 to 4 ordinal scale, and then we let the subsample do the sorting. The linear and regularized models all clustered around R² 0.11. The Nyström-kernel RBF approximation, which was our cheap way to test for non-linearity without paying the O(N³) cost of full kernel methods, came in at 0.10, and that was a negative result, but a clean one. The tree ensembles pulled ahead almost immediately, with XGBoost out in front at 0.435.
Stage 3. Testing and Selection
The screen gave us a ranking, and then the test stage decided which of those candidates actually deserved a full-scale retrain. Three decisions came out of it. First, the Nyström detour was over, because a higher-dimensional projection was not going to rescue the linear family, and the tree direction was clearly where the tuning budget belonged. Second, the neural nets were in trouble. We had tested three adaptive widths (Small, Medium, and Large, with hidden layers scaled to input dimension), and performance got steadily worse as capacity grew, with the Large variant going negative R² before epoch 20. On fifty-odd semantic tabular features, capacity turned out to be a liability rather than an asset. Third, XGBoost, LightGBM, and Random Forest all survived the cut, with XGBoost the clear front-runner on accuracy and training time at the same time.
Stage 4. Full Training and Tuning
With XGBoost selected, we ran a 25-iteration RandomizedSearchCV with 3-fold CV on a 1M-row training split, searching n_estimators, max_depth, learning_rate, and reg_alpha/lambda. The critical move was running the search twice, once on the baseline feature set and once on the song-augmented set, entirely independently, and then confirming that both runs had converged on near-identical hyperparameters. That convergence was the evidence that any lift from the song features could not be written off as a tuning-budget artifact. The final settings landed at n_estimators=3570, learning_rate=0.0196, max_depth=7, and colsample_bytree=0.709, and most of the gain over default XGBoost came out of the n_estimators by max_depth interaction rather than out of the learning rate.
Stage 5. Results
The final tuned XGBoost on the song-augmented dataset hit R² 0.4352 with RMSE 0.3948 on the held-out 20%, while the baseline without songs landed at R² 0.4152, which meant the songs contributed a +4.8% relative lift that stayed consistent across tuning runs. For context, OLS topped out at 0.110, the best neural net reached about 0.29 before overfitting took over, and Random Forest got to 0.33. The pattern across models was useful on its own terms: the linear models saw the largest percentage gains from song features (+19%), the tree-based models saw the largest absolute gains, and the shallow trees (Decision Tree and Random Forest) actually got slightly worse once songs were added, which says something real about how those models handle uncorrelated noise.
Reading the Result
Traffic_Signal dominated the feature importance, followed by road geometry and the engineered temporal buckets. The song features that did register were oddly specific, with the track 'DÁKITI' by Bad Bunny, the artist 'The Weeknd', song speechiness, and song duration all showing up in the long tail. That pattern is genuinely hard to read as music causing crashes. It reads much more like music as a regional fingerprint, acting as a proxy for which demographic is on the road, in which part of the country, during which season of the year. The +4.8% lift is real and defensible, but the causal story simply isn't. We could have framed this project as 'Spotify predicts crashes' and it would have been a lot more shareable, but we didn't, because that is not what the data actually says. Reading the feature importance plot honestly and rewriting the thesis around what it showed was the last real decision the project asked us to make.
- Watching NN Adaptive Large drift into negative R² territory while the Small variant stayed perfectly healthy ended the neural-net branch of this project in a single afternoon, and the takeaway was blunt: on tabular data with around 50 semantic features, extra capacity is a liability, and the sooner you are willing to see that the sooner the compute goes somewhere genuinely useful
- Running the hyperparameter search twice, once on the baseline and once on the song-augmented dataset, entirely independently, was more valuable than running any single search for longer, because the fact that both searches converged on near-identical parameters was the piece of evidence that turned a +0.02 R² reading from 'maybe noise' into 'real, but small'
- The feature-importance plot changed this project's thesis more than any headline metric ever did, and the real discipline was sitting with it, noticing that it didn't support the original story we'd hoped to tell, and then rewriting the claim around what the data actually said instead of around what we had set out to show
Autonomous BattleBot
2024
mechatronics · embedded · autonomous
Autonomous combat robot for live competition. Three control modes, custom perfboards, and a hard lesson in power isolation.
A compact battle robot built for live multi-team competition. I owned embedded software and electrical integration. Three control modes: manual WASD, wall-following via TOF sensors, and coordinate-based autonomous navigation using Vive positioning and an ESP32 browser calibration interface. The biggest lesson was practical: motors and logic need separate power domains, and I learned that the hard way.
3 autonomous operating modes | C/C++ from pin control to state behavior
Role: Embedded software lead · Electrical integration
Team: Matthew Rabin, Stan Han, Steyn Knollema
Skills: C, C++, ESP32, Circuit Design, PID Control, CAD
- Custom perfboards centralizing power distribution, motor control, sensing, and interconnects
- Separate power domains: resolved motor-induced voltage noise and logic instability
- Vive positioning module for global coordinate navigation; TOF for wall-following
- Oscilloscope-diagnosed PWM behavior and voltage drops to reach stable final system
A compact autonomous combat robot built for live competition. I handled embedded software and electrical integration, which meant everything from custom perfboards to autonomous navigation to learning, very concretely, why motors and logic need separate power domains.
Design Concepts
The project started as a full system, not just a control algorithm. Early sketches defined the layout, the schematic locked down how motors, sensing, and compute would share the chassis, and the drawings turned that into something we could actually build. Doing that work early made the robot legible as one system before we committed to fabrication.
Electrical Design
The electrical system centered on custom perfboards handling power distribution, motor control, sensing, and interconnects in a chassis with almost no spare room. The big lesson came early: motor noise destabilized the logic stack. Splitting motors and logic into separate power domains fixed the problem and turned out to be the decision everything else depended on. The same board-level work also had to integrate three TOF sensors and a top-hat switch without turning the robot into a wiring mess.
Software Architecture
The software supported three modes: direct WASD control, wall-following from the TOF array, and coordinate-based autonomous navigation using Vive positioning. An ESP32 access-point interface exposed live pose, heading, and corner calibration so the robot could build a usable arena frame before a match. The hard part was not writing each mode in isolation. It was making the system switch cleanly between local sensing, global positioning, and direct control without losing stability.
System Integration
Integration was the real project. Most of the interesting problems only showed up once the mechanical, electrical, and software layers were all alive at the same time. Debugging meant tracing timing, power stability, and signal integrity together, with an oscilloscope confirming voltage drops, PWM behavior, and noise instead of guessing. The final robot held together well enough to compete as one system, not just as a stack of separate subsystems.
- Separate power domains for motors and logic is a fundamental embedded systems principle: learning it through a live debugging failure made it permanent knowledge
- Cross-domain debugging (software timing + hardware power + signal integrity simultaneously) requires systematic isolation; the oscilloscope was essential, not optional
- Autonomous behaviors that work in isolation can fail at system integration: testing subsystems separately is necessary but not sufficient
Fresh Start: Habit App
2025
ux design · ai-native · behavior design
Habit app designed around why resolutions actually fail. 51 concepts narrowed to 5 features, tested with 25 users.
A habit app designed around why resolutions actually fail, not the reasons people give. Interviews with students, a Wharton habit-formation professor, and professional athletes pointed to three structural causes: vague goals, schedules that don't flex, and support that disappears after the first week. The AI companion (Berry) is deliberately quiet, a background presence rather than the main feature. Tested with 25 users across multiple iterations.
51 concepts sketched | 25 users tested | 7 interview subjects
Role: UX researcher · Interaction designer · Visual designer
Team: Keyu Zhu, Steyn Knollema
Skills: Figma, User Research, JTBD, Behavior Design, Prototyping
- Three root failure modes: vague goals, no schedule fit, support drop-off after early motivation
- Berry, AI designed as quiet background support, not a visible feature
- Full style guide: typography, color, spacing, component behavior
- 25 users tested across multiple iterations; 8 in initial low-fi usability sessions
A habit-building app for students and young professionals, designed around why resolutions actually fail in practice. The project moved from behavioral research through 51 concepts to a prototype tested with 25 users.
Behavior Research
We started by looking at why resolutions break down in real life. Interviews with students, a Wharton habit-formation professor, and professional athletes pointed to three recurring problems: goals stay vague, routines don't fit real schedules, and support disappears once the early motivation spike fades. The early research widened the problem before the product got narrower again.
Archetypes & Opportunity
Persona work and competitor analysis turned the interviews into design constraints. The product had to work for inconsistent, tired, easily derailed users, not just disciplined planners. Looking at adjacent products made the gap obvious: most habit apps reward streaks and reminders, but very few make recovery and rescheduling feel normal.
Concept Exploration
We explored broadly, sketching 51 concepts and narrowing them to five MVP features. Low-fidelity wireframes were then tested with eight users, which exposed navigation, hierarchy, and onboarding issues early enough to fix them cheaply. The product only started working once planning, execution, and recovery were treated as one loop instead of three separate features.
System Refinement
As the direction stabilized, we refined the product into something warmer but still restrained. Berry stayed deliberately quiet: supportive in tone, never the main event. The visual system defined typography, color, spacing, iconography, and component behavior so the app could feel calm without slipping into generic wellness-app language.
Final Result
The final product focuses on the features that directly answer the failure modes: onboarding that turns broad goals into realistic plans, weekly planning that flexes around real schedules, low-friction task completion, and progress feedback without guilt. The core screens work together as one loop: plan, do, recover.
- Designing an AI as quiet background support rather than a visible feature required constant restraint: every design review had proposals to make Berry more prominent
- Habit-driven products work best when failure and recovery are treated as normal states: the guilt loop created by streak-based apps is a design problem, not a user problem
- Sustainable behavior change depends on respecting real schedules and energy levels: a system that only works when users are at their best will fail most of the time
.Pixel: Morning Planner
2025
physical product · digital interface · solo
A physical device for phone-free mornings. Solo project, 3 months. 6 out of 8 testers said they'd buy it.
People reach for their phone first thing not because they want to, but because it's the only convenient way to check the time, weather, and schedule. .Pixel gives them those things without the rest. Dot-matrix display, touch-only interaction, no visible buttons. I prototyped the full interface in Figma before building hardware. Tested with 8 users in their actual morning routines: 7 out of 8 felt calmer, 6 out of 8 said they'd buy it.
7/8 users felt calmer | 6/8 would potentially buy | solo research -> concept -> prototype -> test
Role: Designer · Fabricator · Engineer
Team: Solo
Skills: Physical Prototyping, CAD, Figma, Embedded Hardware, User Testing
- 51 ideation concepts and paper models before committing to a direction
- Calm, minimal physical form: clean white, dot-matrix display, touch-only interaction
- High-fidelity Figma interface prototyped before hardware to test all screens and flows
- 8 users tested in real morning routines, validated the concept before full build
.Pixel is a solo project about a small but stubborn problem: people start the morning on their phone because it's convenient, not because they want to. I tried to make the simplest object that could replace that habit.
Problem Framing
I framed the problem as a design constraint, not a moral one. People were reaching for their phone first thing because it bundled the time, weather, and schedule into one easy place. The challenge was to keep those useful functions and remove the rest of the cognitive drag that comes with opening a smartphone the moment you wake up.
Concept Exploration
I explored 51 concepts before committing to a direction. Most of them were deliberately quick: sketches, storyboards, and rough physical mockups that let me test the interaction without pretending the form was solved. Two ideas stood out, a music-focused device and a day planner, and users consistently pulled the project toward the planner. The catch was that it still had to feel lighter than using a device.
Physical Form
The final form is intentionally quiet: white housing, dot-matrix display, touch interaction, no visible buttons. I wanted it to feel more like a useful object on the table than another gadget asking for attention. Fast paper models helped me settle the proportions before moving into CAD and fabrication.
Interface System
Before building the full hardware, I prototyped the entire interface in Figma and tested it in context. That made it possible to refine the flows, reduce friction, and see whether the interactions still felt calm when someone was half awake and reaching across the table. The interface was designed around quick readability. Each screen had to make sense at a glance.
Final Result
The final concept was tested with eight users in real morning routines, not in a lab. That mattered because the device was not competing with an abstract problem. It was competing with habit. Seven out of eight users said they felt calmer and more focused. Six said they would buy it.
- Designing for subtraction is harder than designing for features: every addition had to justify itself against the cost of complexity, and most additions didn't survive that test
- Physical prototyping before digital saved time: proportions and hand feel that look fine in CAD feel wrong in hand, and paper models exposed that in an afternoon
- Testing in real context produced fundamentally different feedback than lab testing: the device was competing against a phone habit, not a neutral baseline
AR Machine Operating Instructions
2023
augmented reality · published research · hci
AR system on HoloLens that puts step-by-step instructions directly onto the machine. Peer-reviewed, published, adopted for training.
An AR instruction system on HoloLens that overlays procedural guidance directly onto a milling machine. The instructions appear on the component you're about to operate, not on a separate screen. I validated it with engineering students, workshop professionals, and HSE regulators: 47% engagement improvement, fewer errors, lower safety risk compared to paper manuals. Published as a peer-reviewed paper and adopted at the university for safety training.
47% engagement improvement | adopted at university for safety training | peer reviewed publication
Role: Researcher · Designer · Developer
Team: Solo
Skills: Microsoft HoloLens, AR Prototyping, User Testing, Cost Modeling, HSE
- AR prototype on HoloLens: part-specific and task-specific holographic overlays aligned to machine components
- Sequential instruction flows with interactive checkpoints preventing unsafe actions
- Cost modeling demonstrating feasibility for educational and industrial deployment
- Human-centered validation: engineering students, workshop professionals, HSE experts
- Spatial instructions outperformed static manuals on every measured engagement metric
- User research aligned final design with regulatory and safety requirements
I designed and built an AR instruction system on Microsoft HoloLens that overlays guidance directly onto a PICOMAX 20 milling machine. It was tested with real users in a real workshop, published, and adopted for training.
System Architecture
The prototype runs on Microsoft HoloLens and anchors holographic overlays directly to the physical machine. Instructions appear on the component being referenced rather than on a separate screen. The system supports both student and professional instruction flows, with different levels of guidance depending on the operator.
Instruction Flow Design
Instructions are delivered step by step, with checkpoints between stages so the operator cannot advance without confirming the current action. That matters on safety-critical machinery, where skipped steps are the actual problem. The sequence is easy to follow because the guidance stays on the machine itself instead of forcing the user to keep matching a manual to a component.
Cost Modeling
I also cost-modeled the system for larger deployment across educational and industrial settings. The analysis covered HoloLens hardware, content authoring, maintenance, and training. The useful finding was that content authoring, not hardware, is the main cost driver. That would matter a lot more than the headset itself in any commercial version.
- Interactive checkpoints are a design pattern, not a feature: they encode safe procedure as a constraint the system enforces rather than the user remembering
- Spatial AR instructions require precise spatial anchor calibration; misalignment by even a few centimeters breaks the instructional link between hologram and physical component
- Content authoring, not hardware, is the primary cost driver at scale; an insight that would change the product roadmap for any commercial version
The core question was whether spatial instructions could outperform manuals for safety-critical physical tasks. Testing across three user groups gave a clear answer.
Validation Framework
I validated the system with three groups: engineering students, workshop professionals, and HSE experts. Each group cared about something different. Students focused on learning, professionals on workflow, and HSE experts on compliance and risk. Testing happened on the actual PICOMAX 20, comparing AR against traditional documentation on engagement, confidence, and procedural error.
Validated Outcome
Operators using the AR system showed higher engagement and confidence, fewer procedural errors, and lower safety risk than with traditional manuals. Engagement improved by 47%. The advantage came from something straightforward: the instructions lived on the machine itself, so users did not have to translate between a page and a component.
Publication & Implications
The work was published as a peer-reviewed paper through the University of Twente and adopted for safety training at the university. Turning a practical prototype into a formal contribution meant building a defensible measurement approach, placing the work in the HCI and AR literature, and documenting it cleanly enough that someone else could evaluate it.
- Spatial instructions beat manuals for physical, safety-critical tasks: the advantage comes from eliminating the translation step between reading a description and finding the component
- Validation across multiple user groups (students, professionals, regulators) produces far more defensible results than single-group testing
- AR requires alignment of technology, user cognition, and regulatory requirements. A prototype that works technically can still fail validation if it doesn't match how operators actually work
For my bachelor's thesis, I designed, built, and validated an AR instruction system on Microsoft HoloLens for a PICOMAX 20 milling machine. It was tested in a real workshop, published as a peer-reviewed paper, and adopted for safety training.
The System
Part-specific holographic overlays are anchored directly to machine components, so the guidance appears where the operator is actually looking. The instruction flow includes checkpoints, which means the system will not advance until the current step is confirmed. It also supports different guidance depth for student and professional users.
Validation
I validated the system with engineering students, workshop professionals, and HSE experts. Task-matched testing on the real machine measured engagement, procedural error, and confidence while comparing AR against traditional documentation. Operators were more engaged, made fewer errors, and handled the procedure with more confidence.
Publication & Cost Modeling
The work was published as a peer-reviewed paper (full thesis) through the University of Twente and adopted for safety training. I also cost-modeled the system for larger deployment and found that content authoring, not hardware, is the main cost driver at scale.
- Spatial instructions beat manuals for physical, safety-critical tasks: the advantage comes from eliminating the translation step between reading a description and finding the component
- Interactive checkpoints encode safe procedure as a system constraint rather than a user memory task: a design pattern, not a feature
- Validation across multiple user groups (students, professionals, regulators) produces far more defensible results than single-group testing
SpatialPixel: Applied AI Workspaces
2025
spatial computing · research · co-design
Research and co-design for SpatialPixel's AI workspace platform. Fieldwork uncovered a third collaboration mode no one had named.
A client project with SpatialPixel on how their Procession platform could better support creative collaboration. Through historical research, field observation, and co-design sessions, we found that teams don't just work solo or together. They regroup: cycling between independent work and collective synthesis. That insight led to the Spatial Library, a template system that made the open-ended platform easier to actually adopt.
4 in-depth interviews | 7 professionals in group session | 4 intercept interviews | dozens field observations
Role: Research lead · Co-design facilitator · Concept designer
Team: Matthew Rabin, Sofia El Amrani, Hsin Wang, Steyn Knollema
Skills: User Research, Co-Design, Spatial Computing, AI Prototyping, Concept Design
- Historical research on spatial computing framed the project against recurring patterns in projection, collaboration, and interface design
- Fieldwork revealed re-group, a third collaboration mode between solo and fully group work
- HMW development and co-design sessions exposed where Procession broke down: unclear starting points and reliance on outside tools
- Final direction: the Spatial Library, a template system that made Procession easier to adopt in real workflows
- Turned an open-ended client brief into a structured research program grounded in history, observation, and synthesis
- Translated ambiguous research into a client-ready product direction rather than a loose set of ideas
- Shifted the recommendation from raw AI capability toward lower-friction starting points teams could actually adopt
A client project with SpatialPixel on how Procession could better support creative collaboration. The work moved from research and field observation into co-design and, eventually, a much clearer product direction.
Historical Research
The project started with a historical look at spatial computing, collaboration, and makerspaces. We traced how projection, gesture-based interaction, and shared work environments had evolved over time. That gave the project some structure before we moved into the field and stopped it from becoming a loose collection of interesting ideas.
Observations & User Research
We looked at how collaboration actually unfolds in shared creative environments through interviews, group sessions, intercept interviews, and on-site observation. The main insight was that collaboration is not just solo work or group work. Teams constantly move between the two. We called that pattern re-group: independent work followed by moments of synthesis, alignment, and decision-making.
HMWs & Idea Development
The HMW phase turned the research into design direction. We framed questions around how Procession could support physical brainstorming while keeping the clarity and shareability of digital tools. Rapid sketching and AI-assisted mockups helped us explore broadly before narrowing down what was actually worth pursuing.
Co-Design Sessions
Co-design sessions pushed the ideas into real workflows. We brought projected cards, prompts, and shared surfaces into collaborative settings to see how teams actually used Procession. The friction became obvious very quickly: even informed users reached for outside tools, many did not know where to begin, and the platform needed stronger starting points if it was going to work in practice.
Concept Refinement
The evidence pointed toward generative and visualization-driven uses where projection could genuinely help creative teams. The concept moved away from open-ended capability demos and toward faster ways to explore options, align on ideas, and make decisions together.
Final Result
Through user testing and iteration, the interface shifted from a demo into something more practical. The final recommendation was the Spatial Library: a template system for brainstorming, mapping, and collaboration. It gives users a concrete place to start without removing the flexibility that made Procession interesting in the first place.
- Re-group appeared in the fieldwork, not in the brief. The most useful finding came from observation contradicting an assumption, not confirming one.
- Even users who understood Procession didn't know where to start. Entry friction matters more than depth of capability.
- Open-ended briefs need a structural frame early or the exploration produces interesting observations that never add up to a direction.
- Templates reduced adoption friction more effectively than new features would have. Sometimes the product problem is onboarding, not capability.
SpatialPixel came with an open-ended brief: figure out how Procession could better support collaboration in creative spaces. My role was to turn that ambiguity into a structured research process and a usable recommendation.
Framing The Work
The project started without a defined problem or solution space. I anchored the work in a historical investigation of spatial computing and collaboration so we could evaluate Procession against longer-running patterns instead of isolated feature ideas. That framing kept the work from drifting.
Research Program & Insight
I structured the evidence gathering around interviews, group sessions, intercept interviews, and on-site observation. The most important outcome was the re-group pattern: collaboration was not just solo work or group work, but a constant movement between the two. That gave the client a specific behavioral gap in Procession to address.
From HMWs To Direction
The HMW phase and co-design sessions translated the research into product direction. We learned that people still relied on outside tools for context, struggled to know where to begin in Procession, and fell into two camps: users who adopt tools as-is, and users who immediately reshape them. Those findings pushed the recommendation away from raw prompting power and toward stronger starting points.
Spatial Library Recommendation
The final deliverable was the Spatial Library, a template system that packages brainstorming, mapping, and collaboration flows into reusable starting points. Instead of asking teams to figure out how to prompt the system from scratch, it shifted the product toward what to create. That lowered adoption friction without flattening the tool.
- Open-ended client work needs a framing device early or the exploration stays interesting but directionless.
- Behavioral insights only create value when they are translated into something the client can actually build.
- In AI products, reducing starting friction can matter more than exposing more raw capability.
ChatIT: AI B2B Consultancy
2024
startup · b2b · go-to-market
AI consultancy I co-founded in the Netherlands. 15 clients, €15K revenue, 80% margins.
An AI consultancy I co-founded in the Netherlands. 15 clients across 5 industries, €15K revenue at 80% gross margins. Cold outreach didn't work, so we stopped and built a workshop-based channel instead: educate business owners first, sell second. 35% conversion rate at under €500 per customer. Pricing was tied to efficiency outcomes, not hours. Average client saw 40% operational improvement.
15 clients, 5 industries | €15K revenue, 80% margins | 35% workshop → customer
Role: Co-founder · Business development · Product
Team: Floris Vossebeld, Steyn Knollema
Skills: B2B Sales, GTM Strategy, Pricing, AI Products, Customer Acquisition
- Hybrid pricing: project-based engagement + recurring subscription model
- Workshop acquisition channel: educated prospects first, converted 35% at <€500 CAC
- Competitive analysis of 12 AI vendors to prioritize highest-impact roadmap features
- 40% improvement in client operational efficiency across 5 industries
An AI consultancy I co-founded in the Netherlands. We started without clients or a reliable channel and had to figure both out from scratch. The result was 15 clients across five industries, €15K in revenue, and 80% gross margins.
Business Model
The model combined project-based work for implementation with recurring subscription revenue for ongoing access and support. Project fees covered the custom work up front. Subscriptions made the revenue more predictable and kept the relationship active after the initial build. Pricing was tied to efficiency gains rather than time spent.
Acquisition Channel
Cold outreach didn't work, so we stopped doing it. Instead, we built a workshop-based channel that educated business owners on what AI could actually do before we tried to sell anything. That created a prospect pool that already understood the value and had self-selected for genuine interest. Workshop-to-customer conversion reached 35% at under €500 CAC, far better than the outbound approaches we had tried.
Product & Results
A competitive analysis of 12 AI vendors informed which tools and approaches to prioritize. We chose features based on two things: how much they helped the client's workflow, and whether we could implement them well with a small team. Across the five industries we served, clients reported an average 40% improvement in operational efficiency.
- Education-led acquisition works because it removes the trust problem: clients who understand what you're selling don't need to be convinced, they need to be organized
- Hybrid pricing (project + subscription) aligns incentives better than either model alone: project work gets done, recurring revenue keeps the relationship
- B2B sales cycle length is inversely proportional to how clearly you can demonstrate ROI. The businesses that moved fastest had the clearest before/after metrics
Penn Wharton Innovation Fund
2024–2026
early-stage investing · due diligence · deal flow
Two years on Penn's student-run pre-seed fund. 300+ startups reviewed, $220K+ deployed annually.
Penn's student-run pre-seed fund. $220K+ deployed annually into early-stage founders across the university. I've been on the investment team for two consecutive years, one of the few members asked back. Four application cycles a year, weekly committee meetings, written due diligence on every assigned startup, and practicing VCs joining to challenge our reasoning. Real money, real founders, real consequences.
300+ startups reviewed per year | $220K+ deployed annually | 2 years on the team | 4 investment cycles per year
Role: Investment team member · Due diligence · Portfolio selection
Team: 20-person investment team across all Penn schools
Skills: Due Diligence, Market Sizing, Investment Thesis, Early-Stage Evaluation, Startup Assessment, Capital Allocation
- Reviewed 300+ startup applications per year: evaluated team quality, market sizing, problem validity, and early traction across hardware, software, and services
- Wrote structured due diligence feedback on every assigned venture: what works, what doesn't, and whether the risk profile justifies the investment
- Made real capital allocation decisions alongside students from Wharton, Engineering, and Law: diverse perspectives, actual consequences
- Learned deal evaluation from VCs who joined selected weekly sessions to pressure-test investment theses and challenge our reasoning
- One of the few team members selected for a second consecutive year: continuity of judgment across two full cycles of the Penn startup ecosystem
Penn's student-run pre-seed fund deploys $220K+ annually into early-stage founders across the university. I've been on the investment team for two consecutive years, one of the few members asked back, and have seen the full process up close.
How the Fund Works
The fund operates like a real one: four application cycles each year, weekly investment committee meetings, written due diligence on every assigned startup, and VC practitioners joining selected sessions to pressure-test the thesis. The cross-school mix of Wharton, Engineering, and Law gives the team a broader lens than most student investment groups. Every dollar deployed is a real decision with real consequences for founders.
Due Diligence
Due diligence on each assigned venture covered team quality, market sizing, problem validity, and early traction. I wrote structured feedback on every startup I reviewed: what was working, what wasn't, and whether the risk profile justified a pre-seed investment. The most common failure mode was not a bad idea. It was a team that had not talked to enough real customers to know whether they were solving the right problem.
What Two Years of Evaluation Taught Me
After reviewing 300+ startups a year across two full cycles, the pattern has been consistent across hardware, software, and services. The teams that do well usually ran customer discovery before they built. The teams that struggle usually polished the pitch before they validated the problem. At pre-seed, the important judgment call is less about the product itself and more about whether the team knows how to learn.
- Pre-seed investing is fundamentally a judgment call about learning speed: the best early-stage teams are the ones most willing to invalidate their own assumptions
- Market sizing methodology matters more than the number: a team that built their TAM bottom-up from validated customer segments is telling you something about how they think
- Being selected for a second year forced me to develop consistent judgment rather than situational opinions. You can't be ad hoc when you're accountable to prior reasoning