philadelphia · penn · graduating 2026

steyn knollema.

I take things apart and build new ones.
Usually where hardware, software, and people meet.

Dutch, finishing an MSE in Integrated Product Design at Penn across Wharton, Engineering, and Design. Co-founding Serpent Robotics on the side. I like work that makes more sense once you leave the slide deck and go talk to the actual user.

mse integrated product design, penn · wharton · engineering · weitzman

→ view work

about

steyn
knollema.

When I was a kid I took apart everything in the house. My parents stopped buying me nice things for a while. I just wanted to see how stuff worked, what was actually going on behind the plastic. I still do that, except now it's products and companies instead of clocks.

I'm Dutch. At fifteen I moved to the US for an exchange year with a family I'd never met, which looking back was kind of insane. Then I studied design engineering and business at the University of Twente, spent a year in Seoul, and now I'm at Penn doing an MSE in Integrated Product Design across Wharton, Engineering, and Design. I also do triathlons. I like doing things that are hard, getting out of my comfort zone, and learning fast.

Most of what I've worked on comes from the same instinct: something felt like it should exist, so I went and built it. Serpent Robotics started because I watched arborists do dangerous, repetitive pruning work and thought a robot should be doing that. ChatIT started because small Dutch companies needed AI tools but couldn't afford consultants. The investment fund was different, I reviewed 400+ startups and got better at telling the difference between ideas that sound good and ideas that actually work.

Finishing an MSE in Integrated Product Design at Penn this summer. Co-founded Serpent Robotics during the program. Before Penn I built an AI consultancy in the Netherlands and published a peer-reviewed paper on AR safety systems that came out of my bachelor's thesis.

now MSE in Integrated Product Design at Penn, graduating Summer 2026, while co-founding Serpent Robotics.

based Philadelphia, originally from the Netherlands.

training Industrial design engineering and business at Twente, plus a year in Seoul.

focus Product, engineering, and company building, usually where hardware and software meet.

languages Dutch and English, plus beginner German.

interests Triathlon, robotics, and early-stage startups.

features & publications

Het Financieele Dagblad In Nederland sturen ze coaches, in de VS een cheque opinion · 2026 Penn Today A robotic solution for safer tree trimming feature · 2026 Penn Engineering Class of 2026 President's Engagement and Innovation Prize Winners Announced feature · 2026 Poets & Quants Most Disruptive Business School Startup of 2025: Serpent Robotics feature · 2025 Penn Engineering Adventures in Innovation: Penn Engineering Startups Lead Venture Lab Challenge feature · 2025 Penn Center for Innovation Serpent Robotics: I-Corps Team, NSF Innovation Corps program · 2025 Y-Prize 2025 Finalists: Serpent feature · 2025 YouTube From Zero to One in AI Consulting talk · 2025 Academic Paper Engineering Workshop Education and Supervision: AR for Machine Operation Guidance and Safety Instructions peer-reviewed · 2024 University of Twente Building Trust to Boost Trade: Strategies for Enhancing Credibility in B2B Startups thesis · 2024

contact

let's talk

Graduating summer 2026. If you're building something interesting, I'd like to hear about it.

email → linkedin linkedin.com/in/steyn-knollema → github github.com/Steyn555247 → resume Resume (PDF) →

Projects

Serpent Robotics: a climbing robot for tree care

2025–2026

robotics · startup · full-stack product

A teleoperated rope-climbing robot that keeps arborists on the ground while the machine handles the cut — hardware, controls, operator interface, business model, and commercial website built as a team. Now field-tested in live tree canopies.

We co-founded Serpent Robotics to keep tree care operators safely on the ground while the machine handles the cut. The work spans customer discovery, hardware, embedded controls, the handheld operator interface, business model and pilot planning, and serpentrobotics.com. Before any hardware existed, 65+ arborist interviews and 12 shadowing days told us where the real danger was, and we built toward that. We've raised $340K+ in non-dilutive funding across eight competition wins including Wharton VIP-X and Pennovation, signed three pilot contracts, and the robot is now field-tested in live tree canopies.

$340K+ raised, non-dilutive | 3 pilot contracts signed | 65+ arborist interviews | 15× higher fatality than avg.

Skills: Python, Flutter, C++, ESP32, Raspberry Pi, CAD, Customer Discovery, PRD

Monocular Grasp Estimation

2025

computer vision · ml pipeline

Robotic grasping from a single RGB camera, no depth sensor. ~80% accuracy on Cornell. Best Technical Excellence award.

Most robotic grasping systems need a depth camera. This one works from a single RGB image. I built the full pipeline: monocular pseudo-depth replaces the depth sensor, then a heuristic fusing edge saliency, center-of-gravity ranking, and ray-casting determines where and how wide to grasp. ~80% Top-1 accuracy on the Cornell Grasping Dataset, running at an estimated 5 FPS on a Raspberry Pi 3.

~80% Top-1 accuracy, Cornell | 5 FPS on Raspberry Pi 3 | #1 Best Technical Excellence

Role: Computer vision engineer · Grasp estimation pipeline

Team: Solo (shown portion)

Skills: Python, Computer Vision, DepthAnythingV2, OpenCV, PyTorch

Upgraded depth backbone from MiDaS to DepthAnythingV2 Small after systematic evaluation
Fused saliency map: Canny edge detection + depth gradient magnitude
CoG-biased ray-casting for zero-shot grasp width without any training data
Modular, swappable architecture for depth estimation and pose estimation modules

I built a robotic grasping pipeline that works from a single RGB image. The goal was simple: get usable grasp estimates on hardware that can't carry a depth camera.

System Architecture

A single RGB frame moves through three main stages: pseudo-depth estimation, grasp candidate generation, and ranking. Each part is modular, so I could swap depth backbones, heuristics, and pose-estimation components without rebuilding the rest of the stack. That mattered because most of the work was iterative. Every version broke in a slightly different way, and those failure modes drove the next design choice.

Depth Estimation

I started with MiDaS and switched to DepthAnythingV2 Small after testing both. MiDaS was faster in raw inference, but DepthAnythingV2 gave cleaner silhouettes and steadier gradients around object boundaries. That mattered more for the downstream heuristic than squeezing out a bit of speed. The output is normalized rather than metric, which keeps the pipeline deployable without calibrated sensors.

Heuristic Grasp Estimation

Grasp candidates come from a fused saliency map combining Canny edges and depth-gradient magnitude. I then filter and rank them with a physics-informed score that favors positions closer to the object's center of gravity. A CoG-biased ray-casting step estimates grasp width from the object's geometry, which lets the system make stable zero-shot predictions without training data. That choice was deliberate. When the deployment setting is unknown, geometry can be more reliable than a model trained on the wrong distribution.

Real-object Inference & Tuning

Cornell was the benchmark, but I also used real objects to see whether the heuristic held up outside curated data. That meant inspecting how center-of-gravity boosting, contour direction, candidate count, and ray-casting width behaved on unfamiliar shapes. I built an internal tuning dashboard to expose the full decision chain, including latency, valid grasps, output count, and ray-casting parameters. It made debugging much easier because I could see why a prediction worked, not just whether it worked.

Evaluation

On the Cornell Grasping Dataset, the final pipeline reached roughly 80% Top-1 grasp accuracy while staying lightweight enough for embedded hardware. The full walkthrough shows the sequence from RGB input to pseudo-depth, masking, edge extraction, center-of-gravity estimate, grasp set, and final overlay against ground truth. Estimated performance was about 5 FPS on a Raspberry Pi 3. The project won Best Technical Excellence in a 510-student UPenn CIS course.

Zero-shot Generalization

To stress-test the geometry-only approach, I ran the pipeline on out-of-distribution objects: arctic animal figurines at very different scales. The same heuristic that ranked grasps on Cornell objects also picked sensible grasps on a small and a much larger figurine without any retraining or per-class tuning. That was the point of leaning on geometry instead of a learned grasp head. When the candidate score is built from edges, depth gradients, and a center-of-gravity prior, scale and category fall out of the math rather than out of a dataset.

Building a complete real-time vision pipeline from RGB input to grasp output showed how individual module decisions (depth backbone choice, saliency weighting) propagate through the entire system
Geometry-based methods can outperform deep learning in constrained deployment scenarios; understanding when to use each approach is the actual engineering skill
Evaluating against the Cornell dataset provided a clear benchmark but exposed the gap between dataset performance and real-world generalization, a gap that would need addressing before deployment

Machine Learning to Predict Traffic Accident Severity

2025

machine learning · model training & selection

A case study in model selection and honest framing. We predicted US traffic accident severity from weather, road, and Spotify Top-200 features, and then spent most of our energy deciding what the numbers actually meant.

A CIS 5200 final project at Penn with Mateo Taylor and Lucas Flahault. The interesting work wasn't training the final model. It was deciding which of twelve candidates deserved the tuning budget, figuring out why we bailed on neural nets after the first capacity sweep, explaining why a kernel approximation was worth trying even though it lost, and working out how to read a +4.8% R² lift without overclaiming any of it. The results sit at the end of the write-up, but the process is really the point.

12 Model families compared head-to-head | 4× R² improvement: 0.11 OLS → 0.435 tuned XGBoost | +4.8% Additional lift from Spotify features

Role: ML engineer · Model selection · Analysis

Team: Mateo Taylor, Lucas Flahault, Steyn Knollema. Built for CIS 5200 Machine Learning at UPenn.

Skills: Python, XGBoost, scikit-learn, TensorFlow, Pandas, Feature Engineering, Hyperparameter Tuning

Framed severity as regression rather than classification so the loss could treat the 1 to 4 scale as ordinal instead of categorical
Screened twelve candidate models on a 500k-row subsample before spending any of the tuning budget on the eventual winner
Treated the Nyström-kernel detour as a cheap negative result that confirmed linear methods were not going to rescue this dataset
Ran tuning on both the baseline and the song-augmented datasets independently, so the +4.8% lift could not be dismissed as a tuning artifact
Read the feature-importance plot as a reframing tool, and rewrote the thesis from 'music predicts crashes' into 'music is a regional fingerprint'

The pipeline itself was a fairly standard five-stage machine learning workflow: prepare the data, screen candidate models on a subsample, evaluate them head-to-head, retrain the winner at full scale with proper tuning, and then read the result honestly. Where the project actually lived was in the decisions at each stage, the small calls about what to drop, what to keep, what to stop testing, and what the numbers were really telling us once we were willing to look at them carefully.

Stage 1. Data Preparation

The raw US-Accidents dataset held 7.73M records across 47 features and nine years, which we then joined to the weekly Spotify Top-200 charts (74k rows, 40 features) by ISO-week and US region, and that join cut the usable set down to 4.08M rows. Four columns were dropped for missingness above 30% (End_Lat, End_Lng, Wind_Chill, Precipitation), since imputation at this scale was much more likely to manufacture false structure than to recover any real signal. Nine more columns went for redundancy or unusable variance (ID, Source, Description, Street, County, Zipcode, Country, Airport_Code, Weather_Timestamp). From there we expanded StartTime into six engineered temporal features, including accident duration, start hour, day, month, is-morning-rush, is-evening-rush, and is-night, on the theory that bucketing time would give the tree splitters cleaner cut points than a raw datetime ever could. Three of those engineered features ended up in the final model's top ten, which was a nice confirmation that the theory had been worth the effort.

Stage 2. Initial Training on a Subsample

Before spending any real compute on the full 4.1M rows, we screened twelve model families on a 500k-row subsample with a small GridSearchCV on each. The screening was deliberately shallow, with just enough tuning to rank the candidates fairly and not a minute more, because the point was to allocate budget rather than to find the winner on the first pass. We framed severity as a regression problem rather than a classification one so that the loss would respect the 1 to 4 ordinal scale, and then we let the subsample do the sorting. The linear and regularized models all clustered around R² 0.11. The Nyström-kernel RBF approximation, which was our cheap way to test for non-linearity without paying the O(N³) cost of full kernel methods, came in at 0.10, and that was a negative result, but a clean one. The tree ensembles pulled ahead almost immediately, with XGBoost out in front at 0.435.

Stage 3. Testing and Selection

The screen gave us a ranking, and then the test stage decided which of those candidates actually deserved a full-scale retrain. Three decisions came out of it. First, the Nyström detour was over, because a higher-dimensional projection was not going to rescue the linear family, and the tree direction was clearly where the tuning budget belonged. Second, the neural nets were in trouble. We had tested three adaptive widths (Small, Medium, and Large, with hidden layers scaled to input dimension), and performance got steadily worse as capacity grew, with the Large variant going negative R² before epoch 20. On fifty-odd semantic tabular features, capacity turned out to be a liability rather than an asset. Third, XGBoost, LightGBM, and Random Forest all survived the cut, with XGBoost the clear front-runner on accuracy and training time at the same time.

Stage 4. Full Training and Tuning

With XGBoost selected, we ran a 25-iteration RandomizedSearchCV with 3-fold CV on a 1M-row training split, searching n_estimators, max_depth, learning_rate, and reg_alpha/lambda. The critical move was running the search twice, once on the baseline feature set and once on the song-augmented set, entirely independently, and then confirming that both runs had converged on near-identical hyperparameters. That convergence was the evidence that any lift from the song features could not be written off as a tuning-budget artifact. The final settings landed at n_estimators=3570, learning_rate=0.0196, max_depth=7, and colsample_bytree=0.709, and most of the gain over default XGBoost came out of the n_estimators by max_depth interaction rather than out of the learning rate.

Stage 5. Results

The final tuned XGBoost on the song-augmented dataset hit R² 0.4352 with RMSE 0.3948 on the held-out 20%, while the baseline without songs landed at R² 0.4152, which meant the songs contributed a +4.8% relative lift that stayed consistent across tuning runs. For context, OLS topped out at 0.110, the best neural net reached about 0.29 before overfitting took over, and Random Forest got to 0.33. The pattern across models was useful on its own terms: the linear models saw the largest percentage gains from song features (+19%), the tree-based models saw the largest absolute gains, and the shallow trees (Decision Tree and Random Forest) actually got slightly worse once songs were added, which says something real about how those models handle uncorrelated noise.

Reading the Result

Traffic_Signal dominated the feature importance, followed by road geometry and the engineered temporal buckets. The song features that did register were oddly specific, with the track 'DÁKITI' by Bad Bunny, the artist 'The Weeknd', song speechiness, and song duration all showing up in the long tail. That pattern is genuinely hard to read as music causing crashes. It reads much more like music as a regional fingerprint, acting as a proxy for which demographic is on the road, in which part of the country, during which season of the year. The +4.8% lift is real and defensible, but the causal story simply isn't. We could have framed this project as 'Spotify predicts crashes' and it would have been a lot more shareable, but we didn't, because that is not what the data actually says. Reading the feature importance plot honestly and rewriting the thesis around what it showed was the last real decision the project asked us to make.

Watching NN Adaptive Large drift into negative R² territory while the Small variant stayed perfectly healthy ended the neural-net branch of this project in a single afternoon, and the takeaway was blunt: on tabular data with around 50 semantic features, extra capacity is a liability, and the sooner you are willing to see that the sooner the compute goes somewhere genuinely useful
Running the hyperparameter search twice, once on the baseline and once on the song-augmented dataset, entirely independently, was more valuable than running any single search for longer, because the fact that both searches converged on near-identical parameters was the piece of evidence that turned a +0.02 R² reading from 'maybe noise' into 'real, but small'
The feature-importance plot changed this project's thesis more than any headline metric ever did, and the real discipline was sitting with it, noticing that it didn't support the original story we'd hoped to tell, and then rewriting the claim around what the data actually said instead of around what we had set out to show

Autonomous BattleBot

2024

mechatronics · embedded · autonomous

Autonomous combat robot for live competition. Three control modes, custom perfboards, and a hard lesson in power isolation.

A compact battle robot built for live multi-team competition. I owned embedded software and electrical integration. Three control modes: manual WASD, wall-following via TOF sensors, and coordinate-based autonomous navigation using Vive positioning and an ESP32 browser calibration interface. The biggest lesson was practical: motors and logic need separate power domains, and I learned that the hard way.

3 autonomous operating modes | C/C++ from pin control to state behavior

Role: Embedded software lead · Electrical integration

Team: Matthew Rabin, Stan Han, Steyn Knollema

Skills: C, C++, ESP32, Circuit Design, PID Control, CAD

Custom perfboards centralizing power distribution, motor control, sensing, and interconnects
Separate power domains: resolved motor-induced voltage noise and logic instability
Vive positioning module for global coordinate navigation; TOF for wall-following
Oscilloscope-diagnosed PWM behavior and voltage drops to reach stable final system

A compact autonomous combat robot built for live competition. I handled embedded software and electrical integration, which meant everything from custom perfboards to autonomous navigation to learning, very concretely, why motors and logic need separate power domains.

Design Concepts

The project started as a full system, not just a control algorithm. Early sketches defined the layout, the schematic locked down how motors, sensing, and compute would share the chassis, and the drawings turned that into something we could actually build. Doing that work early made the robot legible as one system before we committed to fabrication.

Electrical Design

The electrical system centered on custom perfboards handling power distribution, motor control, sensing, and interconnects in a chassis with almost no spare room. The big lesson came early: motor noise destabilized the logic stack. Splitting motors and logic into separate power domains fixed the problem and turned out to be the decision everything else depended on. The same board-level work also had to integrate three TOF sensors and a top-hat switch without turning the robot into a wiring mess.

Software Architecture

The software supported three modes: direct WASD control, wall-following from the TOF array, and coordinate-based autonomous navigation using Vive positioning. An ESP32 access-point interface exposed live pose, heading, and corner calibration so the robot could build a usable arena frame before a match. The hard part was not writing each mode in isolation. It was making the system switch cleanly between local sensing, global positioning, and direct control without losing stability.

System Integration

Integration was the real project. Most of the interesting problems only showed up once the mechanical, electrical, and software layers were all alive at the same time. Debugging meant tracing timing, power stability, and signal integrity together, with an oscilloscope confirming voltage drops, PWM behavior, and noise instead of guessing. The final robot held together well enough to compete as one system, not just as a stack of separate subsystems.

Separate power domains for motors and logic is a fundamental embedded systems principle: learning it through a live debugging failure made it permanent knowledge
Cross-domain debugging (software timing + hardware power + signal integrity simultaneously) requires systematic isolation; the oscilloscope was essential, not optional
Autonomous behaviors that work in isolation can fail at system integration: testing subsystems separately is necessary but not sufficient

Fresh Start: Habit App

2025

ux design · ai-native · behavior design

Habit app designed around why resolutions actually fail. 51 concepts narrowed to 5 features, tested with 25 users.

A habit app designed around why resolutions actually fail, not the reasons people give. Interviews with students, a Wharton habit-formation professor, and professional athletes pointed to three structural causes: vague goals, schedules that don't flex, and support that disappears after the first week. The AI companion (Berry) is deliberately quiet, a background presence rather than the main feature. Tested with 25 users across multiple iterations.

51 concepts sketched | 25 users tested | 7 interview subjects

Role: UX researcher · Interaction designer · Visual designer

Team: Keyu Zhu, Steyn Knollema

Skills: Figma, User Research, JTBD, Behavior Design, Prototyping

Three root failure modes: vague goals, no schedule fit, support drop-off after early motivation
Berry, AI designed as quiet background support, not a visible feature
Full style guide: typography, color, spacing, component behavior
25 users tested across multiple iterations; 8 in initial low-fi usability sessions

A habit-building app for students and young professionals, designed around why resolutions actually fail in practice. The project moved from behavioral research through 51 concepts to a prototype tested with 25 users.

Behavior Research

We started by looking at why resolutions break down in real life. Interviews with students, a Wharton habit-formation professor, and professional athletes pointed to three recurring problems: goals stay vague, routines don't fit real schedules, and support disappears once the early motivation spike fades. The early research widened the problem before the product got narrower again.

Archetypes & Opportunity

Persona work and competitor analysis turned the interviews into design constraints. The product had to work for inconsistent, tired, easily derailed users, not just disciplined planners. Looking at adjacent products made the gap obvious: most habit apps reward streaks and reminders, but very few make recovery and rescheduling feel normal.

Concept Exploration

We explored broadly, sketching 51 concepts and narrowing them to five MVP features. Low-fidelity wireframes were then tested with eight users, which exposed navigation, hierarchy, and onboarding issues early enough to fix them cheaply. The product only started working once planning, execution, and recovery were treated as one loop instead of three separate features.

System Refinement

As the direction stabilized, we refined the product into something warmer but still restrained. Berry stayed deliberately quiet: supportive in tone, never the main event. The visual system defined typography, color, spacing, iconography, and component behavior so the app could feel calm without slipping into generic wellness-app language.

Final Result

The final product focuses on the features that directly answer the failure modes: onboarding that turns broad goals into realistic plans, weekly planning that flexes around real schedules, low-friction task completion, and progress feedback without guilt. The core screens work together as one loop: plan, do, recover.

Designing an AI as quiet background support rather than a visible feature required constant restraint: every design review had proposals to make Berry more prominent
Habit-driven products work best when failure and recovery are treated as normal states: the guilt loop created by streak-based apps is a design problem, not a user problem
Sustainable behavior change depends on respecting real schedules and energy levels: a system that only works when users are at their best will fail most of the time

.Pixel: Morning Planner

2025

physical product · digital interface · solo

A physical device for phone-free mornings. Solo project, 3 months. 6 out of 8 testers said they'd buy it.

People reach for their phone first thing not because they want to, but because it's the only convenient way to check the time, weather, and schedule. .Pixel gives them those things without the rest. Dot-matrix display, touch-only interaction, no visible buttons. I prototyped the full interface in Figma before building hardware. Tested with 8 users in their actual morning routines: 7 out of 8 felt calmer, 6 out of 8 said they'd buy it.

7/8 users felt calmer | 6/8 would potentially buy | solo research -> concept -> prototype -> test

Role: Designer · Fabricator · Engineer

Team: Solo

Skills: Physical Prototyping, CAD, Figma, Embedded Hardware, User Testing

51 ideation concepts and paper models before committing to a direction
Calm, minimal physical form: clean white, dot-matrix display, touch-only interaction
High-fidelity Figma interface prototyped before hardware to test all screens and flows
8 users tested in real morning routines, validated the concept before full build

.Pixel is a solo project about a small but stubborn problem: people start the morning on their phone because it's convenient, not because they want to. I tried to make the simplest object that could replace that habit.

Problem Framing

I framed the problem as a design constraint, not a moral one. People were reaching for their phone first thing because it bundled the time, weather, and schedule into one easy place. The challenge was to keep those useful functions and remove the rest of the cognitive drag that comes with opening a smartphone the moment you wake up.

Concept Exploration

I explored 51 concepts before committing to a direction. Most of them were deliberately quick: sketches, storyboards, and rough physical mockups that let me test the interaction without pretending the form was solved. Two ideas stood out, a music-focused device and a day planner, and users consistently pulled the project toward the planner. The catch was that it still had to feel lighter than using a device.

Physical Form

The final form is intentionally quiet: white housing, dot-matrix display, touch interaction, no visible buttons. I wanted it to feel more like a useful object on the table than another gadget asking for attention. Fast paper models helped me settle the proportions before moving into CAD and fabrication.

Interface System

Before building the full hardware, I prototyped the entire interface in Figma and tested it in context. That made it possible to refine the flows, reduce friction, and see whether the interactions still felt calm when someone was half awake and reaching across the table. The interface was designed around quick readability. Each screen had to make sense at a glance.

Final Result

The final concept was tested with eight users in real morning routines, not in a lab. That mattered because the device was not competing with an abstract problem. It was competing with habit. Seven out of eight users said they felt calmer and more focused. Six said they would buy it.

Designing for subtraction is harder than designing for features: every addition had to justify itself against the cost of complexity, and most additions didn't survive that test
Physical prototyping before digital saved time: proportions and hand feel that look fine in CAD feel wrong in hand, and paper models exposed that in an afternoon
Testing in real context produced fundamentally different feedback than lab testing: the device was competing against a phone habit, not a neutral baseline

AR Machine Operating Instructions

2023

augmented reality · published research · hci

AR system on HoloLens that puts step-by-step instructions directly onto the machine. Peer-reviewed, published, adopted for training.

An AR instruction system on HoloLens that overlays procedural guidance directly onto a milling machine. The instructions appear on the component you're about to operate, not on a separate screen. I validated it with engineering students, workshop professionals, and HSE regulators: 47% engagement improvement, fewer errors, lower safety risk compared to paper manuals. Published as a peer-reviewed paper and adopted at the university for safety training.

47% engagement improvement | adopted at university for safety training | peer reviewed publication

Role: Researcher · Designer · Developer

Team: Solo

Skills: Microsoft HoloLens, AR Prototyping, User Testing, Cost Modeling, HSE

AR prototype on HoloLens: part-specific and task-specific holographic overlays aligned to machine components
Sequential instruction flows with interactive checkpoints preventing unsafe actions
Cost modeling demonstrating feasibility for educational and industrial deployment

Human-centered validation: engineering students, workshop professionals, HSE experts
Spatial instructions outperformed static manuals on every measured engagement metric
User research aligned final design with regulatory and safety requirements

I designed and built an AR instruction system on Microsoft HoloLens that overlays guidance directly onto a PICOMAX 20 milling machine. It was tested with real users in a real workshop, published, and adopted for training.

System Architecture

The prototype runs on Microsoft HoloLens and anchors holographic overlays directly to the physical machine. Instructions appear on the component being referenced rather than on a separate screen. The system supports both student and professional instruction flows, with different levels of guidance depending on the operator.

Instruction Flow Design

Instructions are delivered step by step, with checkpoints between stages so the operator cannot advance without confirming the current action. That matters on safety-critical machinery, where skipped steps are the actual problem. The sequence is easy to follow because the guidance stays on the machine itself instead of forcing the user to keep matching a manual to a component.

Cost Modeling

I also cost-modeled the system for larger deployment across educational and industrial settings. The analysis covered HoloLens hardware, content authoring, maintenance, and training. The useful finding was that content authoring, not hardware, is the main cost driver. That would matter a lot more than the headset itself in any commercial version.

Interactive checkpoints are a design pattern, not a feature: they encode safe procedure as a constraint the system enforces rather than the user remembering
Spatial AR instructions require precise spatial anchor calibration; misalignment by even a few centimeters breaks the instructional link between hologram and physical component
Content authoring, not hardware, is the primary cost driver at scale; an insight that would change the product roadmap for any commercial version

The core question was whether spatial instructions could outperform manuals for safety-critical physical tasks. Testing across three user groups gave a clear answer.

Validation Framework

I validated the system with three groups: engineering students, workshop professionals, and HSE experts. Each group cared about something different. Students focused on learning, professionals on workflow, and HSE experts on compliance and risk. Testing happened on the actual PICOMAX 20, comparing AR against traditional documentation on engagement, confidence, and procedural error.

Validated Outcome

Operators using the AR system showed higher engagement and confidence, fewer procedural errors, and lower safety risk than with traditional manuals. Engagement improved by 47%. The advantage came from something straightforward: the instructions lived on the machine itself, so users did not have to translate between a page and a component.

Publication & Implications

The work was published as a peer-reviewed paper through the University of Twente and adopted for safety training at the university. Turning a practical prototype into a formal contribution meant building a defensible measurement approach, placing the work in the HCI and AR literature, and documenting it cleanly enough that someone else could evaluate it.

Spatial instructions beat manuals for physical, safety-critical tasks: the advantage comes from eliminating the translation step between reading a description and finding the component
Validation across multiple user groups (students, professionals, regulators) produces far more defensible results than single-group testing
AR requires alignment of technology, user cognition, and regulatory requirements. A prototype that works technically can still fail validation if it doesn't match how operators actually work

For my bachelor's thesis, I designed, built, and validated an AR instruction system on Microsoft HoloLens for a PICOMAX 20 milling machine. It was tested in a real workshop, published as a peer-reviewed paper, and adopted for safety training.

The System

Part-specific holographic overlays are anchored directly to machine components, so the guidance appears where the operator is actually looking. The instruction flow includes checkpoints, which means the system will not advance until the current step is confirmed. It also supports different guidance depth for student and professional users.

Validation

I validated the system with engineering students, workshop professionals, and HSE experts. Task-matched testing on the real machine measured engagement, procedural error, and confidence while comparing AR against traditional documentation. Operators were more engaged, made fewer errors, and handled the procedure with more confidence.

Publication & Cost Modeling

The work was published as a peer-reviewed paper (full thesis) through the University of Twente and adopted for safety training. I also cost-modeled the system for larger deployment and found that content authoring, not hardware, is the main cost driver at scale.

Spatial instructions beat manuals for physical, safety-critical tasks: the advantage comes from eliminating the translation step between reading a description and finding the component
Interactive checkpoints encode safe procedure as a system constraint rather than a user memory task: a design pattern, not a feature
Validation across multiple user groups (students, professionals, regulators) produces far more defensible results than single-group testing

SpatialPixel: Applied AI Workspaces

2025

spatial computing · research · co-design

Research and co-design for SpatialPixel's AI workspace platform. Fieldwork uncovered a third collaboration mode no one had named.

A client project with SpatialPixel on how their Procession platform could better support creative collaboration. Through historical research, field observation, and co-design sessions, we found that teams don't just work solo or together. They regroup: cycling between independent work and collective synthesis. That insight led to the Spatial Library, a template system that made the open-ended platform easier to actually adopt.

4 in-depth interviews | 7 professionals in group session | 4 intercept interviews | dozens field observations

Role: Research lead · Co-design facilitator · Concept designer

Team: Matthew Rabin, Sofia El Amrani, Hsin Wang, Steyn Knollema

Skills: User Research, Co-Design, Spatial Computing, AI Prototyping, Concept Design

Historical research on spatial computing framed the project against recurring patterns in projection, collaboration, and interface design
Fieldwork revealed re-group, a third collaboration mode between solo and fully group work
HMW development and co-design sessions exposed where Procession broke down: unclear starting points and reliance on outside tools
Final direction: the Spatial Library, a template system that made Procession easier to adopt in real workflows

Turned an open-ended client brief into a structured research program grounded in history, observation, and synthesis
Translated ambiguous research into a client-ready product direction rather than a loose set of ideas
Shifted the recommendation from raw AI capability toward lower-friction starting points teams could actually adopt

A client project with SpatialPixel on how Procession could better support creative collaboration. The work moved from research and field observation into co-design and, eventually, a much clearer product direction.

Historical Research

The project started with a historical look at spatial computing, collaboration, and makerspaces. We traced how projection, gesture-based interaction, and shared work environments had evolved over time. That gave the project some structure before we moved into the field and stopped it from becoming a loose collection of interesting ideas.

Observations & User Research

We looked at how collaboration actually unfolds in shared creative environments through interviews, group sessions, intercept interviews, and on-site observation. The main insight was that collaboration is not just solo work or group work. Teams constantly move between the two. We called that pattern re-group: independent work followed by moments of synthesis, alignment, and decision-making.

HMWs & Idea Development

The HMW phase turned the research into design direction. We framed questions around how Procession could support physical brainstorming while keeping the clarity and shareability of digital tools. Rapid sketching and AI-assisted mockups helped us explore broadly before narrowing down what was actually worth pursuing.

Co-Design Sessions

Co-design sessions pushed the ideas into real workflows. We brought projected cards, prompts, and shared surfaces into collaborative settings to see how teams actually used Procession. The friction became obvious very quickly: even informed users reached for outside tools, many did not know where to begin, and the platform needed stronger starting points if it was going to work in practice.

Concept Refinement

The evidence pointed toward generative and visualization-driven uses where projection could genuinely help creative teams. The concept moved away from open-ended capability demos and toward faster ways to explore options, align on ideas, and make decisions together.

Final Result

Through user testing and iteration, the interface shifted from a demo into something more practical. The final recommendation was the Spatial Library: a template system for brainstorming, mapping, and collaboration. It gives users a concrete place to start without removing the flexibility that made Procession interesting in the first place.

Re-group appeared in the fieldwork, not in the brief. The most useful finding came from observation contradicting an assumption, not confirming one.
Even users who understood Procession didn't know where to start. Entry friction matters more than depth of capability.
Open-ended briefs need a structural frame early or the exploration produces interesting observations that never add up to a direction.
Templates reduced adoption friction more effectively than new features would have. Sometimes the product problem is onboarding, not capability.

SpatialPixel came with an open-ended brief: figure out how Procession could better support collaboration in creative spaces. My role was to turn that ambiguity into a structured research process and a usable recommendation.

Framing The Work

The project started without a defined problem or solution space. I anchored the work in a historical investigation of spatial computing and collaboration so we could evaluate Procession against longer-running patterns instead of isolated feature ideas. That framing kept the work from drifting.

Research Program & Insight

I structured the evidence gathering around interviews, group sessions, intercept interviews, and on-site observation. The most important outcome was the re-group pattern: collaboration was not just solo work or group work, but a constant movement between the two. That gave the client a specific behavioral gap in Procession to address.

From HMWs To Direction

The HMW phase and co-design sessions translated the research into product direction. We learned that people still relied on outside tools for context, struggled to know where to begin in Procession, and fell into two camps: users who adopt tools as-is, and users who immediately reshape them. Those findings pushed the recommendation away from raw prompting power and toward stronger starting points.

Spatial Library Recommendation

The final deliverable was the Spatial Library, a template system that packages brainstorming, mapping, and collaboration flows into reusable starting points. Instead of asking teams to figure out how to prompt the system from scratch, it shifted the product toward what to create. That lowered adoption friction without flattening the tool.

Open-ended client work needs a framing device early or the exploration stays interesting but directionless.
Behavioral insights only create value when they are translated into something the client can actually build.
In AI products, reducing starting friction can matter more than exposing more raw capability.

ChatIT: AI B2B Consultancy

2024

startup · b2b · go-to-market

AI consultancy I co-founded in the Netherlands. 15 clients, €15K revenue, 80% margins.

An AI consultancy I co-founded in the Netherlands. 15 clients across 5 industries, €15K revenue at 80% gross margins. Cold outreach didn't work, so we stopped and built a workshop-based channel instead: educate business owners first, sell second. 35% conversion rate at under €500 per customer. Pricing was tied to efficiency outcomes, not hours. Average client saw 40% operational improvement.

15 clients, 5 industries | €15K revenue, 80% margins | 35% workshop → customer

Role: Co-founder · Business development · Product

Team: Floris Vossebeld, Steyn Knollema

Skills: B2B Sales, GTM Strategy, Pricing, AI Products, Customer Acquisition

Hybrid pricing: project-based engagement + recurring subscription model
Workshop acquisition channel: educated prospects first, converted 35% at <€500 CAC
Competitive analysis of 12 AI vendors to prioritize highest-impact roadmap features
40% improvement in client operational efficiency across 5 industries

An AI consultancy I co-founded in the Netherlands. We started without clients or a reliable channel and had to figure both out from scratch. The result was 15 clients across five industries, €15K in revenue, and 80% gross margins.

Business Model

The model combined project-based work for implementation with recurring subscription revenue for ongoing access and support. Project fees covered the custom work up front. Subscriptions made the revenue more predictable and kept the relationship active after the initial build. Pricing was tied to efficiency gains rather than time spent.

Acquisition Channel

Cold outreach didn't work, so we stopped doing it. Instead, we built a workshop-based channel that educated business owners on what AI could actually do before we tried to sell anything. That created a prospect pool that already understood the value and had self-selected for genuine interest. Workshop-to-customer conversion reached 35% at under €500 CAC, far better than the outbound approaches we had tried.

Product & Results

A competitive analysis of 12 AI vendors informed which tools and approaches to prioritize. We chose features based on two things: how much they helped the client's workflow, and whether we could implement them well with a small team. Across the five industries we served, clients reported an average 40% improvement in operational efficiency.

Education-led acquisition works because it removes the trust problem: clients who understand what you're selling don't need to be convinced, they need to be organized
Hybrid pricing (project + subscription) aligns incentives better than either model alone: project work gets done, recurring revenue keeps the relationship
B2B sales cycle length is inversely proportional to how clearly you can demonstrate ROI. The businesses that moved fastest had the clearest before/after metrics

Penn Wharton Innovation Fund

2024–2026

early-stage investing · due diligence · deal flow

Two years on Penn's student-run pre-seed fund. 300+ startups reviewed, $220K+ deployed annually.

Penn's student-run pre-seed fund. $220K+ deployed annually into early-stage founders across the university. I've been on the investment team for two consecutive years, one of the few members asked back. Four application cycles a year, weekly committee meetings, written due diligence on every assigned startup, and practicing VCs joining to challenge our reasoning. Real money, real founders, real consequences.

300+ startups reviewed per year | $220K+ deployed annually | 2 years on the team | 4 investment cycles per year

Role: Investment team member · Due diligence · Portfolio selection

Team: 20-person investment team across all Penn schools

Skills: Due Diligence, Market Sizing, Investment Thesis, Early-Stage Evaluation, Startup Assessment, Capital Allocation

Reviewed 300+ startup applications per year: evaluated team quality, market sizing, problem validity, and early traction across hardware, software, and services
Wrote structured due diligence feedback on every assigned venture: what works, what doesn't, and whether the risk profile justifies the investment
Made real capital allocation decisions alongside students from Wharton, Engineering, and Law: diverse perspectives, actual consequences
Learned deal evaluation from VCs who joined selected weekly sessions to pressure-test investment theses and challenge our reasoning
One of the few team members selected for a second consecutive year: continuity of judgment across two full cycles of the Penn startup ecosystem

Penn's student-run pre-seed fund deploys $220K+ annually into early-stage founders across the university. I've been on the investment team for two consecutive years, one of the few members asked back, and have seen the full process up close.

How the Fund Works

The fund operates like a real one: four application cycles each year, weekly investment committee meetings, written due diligence on every assigned startup, and VC practitioners joining selected sessions to pressure-test the thesis. The cross-school mix of Wharton, Engineering, and Law gives the team a broader lens than most student investment groups. Every dollar deployed is a real decision with real consequences for founders.

Due Diligence

Due diligence on each assigned venture covered team quality, market sizing, problem validity, and early traction. I wrote structured feedback on every startup I reviewed: what was working, what wasn't, and whether the risk profile justified a pre-seed investment. The most common failure mode was not a bad idea. It was a team that had not talked to enough real customers to know whether they were solving the right problem.

What Two Years of Evaluation Taught Me

After reviewing 300+ startups a year across two full cycles, the pattern has been consistent across hardware, software, and services. The teams that do well usually ran customer discovery before they built. The teams that struggle usually polished the pitch before they validated the problem. At pre-seed, the important judgment call is less about the product itself and more about whether the team knows how to learn.

Pre-seed investing is fundamentally a judgment call about learning speed: the best early-stage teams are the ones most willing to invalidate their own assumptions
Market sizing methodology matters more than the number: a team that built their TAM bottom-up from validated customer segments is telling you something about how they think
Being selected for a second year forced me to develop consistent judgment rather than situational opinions. You can't be ad hoc when you're accountable to prior reasoning