Thesis defended — what worked, what I'd change, what comes next
Defended my BSc on June 6th. A few notes on what worked methodologically, what I'd change in retrospect, and the small career decisions on the other side.
I defended my BSc thesis last Friday. Title’s a long one:
Behavioural Anomaly Detection in Network Traffic via Suricata IDS and ELK Stack: A Sandbox-Based Approach to Identifying APT Lateral Movement Patterns.
The full abstract sits on the thesis page and the supporting code is at github.com/palianytsia-200/suricata-elk-lab. This post is the slightly more honest “what really happened during the year” supplement that doesn’t go in the academic version.
Final grade: 92/100 (A). Solid but not top-of-cohort — there are 4–5 ФТІ ‘25 graduates who scored higher and they all earned it. I’m okay with where I landed.
What worked
Picking a topic where I could actually replay traffic. Half the behavioural-detection literature feels theoretical because the authors don’t have access to attacker traffic to test against. By deliberately constraining the empirical chapter to public Malware- Traffic-Analysis pcaps + a few synthetic captures from my home lab, I bounded the project to something I could finish. The temptation to expand to “and also test against real enterprise traffic” was strong; resisting it kept the deadline reachable.
Docker-Compose for the lab. Reproducibility is the boring word that academic supervisors keep saying and I sort of nodded along without internalising. Then around chapter 4 my lab broke, I had to rebuild it on a different laptop, and I lost a week. After that, the Docker-Compose was non-negotiable. The whole lab now spins up in 90 seconds on a fresh machine.
Dropping a chapter early. My original plan had a chapter on Active Directory enumeration detection that turned out to require an AD environment I didn’t have a clean way to build. My supervisor suggested cutting it in chapter 3 review. I argued for keeping it. She was right; I should have cut earlier. The trimmed thesis is more focused and the AD chapter would have been weak.
Starting the empirical chapter early. I ran rule-tuning loops in parallel with chapter 2 writing. By the time I needed to write up chapter 4 (empirical), I had three months of tuning iteration already done.
What I’d change
Wider replay corpus. I leaned heavily on Malware-Traffic- Analysis pcaps because they’re well-documented, but they’re also curated for clarity. A real enterprise capture would be messier and the ruleset’s false-positive rate against it would have been more illustrative. If there’s an MSc continuation, that’s the v2 angle.
More time on the “why” of behavioural cumulative scoring. I described the technique well enough but didn’t go deep enough on the theoretical scaffolding (Bayesian belief updating? hidden Markov model framing?). My committee asked exactly this question during defence and I had a competent but not great answer.
A baseline against Zeek. Suricata is a great IDS but Zeek’s logging-first model is conceptually different and would probably have been a fairer comparison than a vanilla-Suricata baseline. Did not have time. Future work.
Earlier supervisor review on the empirical chapter. I waited too long to show drafts. When I finally sent the empirical chapter four weeks before defence, my supervisor flagged a methodology issue (I was conflating “rule did fire” with “rule fired correctly” without clean ground-truth labels) that took two weeks to fix. Lesson: supervisor review every two weeks, not every two months.
Lessons that stuck
A few things that I think will follow me into actual SOC work:
-
Detection engineering is a tuning loop, not a writing exercise. Writing a rule takes 30 minutes; tuning it to acceptable false-positive levels in a target environment takes weeks. A SOC that ships rules without budgeting for the tuning is going to drown in alerts. (Brights doesn’t have a SOC yet — I’m walking in to build one — and I keep this as a principle.)
-
Cumulative-scoring approaches outperform single-rule detection, with constant tuning. This is the conclusion of my empirical chapter and I believe it: the meta-rule “five low-severity events in a 60-second window from the same source” caught attacker patterns that any individual rule missed. But it requires continuous environment-specific tuning — turn that knob too tight and you false-positive on benign automation; too loose and you miss real attacks. The tuning loop is the work.
-
Replay-pcap-driven empirical work is wildly underutilized in academic cyber programmes. Most thesis-level work I’ve seen at KPI is theoretical or simulation-based. There’s huge room for pure-replay-pcap experimental work, and the public corpus (MTA, the various .pcap libraries on GitHub) is generous. If you’re a future ФТІ student looking for a thesis topic, this is a mineable seam.
What’s next
Returning to Brights as a full-time Junior Cybersecurity Specialist in September. They’ve been an absurdly generous host of an academic-research-flavoured intern (me, 2024 summer) and seem genuinely interested in growing the cybersec function around the ISO 27001 / 42001 work. I’ll get to actually deploy some of the detection rules from the thesis against a real (small) production environment, which is exactly the kind of practical follow-on the thesis lacked.
The summer between defence and start date is for: catching up on sleep, reading the books I procrastinated on for six months (Secure by Design, Building Secure & Reliable Systems, the Anton Chuvakin newsletter archive), and a small Carpathian hiking trip.
Слава Україні. 🇺🇦