<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://hashu-ch.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://hashu-ch.github.io/" rel="alternate" type="text/html" /><updated>2026-05-12T05:45:01+00:00</updated><id>https://hashu-ch.github.io/feed.xml</id><title type="html">Caleb Hsu</title><subtitle>Caleb Hsu&apos;s academic portfolio</subtitle><author><name>Caleb Hsu</name><email>chsu14@cs.washington.edu</email></author><entry><title type="html">Bayesian Inference</title><link href="https://hashu-ch.github.io/posts/bayesian-inference" rel="alternate" type="text/html" title="Bayesian Inference" /><published>2026-05-11T00:00:00+00:00</published><updated>2026-05-11T00:00:00+00:00</updated><id>https://hashu-ch.github.io/posts/bayesian-inference</id><content type="html" xml:base="https://hashu-ch.github.io/posts/bayesian-inference"><![CDATA[<h2 id="setup-and-review">Setup and Review</h2>
<p>This post will serve as a kind of contextual buildup for the next few posts that I plan to write on MCMC and Variational Inference. Without a good understanding about why Bayesian Inference is hard to do in practice, everything else is just a bunch of larp.</p>

<p>Recall Bayes Rule:</p>

\[p(z \mid x) = \frac{p(x \mid z)p(z)}{p(x)}\]

<p>Intuitively, Bayes rule outlines the mechanics to update our belief about some latent variable $z$. We call $p(z \mid x)$ our <em>posterior</em> – literally our belief about $z$ after observing some sequence of data $x$. We compute our posterior by taking the <em>likelihood</em>, $p(x \mid z)$ of observing some sequence $x$ given our initial belief or prior, $p(z)$. The denominator is often called the <em>evidence</em>. You may think of it as a descriptor about how likely our observation was overall. Without getting too lost in the sauce, I’d like to move to a concrete example to ground these concepts before moving on.</p>

<h2 id="example">Example</h2>
<p>Instead of the canonical coin flip example, I’d like to try framing it in a robotics context, specifically a simplified explanation of localization.</p>

<p>The overall question in localization is: given a known map of the world and some sensor readings about my surroundings, how do I know where I am? We can formulate this as a Bayesian Inference where $x$ is some sensor readings and $z$ is my belief about where in the room/world I am.</p>

<p>At the start, our robot could be anywhere in our room so our prior might be an uniform distribution over the infinitely many points on our map $p(z) \sim Unif$. After receiving some sensor readings $x_i$, we should then attempt to update our belief.</p>

<p>Recall the parts we need. We have our prior $p(z)$ and need both our <em>likelihood</em> and <em>evidence</em>. Let’s take a look at the likelihood, $p(x \mid z)$ first. The likelihood is a measure of, “how likely are these sensor readings given my prior belief about my position $z$”. Notice then, that the numerator of Bayes rule can be seen as a weighted likelihood average across our data samples. <em>I gloss over the specifics on calculating likelihoods since they aren’t relevant towards general conversation about Bayes. See <a href="https://docs.ufpr.br/~danielsantos/ProbabilisticRobotics.pdf">Chapter 6</a> for more details</em></p>

<p>Here comes the hard part, the <em>evidence</em>. Computing $p(x)$ is deceptively hard. With our example, let’s say each $x_i$ represents some distance to the walls that are in front of our robot. Computing $p(x)$ is then asking “what is the likelihood of $x_i$, given that $p(z) \sim Unif$, I could be ANYWHERE in the map”. Since position and angle are continous values, there are infinitely many such positions forcing an integral:</p>

\[p(x) = \int p(x \mid z) p(z) dz\]

<p>Yet even this is a simplification. Even in 2D, poses are at least some x coordinate, y coordinate, and heading angle. So in reality we’d have something closer to,</p>

\[p(x) = \int\int\int p(\vec{x} \mid \vec{z}) p(\vec{z}) dz_0 dz_1 dz_2\]

<p>In modern robots and especially with deep learning, our latent vectors and observations are much higher dimensions resulting in something largely intractable like:</p>

\[p(x) = \int \dots \int p(\vec{x} \mid \vec{z}) p(\vec{z}) dz_0 \dots dz_{t-1}\]

<h2 id="so-now-what">So Now What?</h2>
<p>In practice, we almost never compute the evidence directly. Instead, we lean on approximations. In robotics, common approximations are sampling-based methods like <strong>particle filters</strong>, which represent the posterior as a dynamic set of weighted samples, and <strong>Kalman filters</strong> (UKF, EKF), which approximate the posterior with a Gaussian assumption and propagate just its mean and covariance. Both sidestep the integral entirely; Particle filters use sums over discrete samples and Kalman filters use closed-form Gaussian updates.</p>

<p>In the next few posts, I hope to continue to tackle this intractability. Specifically by looking at other sampling based methods Monte Carlo, and variational methods that approximate the true posterior with an optimized tractable approximation.</p>

<p>As a TLDR, Bayesian inference is how we update beliefs with data. The evidence is hard to compute, so we resort to approximations. That ‘hardness’ is the foundation for the next few posts.</p>

<h2 id="questions">Questions</h2>
<p>I keep a list of questions that I had when learning the topic which may be useful for others to think about: 
How can we represent non-unimodal beliefs? Is it ever ideal to spread our belief?</p>

<p>For sampling, what is a sufficient number of samples to decently approximate our posterior? See <a href="https://proceedings.neurips.cc/paper_files/paper/2001/file/c5b2cebf15b205503560c4e8e6d1ea78-Paper.pdf">KLD-Sampling</a></p>

<p>We are sacrificing accuracy for tractability, how ‘good’ do our approximations need to be?</p>

<p>For localization, how can the topology of our environment limit our confidence in our beliefs (hallway vs asymmetric room)?</p>

<p>What happens if our prior has 0 probability mass assigned to the true values? Does our Bayes update allow us to recover on its own?</p>

<p>If you know about Markov properties, how can we incorporate our robots actions into our belief updates? What do we need to also know (motion???)</p>]]></content><author><name>Caleb Hsu</name><email>chsu14@cs.washington.edu</email></author><summary type="html"><![CDATA[Setup and Review This post will serve as a kind of contextual buildup for the next few posts that I plan to write on MCMC and Variational Inference. Without a good understanding about why Bayesian Inference is hard to do in practice, everything else is just a bunch of larp.]]></summary></entry><entry><title type="html">Hello (World)</title><link href="https://hashu-ch.github.io/posts/Hello/World" rel="alternate" type="text/html" title="Hello (World)" /><published>2026-05-10T00:00:00+00:00</published><updated>2026-05-10T00:00:00+00:00</updated><id>https://hashu-ch.github.io/posts/Hello/first-post</id><content type="html" xml:base="https://hashu-ch.github.io/posts/Hello/World"><![CDATA[<h2 id="this-page">This Page</h2>
<p>After some time as a TA, I’ve found teaching and explaining to be invaluable to my own learning process. 
Truthfully, these notes are more to complete my own understanding of many robotics and statistics concepts – as much of it I am self learning. Who knows though, maybe these will be useful to someone! That said, here’s some more about me.</p>

<h2 id="academics">Academics</h2>
<p>At the end of my freshman third quarter, I applied to the Computer Science Major mostly on a whim. I was coming from an Informatics major and my understanding of CS was limited to the few Intro to Java courses I had taken as prerequisites. To be frank, I think I applied 30% due to wanting a more technical math-y perspective and 70% because the people I enjoyed working with were either applying or already in the program.</p>

<p>As a junior now, I feel that I’ve developed a love and hate relationship with CS. I get the most intellectual satisfaction from doing hard problems. Consequentially, I always seek out the next ‘harder’ thing which has put me in a very fun and interesting but perpetual cycle of struggle.</p>

<p>I started as a TA after the intro to programming sequence. I found my way of comprehension to be a bit strange and thought it could be of use to other students. In my sophomore year, I grew really fond of algorithms, theory, and statistics. That, has lent itself to my current interests in probabilistic robotics and learning.</p>

<h2 id="interests">Interests</h2>
<p>When I’m not tackling my backlog of infinite papers to read, I’m really trying to live life outdoors with friends and family. I’ve played soccer all my life (football) and recently have been getting into the PNW sports – climbing, long distance running, hiking, etc. I had a two-ish year of volleyball, but I’m afraid that is most definitely not what I was put on Earth to do. Sometimes I go to the gym.</p>

<p>When I’m too tired, I’ve been trying to learn to play guitar and not sound like I’m dying while singing (shoutout my GOATS Dijon, Mk.gee, and Yorushika). My New Year’s resolution is to explore more live music; Watching local bands, jazz clubs, going to rager concerts, really anything.</p>]]></content><author><name>Caleb Hsu</name><email>chsu14@cs.washington.edu</email></author><summary type="html"><![CDATA[This Page After some time as a TA, I’ve found teaching and explaining to be invaluable to my own learning process. Truthfully, these notes are more to complete my own understanding of many robotics and statistics concepts – as much of it I am self learning. Who knows though, maybe these will be useful to someone! That said, here’s some more about me.]]></summary></entry></feed>