비모수 복습 - Introduction

CourseWork/비모수

비모수 복습 - Introduction

juhongyee 2026. 3. 5. 00:51

서론

비모수 함수 추정론 첫 번째 시간 복습 글입니다.

사실 복습이랄 게 없는 게 수업을 쨌습니다.

감사합니다.

중요해보이는 것만 빠르게 적어보겠습니다.

본 포스팅은 서울대학교 박병욱 교수님의 비모수함수추정론 강의노트를 참고하여 기록되었습니다.

본론

1. Nonparametric regression models

With $Y_i \in \mathbb{R}$ and $X_i \in [0,1]^d$, assume

$$Y_i = f(X_i) + \epsilon_i, \quad 1 \le i \le n$$

- $f$ : unspecified, smooth

- $\epsilon_i$ : unobservable and $\mathbb{E}(\epsilon_i | X_i) =0$.

Objective : estimate $f$ using the observations $(Y_i,X_i), \quad 1 \le i \le n$

잘 근사하는 $f$를 찾자.

그런데 smooth, 즉, 미분이 무한 번 잘 되도록.

Ill-posed data는 sample size가 parameter dimension보다 큰 경우

Kernel smoothing : solve locally finite-dimensional problems

2. Splines

Spline function : Let $\tau_1 < \tau_2 < \cdots < \tau_K$ be preselected points in the range of $x$m called knots.

We call $f$ a spline function of order $\ell$ if $f$ is a polynomial of order $\ell$ in each interval $[\tau_j, \tau_{j+1}]$ and has. $(\ell -1)$ continuous derivatives at the knots.

구간을 나누어 각 구간에서는 $\ell$ polynomial order이고, $(\ell-1)$번 미분 가능한 함수로 되어 있는 함수가 spline function 입니다.

spline function의 class를 나타내려면 얼마나 많은 parameter 필요할까요?

$(K+1)(\ell +1) - K \cdot \ell = K + \ell +1$

$(K+1)$ 구간에 $0 \sim \ell$개의 coefficient를 둘 수 있기 때문에 $(K+1)(\ell+1)$개.

각 knot에서 $\ell$차까지 derivatives가 같아야 하니 $f_j(\tau_j) = f_{j+1}(\tau_j)$, $f_j'(\tau_j) = f_{j+1}'(\tau_j)$ 같은 식으로 $\ell$개의 제약조건이 있으므로 $K \cdot \ell$.

제약조건이 $\tau_j , \tau_j+1$에서 선형적이기 때문에 개수 세서 빼면 되는 듯합니다.

Representation of spline functions:

$$\displaystyle f(x) = \beta_0 + \beta_1x + \cdots + \beta_{\ell}x^\ell + \sum_{j=1}^K \beta_{\ell + j}(x- \tau_j)^\ell_+$$

연산 우선순위

1) max연산

2) 제곱

knot만큼 평행이동하고 ReLU를 씀으로 $\tau_j$가 넘어간 순간부터 $\ell$차항이 더해집니다.

각 knot마다 $\ell$차항이 더해지는 그림을 상상하면 쉽습니다.

$\ell$보다 저차항은 더하면 안 되는데, 조건이 $\ell-1$차까지 continuous derivative를 갖는 것이기 때문입니다.

$k$ order 저차항을 더했다고 하면 $k$번 미분할 때 상수가 더해지니까, knot 기준으로 왼쪽 오른쪽 값이 달라져서 continuous가 깨집니다.

3. Local approximation by kernel function 1

One may approximate a function $f$ at a point $x$ by the local average $f$ in the inverval $[x-h,x+h]$ for a small $h>0$ called bandwidth.

$$\begin{aligned} f^{\text{app}}(x) &:= \underset{\alpha}{\operatorname{argmin}} \int_{x-h}^{x+h} (f(u) - \alpha)^2 , du \\&= \int (2h)^{-1} I_{[-1,1]} \left( \frac{u-x}{h} \right) f(u) , du \end{aligned}$$

일단 $x$를 하나 생각하고 주변으로 $h$반경을 생각합시다. 그리고 이 local한 $h$반경안에서 $f(x)$들을 제일 잘 설명하는, 다른 표현으로, 제곱 loss를 가장 최소로 하는 $\alpha$라는 값을 생각하자는 뜻입니다.

밑의 수식은 그냥 예쁘게 정리한 건데 적분범위를 indicator를 사용해서 $I_{[-1,1]}(\frac{u-x}{h})$로 쓰고 대입한 뒤 $\alpha$로 미분하고 치환적분해주면 나옵니다.

$\alpha$로 미분과 적분을 바꾸는 작업을 한 번 해야 하는데, compact set에서는 uniformly continuous하기 때문에 자명하게 바꿀 수 있습니다.

4. Local approximation by kernel function 2

More generally, one may approximate $f(x)$ by a weighted local average of $f$:

$$\begin{aligned} f^{\text{app}}(x) &:= \underset{\alpha}{\operatorname{argmin}} \int (f(u) - \alpha)^2 K\left( \frac{u-x}{h} \right) \, du \\ &= \left( \int K\left( \frac{u-x}{h} \right) \, du \right)^{-1} \int K\left( \frac{u-x}{h} \right) f(u) \, du \end{aligned}$$

가까운 곳에는 큰 가중치를 먼 곳에는 작은 가중치를 주는 $K$(kernel) function을 도입한 approximator도 있습니다.

이를 우리는 Nadaraya-Watson Estimator라고 부릅니다.

결론

복습을 해보았습니다.

최대한 이해해 보려고 노력하였고 아직 잘 모릅니다.

Spline과 Kernel function에 대해 주로 다루어 보았습니다.

끗!

'CourseWork > 비모수' 카테고리의 다른 글

비모수 복습 - Bandwidth Selection (0)	2026.03.17
비모수 복습 - CLT in KDE (0)	2026.03.16
bias, variance of KDE - 증명[2] (0)	2026.02.24
bias, variance of KDE - 증명[1] (0)	2026.02.23
bias, variance of KDE (0)	2026.02.17

현재글비모수 복습 - Introduction

Juhongyee

수학, 통계를 주로 다룹니다.

측도론, 확률론, sigma field, martingale, 응용위상, probability theory, KDE, 백준, tda, fundamental group, topology, Rademacher complexity, measure, 수학, 일기, 위상수학, 비모수함수추정론, 대수위상, 해석학, 수리통계학,

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Juhongyee