Skip to content

Commit 5eb0404

Browse files
committed
Quartz sync: Mar 14, 2026, 5:06 PM
1 parent c347768 commit 5eb0404

File tree

5 files changed

+116
-24
lines changed

5 files changed

+116
-24
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Non-linear Regression

content/Mathematical Foundations of Generative AI/VDM and GANs.md

Lines changed: 48 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,11 @@ $$
100100
K.L Divergence is asymmetric, meaning $\underbrace{D(P_X\,||\,P_\theta)}_\text{Forward K.L.} \ne \underbrace{D(P_\theta\,||\,P_X)}_\text{Reverse K.L.}$.
101101

102102
2. $f(u) = \frac{1}{2}\left(u\,log\,u-(u+1)\,log\left(\frac{u+1}{2}\right)\right)$ leads to the **JS (Jensen-Shannon) Divergence**.
103+
104+
$$
105+
JS(P_X || P_\theta) = \frac{1}{2} KL(P_X || M) + \frac{1}{2} KL(P_\theta || M) \qquad \text{where,}\,\,M = \frac{P_X + P_\theta}{2}
106+
$$
107+
103108
3. $f(u)=\frac{1}{2}|u-1|$ leads to the **Total Variation Distance** or TV Distance.
104109

105110
## Algorithm for f-divergence minimization
@@ -332,7 +337,7 @@ $$
332337
333338
&\approx \arg\min_{\theta} \Bigg[\cancel{\frac{1}{B_1}\sum_{i=1}^{B_1}log\,D_w(x_i)}^{\text{Independent of }\theta} + \frac{1}{B_2}\sum_{i=1}^{B_2}log\,(1-D_w(\hat{x}_i)) \Bigg] \\[8pt]
334339
335-
&= \arg\min_{\theta} \Bigg[\frac{1}{B_2}\sum_{i=1}^{B_2}log\,(1-D_w(\hat{x}_i) \Bigg] \qquad\because \text{Second term stays as } \hat{x}_i=g_\theta(z_j) \in P_\theta \\[8pt]
340+
&= \arg\min_{\theta} \Bigg[\frac{1}{B_2}\sum_{i=1}^{B_2}log\,(1-D_w(\hat{x}_i)) \Bigg] \qquad\because \text{Second term stays as } \hat{x}_i=g_\theta(z_j) \in P_\theta \\[8pt]
336341
\end{aligned}
337342
$$
338343

@@ -438,7 +443,7 @@ Given two distributions $P_X$ and $P_{\hat{X}}$,
438443

439444
$$
440445
\begin{aligned}
441-
W(P_X || P_{\hat{X}}) &= \min_{\lambda \in \Pi(X,\hat{X})} \Big[\underset{\lambda(x,\hat{x})}{\mathbb{E}}||x-\hat{x}||_2\Big] \\[8pt]
446+
W(P_X || P_{\hat{X}}) &= \min_{\lambda \in \Pi(X,\hat{X})} \Big[\underset{\lambda(x,\hat{x})}{\mathbb{E}}||X-\hat{X}||_2\Big] \\[8pt]
442447
\lambda &: \text{Joint distribution b/w }P_X,P_{\hat{X}} \\[8pt]
443448
\Pi(X,\hat{X}) &: \text{All Joint distributions such that -} \\[8pt]
444449
&\int_X \Pi(X,\hat{X})\,dx = P_{\hat{X}} \\[8pt]
@@ -500,21 +505,21 @@ W(P_x || P_\theta) &= \max_{||T_w(x)||_L \lt 1} \Big[\underset{P_X}{\mathbb{E}}
500505
\end{aligned}
501506
$$
502507

503-
Any function $f$ being 1-Lipschitz means that the function cannot change faster than the distance -
508+
Any function $f$ being 1-Lipschitz means that the function cannot change faster than the distance (the derivative is always less than or equal to 1) -
504509

505510
$$
506-
\frac{||f(x_1)-f(x_2)||}{||x_1-x_2||} \lt 1
511+
\frac{||f(x_1)-f(x_2)||}{||x_1-x_2||} \le 1
507512
$$
508513

509-
The $T_w$ in this case is a neural network and can be made 1-Lipschitz by normalizing the weights of $T_w$ such that $||w||_2=1$ after each gradient step.
514+
The $T_w$ in this case is a neural network and can be made 1-Lipschitz by normalizing the weights of $T_w$ such that $||w||_2=1$ after each gradient step.
510515

511516
$\theta^*$ has to be such that the Wasserstein's distance is to be minimized. The Kantrovic-Rubenstein's Duality enables us to express the Wasserstein's distance in terms of expectations of $P_X$ and $P_\theta$.
512517

513518
$$
514519
\theta^*, w^*= \arg\min_\theta\max_{||T_w(x)||_L \lt 1} \Big[\underset{P_X}{\mathbb{E}}\, [T_w(x)] - \underset{P_\theta}{\mathbb{E}} \, [T_w(\hat{x})]\Big]
515520
$$
516521

517-
The above objective is very similar to GANs. That's why this method of minimizing the Wasserstein's metric is called the **WGAN**. Training a WGAN is more stable than training a Naive-GAN.
522+
The above objective is very similar to GANs. That's why this method of minimizing the Wasserstein's metric is called the **WGAN**. Training a WGAN is more stable than training a Naive-GAN as the gradients will not saturate if the supports of the probability distributions misalign.
518523
## Bi-Directional GAN (Bi-GAN)
519524
### Inversion of GANs
520525
We train a GAN specifically to allow us to sample $x$ from the dataset distribution $P_X$ by picking a random sample $z$ from an arbitrary distribution $Z$ and passing it through the generator $g_\theta$. But how can we get back $z$ if we know $x$?
@@ -531,7 +536,7 @@ The objective function is -
531536

532537
$$
533538
\begin{aligned}
534-
L_{BiGAN}(\theta,w,\phi) &= \underset{x\sim P_X}{\mathbb{E}}\Big[\underset{\hat{z}\sim P_\phi}{\mathbb{E}}[log\,D_w(x,E_\phi(x))]\Big] + \underset{z\sim Z}{\mathbb{E}}\Big[\underset{\hat{x}\sim P_\theta}{\mathbb{E}}[log\,\{1 - D_w(x,E_\phi(x))\}]\Big] \\[8pt]
539+
L_{BiGAN}(\theta,w,\phi) &= \underset{x\sim P_X}{\mathbb{E}}\Big[\underset{\hat{z}\sim P_\phi}{\mathbb{E}}[log\,D_w(x,E_\phi(x))]\Big] + \underset{z\sim Z}{\mathbb{E}}\Big[\underset{\hat{x}\sim P_\theta}{\mathbb{E}}[log\,\{1 - D_w(g_\theta(z),z)\}]\Big] \\[8pt]
535540
\theta^*,w^*,\phi^* &= \arg\min_{\theta,\phi}\max_{w} L_{BiGAN}(\theta,w,\phi) \\[8pt]
536541
\text{where, } &z\sim Z, \,\,\,\,\hat{x} \sim P_\theta, \,\,\,\,\hat{z} \sim P_\phi
537542
\end{aligned}
@@ -546,6 +551,14 @@ P_{\hat{Z}X} &= \int_X P_X(x) \int_{\hat{Z}} P_\phi(\hat{z}|x)\,d\hat{z}\,dx \\[
546551
P_{Z\hat{X}} &= \int_Z P_Z(x) \int_{\hat{X}} P_\phi(\hat{x}|z)\,d\hat{x}\,dz \\[8pt]
547552
\end{aligned}
548553
$$
554+
555+
### Latent Regression
556+
$$
557+
\begin{aligned}
558+
L(\theta, w, \phi) = \underset{x\sim P_X}{\mathbb{E}}\operatorname{log}D_w(x) + \underset{\hat{x}\sim P_\theta}{\mathbb{E}}\operatorname{log}(1-D_w(x)) + \lambda\underset{\hat{x}\sim P_\theta}{\mathbb{E}}||z-E_\phi(\hat x)||^2_2
559+
\end{aligned}
560+
$$
561+
where $\lambda$ is a hyperparameter. Here the discriminator remains the same as the naive regressor. It has been found that
549562
## Domain Adversarial Networks
550563
Suppose we have a source dataset and target dataset such that both belong to a different distribution.
551564

@@ -556,23 +569,47 @@ D_t &= \{(\hat{x}_j)\}_{j=1}^m &\sim P_t\\[8pt]
556569
\end{alignedat}
557570
$$
558571

559-
Any classifier/regressor trained solely on $D_s$ would fail to predict for the target items in $D_t$. We can use **Domain Adversarial Networks** here to train a classifier that is **domain agnostic** (able to classify independent on which domain element belong to).
572+
When the probability distribution of your training and testing dataset differ, we call this as **domain shift**.
573+
574+
Any classifier/regressor trained solely on $D_s$ would fail to predict for the target items in $D_t$.
575+
576+
---
577+
<h4 class="special">Example</h4>
578+
Imagine you're training a model to identify images of dogs. The training dataset for your model has sketches, paintings, and cartoon representations of dogs while you test dataset has actual photos of dogs. In such a case the distributions for the training and testing dataset differ. The method to solving domain shift is called as **Unsupervised Domain Adaptation**.
579+
580+
In the above example -
581+
- The broader class of "animal representations" is a **semantic class.**
582+
- The sketches, paintings, cartoons, and photos are called as **domains**.
583+
584+
The network can be trained either on all four domains or one of the domains can be unknown. In the case where a domain is left out, our hope is that all domains share the same underlying semantic structure just with different marginal distributions. This is called as the **shared support assumption**. Under this assumption, an optimal encoder trained on the rest three domains should be able to extract meaningful features from the unseen domain. This setting is called **domain generalization**.
585+
586+
---
587+
588+
So our objective with such a setup would be that our model is able to learn the features/classifier in such a manner that it's able to perform well on both $P_s$ and $P_t$.
589+
590+
We can use **Domain Adversarial Networks** here to train a classifier that is **domain agnostic** (able to classify independent on which domain element belong to).
560591

561592
In Domain Adversarial Networks we have -
562593
1. An Encoder $\phi:X \rightarrow F$ to extract features from inputs regardless of which domain the inputs belong to (both $D_s$ and $D_t$).
563594
2. A Discriminator $T_w:F \rightarrow [0,1]$ to distinguish between elements of $P_s$ and elements of $P_t$ (Features of both source and target data).
564-
3. A Classifier/Regressor $h_\psi: F_s \rightarrow y_s \sim P_s(y|x)$ which uses the features of the source inputs to make a prediction regarding them.
595+
3. A Classifier/Regressor $h_\psi: F_s \rightarrow y_s \sim P_s(y|x)$ which uses the features of the source inputs to make a prediction regarding their target. This works as a metric for the usefulness of the features.
565596

566-
Here the Discriminator makes the Encoder better at constructing features from the inputs (both source and target) in such a way that the features appear domain agnostic. But just having domain agnostic features isn't all, they need to be useful for predicting the target class. For this we include a Classifier/Regressor as well in the network so that the features learnt are both domain agnostic and useful.
597+
A DANN cannot generate samples, it's only job is to align the features of the different distributions.
598+
599+
The Discriminator makes the Encoder better at constructing features from the inputs (both source and target) in such a way that the features appear domain agnostic. But just having domain agnostic features isn't all, they need to be useful for predicting the target class. For this we include a Classifier/Regressor as well in the network so that the features learnt are both domain agnostic and useful.
567600

568601
$$
569602
\begin{aligned}
570603
\phi^*, w^* &= \arg\min_\phi\max_w \Bigg[\underset{P_{F_s}}{\mathbb{E}}\, [log\,D_w(F_s)] + \underset{P_{F_t}}{\mathbb{E}} \, [log\,(1-D_w(F_t))]\Bigg]& \\[8pt]
571604
\psi^* &= \arg\min_\psi \operatorname{BCE}(y,h_\psi(F_s)) \qquad(\text{BCE=Binary Cross-Entropy})
572605
\end{aligned}
573606
$$
607+
608+
The encoder network has gradients flowing from both the discriminator as well as the classifier.
609+
- $\phi$ would ensure that $P_{F_s}$ = $P_{F_t}$.
610+
- $h_\phi$ would ensure that the features are meaningful.
574611
## Evaluation of a GAN
575-
Suppose we have some true and generated samples and we wish to evaluate whether the GAN is successful in generating samples from $P_X$. There are various methods for it, but we'd be look at an adversarial method of evaluation called **Frechet Inception Distance**. FID uses [[VDM and GANs#Wasserstein's Metric (Optimal Transport)|Wasserstein's Metric]] along with Inception Network trained on Imagenet to do this evaluation.
612+
Suppose we have some true and generated samples and we wish to evaluate whether the GAN is successful in generating samples from $P_X$. There are various methods for it, but we'd be look at an adversarial method of evaluation called **Fréchet Inception Distance**. FID uses [[VDM and GANs#Wasserstein's Metric (Optimal Transport)|Wasserstein's Metric]] along with **Inception Network trained on Imagenet** to do this evaluation.
576613

577614
Let -
578615

content/Software Testing.md

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,13 @@ Software testing is the process of examining the artifacts and behavior of a sof
3535
- **Error -** An incorrect internal state during execution. This happens inside the memory.
3636

3737
A test case involves an input to the software and an output. If the actual output matches the expected output, we say that the test case passed.
38+
39+
Testing goals based on process maturity -
40+
1. Level 0: There is no difference between testing and de-bugging.
41+
2. Level 1: The purpose of testing is to show correctness.
42+
3. Level 2: The purpose of testing is to show that software doesn’t work.
43+
4. Level 3: The purpose of testing is not to prove anything specific, but to reduce the risk of using the software.
44+
5. Level 4: Testing is a mental discipline that helps all IT professionals develop higher quality software.
3845
## Types of testing
3946
1. **Unit Testing -** Testing of a singular component.
4047
2. **Integration Testing** - Various components are put together and tested.
@@ -64,13 +71,11 @@ There are two broader methods of testing -
6471
1. **Simple Path -** A path from one node to another is a simple path if no node appears more than once except the first and last node. No internal loops.
6572
2. **Prime Path -** A simple path such that it's not a sub-path of another simple path. They are thus the maximal simple paths. ^633cde
6673
## Types of Tours
67-
1. Tours with side-trips - A test path $p$ tours a sub-path $q$ with side-trips iff every edge in $q$ is also in $p$ in the same order.
68-
If a tour comes back to the same node it diverted from, we say the tour includes a side-trip.
74+
1. **Tours with side-trips** - A test path $p$ tours a sub-path $q$ with side-trips iff every edge in $q$ is also in $p$ in the same order. If a tour comes back to the same node it diverted from, we say the tour includes a side-trip.
6975

7076
![[Pasted image 20260225093645.png|450]]
7177

72-
2. Tours with detours - A test path $p$ tours a sub-path $q$ with detours iff every node in $q$ is also in $p$ in the same order.
73-
If a tour detours from some node $n$ and returns back to the prime path at a successor of $n$, we say the tour has a detour.
78+
2. **Tours with detours** - A test path $p$ tours a sub-path $q$ with detours iff every node in $q$ is also in $p$ in the same order. If a tour detours from some node $n$ and returns back to the prime path at a successor of $n$, we say the tour has a detour.
7479

7580
![[Pasted image 20260225093700.png|450]]
7681
# Data flow Coverage
@@ -84,6 +89,18 @@ There are two broader methods of testing -
8489

8590
![[Pasted image 20260222112558.png]]
8691
# Test Integration
92+
## Scaffolding
93+
When testing incomplete portions of software, we need extra software components, sometimes called scaffolding.
94+
Two common types of scaffolding:
95+
1. **Test stub** is a skeletal or special purpose implementation of a software module, used to develop or test a component that calls the stub or otherwise depends on it.
96+
2. **Test driver** is a software component or test tool that replaces a component that takes care of the control and/or the calling of a software component.
97+
## Five approaches to integration testing
98+
1. **Incremental -**
99+
1. Top-down - Create top level modules while using stubs.
100+
2. Bottom-up - Create bottom level modules while using test drivers to call them.
101+
2. **Sandwich** - Mix of top-down and bottom-up
102+
3. **Big Bang** - all individually tested modules are put together to construct the entire system which is tested as a whole.
103+
## Coupling data flow
87104
Coupling variables are variables that are defined in one unit and used in the other.
88105
There are different kinds of couplings based on the interfaces:
89106
- **Parameter coupling:** Parameters are passed in calls.
@@ -96,7 +113,7 @@ Data flow coverage criteria can now be extended to coupling variables:
96113
- **All-coupling-def coverage:** A path is to be executed from every last-def to at least one first-use.
97114
- **All-coupling-use coverage:** A path is to be executed from every last-def to every first-use.
98115
- **All-coupling-du-paths coverage:** Every simple path from every last-def to every first-use needs to be executed.
99-
116+
## Classical Coverage Criteria
100117
Traditional terminologies -
101118
- **A linearly independent path** of execution in the CFG of a program is a path that does not contain other paths within it. (very similar to prime paths)
102119
- **Basic Block -** A series of nodes with no branching can be collapsed into one node called the basic block.
Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,23 @@
11

22
| Operation | RLs | CFLs | CSL | REL |
33
| :-----------: | :-: | :--: | :-: | :-: |
4-
| Union | || | |
5-
| Intersection | || | |
6-
| Complement | || | |
7-
| Concatenation | || | |
8-
| Kleene Star | || | |
9-
| Positive Star | || | |
10-
| Difference | || | |
11-
| Reversal | || | |
4+
| Union ||| | |
5+
| Intersection ||| | |
6+
| Complement ||| | |
7+
| Concatenation ||| | |
8+
| Kleene Star ||| | |
9+
| Positive Star ||| | |
10+
| Difference ||| | |
11+
| Reversal ||| | |
12+
## Operator Precedence
13+
The precedence of regular operators. are
14+
$$
15+
() \gg *,+,R, - \gg \circ \gg \cap, \backslash \gg \cup
16+
$$
17+
- Parentheses or grouping have highest precedence.
18+
- This is followed by the unary operators of **Kleene star, positive star, reversal**, and **complement** that have the same precedence.
19+
- This is followed by the binary operator of concatenation.
20+
- Then we have binary operators of intersection and difference with the same precedence.
21+
- Finally, the binary operator of the union has the lowest precedence.
22+
## Extras
1223
- Intersection of CFLs and RLs is closed.

content/Theory of Computation/Finite Automata and Regular Languages.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,4 +34,30 @@ It is a finite automata where each state has **exactly one transition** for **ev
3434
2. For a language there can be multiple possible NFAs, but only one possible DFA.
3535
3. Grammars are inherently non-deterministic as a production rule can lead to multiple outcomes.
3636
4. A recognizer is a computational device or algorithm that decides whether a given input string belongs to a language.
37+
# Regular Expression
38+
The precedence of regex operators are:
39+
$$
40+
() \gg *,+ \gg \cdot \gg |
41+
$$
3742

43+
- Parentheses or grouping have highest precedence.
44+
- This is followed by the unary operators of Kleene star and positive star that have the same precedence.
45+
- This is followed by the binary operator of concatenation.
46+
- Finally, the binary operator of the union has the lowest precedence.
47+
## Arden's Theorem
48+
If $P$ and $Q$ are two Regular Expressions over $\Sigma$ and if $P$ does not contain $\epsilon$, then the equation for $R$
49+
50+
$$
51+
\begin{aligned}
52+
R &= Q + RP \\[8pt]
53+
&= Q + (Q + RP)P \\[8pt]
54+
&= Q + QP + RP^2 \\[8pt]
55+
&= Q + QP + (Q + RP)P^2 \\[8pt]
56+
&= Q + QP + QP^2 + RP^2 \\[8pt]
57+
&= Q + QP + QP^2 + \dots \\[8pt]
58+
&= Q(\epsilon + P + P^2 + \dots) \\[8pt]
59+
&= QP^* \\[8pt]
60+
\end{aligned}
61+
$$
62+
63+
Similarly if $R = Q + PR$ then $R = P^*Q$ is the solution.

0 commit comments

Comments
 (0)