You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/Mathematical Foundations of Generative AI/VDM and GANs.md
+48-11Lines changed: 48 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -100,6 +100,11 @@ $$
100
100
K.L Divergence is asymmetric, meaning $\underbrace{D(P_X\,||\,P_\theta)}_\text{Forward K.L.} \ne \underbrace{D(P_\theta\,||\,P_X)}_\text{Reverse K.L.}$.
101
101
102
102
2. $f(u) = \frac{1}{2}\left(u\,log\,u-(u+1)\,log\left(\frac{u+1}{2}\right)\right)$ leads to the **JS (Jensen-Shannon) Divergence**.
103
+
104
+
$$
105
+
JS(P_X || P_\theta) = \frac{1}{2} KL(P_X || M) + \frac{1}{2} KL(P_\theta || M) \qquad \text{where,}\,\,M = \frac{P_X + P_\theta}{2}
106
+
$$
107
+
103
108
3. $f(u)=\frac{1}{2}|u-1|$ leads to the **Total Variation Distance** or TV Distance.
104
109
105
110
## Algorithm for f-divergence minimization
@@ -332,7 +337,7 @@ $$
332
337
333
338
&\approx \arg\min_{\theta} \Bigg[\cancel{\frac{1}{B_1}\sum_{i=1}^{B_1}log\,D_w(x_i)}^{\text{Independent of }\theta} + \frac{1}{B_2}\sum_{i=1}^{B_2}log\,(1-D_w(\hat{x}_i)) \Bigg] \\[8pt]
334
339
335
-
&= \arg\min_{\theta} \Bigg[\frac{1}{B_2}\sum_{i=1}^{B_2}log\,(1-D_w(\hat{x}_i) \Bigg] \qquad\because \text{Second term stays as } \hat{x}_i=g_\theta(z_j) \in P_\theta \\[8pt]
340
+
&= \arg\min_{\theta} \Bigg[\frac{1}{B_2}\sum_{i=1}^{B_2}log\,(1-D_w(\hat{x}_i)) \Bigg] \qquad\because \text{Second term stays as } \hat{x}_i=g_\theta(z_j) \in P_\theta \\[8pt]
336
341
\end{aligned}
337
342
$$
338
343
@@ -438,7 +443,7 @@ Given two distributions $P_X$ and $P_{\hat{X}}$,
Any function $f$ being 1-Lipschitz means that the function cannot change faster than the distance -
508
+
Any function $f$ being 1-Lipschitz means that the function cannot change faster than the distance (the derivative is always less than or equal to 1) -
504
509
505
510
$$
506
-
\frac{||f(x_1)-f(x_2)||}{||x_1-x_2||} \lt 1
511
+
\frac{||f(x_1)-f(x_2)||}{||x_1-x_2||} \le 1
507
512
$$
508
513
509
-
The $T_w$ in this case is a neural network and can be made 1-Lipschitz by normalizing the weights of $T_w$ such that $||w||_2=1$ after each gradient step.
514
+
The $T_w$ in this case is a neural network and can be made 1-Lipschitz by normalizing the weights of $T_w$ such that $||w||_2=1$ after each gradient step.
510
515
511
516
$\theta^*$ has to be such that the Wasserstein's distance is to be minimized. The Kantrovic-Rubenstein's Duality enables us to express the Wasserstein's distance in terms of expectations of $P_X$ and $P_\theta$.
The above objective is very similar to GANs. That's why this method of minimizing the Wasserstein's metric is called the **WGAN**. Training a WGAN is more stable than training a Naive-GAN.
522
+
The above objective is very similar to GANs. That's why this method of minimizing the Wasserstein's metric is called the **WGAN**. Training a WGAN is more stable than training a Naive-GAN as the gradients will not saturate if the supports of the probability distributions misalign.
518
523
## Bi-Directional GAN (Bi-GAN)
519
524
### Inversion of GANs
520
525
We train a GAN specifically to allow us to sample $x$ from the dataset distribution $P_X$ by picking a random sample $z$ from an arbitrary distribution $Z$ and passing it through the generator $g_\theta$. But how can we get back $z$ if we know $x$?
Any classifier/regressor trained solely on $D_s$ would fail to predict for the target items in $D_t$. We can use **Domain Adversarial Networks** here to train a classifier that is **domain agnostic** (able to classify independent on which domain element belong to).
572
+
When the probability distribution of your training and testing dataset differ, we call this as **domain shift**.
573
+
574
+
Any classifier/regressor trained solely on $D_s$ would fail to predict for the target items in $D_t$.
575
+
576
+
---
577
+
<h4class="special">Example</h4>
578
+
Imagine you're training a model to identify images of dogs. The training dataset for your model has sketches, paintings, and cartoon representations of dogs while you test dataset has actual photos of dogs. In such a case the distributions for the training and testing dataset differ. The method to solving domain shift is called as **Unsupervised Domain Adaptation**.
579
+
580
+
In the above example -
581
+
- The broader class of "animal representations" is a **semantic class.**
582
+
- The sketches, paintings, cartoons, and photos are called as **domains**.
583
+
584
+
The network can be trained either on all four domains or one of the domains can be unknown. In the case where a domain is left out, our hope is that all domains share the same underlying semantic structure just with different marginal distributions. This is called as the **shared support assumption**. Under this assumption, an optimal encoder trained on the rest three domains should be able to extract meaningful features from the unseen domain. This setting is called **domain generalization**.
585
+
586
+
---
587
+
588
+
So our objective with such a setup would be that our model is able to learn the features/classifier in such a manner that it's able to perform well on both $P_s$ and $P_t$.
589
+
590
+
We can use **Domain Adversarial Networks** here to train a classifier that is **domain agnostic** (able to classify independent on which domain element belong to).
560
591
561
592
In Domain Adversarial Networks we have -
562
593
1. An Encoder $\phi:X \rightarrow F$ to extract features from inputs regardless of which domain the inputs belong to (both $D_s$ and $D_t$).
563
594
2. A Discriminator $T_w:F \rightarrow [0,1]$ to distinguish between elements of $P_s$ and elements of $P_t$ (Features of both source and target data).
564
-
3. A Classifier/Regressor $h_\psi: F_s \rightarrow y_s \sim P_s(y|x)$ which uses the features of the source inputs to make a prediction regarding them.
595
+
3. A Classifier/Regressor $h_\psi: F_s \rightarrow y_s \sim P_s(y|x)$ which uses the features of the source inputs to make a prediction regarding their target. This works as a metric for the usefulness of the features.
565
596
566
-
Here the Discriminator makes the Encoder better at constructing features from the inputs (both source and target) in such a way that the features appear domain agnostic. But just having domain agnostic features isn't all, they need to be useful for predicting the target class. For this we include a Classifier/Regressor as well in the network so that the features learnt are both domain agnostic and useful.
597
+
A DANN cannot generate samples, it's only job is to align the features of the different distributions.
598
+
599
+
The Discriminator makes the Encoder better at constructing features from the inputs (both source and target) in such a way that the features appear domain agnostic. But just having domain agnostic features isn't all, they need to be useful for predicting the target class. For this we include a Classifier/Regressor as well in the network so that the features learnt are both domain agnostic and useful.
The encoder network has gradients flowing from both the discriminator as well as the classifier.
609
+
- $\phi$ would ensure that $P_{F_s}$ = $P_{F_t}$.
610
+
- $h_\phi$ would ensure that the features are meaningful.
574
611
## Evaluation of a GAN
575
-
Suppose we have some true and generated samples and we wish to evaluate whether the GAN is successful in generating samples from $P_X$. There are various methods for it, but we'd be look at an adversarial method of evaluation called **Frechet Inception Distance**. FID uses [[VDM and GANs#Wasserstein's Metric (Optimal Transport)|Wasserstein's Metric]] along with Inception Network trained on Imagenet to do this evaluation.
612
+
Suppose we have some true and generated samples and we wish to evaluate whether the GAN is successful in generating samples from $P_X$. There are various methods for it, but we'd be look at an adversarial method of evaluation called **Fréchet Inception Distance**. FID uses [[VDM and GANs#Wasserstein's Metric (Optimal Transport)|Wasserstein's Metric]] along with **Inception Network trained on Imagenet** to do this evaluation.
Copy file name to clipboardExpand all lines: content/Software Testing.md
+22-5Lines changed: 22 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,6 +35,13 @@ Software testing is the process of examining the artifacts and behavior of a sof
35
35
-**Error -** An incorrect internal state during execution. This happens inside the memory.
36
36
37
37
A test case involves an input to the software and an output. If the actual output matches the expected output, we say that the test case passed.
38
+
39
+
Testing goals based on process maturity -
40
+
1. Level 0: There is no difference between testing and de-bugging.
41
+
2. Level 1: The purpose of testing is to show correctness.
42
+
3. Level 2: The purpose of testing is to show that software doesn’t work.
43
+
4. Level 3: The purpose of testing is not to prove anything specific, but to reduce the risk of using the software.
44
+
5. Level 4: Testing is a mental discipline that helps all IT professionals develop higher quality software.
38
45
## Types of testing
39
46
1.**Unit Testing -** Testing of a singular component.
40
47
2.**Integration Testing** - Various components are put together and tested.
@@ -64,13 +71,11 @@ There are two broader methods of testing -
64
71
1.**Simple Path -** A path from one node to another is a simple path if no node appears more than once except the first and last node. No internal loops.
65
72
2.**Prime Path -** A simple path such that it's not a sub-path of another simple path. They are thus the maximal simple paths. ^633cde
66
73
## Types of Tours
67
-
1. Tours with side-trips - A test path $p$ tours a sub-path $q$ with side-trips iff every edge in $q$ is also in $p$ in the same order.
68
-
If a tour comes back to the same node it diverted from, we say the tour includes a side-trip.
74
+
1.**Tours with side-trips** - A test path $p$ tours a sub-path $q$ with side-trips iff every edge in $q$ is also in $p$ in the same order. If a tour comes back to the same node it diverted from, we say the tour includes a side-trip.
69
75
70
76
![[Pasted image 20260225093645.png|450]]
71
77
72
-
2. Tours with detours - A test path $p$ tours a sub-path $q$ with detours iff every node in $q$ is also in $p$ in the same order.
73
-
If a tour detours from some node $n$ and returns back to the prime path at a successor of $n$, we say the tour has a detour.
78
+
2.**Tours with detours** - A test path $p$ tours a sub-path $q$ with detours iff every node in $q$ is also in $p$ in the same order. If a tour detours from some node $n$ and returns back to the prime path at a successor of $n$, we say the tour has a detour.
74
79
75
80
![[Pasted image 20260225093700.png|450]]
76
81
# Data flow Coverage
@@ -84,6 +89,18 @@ There are two broader methods of testing -
84
89
85
90
![[Pasted image 20260222112558.png]]
86
91
# Test Integration
92
+
## Scaffolding
93
+
When testing incomplete portions of software, we need extra software components, sometimes called scaffolding.
94
+
Two common types of scaffolding:
95
+
1.**Test stub** is a skeletal or special purpose implementation of a software module, used to develop or test a component that calls the stub or otherwise depends on it.
96
+
2.**Test driver** is a software component or test tool that replaces a component that takes care of the control and/or the calling of a software component.
97
+
## Five approaches to integration testing
98
+
1.**Incremental -**
99
+
1. Top-down - Create top level modules while using stubs.
100
+
2. Bottom-up - Create bottom level modules while using test drivers to call them.
101
+
2.**Sandwich** - Mix of top-down and bottom-up
102
+
3.**Big Bang** - all individually tested modules are put together to construct the entire system which is tested as a whole.
103
+
## Coupling data flow
87
104
Coupling variables are variables that are defined in one unit and used in the other.
88
105
There are different kinds of couplings based on the interfaces:
89
106
-**Parameter coupling:** Parameters are passed in calls.
@@ -96,7 +113,7 @@ Data flow coverage criteria can now be extended to coupling variables:
96
113
-**All-coupling-def coverage:** A path is to be executed from every last-def to at least one first-use.
97
114
-**All-coupling-use coverage:** A path is to be executed from every last-def to every first-use.
98
115
-**All-coupling-du-paths coverage:** Every simple path from every last-def to every first-use needs to be executed.
99
-
116
+
## Classical Coverage Criteria
100
117
Traditional terminologies -
101
118
-**A linearly independent path** of execution in the CFG of a program is a path that does not contain other paths within it. (very similar to prime paths)
102
119
-**Basic Block -** A series of nodes with no branching can be collapsed into one node called the basic block.
0 commit comments