<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="http://www.harsh-seth.com/feed.xml" rel="self" type="application/atom+xml" /><link href="http://www.harsh-seth.com/" rel="alternate" type="text/html" /><updated>2024-09-25T20:41:52+00:00</updated><id>http://www.harsh-seth.com/feed.xml</id><title type="html">knowhere</title><subtitle>A collection of notes about topics I&apos;m thinking about</subtitle><author><name>Harsh Seth</name></author><entry><title type="html">Preparing for Coding Interviews (A journey) - The Problem Solving Process</title><link href="http://www.harsh-seth.com/lc-process/" rel="alternate" type="text/html" title="Preparing for Coding Interviews (A journey) - The Problem Solving Process" /><published>2024-07-29T00:00:00+00:00</published><updated>2024-07-29T00:00:00+00:00</updated><id>http://www.harsh-seth.com/lc-process</id><content type="html" xml:base="http://www.harsh-seth.com/lc-process/"><![CDATA[<blockquote>
  <details><summary>About this series</summary>
    This series of posts documents the learnings, challenges i encountered and the decisions i made on my journey to hone the (algorithmic) tools of my trade. It's partly a tool for me to arrange my thoughts on this, partly a resource for me to revisit every now and then, and partly to serve as a guide for anyone who might find it useful. <br /><br />Check out the <a href="/lc-overview">meta post</a> for a list of all related posts. For any Hiring Managers/TA Partners seeing this, drop me an <a href="mailto: harshseth2006@gmail.com">email</a> if you like what you see! </details>
</blockquote>

<h2 id="the-typical-process">The Typical Process</h2>
<ul>
  <li>Create a notes document to accompany the question</li>
  <li>Select a language to solve the problem in, and write down the function prototype of the expected solution (to determine expected input and output formats)</li>
  <li>Write tests/asserts which call the function to verify if the solution works (AKA Test driven development)
    <ul>
      <li>Think of some on your own test cases before copying over the provided ones, and then think of some more after copying</li>
      <li>Use this as an opportunity to clarify details about the problem from the interviewer</li>
    </ul>
  </li>
  <li>In the notes document, list down assumptions, approach ideas and observations
    <ul>
      <li>This helps condense thoughts into atomic, verifiable claims</li>
    </ul>
  </li>
  <li>For every potential solution, write out it’s ‘approach’ pseudocode, even if it doesn’t meet some of the question’s limitations (like space or time complexity)
    <ul>
      <li>This helps develop ideas, and lets you fail fast, along with potentially getting helpful insights from the interviewer</li>
    </ul>
  </li>
</ul>

<h2 id="while-practicing">While Practicing</h2>
<ul>
  <li>All of steps from The Typical Process</li>
  <li>If a solution in a language feels like it can be optimized (either performance or readability), then attempt the same problem in a different language to inform the language selection process for future problems</li>
  <li>If a different (not necessarily better) solution is found after you’ve submitted your solution, analyse and write it down as an approach too</li>
</ul>]]></content><author><name>Harsh Seth</name></author><category term="blog" /><category term="leetcoding" /><summary type="html"><![CDATA[Documenting the steps I follow when I think through solutions for a programming problem]]></summary></entry><entry><title type="html">Preparing for Coding Interviews (A journey) - Noteworthy Data Structures and Algorithms</title><link href="http://www.harsh-seth.com/lc-notable-ds-and-a/" rel="alternate" type="text/html" title="Preparing for Coding Interviews (A journey) - Noteworthy Data Structures and Algorithms" /><published>2024-07-26T00:00:00+00:00</published><updated>2024-07-26T00:00:00+00:00</updated><id>http://www.harsh-seth.com/lc-notable-ds-and-a</id><content type="html" xml:base="http://www.harsh-seth.com/lc-notable-ds-and-a/"><![CDATA[<blockquote>
  <details><summary>About this series</summary>
    This series of posts documents the learnings, challenges i encountered and the decisions i made on my journey to hone the (algorithmic) tools of my trade. It's partly a tool for me to arrange my thoughts on this, partly a resource for me to revisit every now and then, and partly to serve as a guide for anyone who might find it useful. <br /><br />Check out the <a href="/lc-overview">meta post</a> for a list of all related posts. For any Hiring Managers/TA Partners seeing this, drop me an <a href="mailto: harshseth2006@gmail.com">email</a> if you like what you see! </details>
</blockquote>

<h2 id="fundamental-data-structures">Fundamental Data Structures</h2>
<ul>
  <li>Object</li>
  <li>Arrays</li>
  <li>Linked Lists</li>
  <li>Hash Maps</li>
  <li>Stacks</li>
  <li>Queues</li>
  <li>Priority Queues</li>
  <li>Trees</li>
  <li>Graphs</li>
  <li>(Binary) Heaps</li>
</ul>

<blockquote>
  <p>NOTE: It is possible to build all of these data structures from the first two in the list</p>
  <ul>
    <li>Arrays -&gt; Stacks, Queues</li>
    <li>Objects -&gt; Linked Lists -&gt; Stacks, Queues</li>
    <li>Objects, Arrays -&gt; Hash Maps</li>
    <li>Objects -&gt; Trees, Graphs</li>
    <li>Trees -&gt; Heap -&gt; Priority Queue</li>
  </ul>
</blockquote>

<p>Each data structure should allow for the following operations: <code class="language-plaintext highlighter-rouge">Access</code>, <code class="language-plaintext highlighter-rouge">Insert</code>, <code class="language-plaintext highlighter-rouge">Update</code>, <code class="language-plaintext highlighter-rouge">Delete</code>, <code class="language-plaintext highlighter-rouge">Sort</code> (if applicable)</p>

<h2 id="notable-algorithms">Notable Algorithms</h2>
<ul>
  <li>Longest Increasing Subsequence (LIS)</li>
  <li>Knapsack Problem</li>
  <li>Dijistra’s Algorithm</li>
  <li>A* Search</li>
</ul>]]></content><author><name>Harsh Seth</name></author><category term="blog" /><category term="leetcoding" /><summary type="html"><![CDATA[Documenting a list of notable data structures and algorithms, that I can work towards mastering]]></summary></entry><entry><title type="html">Preparing for Coding Interviews (A journey) - Programming Language Selection</title><link href="http://www.harsh-seth.com/lc-language/" rel="alternate" type="text/html" title="Preparing for Coding Interviews (A journey) - Programming Language Selection" /><published>2024-07-24T00:00:00+00:00</published><updated>2024-07-24T00:00:00+00:00</updated><id>http://www.harsh-seth.com/lc-language</id><content type="html" xml:base="http://www.harsh-seth.com/lc-language/"><![CDATA[<blockquote>
  <details><summary>About this series</summary>
    This series of posts documents the learnings, challenges i encountered and the decisions i made on my journey to hone the (algorithmic) tools of my trade. It's partly a tool for me to arrange my thoughts on this, partly a resource for me to revisit every now and then, and partly to serve as a guide for anyone who might find it useful. <br /><br />Check out the <a href="/lc-overview">meta post</a> for a list of all related posts. For any Hiring Managers/TA Partners seeing this, drop me an <a href="mailto: harshseth2006@gmail.com">email</a> if you like what you see! </details>
</blockquote>

<h2 id="selecting-a-programming-language">Selecting a programming language</h2>

<table>
  <thead>
    <tr>
      <th>Factor</th>
      <th>Rationale</th>
      <th>Candidates</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Proficiency in the language</td>
      <td>Assessment time should not be spent second-guessing (or worse, debugging) syntax and language features.</td>
      <td>Javascript, Python, Java, C++</td>
    </tr>
    <tr>
      <td>Relevance of the language</td>
      <td>Selected language should ideally be relevant to the desired roles and the work they entail</td>
      <td>Javascript, Python, Java, Go, Rust</td>
    </tr>
    <tr>
      <td>Time constraints of the programming assessment</td>
      <td>Languages which allow for more intuitive expression of logic (such as dynamically typed languages and languages with syntactical sugar) save mental bandwidth for actually working on the problem</td>
      <td>Javascript, Python</td>
    </tr>
    <tr>
      <td>Support for useful features</td>
      <td>Assessment time shouldn’t be spent writing code to reinvent the standard data structures wheel or to figure out abstracted away memory management</td>
      <td>C++, Java, Python, Javascript (w/ external modules), Go (w/ external modules)</td>
    </tr>
  </tbody>
</table>

<p>Decision:</p>
<ul>
  <li>When building off known data structures to arrive at solutions, use Python</li>
  <li>When required to implement data structures from scratch, use Javascript</li>
  <li>Only when required to directly manipulate memory, use C++ (or more ideally, Rust)</li>
</ul>

<h2 id="stray-thoughts">Stray Thoughts</h2>
<ul>
  <li>A scripting language really allows for clarity in thought and lets the logic to shine though
    <ul>
      <li>See my <a href="https://github.com/harsh-seth/code-katas/blob/main/leetcode/1768-merge-strings-alternatively/solution.py">solution</a> (and the <a href="https://github.com/harsh-seth/code-katas/blob/main/leetcode/1768-merge-strings-alternatively/community_solution.py">community solution</a>) for LC 1768 for an example</li>
    </ul>
  </li>
  <li>Switching between Python and Javascript can often cause “crossed wires”
    <ul>
      <li>Accessing <code class="language-plaintext highlighter-rouge">arr[-1]</code> yields <code class="language-plaintext highlighter-rouge">undefined</code> in JS and the last element in Python</li>
      <li>Accessing <code class="language-plaintext highlighter-rouge">str[len(str)]</code> yields <code class="language-plaintext highlighter-rouge">undefined</code> in JS and an error in Python</li>
    </ul>
  </li>
  <li>C++ and Java really bring with it a lot of syntax which detracts from the logic of the solution!</li>
  <li>However, in the memory manipulation cases, C++ does not have any competition (in the selected 3), allowing for crystal clear expression!</li>
</ul>]]></content><author><name>Harsh Seth</name></author><category term="blog" /><category term="leetcoding" /><summary type="html"><![CDATA[Documenting the factors that goes into the selection of a programming language to write solutions in]]></summary></entry><entry><title type="html">Preparing for Coding Interviews (A journey)</title><link href="http://www.harsh-seth.com/lc-overview/" rel="alternate" type="text/html" title="Preparing for Coding Interviews (A journey)" /><published>2024-07-23T00:00:00+00:00</published><updated>2024-07-23T00:00:00+00:00</updated><id>http://www.harsh-seth.com/lc-overview</id><content type="html" xml:base="http://www.harsh-seth.com/lc-overview/"><![CDATA[<p>This series of posts documents the learnings, challenges i encountered and the decisions i made on my journey to hone the (algorithmic) tools of my trade. It’s partly a tool for me to arrange my thoughts on this, partly a resource for me to revisit every now and then, and partly to serve as a guide for anyone who might find it useful. <br /><br /> For any Hiring Managers/TA Partners seeing this, drop me an <a href="mailto: harshseth2006@gmail.com">email</a> if you like what you see!</p>

<h2 id="posts">Posts</h2>
<ol>
  <li><a href="/lc-language">Programming Language Selection</a></li>
  <li><a href="/lc-notable-ds-and-a">Noteworthy Data Structures and Algorithms</a></li>
  <li><a href="/lc-process">The Problem Solving Process</a></li>
</ol>]]></content><author><name>Harsh Seth</name></author><category term="blog" /><category term="leetcoding" /><category term="featured" /><summary type="html"><![CDATA[A meta-post containing a list of all posts in this series]]></summary></entry><entry><title type="html">Vision Language Models</title><link href="http://www.harsh-seth.com/vision-language-models/" rel="alternate" type="text/html" title="Vision Language Models" /><published>2024-04-22T00:00:00+00:00</published><updated>2024-04-22T00:00:00+00:00</updated><id>http://www.harsh-seth.com/vision-language-models</id><content type="html" xml:base="http://www.harsh-seth.com/vision-language-models/"><![CDATA[<h3 id="key-ideas">Key Ideas</h3>
<ul>
  <li>Images can be represented as a collection of visual “words” or patches, allowing the attention mechanism to be applied to it</li>
  <li>Architectures for Vision and Text models are converging, allowing for native multimodality</li>
</ul>

<h3 id="notes">Notes</h3>
<h4 id="vision-basics">Vision Basics</h4>
<p><strong>Representation</strong></p>
<ul>
  <li>Grayscale images are matrices, Color images are tensors</li>
  <li>Image pixel values exist in a fixed range, called a Color Space</li>
</ul>

<p><strong>Convolution Networks</strong></p>
<ul>
  <li>Given a <em>convolution mask</em> <code class="language-plaintext highlighter-rouge">k</code>, we can create a representation of an image
    <ul>
      <li><code class="language-plaintext highlighter-rouge">g(x, y) = Sum_{v} Sum_{u} k(u, v)f(x - u, y - v)</code> where <code class="language-plaintext highlighter-rouge">f</code> is the input representation, <code class="language-plaintext highlighter-rouge">g</code> is the new representation</li>
      <li>Can do things such as: “Sharpen”, “Find Edges”, “Blur”, etc.</li>
    </ul>
  </li>
  <li>Stack enough depth, and the network will learn more complex features (image -&gt; edges -&gt; groups of edges -&gt; collections of interesting features)</li>
</ul>

<p><strong>Transformer Networks</strong></p>
<ul>
  <li>Apply self attention on pixel values
    <ul>
      <li>Problems around extracting 2D relation information from the image and local vs global attention</li>
    </ul>
  </li>
  <li>Use patches instead of pixels (dModel = 768!)</li>
</ul>

<h4 id="multimodality">Multimodality</h4>
<p><strong>Contrastive Language–Image Pre-training (CLIP)</strong></p>
<ul>
  <li>Step 1: Train a model to maximise the similarity scores between image encoding and corresponding text encoding</li>
  <li>Step 2: Given an input in a mode, create encodings for all potential counterparts in the other mode</li>
  <li>Step 3: Find similarity scores between encoding of input with counterparts and select the highest one</li>
</ul>

<p><strong>Fuyu</strong></p>
<ul>
  <li>Step 1: Create image patches, encode it into linear vectors to become “words”
    <ul>
      <li>Special mention: Image “newline” character!</li>
    </ul>
  </li>
  <li>Step 2: Append the image patch vector sequence with text words vectors and feed into the Transformer architecture</li>
  <li>Step 3: Only perform prediction on the output embeddings corresponding to the text</li>
</ul>

<p><strong>Aside: V* Vision Search</strong>
Some features might be too insignificant in the larger context to be accurately pinpointed by traditional methods. The V* method might be helpful</p>
<ul>
  <li>Step 1: Use LLM to identify patches which may contain subject</li>
  <li>Step 2: Use high likelihood patches as inputs for actual query</li>
</ul>

<h3 id="resources">Resources</h3>
<ul>
  <li>Dr. Mohit Iyyer’s UMass CS685 S24 <a href="https://www.youtube.com/watch?v=ijqUUZI3osM">Lecture 19</a></li>
  <li><a href="https://arxiv.org/abs/2010.11929">An image is worth 16x16 words</a></li>
  <li>OpenAI’s <a href="https://openai.com/research/clip">CLIP</a></li>
  <li>AdeptAI’s <a href="https://www.adept.ai/blog/fuyu-8b">Fuyu</a></li>
</ul>]]></content><author><name>Harsh Seth</name></author><category term="NLP" /><category term="notes" /><summary type="html"><![CDATA[Images can be represented as a collection of visual "words" or patches, allowing the attention mechanism to be applied to it. This opened the doors for native multimodality capabilities be developed in the form of VLMs]]></summary></entry><entry><title type="html">Scaling Laws for Large Language Models</title><link href="http://www.harsh-seth.com/scaling-laws/" rel="alternate" type="text/html" title="Scaling Laws for Large Language Models" /><published>2024-04-10T00:00:00+00:00</published><updated>2024-04-10T00:00:00+00:00</updated><id>http://www.harsh-seth.com/scaling-laws</id><content type="html" xml:base="http://www.harsh-seth.com/scaling-laws/"><![CDATA[<h3 id="key-ideas">Key Ideas</h3>
<ul>
  <li>There seems to be an upper cap on LLM performance if compute budget is kept fixed, i.e. capacity of a model</li>
  <li>Increasing either data or model parameters alone is not enough to improve performance</li>
  <li>Various studies have found quantified relationships between data size, model size and compute budget which can be used to inform how much resource to use when training an LLM.</li>
</ul>

<h3 id="notes">Notes</h3>
<h4 id="training-tradeoffs">Training Tradeoffs</h4>
<p>Factors informing training are</p>
<ul>
  <li>Dataset Size (<code class="language-plaintext highlighter-rouge">D</code> # of training tokens)</li>
  <li>Model Size (<code class="language-plaintext highlighter-rouge">N</code> # of model parameters)</li>
  <li>Compute Budget (<code class="language-plaintext highlighter-rouge">C = Flops(N, D)</code>)</li>
</ul>

<p>To solve: <code class="language-plaintext highlighter-rouge">argmin_{N, D} L(N, D) s.t. FLOPS(N, D) = C</code> where <code class="language-plaintext highlighter-rouge">L(N, D) = A/(N^alpha) + B/(D^beta) + E</code></p>

<h4 id="kaplan-scaling-laws">Kaplan Scaling Laws</h4>
<p>Findings from Kaplan et al., 2020</p>
<ul>
  <li>Performance depends strongly on scale and weakly on model shape</li>
  <li>Increasing both dataset and model size is key to improved performance. Increasing one while keeping the other fixed leads to diminished returns (assuming uncapped compute)</li>
  <li>Larger models are more sample efficient</li>
  <li>Prioritize increasing model size over data size</li>
</ul>

<p>Issue</p>
<ul>
  <li>Same learning rate schedule was used for all training runs, regardless of batch size</li>
</ul>

<h4 id="chinchilla-scaling-laws">Chinchilla Scaling Laws</h4>
<p>Findings from Hoffman et al., 2022</p>
<ul>
  <li>Fixed the learning rate schedules</li>
  <li>Diff from Kaplan: Increase the data and model with the same factor</li>
  <li>Based off a fixed compute budget, they found two linear relationships - one for model size and one for dataset size</li>
</ul>

<h3 id="resources">Resources</h3>
<ul>
  <li>Dr. Mohit Iyyer’s UMass CS685 S24 <a href="https://www.youtube.com/watch?v=lSBG_JuhbPE">Lecture 17</a></li>
</ul>]]></content><author><name>Harsh Seth</name></author><category term="NLP" /><category term="notes" /><summary type="html"><![CDATA[There seems to be an upper cap on LLM performance if compute budget is kept fixed, i.e. capacity of a model. Various studies have found quantified relationships between data size, model size and compute budget which can be used to inform how much resource to use when training an LLM]]></summary></entry><entry><title type="html">Position Embeddings and Efficient Attention</title><link href="http://www.harsh-seth.com/position-embedding/" rel="alternate" type="text/html" title="Position Embeddings and Efficient Attention" /><published>2024-04-08T00:00:00+00:00</published><updated>2024-04-08T00:00:00+00:00</updated><id>http://www.harsh-seth.com/position-embedding</id><content type="html" xml:base="http://www.harsh-seth.com/position-embedding/"><![CDATA[<h3 id="key-ideas">Key Ideas</h3>
<ul>
  <li>Position Encodings give the model a notion of order</li>
</ul>

<h3 id="notes">Notes</h3>
<h4 id="embedding-format">Embedding Format</h4>
<p><strong>Type 1: Absolute Positions</strong>
<code class="language-plaintext highlighter-rouge">q_1 = w_q . (c_1 + p_1)</code></p>

<p><strong>Fixed Format</strong></p>
<ul>
  <li>Allows for arbitrary length input sequences, esp. at test time</li>
  <li>Practically, the model does not effectively learn this</li>
</ul>

<p><strong>Learned</strong></p>
<ul>
  <li>Lets the model figure out the best format to encode this information</li>
  <li>Cannot be used for longer length sequences at test time than that set at train time</li>
</ul>

<p><strong>Type 2: Relative Positions</strong></p>
<ul>
  <li>Represent every pair of tokens, and measure the relative positions difference between them</li>
  <li>Could be better suited as input sequences might have variations, and text might be prepended/truncated - changing the absolute positions while keeping the relative position difference the same</li>
  <li>Cannot be directly added to input embedding (exception: RoPE). Instead, directly modifies the attention matrix</li>
</ul>

<p><strong>ALiBi</strong></p>
<ul>
  <li>Decay the <code class="language-plaintext highlighter-rouge">q.k</code> dot products in the attention calculation by the difference in positions
    <ul>
      <li>Mask = <code class="language-plaintext highlighter-rouge">[[0, -inf, -inf, -inf], [-1, 0, -inf, -inf], [-2, -1, 0, -inf], [-3, -2, -1, 0]] * m</code> where <code class="language-plaintext highlighter-rouge">m</code> is a ‘magnitude’/’slope’ which is a hyperparameter, and only varies between attention heads</li>
    </ul>
  </li>
  <li>Enables extrapolation beyond training sequence length</li>
  <li>Position information does not affect <code class="language-plaintext highlighter-rouge">v</code></li>
</ul>

<p><strong>Rotary Position Embeddings (RoPE)</strong></p>
<ul>
  <li>Rotate <code class="language-plaintext highlighter-rouge">q</code> by angle x. Rotate <code class="language-plaintext highlighter-rouge">k</code> by angle y. <code class="language-plaintext highlighter-rouge">q.k</code> will only have have information about the relative position difference encoded, not the absolute position diff</li>
  <li>i.e. <code class="language-plaintext highlighter-rouge">f_q(c_4, 4) = q_4</code>, <code class="language-plaintext highlighter-rouge">f_k(c_1, 1) = k_1</code>, <code class="language-plaintext highlighter-rouge">q_4.k_1 = g(c_4, c_1, 4-1)</code>. Find <code class="language-plaintext highlighter-rouge">f_q, f_k, g</code></li>
  <li><code class="language-plaintext highlighter-rouge">f_q(c_t, t) = R_{theta, t} = [[cos(t*theta) -sin(t*theta)], [sin(t*theta) cos(t*theta)]]</code> where <code class="language-plaintext highlighter-rouge">theta</code> is a hyperparameter</li>
</ul>

<h4 id="optimized-attention-computation-strategies">Optimized Attention Computation Strategies</h4>
<p>Attention calculation is a quadratic complexity operation. This can be improved upon by special consideration.</p>

<p><strong>Flash Attention</strong></p>
<ul>
  <li>Rather than storing results of one intermediate operations back into memory and reading them again for the next, create a new operation which does all these steps in one go, saving on wasteful memory I/O operations</li>
</ul>

<p><strong>Ring Attention</strong></p>
<ul>
  <li>Break the attention computation down into chunks, assign a chunks of the subsequence to its own dedicated GPU</li>
  <li>Forward results to the next GPU, which are all arranged in a grid</li>
  <li>Eventually, after <code class="language-plaintext highlighter-rouge">n</code> forwards (where <code class="language-plaintext highlighter-rouge">n</code> is the number of GPUs), every GPU’s memory will have the full attention score for its own subsequence</li>
</ul>

<h3 id="resources">Resources</h3>
<ul>
  <li>Dr. Mohit Iyyer’s UMass CS685 S24 <a href="https://www.youtube.com/watch?v=cG3PQX64rKE">Lecture 16</a></li>
</ul>]]></content><author><name>Harsh Seth</name></author><category term="NLP" /><category term="notes" /><summary type="html"><![CDATA[Natural Language stores a great portion of its information in the ordering of it's constituents. Positional Encodings are key to including this information in effectively using the self-attention mechanism]]></summary></entry><entry><title type="html">Evaluating LLM-generated text</title><link href="http://www.harsh-seth.com/llm-evaluation/" rel="alternate" type="text/html" title="Evaluating LLM-generated text" /><published>2024-04-03T00:00:00+00:00</published><updated>2024-04-03T00:00:00+00:00</updated><id>http://www.harsh-seth.com/llm-evaluation</id><content type="html" xml:base="http://www.harsh-seth.com/llm-evaluation/"><![CDATA[<h3 id="key-ideas">Key Ideas</h3>
<ul>
  <li>Human judgement can be learnt by LLMs to serve as replacements for text quality evaluation tasks</li>
  <li>However this runs into problems of generalization and LLM specific bias</li>
</ul>

<h3 id="notes">Notes</h3>
<h4 id="fixed-scope-task-evaluation---human-evaluation">Fixed Scope Task Evaluation - Human Evaluation</h4>
<p>Generally indicated by scores based on subjective measures on a 5 point scale</p>
<ul>
  <li>Adequacy: Is the meaning correct?</li>
  <li>Fluent: Is it easy to read?</li>
</ul>

<p>Cons</p>
<ul>
  <li>Subjective!</li>
  <li>Difficult to calibrate</li>
  <li>Expensive and time consuming</li>
</ul>

<h4 id="fixed-scope-task-evaluation---automatic-metrics">Fixed Scope Task Evaluation - Automatic Metrics</h4>
<p><strong>Precision, Recall and F-Scores</strong></p>
<ul>
  <li>Use the precision (<code class="language-plaintext highlighter-rouge">common words in y_cap and y_pred/y_pred length</code>), the recall (<code class="language-plaintext highlighter-rouge">common words in y_cap and y_pred/y_cap length</code>) and F-Score (<code class="language-plaintext highlighter-rouge">precision * recall/((precision + recall)/2)</code>)</li>
</ul>

<p>Pros</p>
<ul>
  <li>Quick, easy and cheap</li>
</ul>

<p>Cons</p>
<ul>
  <li>Does not handle synonyms</li>
  <li>Does not handle order</li>
  <li>We may not at always have a reference</li>
  <li>Does not take into account meaning of the constructed sentence</li>
</ul>

<p><strong>Bilingual Evaluation Understudy (BLEU)</strong></p>
<ul>
  <li>Based off n-gram overlap between y_pred and y_cap</li>
  <li>Computes precision for n-grams of size 1 to 4.</li>
  <li>Has a brevity penalty</li>
  <li>`BLEU = min(1, output-length/ref-length) (PROD_{i, 1:4} precision_i)^1/4</li>
  <li>Allows use of multiple references, and we can match against all refs, so recall will not be very useful
    <ul>
      <li>Closest reference length is used</li>
    </ul>
  </li>
</ul>

<p>Cons</p>
<ul>
  <li>All words/n-grams are treated equally</li>
  <li>Human translations also score lower than machines</li>
  <li>The score does give any indication, cannot be used comparatively with other test strings</li>
</ul>

<p><strong>ROUGE</strong></p>
<ul>
  <li>Based off n-gram overlap between y_cap and y_pred</li>
  <li>Computes recall for n-grams of size 1 to 4</li>
  <li>Used for text summarization systems</li>
</ul>

<p>Cons</p>
<ul>
  <li>All words/n-grams are treated equally</li>
  <li>Human translations also score lower than machines</li>
  <li>The score does not matter</li>
  <li>Can game the score by just replicating the string n-times</li>
</ul>

<h4 id="fixed-scope-task-evaluation---learned-metrics">Fixed Scope Task Evaluation - Learned Metrics</h4>
<ul>
  <li>Finetune a model directly on scores from human evaluations to perform evaluation</li>
</ul>

<p><strong>BLEURT</strong></p>
<ul>
  <li>Finetune a pretrained BERT model on synthetic tasks with perturbed data and automatic metrics. Then finetune again with human evaluation metrics.</li>
</ul>

<p><strong>COMET</strong></p>
<ul>
  <li>SOTA LLM-as-a-judge for similarity scores</li>
</ul>

<h4 id="open-ended-task-evaluations---human-evaluations">Open Ended Task Evaluations - Human Evaluations</h4>
<p>Challenges when using humans</p>
<ul>
  <li>Subjective</li>
  <li>Needs experts to evaluate</li>
  <li>Annotators might not do a good job</li>
</ul>

<p><strong>Long Eval</strong></p>
<ol>
  <li>Split long form text into atomic claims</li>
  <li>Get each claim verified for support from the long form text
    <ul>
      <li>(optional) Send only subset of claims to each individual annotator to reduce workload</li>
    </ul>
  </li>
  <li>Calc %age of facts being supported by the text</li>
</ol>

<h4 id="open-ended-task-evaluations---llm-evaluations">Open Ended Task Evaluations - LLM Evaluations</h4>
<p><strong>GPTEval</strong></p>
<ul>
  <li>Create a prefix/context for LLM with instructions on how to evaluate and to give a score</li>
  <li>Aggregate scores w/ probabilities to calculate evaluation</li>
</ul>

<p><strong>Win Rate</strong></p>
<ul>
  <li>Ask LLM to select one of two outputs and use that as a score</li>
  <li>One of the two outputs should ideally come from the same base model to ensure fair comparison between two models</li>
  <li>Caveat: The selected annotator model might prefer a specific class of model (responses created by OpenAI models may be preferred by OpenAI models)</li>
</ul>

<p><strong>Decompose, Eval and Aggregate</strong></p>
<ol>
  <li>Use an LLM to break down a text into claims</li>
  <li>Verify each claim with an LLM + an evidence source</li>
  <li>Calculate the retrieval score</li>
</ol>

<h3 id="resources">Resources</h3>
<ul>
  <li>Dr. Mohit Iyyer’s UMass CS685 S24 <a href="https://www.youtube.com/watch?v=Um9gf-U0o1Q">Lecture #15</a></li>
  <li><a href="chat.lmsys.org">Chatbot Arena</a></li>
</ul>]]></content><author><name>Harsh Seth</name></author><category term="NLP" /><category term="notes" /><summary type="html"><![CDATA[As LLM capabilities improve, the sophistication of generative text evaluation methods also needs to increase. We look at some of the most common methods used thus far, both human annotated and automated]]></summary></entry><entry><title type="html">AI - Adversarial Search</title><link href="http://www.harsh-seth.com/adversarial-search/" rel="alternate" type="text/html" title="AI - Adversarial Search" /><published>2024-04-01T00:00:00+00:00</published><updated>2024-04-01T00:00:00+00:00</updated><id>http://www.harsh-seth.com/adversarial-search</id><content type="html" xml:base="http://www.harsh-seth.com/adversarial-search/"><![CDATA[<h3 id="in-a-nutshell">In a nutshell</h3>

<h3 id="key-ideas">Key Ideas</h3>

<h3 id="notes">Notes</h3>
<h4 id="topic-1">Topic 1</h4>

<h4 id="misc">Misc</h4>

<h4 id="needs-exploration">Needs Exploration</h4>

<h3 id="resources">Resources</h3>]]></content><author><name>Harsh Seth</name></author><category term="Artificial Intelligence" /><category term="wip" /><category term="notes" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">AI - Constraint Satisfaction Problems</title><link href="http://www.harsh-seth.com/constraint-satisfaction-problems/" rel="alternate" type="text/html" title="AI - Constraint Satisfaction Problems" /><published>2024-04-01T00:00:00+00:00</published><updated>2024-04-01T00:00:00+00:00</updated><id>http://www.harsh-seth.com/constraint-satisfaction-problems</id><content type="html" xml:base="http://www.harsh-seth.com/constraint-satisfaction-problems/"><![CDATA[<h3 id="in-a-nutshell">In a nutshell</h3>

<h3 id="key-ideas">Key Ideas</h3>

<h3 id="notes">Notes</h3>
<h4 id="constraint-satisfaction-problems-csps">Constraint Satisfaction Problems (CSPs)</h4>
<p><strong>Basic Properties</strong></p>
<ul>
  <li>Variables (<code class="language-plaintext highlighter-rouge">X = X_1, X_2, X_3, ..., X_n</code>) are all linear rational values
    <ul>
      <li><code class="language-plaintext highlighter-rouge">X_i</code> belongs to domain <code class="language-plaintext highlighter-rouge">D_i</code></li>
    </ul>
  </li>
  <li>Constraints (<code class="language-plaintext highlighter-rouge">C</code>) are all linear
    <ul>
      <li>Constraints list which variables are involved and how</li>
    </ul>
  </li>
  <li>Effective solvers reduce search space significantly and quickly w/ use of variable dependencies</li>
  <li>Objective: Find a legal assignment of values (<code class="language-plaintext highlighter-rouge">y = y_1, y_2, y_3, ..., y_n</code>) to variables such that all constraints are satisfied
    <ul>
      <li><strong>Complete</strong>: All variables are set</li>
      <li><strong>Consistent</strong>: No constraint is violated</li>
    </ul>
  </li>
  <li>States are partial assignments of the variables</li>
  <li>Can be encoded as a <strong>Constraint Graph</strong> where Nodes are variables, Edges are constraints</li>
</ul>

<p><strong>An example</strong>
Variables: <code class="language-plaintext highlighter-rouge">X = {WA, NT, Q, NSW, V, SA, T}</code>
Domains: <code class="language-plaintext highlighter-rouge">D_i = {R, G, B}</code>
Constraints: If <code class="language-plaintext highlighter-rouge">(X_i, X_j)</code> in edges (<code class="language-plaintext highlighter-rouge">E</code>), then <code class="language-plaintext highlighter-rouge">color(X_i) =/= color(X_j)</code></p>

<p><img src="../../../images/posts/graph-coloring.png" alt="Graph coloring of the territories in Australia, with no adjacent territory sharing the same color" />
<em>Graph coloring of the territories in Australia, with no adjacent territory sharing the same color</em></p>

<p><strong>Variations</strong></p>
<ul>
  <li>Variable type
    <ul>
      <li>Discrete
        <ul>
          <li>Generally considered computationally intractable problems</li>
        </ul>
      </li>
      <li>Continuous
        <ul>
          <li>Generally considered easier</li>
          <li>linear programming problems are solvable in polynomial time</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Domain type
    <ul>
      <li>Finite Domains: e.g. 8-queens</li>
      <li>Infinite Domains: e.g. Job-Shop Scheduling</li>
    </ul>
  </li>
  <li>Constraint type
    <ul>
      <li>Unary: One variable e.g. SA =/= G</li>
      <li>Binary: Two variables e.g. SA =/= WA</li>
      <li>Global (higher order): 3 or more variables e.g. X_1 + X_2 - 4*X_7 &lt;= 15</li>
    </ul>
  </li>
</ul>

<h4 id="backtracking-search-for-cps">Backtracking Search for CPS</h4>

<h4 id="local-consistency">Local Consistency</h4>

<h4 id="local-search">Local Search</h4>

<h4 id="misc">Misc</h4>

<h4 id="needs-exploration">Needs Exploration</h4>

<h3 id="resources">Resources</h3>]]></content><author><name>Harsh Seth</name></author><category term="Artificial Intelligence" /><category term="wip" /><category term="notes" /><summary type="html"><![CDATA[]]></summary></entry></feed>