EP187:为什么 DeepSeek-OCR 如此重要? - EP187: Why is DeepSeek-OCR such a BIG DEAL?

Source: ByteByteGo

👋 Goodbye low test coverage and slow QA cycles (Sponsored)

Bugs sneak out when less than 80% of user flows are tested before shipping. However, getting that kind of coverage (and staying there) is hard and pricey for any team.

QA Wolf’s AI-native solution provides high-volume, high-speed test coverage for web and mobile apps, reducing your organization’s QA cycle to minutes.

They can get you:

The benefit? No more manual E2E testing. No more slow QA cycles. No more bugs reaching production.

With QA Wolf, Drata’s team of engineers achieved 4x more test cases and 86% faster QA cycles.

⭐ Rated 4.8/5 on G2

Schedule a demo to learn more


This week’s system design refresher:

  • 10 Key Data Structures We Use Every Day

  • 🚀 New Launch: Become an AI Engineer | Learn by Doing | Cohort 2!

  • IP Address Cheat Sheet Every Engineer Should Know

  • Which Protocols Run on TCP and UDP

  • Why is DeepSeek-OCR such a BIG DEAL?

  • SPONSOR US


10 Key Data Structures We Use Every Day

Image
  • list: keep your Twitter feeds

  • stack: support undo/redo of the word editor

  • queue: keep printer jobs, or send user actions in-game

  • hash table: cashing systems

  • Array: math operations

  • heap: task scheduling

  • tree: keep the HTML document, or for AI decision

  • suffix tree: for searching string in a document

  • graph: for tracking friendship, or path finding

  • r-tree: for finding the nearest neighbor

  • vertex buffer: for sending data to GPU for rendering

Over to you: Which additional data structures have we overlooked?


🚀 New Launch: Become an AI Engineer | Learn by Doing | Cohort 2!

After the incredible success of our first cohort (almost 500 people attended), I’m thrilled to announce the launch of Cohort 2 of Become an AI Engineer!

Image

This is not just another course about AI frameworks and tools. Our goal is to help engineers build the foundation and end to end skill set needed to thrive as AI engineers.

Here’s what makes this cohort special:

  • Learn by doing: Build real world AI applications, not just by watching videos.

  • Structured, systematic learning path: Follow a carefully designed curriculum that takes you step by step, from fundamentals to advanced topics.

  • Live feedback and mentorship: Get direct feedback from instructors and peers.

  • Community driven: Learning alone is hard. Learning with a community is easy!

We are focused on skill building, not just theory or passive learning. Our goal is for every participant to walk away with a strong foundation for building AI systems.

If you missed Cohort 1, now’s your chance to join us for Cohort 2.

Check it out here


IP Address Cheat Sheet Every Engineer Should Know

Image

Which Protocols Run on TCP and UDP

Every message sent over the internet has two layers of communication, one that carries the data (transport) and one that defines what the data means (application). TCP and UDP sit at the transport layer, but they serve completely different purposes.

Image

TCP is connection-oriented. It guarantees delivery, maintains order, and handles retransmission when packets get lost.

  • HTTP runs on TCP. The browser opens a TCP connection, sends an HTTP request, waits for the HTTP response, and closes the connection (or keeps it alive for subsequent requests). Every web page you have ever loaded used this pattern.

  • HTTPS adds TLS over TCP. The TCP connection happens first. Then comes the TLS handshake with public key exchange, session key negotiation, and finally encrypted data transfer.

  • SMTP uses TCP for email. Messages flow from sender to SMTP server to receiver over TCP connections. Email can’t afford to lose data mid-transmission, so TCP’s reliability is essential.

UDP is connectionless. No handshake. No guaranteed delivery. No order preservation. Just fire data requests and responses into the network and hope they arrive. Sounds chaotic, but it’s fast.

  • HTTP/3 runs over QUIC, which uses UDP. This seems backwards until you realize QUIC reimplements the reliability features of TCP inside UDP, but with better performance. Multiple streams over one connection. Built-in TLS 1.3. Faster connection establishment. The numbered streams in the diagram show parallel data flows that don’t block each other.

Over to you: What tools do you use to analyze transport layer performance?


Why is DeepSeek-OCR such a BIG DEAL?

Existing LLMs struggle with long inputs because they can only handle a fixed number of tokens, known as the context window, and attention cost grows quickly as inputs get longer.

DeepSeek-OCR takes a new approach.

Image

Instead of sending long context directly to an LLM, it turns it into an image, compresses that image into visual tokens, and then passes those tokens to the LLM.

Fewer tokens lead to lower computational cost from attention and a larger effective context window. This makes chatbots and document models more capable and efficient.

How is DeepSeek-OCR built? The system has two main parts:

  1. Encoder: It processes an image of text, extracts the visual features, and compresses them into a small number of vision tokens.

  2. Decoder: A Mixture of Experts language model that reads those tokens and generate text one token at a time, similar to a standard decoder-only transformer.

When to use it?

DeepSeek-OCR shows that text can be efficiently compressed using visual representations.

It is especially useful for handling very long documents that exceed standard context limits. You can use it for context compression, standard OCR tasks, or deep parsing, such as converting tables and complex layouts into text.

Over to you: What do you think about using visual tokens to handle long-context problems in LLMs? Could this become the next standard for large models?


SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing sponsorship@bytebytego.com.