前往 Gemma Cookbook 存放區取得產生與調整範例！瞭解詳情

本頁面由 Cloud Translation API 翻譯而成。

Gemma C++ 教學課程 (gemma.cpp)

gemma.cpp 是 Gemma 模型的輕量級純 C++ 推論執行階段實作。

如要進一步瞭解 Gemma，請參閱模型資訊卡。模型權重 (包括 gemma.cpp 專屬構件) 可在 Kaggle 上取得。

這項計畫的適用對象

新式 LLM 推論引擎是複雜的系統，通常具備專屬功能，可超越傳統神經網路執行階段。這也為研究和創新帶來機會，可透過高階演算法和低階運算的共同設計，不過，以部署為導向的 C++ 推論執行階段並非為實驗而設計，而以 Python 為中心的機器學習研究架構則會透過編譯來抽象化低階運算。

gemma.cpp 提供 Gemma 2B 和 7B 模型的簡化實作方式，著重於簡單和直接，而非完全通用。這項功能的靈感來自垂直整合的 C++ 模型實作，例如 ggml、llama.c 和 llama.rs。

gemma.cpp 適用於實驗和研究用途，特別是透過 Google Highway 程式庫，探索 CPU 推論和推論演算法的設計空間，並使用可攜式 SIMD。這個程式庫旨在以最少的依附元件，簡單地嵌入其他專案，並且透過約 2,000 個 LoC 的核心實作項目 (以及約 4,000 個 LoC 的支援公用程式)，輕鬆修改。

如要針對實際工作環境部署邊緣裝置，我們建議您使用標準部署途徑，並採用成熟的 Python 架構，例如 JAX、Keras、PyTorch 和 Transformer (所有模型變化版本)。

歡迎各式各樣的社群貢獻。本專案遵循 Google 開放原始碼社群規範。

快速入門導覽課程

如要完成本快速入門，您必須複製或下載 gemma.cpp。

系統需求

開始前，請先安裝以下項目：

CMake
Clang C++ 編譯器
tar：用於從 Kaggle 擷取封存檔案。

步驟 1：從 Kaggle 取得模型權重和 tokenizer

前往 Kaggle 上的 gemma 模型頁面，然後選取「Model Variations

Gemma C++ . On this tab, the「Variation」下拉式選單包含下列選項。請注意，bfloat16 權重選項的準確度較高，而 8 位元切換浮點權重可加快推論速度。

2B 指令調整 (it) 和預先訓練 (pt) 模型：

模型名稱	說明
`2b-it`	20 億參數指令調整模型，bfloat16
`2b-it-sfp`	20 億參數指令調整模型，8 位元切換浮點
`2b-pt`	20 億參數預先訓練模型，bfloat16
`2b-pt-sfp`	20 億個參數的預先訓練模型，8 位元切換浮點

7B 指令調整 (it) 和預先訓練 (pt) 模型：

模型名稱	說明
`7b-it`	70 億參數指令調整模型，bfloat16
`7b-it-sfp`	70 億參數指令調整模型，8 位元切換浮點
`7b-pt`	70 億參數預先訓練模型，bfloat16
`7b-pt-sfp`	70 億參數預先訓練模型，8 位元切換浮點

注意：建議您先使用 2b-it-sfp 來啟用服務。

步驟 2：擷取檔案

填妥同意書後，下載作業應會繼續擷取 tar 封存檔案 archive.tar.gz。從 archive.tar.gz 中解壓縮檔案 (這可能需要幾分鐘的時間)：

tar -xf archive.tar.gz

這應該會產生包含模型權重 (例如 2b-it-sfp.sbs) 和分詞器檔案 (tokenizer.spm) 的檔案。您可能需要將這些檔案移至方便的目錄位置 (例如這個存放區中的 build/ 目錄)。

步驟 3：建構

建構系統會使用 CMake。如要建構 gemma 推論執行階段，請建立建構目錄，並使用頂層專案目錄中的 cmake 產生建構檔案：

(cd build && cmake ..)

接著執行 make，建構可執行的 ./gemma：

cd build make -j [number of parallel threads to use] gemma

例如：make -j 8 gemma。如果成功，您現在應該會在 build/ 目錄中看到 gemma 可執行檔。

步驟 4：執行

您現在可以從 build/ 目錄內執行 gemma。

gemma 的必要引數如下：

引數	說明	範例值
`--model`	模型類型。	`2b-it`、`2b-pt`、`7b-it`、`7b-pt`、... (請參閱上述說明)
`--compressed_weights`	壓縮的重量檔案。	`2b-it-sfp.sbs`... (請參閱上方說明)
`--tokenizer`	代碼化工具檔案名稱。	`tokenizer.spm`

gemma 會以以下方式叫用：

./gemma \
--tokenizer [tokenizer file] \
--compressed_weights [compressed weights file] \
--model [2b-it or 2b-pt or 7b-it or 7b-pt]

以下設定的叫用範例：

壓縮權重檔案 2b-it-sfp.sbs (2B 指令調整模型，8 位元切換浮點值)。
分詞器檔案 tokenizer.spm。

./gemma \
--tokenizer tokenizer.spm \
--compressed_weights 2b-it-sfp.sbs \
--model 2b-it

用量

gemma 有不同的用法模式，由詳細度標記控制。

所有使用模式都是互動式的，會在輸入換行符號時觸發文字產生功能。

詳細程度	使用模式	詳細資料
`--verbosity 0`	最低	僅顯示產生的輸出內容。適合作為 CLI 工具。
`--verbosity 1`	預設	標準的使用者端終端機使用者介面。
`--verbosity 2`	詳細	顯示其他開發人員和偵錯資訊。

互動式終端機應用程式

根據預設，詳細程度會設為 1，在使用必要引數呼叫 gemma 時，會顯示以終端機為基礎的互動式介面：

$ ./gemma [...]
  __ _  ___ _ __ ___  _ __ ___   __ _   ___ _ __  _ __
 / _` |/ _ \ '_ ` _ \| '_ ` _ \ / _` | / __| '_ \| '_ \
| (_| |  __/ | | | | | | | | | | (_| || (__| |_) | |_) |
 \__, |\___|_| |_| |_|_| |_| |_|\__,_(_)___| .__/| .__/
  __/ |                                    | |   | |
 |___/                                     |_|   |_|

tokenizer                     : tokenizer.spm
compressed_weights            : 2b-it-sfp.sbs
model                         : 2b-it
weights                       : [no path specified]
max_tokens                    : 3072
max_generated_tokens          : 2048

*Usage*
  Enter an instruction and press enter (%Q quits).

*Examples*

-   Write an email to grandma thanking her for the cookies.
-   What are some historical attractions to visit around Massachusetts?
-   Compute the nth fibonacci number in javascript.
-   Write a standup comedy bit about WebGPU programming.

> What are some outdoorsy places to visit around Boston?

[ Reading prompt ] .....................

**Boston Harbor and Islands:**

*   **Boston Harbor Islands National and State Park:** Explore pristine beaches, wildlife, and maritime history.
*   **Charles River Esplanade:** Enjoy scenic views of the harbor and city skyline.
*   **Boston Harbor Cruise Company:** Take a relaxing harbor cruise and admire the city from a different perspective.
*   **Seaport Village:** Visit a charming waterfront area with shops, restaurants, and a seaport museum.

**Forest and Nature:**

*   **Forest Park:** Hike through a scenic forest with diverse wildlife.
*   **Quabbin Reservoir:** Enjoy boating, fishing, and hiking in a scenic setting.
*   **Mount Forest:** Explore a mountain with breathtaking views of the city and surrounding landscape.

...

做為指令列工具的用法

如要將 gemma 可執行檔用作指令列工具，建議您為 gemma.cpp 建立別名，並完整指定引數：

alias gemma2b="~/gemma.cpp/build/gemma -- --tokenizer ~/gemma.cpp/build/tokenizer.spm --compressed_weights ~/gemma.cpp/build/2b-it-sfp.sbs --model 2b-it --verbosity 0"

將上述路徑替換為您下載的模型和剖析器路徑。

以下是使用截斷輸入內容 (使用上述定義的 gemma2b 別名) 提示 gemma 的範例：

cat configs.h | tail -35 | tr '\n' ' ' | xargs -0 echo "What does this C++ code do: " | gemma2b

注意：gemma.cpp 的 CLI 用法屬於實驗性質，應考量上下文長度限制。

上述指令的輸出內容應如下所示：

$ cat configs.h | tail -35 | tr '\n' ' ' | xargs -0 echo "What does this C++ code do: " | gemma2b
[ Reading prompt ] ......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
The code defines two C++ structs, `ConfigGemma7B` and `ConfigGemma2B`, which are used for configuring a deep learning model.

**ConfigGemma7B**:

*   `seq_len`: Stores the length of the sequence to be processed. It's set to 7168.
*   `vocab_size`: Stores the size of the vocabulary, which is 256128.
*   `n_layers`: Number of layers in the deep learning model. It's set to 28.
*   `dim_model`: Dimension of the model's internal representation. It's set to 3072.
*   `dim_ffw_hidden`: Dimension of the feedforward and recurrent layers' hidden representations. It's set to 16 * 3072 / 2.

**ConfigGemma2B**:

*   `seq_len`: Stores the length of the sequence to be processed. It's also set to 7168.
*   `vocab_size`: Size of the vocabulary, which is 256128.
*   `n_layers`: Number of layers in the deep learning model. It's set to 18.
*   `dim_model`: Dimension of the model's internal representation. It's set to 2048.
*   `dim_ffw_hidden`: Dimension of the feedforward and recurrent layers' hidden representations. It's set to 16 * 2048 / 2.

These structs are used to configure a deep learning model with specific parameters for either Gemma7B or Gemma2B architecture.