OCR HTTP API 文档（基于systemcmd调用版本）

基础信息

基础URL: http://ocr.dtns.top/api
调用方式: 通过systemcmd的/systemcmd/run-sync接口执行wget命令调用
响应格式: JSON
字符编码: UTF-8
默认端口: 1033（本地服务器）
支持语言: eng（英语）、chi_sim（简体中文）

调用方式说明

所有OCR API都通过systemcmd的/systemcmd/run-sync接口执行wget命令来调用：

调用模板:

wget -qO- --timeout=30 --tries=1 "http://ocr.dtns.top/api/[端点路径]?[参数]"

POST请求模板:

wget -qO- --timeout=30 --tries=1 --post-data="[参数]" "http://ocr.dtns.top/api/[端点路径]"

文件上传模板:

wget -qO- --timeout=30 --tries=1 --post-file="[文件路径]" "http://ocr.dtns.top/api/[端点路径]"

通过systemcmd调用:

{
  "cmd": "wget -qO- --timeout=30 --tries=1 'http://ocr.dtns.top/api/[端点路径]?[参数]'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

API调用示例

1. 系统状态

1.1 健康检查

端点: /health 方法: GET 描述: 检查OCR服务器运行状态

systemcmd调用示例:

{
  "cmd": "wget -qO- --timeout=30 --tries=1 'http://ocr.dtns.top/api/health'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

响应示例:

{
  "status": "ok",
  "initialized": true,
  "languages": ["eng", "chi_sim"],
  "uptime": 123.45
}

1.2 服务器信息

端点: /info 方法: GET 描述: 获取服务器详细信息

systemcmd调用示例:

{
  "cmd": "wget -qO- --timeout=30 --tries=1 'http://ocr.dtns.top/api/info'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

响应示例:

{
  "name": "Tesseract OCR Server",
  "version": "1.0.0",
  "tesseractVersion": "4.0.2",
  "languages": ["eng", "chi_sim"],
  "maxFileSize": 10485760,
  "initialized": true
}

2. OCR识别接口

2.1 单张图片识别（文件上传）

端点: /ocr 方法: POST 描述: 上传图片文件进行OCR识别

参数:

image: 图片文件（multipart/form-data）
lang: 语言（默认：eng+chi_sim）
psm: 页面分割模式（1-13，默认：3）
oem: OCR引擎模式（0-3，默认：3）
dpi: DPI设置（默认：300）
preserve_interword_spaces: 保留单词间空格（默认：1）
tessedit_char_whitelist: 字符白名单
tessedit_char_blacklist: 字符黑名单

systemcmd调用示例:

{
  "cmd": "wget -qO- --timeout=30 --tries=1 --post-file='/tmp/test.png' 'http://ocr.dtns.top/api/ocr?lang=chi_sim&psm=3'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

使用curl的systemcmd调用:

{
  "cmd": "curl -X POST -F 'image=@/tmp/test.png' -F 'lang=chi_sim' 'http://ocr.dtns.top/api/ocr'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

响应示例:

{
  "success": true,
  "text": "成功识别了示例内容\n这是示例内容\n示例内容\n",
  "confidence": 85,
  "processingTime": "4898ms",
  "language": "chi_sim",
  "details": {
    "blocks": 1,
    "paragraphs": 1,
    "lines": 21,
    "words": 185
  },
  "hocr": "<div class='ocr_page' id='page_1' title='image \"unknown\"; bbox 0 0 1020 2250; ppageno 0; scan_res 70 70'>\n <div class='ocr_carea' id='block_1_1' title=\"bbox 45 42 1020 2227\">\n  <p class='ocr_par' id='par_1_1' lang='chi_sim' title=\"bbox 45 42 1020 2227\">\n   <!-- 详细HOCR结构（已简化） -->\n  </p>\n </div>\n</div>",
  "tsv": "1\t1\t0\t0\t0\t0\t0\t0\t1020\t2250\t-1\t\n2\t1\t1\t0\t0\t0\t45\t42\t975\t2185\t-1\t\n<!-- 详细TSV数据（已简化） -->",
  "box": null
}

OCR API返回字段说明

字段名	类型	说明	示例值
`success`	boolean	识别是否成功	`true`
`text`	string	识别出的文本内容	`"11:30 回回口国 @ Q 宗 l 匿小..."`
`confidence`	integer	整体识别置信度（0-100）	`85`
`processingTime`	string	处理耗时	`"4898ms"`
`language`	string	识别语言代码	`"chi_sim"`（简体中文）
`details`	object	识别详情统计
`details.blocks`	integer	文本块数量	`1`
`details.paragraphs`	integer	段落数量	`1`
`details.lines`	integer	行数	`21`
`details.words`	integer	单词/字符数	`185`
`hocr`	string	HOCR格式的结构化识别结果（HTML格式）	`<div class='ocr_page'...>`
`tsv`	string	TSV格式的结构化识别结果	`"1\t1\t0\t0\t0\t0\t0\t0\t1020\t2250\t-1\t\n..."`
`box`	null/array	边界框信息（当前为null）	`null`

注意事项

文本格式：text字段中的文本保留了原始换行符和空格
置信度：confidence字段表示整体识别准确度，值越高表示识别越可靠
语言代码：language字段使用Tesseract标准语言代码
结构化数据：hocr和tsv字段提供了详细的布局和位置信息
处理时间：processingTime包含图像处理到文本输出的总时间

2.2 Base64图片识别

端点: /ocr/base64 方法: POST 描述: 使用Base64编码的图片进行OCR识别

参数:

image: Base64编码的图片数据
lang: 识别语言（默认：eng+chi_sim）
其他OCR参数同2.1

systemcmd调用示例:

{
  "cmd": "wget -qO- --timeout=30 --tries=1 --post-data='image=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8/5+hHgAHggJ/PchI7wAAAABJRU5ErkJggg==&lang=eng' 'http://ocr.dtns.top/api/ocr/base64'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

2.3 URL图片识别

端点: /ocr/url 方法: POST 描述: 通过URL获取图片进行OCR识别

参数:

url: 图片URL地址
lang: 识别语言（默认：eng+chi_sim）
其他OCR参数同2.1

systemcmd调用示例:

{
  "cmd": "wget -qO- --timeout=30 --tries=1 --post-data='url=https://example.com/image.png&lang=eng+chi_sim' 'http://ocr.dtns.top/api/ocr/url'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

2.4 批量图片识别

端点: /ocr/batch 方法: POST 描述: 批量上传多张图片进行OCR识别（最多10张）

参数:

images: 多张图片文件（multipart/form-data）
lang: 识别语言（默认：eng+chi_sim）
其他OCR参数同2.1

systemcmd调用示例:

{
  "cmd": "curl -X POST -F 'images=@/tmp/image1.png' -F 'images=@/tmp/image2.png' -F 'lang=chi_sim' 'http://ocr.dtns.top/api/ocr/batch'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 60000
}

响应示例:

{
  "success": true,
  "total": 2,
  "results": [
    {
      "filename": "image1.png",
      "success": true,
      "text": "第一张图片的文本",
      "confidence": 92.3,
      "processingTime": "567ms"
    },
    {
      "filename": "image2.png",
      "success": true,
      "text": "第二张图片的文本",
      "confidence": 88.7,
      "processingTime": "623ms"
    }
  ]
}

3. 文件上传接口

3.1 上传目录访问

端点: /uploads/[文件名] 方法: GET 描述: 访问已上传的文件

systemcmd调用示例:

{
  "cmd": "wget -qO- --timeout=30 --tries=1 'http://ocr.dtns.top/api/uploads/test.png' -O /tmp/downloaded.png",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

系统命令执行API参考

systemcmd/run-sync 接口说明

HTTP方法	路由路径	描述	请求参数	响应格式	状态码
ALL	`/systemcmd/run-sync`	同步执行系统命令（一次性返回结果）	`cmd`: 命令字符串 `cwd`: 工作目录 `ostype`: 操作系统类型 `timeout`: 超时时间（毫秒，默认300000）	`{ret: boolean, msg: string, data: {output: string, error: string, exitCode: number, success: boolean}}`	200

请求示例:

{
  "cmd": "wget -qO- --timeout=30 --tries=1 'http://ocr.dtns.top/api/health'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

响应示例:

{
  "ret": true,
  "msg": "success",
  "data": {
    "output": "{\"status\":\"ok\",\"initialized\":true,\"languages\":[\"eng\",\"chi_sim\"],\"uptime\":123.45}",
    "error": "",
    "exitCode": 0,
    "success": true
  }
}

调用方式总结

所有OCR API都通过以下方式调用：

GET请求（健康检查、服务器信息）:

{
  "cmd": "wget -qO- --timeout=30 --tries=1 'http://ocr.dtns.top/api/[端点路径]'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

POST请求（Base64、URL识别）:

{
  "cmd": "wget -qO- --timeout=30 --tries=1 --post-data='[参数]' 'http://ocr.dtns.top/api/[端点路径]'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

文件上传请求（推荐使用curl）:

{
  "cmd": "curl -X POST -F 'image=@/tmp/test.png' -F 'lang=chi_sim' 'http://ocr.dtns.top/api/ocr'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

批量文件上传:

{
  "cmd": "curl -X POST -F 'images=@/tmp/image1.png' -F 'images=@/tmp/image2.png' -F 'lang=eng+chi_sim' 'http://ocr.dtns.top/api/ocr/batch'",
  "wd": "/tmp",
  "ostype": "linux",
  "timeout": 60000
}

使用示例

示例1：检查OCR服务器状态

{
  "cmd": "wget -qO- --timeout=30 --tries=1 'http://ocr.dtns.top/api/health'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

示例2：识别本地图片文件

{
  "cmd": "curl -X POST -F 'image=@/tmp/document.png' -F 'lang=chi_sim' -F 'psm=6' 'http://ocr.dtns.top/api/ocr'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

示例3：识别网络图片

{
  "cmd": "wget -qO- --timeout=30 --tries=1 --post-data='url=https://example.com/ocr-image.jpg&lang=eng&dpi=300' 'http://ocr.dtns.top/api/ocr/url'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

示例4：批量识别多张图片

{
  "cmd": "curl -X POST -F 'images=@/tmp/receipt1.jpg' -F 'images=@/tmp/receipt2.jpg' -F 'images=@/tmp/receipt3.jpg' -F 'lang=eng' 'http://ocr.dtns.top/api/ocr/batch'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 90000
}

示例5：使用Base64识别

{
  "cmd": "wget -qO- --timeout=30 --tries=1 --post-data='image=$(base64 -w0 /tmp/small.png)&lang=eng+chi_sim' 'http://ocr.dtns.top/api/ocr/base64'",
  "cwd": "/tmp",
  "ostype": "linux",
  "timeout": 30000
}

高级参数配置

Tesseract参数说明

页面分割模式（psm）:
- 0: 方向和脚本检测（OSD）
- 1: 自动页面分割（OSD）
- 2: 自动页面分割（无OSD）
- 3: 全自动页面分割（默认）
- 4: 单列可变大小文本
- 5: 垂直对齐文本
- 6: 单块统一文本
- 7: 单行文本
- 8: 单个单词
- 9: 单个单词（圆形）
- 10: 单个字符
- 11: 稀疏文本
- 12: 稀疏文本（OSD）
- 13: 原始行
OCR引擎模式（oem）:
- 0: 仅限传统引擎
- 1: 仅限神经网络LSTM引擎
- 2: 传统+LSTM引擎
- 3: 默认（基于可用性）
语言代码:
- eng: 英语
- chi_sim: 简体中文
- chi_tra: 繁体中文
- jpn: 日语
- kor: 韩语
- 支持多语言组合：eng+chi_sim

注意事项

文件大小限制: 默认最大10MB，可通过maxFileSize参数调整
支持格式: JPEG、JPG、PNG、GIF、BMP、TIFF、WEBP
超时设置: 建议设置适当的超时时间（单张30秒，批量60-90秒）
语言模型: 首次使用某种语言时可能需要下载模型，会有额外延迟
错误处理: 检查返回的success字段和error信息
工作目录: 建议使用/tmp作为工作目录，确保有写入权限
网络访问: 确保服务器可以访问外部URL（用于URL识别）
内存使用: 批量处理时注意内存使用，建议分批处理

错误代码

状态码	错误信息	说明
400	请上传图片文件	未提供图片文件
400	只支持图片文件 (jpeg, jpg, png, gif, bmp, tiff, webp)	文件格式不支持
400	请提供base64编码的图片	Base64接口缺少参数
400	请提供图片URL	URL接口缺少参数
503	服务器正在初始化，请稍后重试	服务器未完成初始化
500	服务器内部错误	服务器处理错误

元数据

{
  "sourceFile": "server.js",
  "integrationWith": "systemcmd-skill.md",
  "apiEndpointCount": 6,
  "callMethod": "wget/curl via systemcmd/run-sync",
  "supportedLanguages": ["eng", "chi_sim"],
  "maxFileSize": "10MB",
  "maxBatchSize": 10,
  "hasHttpApis": true,
  "ocrEngine": "Tesseract.js v4.0.2",
  "serverPort": 1033
}

本文档基于OCR服务器代码改造，适配systemcmd调用方式

🧩 OCR识别

🗂️ 技能一览

OCR识别

OCR HTTP API 文档（基于systemcmd调用版本）

基础信息

调用方式说明

API调用示例

1. 系统状态

1.1 健康检查

1.2 服务器信息

2. OCR识别接口

2.1 单张图片识别（文件上传）

2.2 Base64图片识别

2.3 URL图片识别

2.4 批量图片识别

3. 文件上传接口

3.1 上传目录访问

系统命令执行API参考

systemcmd/run-sync 接口说明

调用方式总结

使用示例

示例1：检查OCR服务器状态

示例2：识别本地图片文件

示例3：识别网络图片

示例4：批量识别多张图片

示例5：使用Base64识别

高级参数配置

Tesseract参数说明

注意事项

错误代码

元数据