Python结合OCR以及Opencv提取并且实时翻译图片内容

本文讲述基于python的一些模块进行图片内容的提取图片内容的翻译。本文主要进行记录一些在实践中的构想以及遇到的问题,并且记录上一些实现的代码,因为技术含量实在是不怎么高的,不过若是自己玩玩,参加那种水比赛也许能获得个不错的名次,或者是应付个学生报告什么的…

由于时间关系,本文多数只是起到一个构想记录的效用。

基于OCR的图片内容提取

在python使用到的模块是pytesseract,关于简要的下载介绍什么的可见:Python–文字识别–Tesseract

运行代码:

1
2
3
4
5
6
import pytesseract
import cv2
image = cv2.imread('/Users/junjieliu/Desktop/1.png')
text = pytesseract.image_to_string(image)
print(text)

在此记录一下在使用过程中的出现的问题:

问题一:

1
Error: [Errno 2] No such file or directory using pytesser

之后我参考了:https://stackoverflow.com/questions/35609773/oserror-errno-2-no-such-file-or-directory-using-pytesser

我使用了其中的前面的几个答案的方案,结果出现了下面的错误…

问题二:

1
PermissionError: [Errno 13] Permission denied

之后我参考了:https://github.com/madmaze/pytesseract/issues/62

但是依旧得不到解决。

解决方案:

使用命令行:

1
which tesseract

找到了它的位置(没想到Mac自带的一个?):

1
/usr/local/bin/tesseract

然后虽然在替换了地址之后可以正常运行代码了(即tesseract_cmd = “/usr/local/bin/tesseract”),就会变得很麻烦,因为自带的根本难以进行扩展。

将下载好的加入环境变量替换掉原装的:

1
vi ~/.bash_profile

写入:

1
2
#tesseract
export PATH="/usr/local/Cellar/tesseract/4.0.0/bin:$PATH"

立即生效:

1
source ~/.bash_profile

之后再使用命令行which tesseract,就会发现变了位置,更改tesseract_cmd = “/usr/local/Cellar/tesseract/4.0.0/bin/tesseract”,之后程序就能成功运行并且可以得到以后的更多的扩展使用了,比如语言包的选择。

在线提取图片文字小工具

提取这一块的具体过程就不多说了,简单记录一下结合其他技术可以实现的想法:

可结合Pyqt5的GUI界面化开发,输入图片的目录地址,下方即出现提取的内容。

在以上的基础上结合爬虫实现翻译。

可参考我以前写的文章:python3爬虫与GUI-基于有道词典的词典小工具

这样一来这个小工具就能出来了。这里就这样吧,因为时间关系加上实现的过程不是很难,所以就不多说了。

  • 关于提取的精确度可移步参考更强大的工具:deep_ocr

结合OpenCV实时翻译

这里主要是我在参考了:用OpenCV和Python识别二维码和条形码这篇文章之后结合本身的需求出现的启发。

这是我经过修改之后的代码(添加并且修改了几行代码):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
from imutils.video import VideoStream
from pyzbar import pyzbar
import datetime
import imutils
import time
import cv2
cap = cv2.VideoCapture(0)
vs = VideoStream(src=0).start()
time.sleep(2.0)
# open the output CSV file for writing and initialize the set of
# barcodes found thus far
csv = open("barcodes.csv", "w")
found = set()
# loop over the frames from the video stream
while True:
# grab the frame from the threaded video stream and resize it to
# have a maximum width of 400 pixels
frame = vs.read()
frame = imutils.resize(frame, width=400)
# find the barcodes in the frame and decode each of the barcodes
barcodes = pyzbar.decode(frame)
# loop over the detected barcodes
for barcode in barcodes:
# extract the bounding box location of the barcode and draw
# the bounding box surrounding the barcode on the image
(x, y, w, h) = barcode.rect
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)
# the barcode data is a bytes object so if we want to draw it
# on our output image we need to convert it to a string first
barcodeData = barcode.data.decode("utf-8")
barcodeType = barcode.type
# draw the barcode data and barcode type on the image
text = "{} ({})".format(barcodeData, barcodeType)
cv2.putText(frame, text, (x, y - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
# if the barcode text is currently not in our CSV file, write
# the timestamp + barcode qto disk and update the set
if barcodeData not in found:
csv.write("{},{}\n".format(datetime.datetime.now(),
barcodeData))
csv.flush()
found.add(barcodeData)
# show the output frame
cv2.imshow("Barcode Scanner", frame)
key = cv2.waitKey(1) & 0xFF
# if the `q` key was pressed, break from the loop
if key == ord("q"):
break
# close the output CSV file do a bit of cleanup
print("[INFO] cleaning up...")
cap.release() # 释放摄像头
csv.close()
cv2.destroyAllWindows()
vs.stop()

性能得到了一点的优化,少写了点代码。效果没变化。

关于实现实时翻译的效果,这里可结合上面的有道爬虫与OpenCV来完成。基本上进行一些修改就行了,实现的过程不算太难。多参考官方文档以及他人的做法即能实现。

大概的代码样本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# coding:utf-8
from imutils.video import VideoStream
import datetime
import imutils
import time
import cv2
import hashlib
import requests
import json
import random
cap = cv2.VideoCapture(0)
# initialize the video stream and allow the camera sensor to warm up
print("starting video stream...")
# vs = VideoStream(src=0).start()
vs = VideoStream(src=0).start()
time.sleep(2.0)
csv = open("barcodes.csv", "w")
found = set()
while True:
# grab the frame from the threaded video stream and resize it to
# have a maximum width of 400 pixels
frame = vs.read()
frame = imutils.resize(frame, width=400)
word = frame
for words in word:
r = str(int(time.time() * 1000 + random.randint(1, 10))) # 模仿JS代码的仿写
S = 'fanyideskweb'
n = words
D = "ebSeFb%=XZ%T[KZ)c(sy!" # 在完整的JS代码中可找到
o = hashlib.md5((S + n + str(r) + D).encode('utf-8')).hexdigest()
data = {
'i': words,
'from': 'AUTO',
'to': 'AUTO',
'smartresult': 'dict',
'client': S,
'salt': r,
'sign': o,
'doctype': 'json',
'version': '2.1',
'keyfrom': 'fanyi.web',
'action': 'FY_BY_REALTIME',
'typoResult': 'false'
}
url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
# 在代理中需要加入cookies信息,否则会出现代码错误信息的返回
header = {
'Cookie': 'OUTFOX_SEARCH_USER_ID=432464843@10.168.8.76; _ntes_nnid=25aff2b1480f17471ca1585f6f2f4293,1512024136653; OUTFOX_SEARCH_USER_ID_NCOO=132154936.07902834; JSESSIONID=aaa3TFIg-JJJN4xEog6mw; ___rl__test__cookies=1525691300664',
'Referer': 'http://fanyi.youdao.com/',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36'
}
response = requests.post(url=url, headers=header, data=data)
response.encoding = 'utf-8'
translateResult = json.loads(response.text)["translateResult"][0][0]['tgt']
#(x, y, w, h) = words.rect
cv2.rectangle(frame, (10, 10), (20, 20), (0, 0, 255), 2)
cv2.putText(frame, translateResult, (10, 10 - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
# if the text is currently not in our CSV file, write
# the timestamp + text qto disk and update the set
if translateResult not in found:
csv.write("{},{}\n".format(datetime.datetime.now(),
translateResult))
csv.flush()
found.add(translateResult)
# show the output frame
cv2.imshow("Translate Discern", frame)
key = cv2.waitKey(1) & 0xFF
# if the `q` key was pressed, break from the loop
if key == ord("q"):
break
# close the output CSV file do a bit of cleanup
print("cleaning up...")
cap.release() # 释放摄像头
csv.close()
cv2.destroyAllWindows()
vs.stop()

可参考:

最后

记录一下在下载tesseract之后的提示,有一天可能会用到:

1
2
3
4
5
6
7
8
9
10
11
12
13
icu4c is keg-only, which means it was not symlinked into /usr/local,
because macOS provides libicucore.dylib (but nothing else).
If you need to have icu4c first in your PATH run:
echo 'export PATH="/usr/local/opt/icu4c/bin:$PATH"' >> ~/.bash_profile
echo 'export PATH="/usr/local/opt/icu4c/sbin:$PATH"' >> ~/.bash_profile
For compilers to find icu4c you may need to set:
export LDFLAGS="-L/usr/local/opt/icu4c/lib"
export CPPFLAGS="-I/usr/local/opt/icu4c/include"
For pkg-config to find icu4c you may need to set:
export PKG_CONFIG_PATH="/usr/local/opt/icu4c/lib/pkgconfig"
---------------本文终---------------

文章作者:刘俊

最后更新:2019年01月02日 - 14:01

许可协议: 转载请保留原文链接及作者。