https://www.quora.com/Is-Google-Translator-the-best-among-all
Does it work and how?
However, whatever the technology is used for development of Google Translate software, the quality of the outcome depends on several factors.
Language Pair or language combination. Why? Because Google does not so much generate some tricky and fancy translations out of nothing. It has a database of ready made translations. Where they were all taken from? They were all taken from works done by real translators. And the more translators available for particular language pair the higher quality of translation can be. English to German translation or English to French translation can be a good example of better quality in Google Translate. The rest of languages can leave much to be desired, like in case with English to Russian translation or English to Turkish translation.
Subject matter. The quality of specific translation also depends on the language pair. The more professional translators are involved in this language combination, the more professional the translation can be. So, the more sophisticated the subject is, the less chance is there for good quality translation. Even though Google Translate can deliver good technical translation, it will not be professional for engineering translation or scientific translation.
Literature translation. It can also create a problem with Google Translate. Why though? Because the literature translation is rather creative and should suggest more than two or three options for translation to select from. If the text is a fiction story, the software will just generate the translation which it has accumulated until now and would not offer you any kind of alternatives. There will just be one translation which will be more like technical rather than creative.
Legal translation. It can also be a trouble for Google Translate. Many terms in jurisdiction may have different meanings which may cause some misunderstndings in the final version of Google translation. So, after any machine translation the post editing is provided to make up for any erros made in Google.
Prices:
Basic:
Language detection $20 per million characters *
Text translation (NMT general models) $20 per million characters*
Text translation (PBMT general models). $20 per million characters*
There are plenty of VA and EV (eng - vi) dictionary apps out there, but it seems shitty.
Event GTrans need many community contribution.
I have some good old dictionary on carbon copy, so I decide made my self dictionary. But data entry take damn time, so I try to find out if some one have converted it to digital version.
And it seems some of apps have done this or at least close to my carbon version.
So I decided to use there data but with different present, ie. offline options. I will update database when have time, $ and when it is necessary.
https://semisignal.com/wiktionary-definitions-database/
https://stackoverflow.com/questions/35202766/how-to-embed-wiktionary-for-offline-access-in-android-app
oh https://semisignal.com/ interesting blog.
Sample web api call ([otta]I suggest implementing the online API access, so small app can be downloaded and used, plus add a button somewhere that downloads the offline part. Also check network connection, and if it's not wi-fi, warn the user so the mobile data plan will not be abused for downloading 100 MB dictionary.)
curl -X GET \
'https://ontario.com/search.html?style=word&content=focus&type=english-vietnamese&t=dict&RequestVerificationToken=rw6mQWJP_j3RCpS5zFzrNdj2I7uSEaUsOSjEHELH-5ux_rQjXZ0wnLrcl4EZTG3EbQ8S0QzwrtsJyBDj1ssWtyyltHeM72er5b98w6tnARprBbOiouEhE9u8Mb99ZYOoqniffUaQGcmTMCIRVc1g0g2%3AFgWcbbhtLXx5gfGqdRazx71WNSyxNalygNDs_H7lvMJjIypTj_obMmjptTEjjebT6JKolYDrTR4KG-d3QPJuv8bzbywh5vZY1iy52ODhF1ftATXSnYEegJHwL2hJ9oAH7HpFmVyyvJZRXx12SDMmZ0QfbTgBLiFK-AJBOcF8P_g1' \
-H 'Content-Type: application/x-www-form-urlencoded' \
-H 'Postman-Token: 04ebf3ca-b952-4119-b4b3-60621d638c69'
With mobile api call may be different, I will try to find out what api mobile app used.
English word list
https://github.com/dwyl/english-words.git
It seems we only care ab alpha words only.
I suggest implementing the online API access, so small app can be downloaded and used, plus add a button somewhere that downloads the offline part. Also check network connection, and if it's not wi-fi, warn the user so the mobile data plan will not be abused for downloading 100 MB dictionary.
Clone/crawl DB
Step 1
- sleep to avoid ban on too much request
- strip many part for process
- most simple way is store HTML to DB
- create sample app for experiment and expose condition.
- DB should be crawled full in case API go down or block further access.
- Simple API response HTML
- Simple app may be RA Native
Step 2
- should process DOM for structured data on translate
- design DB, should be Relational (schema) if possible as M. Fowler have suggested. On mobile may be SQLite.
- Allow offline download
- Notice user on mobile network (not wifi)
- refine app
Step 3
- Word/pdf or page version of dict, out of sight - out of mind.
- compress and other refine
- contribute to gtrans (can it auto or some tool suggest for faster contribute ?)
https://thedecisionlab.com/co2-out-of-sight-not-out-of-mind-perception-of-carbon-capture-and-storage-risks/
http://www.environmentandsociety.org/sites/default/files/2016_i1.pdf
https://www.theverge.com/transportation/2018/4/19/17204044/tesla-waymo-self-driving-car-data-simulation
https://electrek.co/2016/04/06/tesla-autopilot-comma-ai-geohot-elon-musk/
https://github.com/kiitPK/DictionaryDemo
https://github.com/kodycode/React-Native-Dictionary-App
https://stackoverflow.com/questions/19583956/read-line-and-remove-newline-character-using-shell-script
https://unix.stackexchange.com/questions/1519/how-do-i-delete-a-file-whose-name-begins-with-hyphen-a-k-a-dash-or-minus
Aha chinese
http://www.cantonese.sheik.co.uk/scripts/wordlist.htm
yeah this dutch site have many cool stuff
https://www.informatik.uni-leipzig.de/~duc/Dict/
http://www.denisowski.org/Vietnamese/vnedict.txt
www.denisowski.org
http://viet.jnlp.org/nghien-cuu-cua-tac-gia/bai-toan-them-dau-cho-tieng-viet
http://box.jnlp.org/arc/12/12IALP-anh.pdf
JSON handle remembering jq and python3 ?
https://stackoverflow.com/questions/1955505/parsing-json-with-unix-tools
Does it work and how?
However, whatever the technology is used for development of Google Translate software, the quality of the outcome depends on several factors.
Language Pair or language combination. Why? Because Google does not so much generate some tricky and fancy translations out of nothing. It has a database of ready made translations. Where they were all taken from? They were all taken from works done by real translators. And the more translators available for particular language pair the higher quality of translation can be. English to German translation or English to French translation can be a good example of better quality in Google Translate. The rest of languages can leave much to be desired, like in case with English to Russian translation or English to Turkish translation.
Subject matter. The quality of specific translation also depends on the language pair. The more professional translators are involved in this language combination, the more professional the translation can be. So, the more sophisticated the subject is, the less chance is there for good quality translation. Even though Google Translate can deliver good technical translation, it will not be professional for engineering translation or scientific translation.
Literature translation. It can also create a problem with Google Translate. Why though? Because the literature translation is rather creative and should suggest more than two or three options for translation to select from. If the text is a fiction story, the software will just generate the translation which it has accumulated until now and would not offer you any kind of alternatives. There will just be one translation which will be more like technical rather than creative.
Legal translation. It can also be a trouble for Google Translate. Many terms in jurisdiction may have different meanings which may cause some misunderstndings in the final version of Google translation. So, after any machine translation the post editing is provided to make up for any erros made in Google.
Prices:
Basic:
Language detection $20 per million characters *
Text translation (NMT general models) $20 per million characters*
Text translation (PBMT general models). $20 per million characters*
There are plenty of VA and EV (eng - vi) dictionary apps out there, but it seems shitty.
Event GTrans need many community contribution.
I have some good old dictionary on carbon copy, so I decide made my self dictionary. But data entry take damn time, so I try to find out if some one have converted it to digital version.
And it seems some of apps have done this or at least close to my carbon version.
So I decided to use there data but with different present, ie. offline options. I will update database when have time, $ and when it is necessary.
https://semisignal.com/wiktionary-definitions-database/
https://stackoverflow.com/questions/35202766/how-to-embed-wiktionary-for-offline-access-in-android-app
oh https://semisignal.com/ interesting blog.
Sample web api call ([otta]I suggest implementing the online API access, so small app can be downloaded and used, plus add a button somewhere that downloads the offline part. Also check network connection, and if it's not wi-fi, warn the user so the mobile data plan will not be abused for downloading 100 MB dictionary.)
curl -X GET \
'https://ontario.com/search.html?style=word&content=focus&type=english-vietnamese&t=dict&RequestVerificationToken=rw6mQWJP_j3RCpS5zFzrNdj2I7uSEaUsOSjEHELH-5ux_rQjXZ0wnLrcl4EZTG3EbQ8S0QzwrtsJyBDj1ssWtyyltHeM72er5b98w6tnARprBbOiouEhE9u8Mb99ZYOoqniffUaQGcmTMCIRVc1g0g2%3AFgWcbbhtLXx5gfGqdRazx71WNSyxNalygNDs_H7lvMJjIypTj_obMmjptTEjjebT6JKolYDrTR4KG-d3QPJuv8bzbywh5vZY1iy52ODhF1ftATXSnYEegJHwL2hJ9oAH7HpFmVyyvJZRXx12SDMmZ0QfbTgBLiFK-AJBOcF8P_g1' \
-H 'Content-Type: application/x-www-form-urlencoded' \
-H 'Postman-Token: 04ebf3ca-b952-4119-b4b3-60621d638c69'
With mobile api call may be different, I will try to find out what api mobile app used.
English word list
https://github.com/dwyl/english-words.git
It seems we only care ab alpha words only.
I suggest implementing the online API access, so small app can be downloaded and used, plus add a button somewhere that downloads the offline part. Also check network connection, and if it's not wi-fi, warn the user so the mobile data plan will not be abused for downloading 100 MB dictionary.
Clone/crawl DB
Step 1
- sleep to avoid ban on too much request
- strip many part for process
- most simple way is store HTML to DB
- create sample app for experiment and expose condition.
- DB should be crawled full in case API go down or block further access.
- Simple API response HTML
- Simple app may be RA Native
Step 2
- should process DOM for structured data on translate
- design DB, should be Relational (schema) if possible as M. Fowler have suggested. On mobile may be SQLite.
- Allow offline download
- Notice user on mobile network (not wifi)
- refine app
Step 3
- Word/pdf or page version of dict, out of sight - out of mind.
- compress and other refine
- contribute to gtrans (can it auto or some tool suggest for faster contribute ?)
https://thedecisionlab.com/co2-out-of-sight-not-out-of-mind-perception-of-carbon-capture-and-storage-risks/
http://www.environmentandsociety.org/sites/default/files/2016_i1.pdf
https://www.theverge.com/transportation/2018/4/19/17204044/tesla-waymo-self-driving-car-data-simulation
https://electrek.co/2016/04/06/tesla-autopilot-comma-ai-geohot-elon-musk/
https://github.com/kiitPK/DictionaryDemo
https://github.com/kodycode/React-Native-Dictionary-App
https://stackoverflow.com/questions/19583956/read-line-and-remove-newline-character-using-shell-script
https://unix.stackexchange.com/questions/1519/how-do-i-delete-a-file-whose-name-begins-with-hyphen-a-k-a-dash-or-minus
Aha chinese
http://www.cantonese.sheik.co.uk/scripts/wordlist.htm
yeah this dutch site have many cool stuff
https://www.informatik.uni-leipzig.de/~duc/Dict/
http://www.denisowski.org/Vietnamese/vnedict.txt
www.denisowski.org
http://viet.jnlp.org/nghien-cuu-cua-tac-gia/bai-toan-them-dau-cho-tieng-viet
http://box.jnlp.org/arc/12/12IALP-anh.pdf
JSON handle remembering jq and python3 ?
https://stackoverflow.com/questions/1955505/parsing-json-with-unix-tools
URL=${URL%$'\r'}
word2=`echo $word | sed "s/$(printf '\r')//"` # OSX
Comments
Post a Comment