Skip to main content

Dictionary / Translator app design

Does it work and how?

However, whatever the technology is used for development of Google Translate software, the quality of the outcome depends on several factors.

    Language Pair or language combination. Why? Because Google does not so much generate some tricky and fancy translations out of nothing. It has a database of ready made translations. Where they were all taken from? They were all taken from works done by real translators. And the more translators available for particular language pair the higher quality of translation can be. English to German translation or English to French translation can be a good example of better quality in Google Translate. The rest of languages can leave much to be desired, like in case with English to Russian translation or English to Turkish translation.
    Subject matter. The quality of specific translation also depends on the language pair. The more professional translators are involved in this language combination, the more professional the translation can be. So, the more sophisticated the subject is, the less chance is there for good quality translation. Even though Google Translate can deliver good technical translation, it will not be professional for engineering translation or scientific translation.
    Literature translation. It can also create a problem with Google Translate. Why though? Because the literature translation is rather creative and should suggest more than two or three options for translation to select from. If the text is a fiction story, the software will just generate the translation which it has accumulated until now and would not offer you any kind of alternatives. There will just be one translation which will be more like technical rather than creative.
    Legal translation. It can also be a trouble for Google Translate. Many terms in jurisdiction may have different meanings which may cause some misunderstndings in the final version of Google translation. So, after any machine translation the post editing is provided to make up for any erros made in Google.

Language detection          $20 per million characters  *
Text translation (NMT general models)         $20 per million characters*
Text translation (PBMT general models).      $20 per million characters*

There are plenty of VA and EV (eng - vi) dictionary apps out there, but it seems shitty.
Event GTrans need many community contribution.
I have some good old dictionary on carbon copy, so I decide made my self dictionary. But data entry take damn time, so I try to find out if some one have converted it to digital version.
And it seems some of apps have done this or at least close to my carbon version.
So I decided to use there data but with different present, ie. offline options. I will update database when have time, $ and when it is necessary.

oh interesting blog.
Sample web api call ([otta]I suggest implementing the online API access, so small app can be downloaded and used, plus add a button somewhere that downloads the offline part. Also check network connection, and if it's not wi-fi, warn the user so the mobile data plan will not be abused for downloading 100 MB dictionary.)
curl -X GET \
  '' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -H 'Postman-Token: 04ebf3ca-b952-4119-b4b3-60621d638c69'

With mobile api call may be different, I will try to find out what api mobile app used.

English word list

It seems we only care ab alpha words only.
I suggest implementing the online API access, so small app can be downloaded and used, plus add a button somewhere that downloads the offline part. Also check network connection, and if it's not wi-fi, warn the user so the mobile data plan will not be abused for downloading 100 MB dictionary.

Clone/crawl DB
Step 1
- sleep to avoid ban on too much request
- strip many part for process
- most simple way is store HTML to DB
- create sample app for experiment and expose condition.
- DB should be crawled full in case API go down or block further access.
- Simple API response HTML
- Simple app may be RA Native
Step 2
- should process DOM for structured data on translate
- design DB, should be Relational (schema) if possible as M. Fowler have suggested. On mobile may be SQLite.
- Allow offline download
- Notice user on mobile network (not wifi)
- refine app
Step 3
- Word/pdf or page version of dict, out of sight - out of mind.
- compress and other refine
- contribute to gtrans (can it auto or some tool suggest for faster contribute ?)
Aha chinese
yeah this dutch site have many cool stuff

JSON handle remembering jq and python3 ?

word2=`echo $word | sed "s/$(printf '\r')//"` # OSX


Popular posts from this blog

AWS Elasticache Memcached connection Access memcached Zip include hidden file phpmemcachedadmin ~ phpMyAdmin or phpPgAdmin ... telnet 11211 stats items stats cachedump 27 100 VPC ID Security Group ID (sg-...) Cluster: The identifier for the cluster memcached1 Creation Time: The time (UTC) when the cluster was created January 9, 2019 at 11:47:16 AM UTC+7 Configuration Endpoint: The configuration endpoint of the cluster St...

Simulate Fail2ban on Apache request spam with mod_evasive limitipconn ... trap command exit 1 not work it seem { } brace bound fixed it. cat access_log | cut -d ' ' -f 1 > ip1 sort -n -t. -k1,1 -k2,2 -k3,3 -k4,4 | uniq -c | sort -n -r -s  Code: ------------------------------------------------------------------- #Block Spam Bots and Spam on your website #Block proxies...

Rocket.Chat DB schema

_raix_push_notifications avatars.chunks avatars.files instances meteor_accounts_loginServiceConfiguration meteor_oauth_pendingCredentials meteor_oauth_pendingRequestTokens migrations rocketchat__trash rocketchat_cron_history rocketchat_custom_emoji rocketchat_custom_sounds rocketchat_import rocketchat_integration_history rocketchat_integrations rocketchat_livechat_custom_field rocketchat_livechat_department rocketchat_livechat_department_agents rocketchat_livechat_external_message rocketchat_livechat_inquiry rocketchat_livechat_office_hour rocketchat_livechat_page_visited rocketchat_livechat_trigger rocketchat_message rocketchat_oauth_apps rocketchat_oembed_cache rocketchat_permissions rocketchat_raw_imports rocketchat_reports rocketchat_roles rocketchat_room rocketchat_settings rocketchat_smarsh_history rocketchat_statistics rocketchat_subscription rocketchat_uploads system.indexes users usersSessions