Skip to main content

Dictionary / Translator app design

https://www.quora.com/Is-Google-Translator-the-best-among-all

Does it work and how?

However, whatever the technology is used for development of Google Translate software, the quality of the outcome depends on several factors.

    Language Pair or language combination. Why? Because Google does not so much generate some tricky and fancy translations out of nothing. It has a database of ready made translations. Where they were all taken from? They were all taken from works done by real translators. And the more translators available for particular language pair the higher quality of translation can be. English to German translation or English to French translation can be a good example of better quality in Google Translate. The rest of languages can leave much to be desired, like in case with English to Russian translation or English to Turkish translation.
    Subject matter. The quality of specific translation also depends on the language pair. The more professional translators are involved in this language combination, the more professional the translation can be. So, the more sophisticated the subject is, the less chance is there for good quality translation. Even though Google Translate can deliver good technical translation, it will not be professional for engineering translation or scientific translation.
    Literature translation. It can also create a problem with Google Translate. Why though? Because the literature translation is rather creative and should suggest more than two or three options for translation to select from. If the text is a fiction story, the software will just generate the translation which it has accumulated until now and would not offer you any kind of alternatives. There will just be one translation which will be more like technical rather than creative.
    Legal translation. It can also be a trouble for Google Translate. Many terms in jurisdiction may have different meanings which may cause some misunderstndings in the final version of Google translation. So, after any machine translation the post editing is provided to make up for any erros made in Google.

Prices:
Basic:
Language detection          $20 per million characters  *
Text translation (NMT general models)         $20 per million characters*
Text translation (PBMT general models).      $20 per million characters*



There are plenty of VA and EV (eng - vi) dictionary apps out there, but it seems shitty.
Event GTrans need many community contribution.
I have some good old dictionary on carbon copy, so I decide made my self dictionary. But data entry take damn time, so I try to find out if some one have converted it to digital version.
And it seems some of apps have done this or at least close to my carbon version.
So I decided to use there data but with different present, ie. offline options. I will update database when have time, $ and when it is necessary.

https://semisignal.com/wiktionary-definitions-database/
https://stackoverflow.com/questions/35202766/how-to-embed-wiktionary-for-offline-access-in-android-app

oh https://semisignal.com/ interesting blog.
Sample web api call ([otta]I suggest implementing the online API access, so small app can be downloaded and used, plus add a button somewhere that downloads the offline part. Also check network connection, and if it's not wi-fi, warn the user so the mobile data plan will not be abused for downloading 100 MB dictionary.)
curl -X GET \
  'https://ontario.com/search.html?style=word&content=focus&type=english-vietnamese&t=dict&RequestVerificationToken=rw6mQWJP_j3RCpS5zFzrNdj2I7uSEaUsOSjEHELH-5ux_rQjXZ0wnLrcl4EZTG3EbQ8S0QzwrtsJyBDj1ssWtyyltHeM72er5b98w6tnARprBbOiouEhE9u8Mb99ZYOoqniffUaQGcmTMCIRVc1g0g2%3AFgWcbbhtLXx5gfGqdRazx71WNSyxNalygNDs_H7lvMJjIypTj_obMmjptTEjjebT6JKolYDrTR4KG-d3QPJuv8bzbywh5vZY1iy52ODhF1ftATXSnYEegJHwL2hJ9oAH7HpFmVyyvJZRXx12SDMmZ0QfbTgBLiFK-AJBOcF8P_g1' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -H 'Postman-Token: 04ebf3ca-b952-4119-b4b3-60621d638c69'

With mobile api call may be different, I will try to find out what api mobile app used.

English word list
https://github.com/dwyl/english-words.git

It seems we only care ab alpha words only.
I suggest implementing the online API access, so small app can be downloaded and used, plus add a button somewhere that downloads the offline part. Also check network connection, and if it's not wi-fi, warn the user so the mobile data plan will not be abused for downloading 100 MB dictionary.

Clone/crawl DB
Step 1
- sleep to avoid ban on too much request
- strip many part for process
- most simple way is store HTML to DB
- create sample app for experiment and expose condition.
- DB should be crawled full in case API go down or block further access.
- Simple API response HTML
- Simple app may be RA Native
Step 2
- should process DOM for structured data on translate
- design DB, should be Relational (schema) if possible as M. Fowler have suggested. On mobile may be SQLite.
- Allow offline download
- Notice user on mobile network (not wifi)
- refine app
Step 3
- Word/pdf or page version of dict, out of sight - out of mind.
- compress and other refine
- contribute to gtrans (can it auto or some tool suggest for faster contribute ?)
https://thedecisionlab.com/co2-out-of-sight-not-out-of-mind-perception-of-carbon-capture-and-storage-risks/
http://www.environmentandsociety.org/sites/default/files/2016_i1.pdf

https://www.theverge.com/transportation/2018/4/19/17204044/tesla-waymo-self-driving-car-data-simulation
https://electrek.co/2016/04/06/tesla-autopilot-comma-ai-geohot-elon-musk/

https://github.com/kiitPK/DictionaryDemo
https://github.com/kodycode/React-Native-Dictionary-App

https://stackoverflow.com/questions/19583956/read-line-and-remove-newline-character-using-shell-script
https://unix.stackexchange.com/questions/1519/how-do-i-delete-a-file-whose-name-begins-with-hyphen-a-k-a-dash-or-minus
Aha chinese
http://www.cantonese.sheik.co.uk/scripts/wordlist.htm
yeah this dutch site have many cool stuff
https://www.informatik.uni-leipzig.de/~duc/Dict/

http://www.denisowski.org/Vietnamese/vnedict.txt
www.denisowski.org

http://viet.jnlp.org/nghien-cuu-cua-tac-gia/bai-toan-them-dau-cho-tieng-viet
http://box.jnlp.org/arc/12/12IALP-anh.pdf

JSON handle remembering jq and python3 ?
https://stackoverflow.com/questions/1955505/parsing-json-with-unix-tools

URL=${URL%$'\r'}
word2=`echo $word | sed "s/$(printf '\r')//"` # OSX



Comments

Popular posts from this blog

AWS Elasticache Memcached connection

https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/accessing-elasticache.html#access-from-outside-aws http://hourlyapps.blogspot.com/2010/06/examples-of-memcached-commands.html Access memcached https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/GettingStarted.AuthorizeAccess.html Zip include hidden file https://stackoverflow.com/questions/12493206/zip-including-hidden-files phpmemcachedadmin ~ phpMyAdmin or phpPgAdmin ... telnet mycachecluster.eaogs8.0001.usw2.cache.amazonaws.com 11211 stats items stats cachedump 27 100 https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/VPCs.EC.html https://lzone.de/cheat-sheet/memcached VPC ID Security Group ID (sg-...) Cluster: The identifier for the cluster memcached1 Creation Time: The time (UTC) when the cluster was created January 9, 2019 at 11:47:16 AM UTC+7 Configuration Endpoint: The configuration endpoint of the cluster memcached1.ahgofe.cfg.usw1.cache.amazonaws.com:11211 St

Notes Windows 10 Virtualbox config, PHP Storm Japanese, custom PHP, Apache build, Postgresql

 cmd => Ctrl + Shift + Enter mklink "C:\Users\HauNT\Videos\host3" "C:\Windows\System32\drivers\etc\hosts" https://www.quora.com/How-to-create-a-router-in-php https://serverfault.com/questions/225155/virtualbox-how-to-set-up-networking-so-both-host-and-guest-can-access-internet 1 NAT + 1 host only config https://unix.stackexchange.com/questions/115464/how-to-properly-set-up-2-network-interfaces-in-centos-running-in-virtualbox DEVICE=eth0 TYPE=Ethernet #BOOTPROTO=dhcp BOOTPROTO=none #IPADDR=10.9.11.246 #PREFIX=24 #GATEWAY=10.9.11.1 #IPV4_FAILURE_FATAL=yes #HWADDR=08:00:27:CC:AC:AC ONBOOT=yes NAME="System eth0" [root@localhost www]# cat /etc/sysconfig/network-scripts/ifcfg-eth1 # Advanced Micro Devices, Inc. [AMD] 79c970 [PCnet32 LANCE] DEVICE=eth1 IPADDR=192.168.56.28 <= no eff => auto like DHCP #GATEWAY=192.168.56.1 #BOOTPROTO=dhcp BOOTPROTO=static <= no eff ONBOOT=yes HWADDR=08:00:27:b4:20:10 [root@localhost www]#

Rocket.Chat DB schema

_raix_push_notifications avatars.chunks avatars.files instances meteor_accounts_loginServiceConfiguration meteor_oauth_pendingCredentials meteor_oauth_pendingRequestTokens migrations rocketchat__trash rocketchat_cron_history rocketchat_custom_emoji rocketchat_custom_sounds rocketchat_import rocketchat_integration_history rocketchat_integrations rocketchat_livechat_custom_field rocketchat_livechat_department rocketchat_livechat_department_agents rocketchat_livechat_external_message rocketchat_livechat_inquiry rocketchat_livechat_office_hour rocketchat_livechat_page_visited rocketchat_livechat_trigger rocketchat_message rocketchat_oauth_apps rocketchat_oembed_cache rocketchat_permissions rocketchat_raw_imports rocketchat_reports rocketchat_roles rocketchat_room rocketchat_settings rocketchat_smarsh_history rocketchat_statistics rocketchat_subscription rocketchat_uploads system.indexes users usersSessions https://rocket.chat/docs/developer-guides/sc