Skip to main content

Dictionary / Translator app design

https://www.quora.com/Is-Google-Translator-the-best-among-all

Does it work and how?

However, whatever the technology is used for development of Google Translate software, the quality of the outcome depends on several factors.

    Language Pair or language combination. Why? Because Google does not so much generate some tricky and fancy translations out of nothing. It has a database of ready made translations. Where they were all taken from? They were all taken from works done by real translators. And the more translators available for particular language pair the higher quality of translation can be. English to German translation or English to French translation can be a good example of better quality in Google Translate. The rest of languages can leave much to be desired, like in case with English to Russian translation or English to Turkish translation.
    Subject matter. The quality of specific translation also depends on the language pair. The more professional translators are involved in this language combination, the more professional the translation can be. So, the more sophisticated the subject is, the less chance is there for good quality translation. Even though Google Translate can deliver good technical translation, it will not be professional for engineering translation or scientific translation.
    Literature translation. It can also create a problem with Google Translate. Why though? Because the literature translation is rather creative and should suggest more than two or three options for translation to select from. If the text is a fiction story, the software will just generate the translation which it has accumulated until now and would not offer you any kind of alternatives. There will just be one translation which will be more like technical rather than creative.
    Legal translation. It can also be a trouble for Google Translate. Many terms in jurisdiction may have different meanings which may cause some misunderstndings in the final version of Google translation. So, after any machine translation the post editing is provided to make up for any erros made in Google.

Prices:
Basic:
Language detection          $20 per million characters  *
Text translation (NMT general models)         $20 per million characters*
Text translation (PBMT general models).      $20 per million characters*



There are plenty of VA and EV (eng - vi) dictionary apps out there, but it seems shitty.
Event GTrans need many community contribution.
I have some good old dictionary on carbon copy, so I decide made my self dictionary. But data entry take damn time, so I try to find out if some one have converted it to digital version.
And it seems some of apps have done this or at least close to my carbon version.
So I decided to use there data but with different present, ie. offline options. I will update database when have time, $ and when it is necessary.

https://semisignal.com/wiktionary-definitions-database/
https://stackoverflow.com/questions/35202766/how-to-embed-wiktionary-for-offline-access-in-android-app

oh https://semisignal.com/ interesting blog.
Sample web api call ([otta]I suggest implementing the online API access, so small app can be downloaded and used, plus add a button somewhere that downloads the offline part. Also check network connection, and if it's not wi-fi, warn the user so the mobile data plan will not be abused for downloading 100 MB dictionary.)
curl -X GET \
  'https://ontario.com/search.html?style=word&content=focus&type=english-vietnamese&t=dict&RequestVerificationToken=rw6mQWJP_j3RCpS5zFzrNdj2I7uSEaUsOSjEHELH-5ux_rQjXZ0wnLrcl4EZTG3EbQ8S0QzwrtsJyBDj1ssWtyyltHeM72er5b98w6tnARprBbOiouEhE9u8Mb99ZYOoqniffUaQGcmTMCIRVc1g0g2%3AFgWcbbhtLXx5gfGqdRazx71WNSyxNalygNDs_H7lvMJjIypTj_obMmjptTEjjebT6JKolYDrTR4KG-d3QPJuv8bzbywh5vZY1iy52ODhF1ftATXSnYEegJHwL2hJ9oAH7HpFmVyyvJZRXx12SDMmZ0QfbTgBLiFK-AJBOcF8P_g1' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -H 'Postman-Token: 04ebf3ca-b952-4119-b4b3-60621d638c69'

With mobile api call may be different, I will try to find out what api mobile app used.

English word list
https://github.com/dwyl/english-words.git

It seems we only care ab alpha words only.
I suggest implementing the online API access, so small app can be downloaded and used, plus add a button somewhere that downloads the offline part. Also check network connection, and if it's not wi-fi, warn the user so the mobile data plan will not be abused for downloading 100 MB dictionary.

Clone/crawl DB
Step 1
- sleep to avoid ban on too much request
- strip many part for process
- most simple way is store HTML to DB
- create sample app for experiment and expose condition.
- DB should be crawled full in case API go down or block further access.
- Simple API response HTML
- Simple app may be RA Native
Step 2
- should process DOM for structured data on translate
- design DB, should be Relational (schema) if possible as M. Fowler have suggested. On mobile may be SQLite.
- Allow offline download
- Notice user on mobile network (not wifi)
- refine app
Step 3
- Word/pdf or page version of dict, out of sight - out of mind.
- compress and other refine
- contribute to gtrans (can it auto or some tool suggest for faster contribute ?)
https://thedecisionlab.com/co2-out-of-sight-not-out-of-mind-perception-of-carbon-capture-and-storage-risks/
http://www.environmentandsociety.org/sites/default/files/2016_i1.pdf

https://www.theverge.com/transportation/2018/4/19/17204044/tesla-waymo-self-driving-car-data-simulation
https://electrek.co/2016/04/06/tesla-autopilot-comma-ai-geohot-elon-musk/

https://github.com/kiitPK/DictionaryDemo
https://github.com/kodycode/React-Native-Dictionary-App

https://stackoverflow.com/questions/19583956/read-line-and-remove-newline-character-using-shell-script
https://unix.stackexchange.com/questions/1519/how-do-i-delete-a-file-whose-name-begins-with-hyphen-a-k-a-dash-or-minus
Aha chinese
http://www.cantonese.sheik.co.uk/scripts/wordlist.htm
yeah this dutch site have many cool stuff
https://www.informatik.uni-leipzig.de/~duc/Dict/

http://www.denisowski.org/Vietnamese/vnedict.txt
www.denisowski.org

http://viet.jnlp.org/nghien-cuu-cua-tac-gia/bai-toan-them-dau-cho-tieng-viet
http://box.jnlp.org/arc/12/12IALP-anh.pdf

JSON handle remembering jq and python3 ?
https://stackoverflow.com/questions/1955505/parsing-json-with-unix-tools

URL=${URL%$'\r'}
word2=`echo $word | sed "s/$(printf '\r')//"` # OSX



Comments

Popular posts from this blog

Rand mm 10

https://stackoverflow.com/questions/2447791/define-vs-const Oh const vs define, many time I got unexpected interview question. As this one, I do not know much or try to study this. My work flow, and I believe of many programmer is that search topic only when we have task or job to tackle. We ignore many 'basic', 'fundamental' documents, RTFM is boring. So I think it is a trade off between the two way of study language. And I think there are a bridge or balanced way to extract both advantage of two method. There are some huge issue with programmer like me that prevent we master some technique that take only little time if doing properly. For example, some Red Hat certificate program, lesson, course that I have learned during Collage gave our exceptional useful when it cover almost all topic while working with Linux. I remember it called something like RHEL (RedHat Enterprise Linux) Certificate... I think there are many tons of documents, guide n books about Linux bu

Martin Fowler - Software Architecture - Making Architecture matter

  https://martinfowler.com/architecture/ One can appreciate the point of this presentation when one's sense of code smell is trained, functional and utilized. Those controlling the budget as well as developer leads should understand the design stamina hypothesis, so that the appropriate focus and priority is given to internal quality - otherwise pay a high price soon. Andrew Farrell 8 months ago I love that he was able to give an important lesson on the “How?” of software architecture at the very end: delegate decisions to those with the time to focus on them. Very nice and straight-forward talk about the value of software architecture For me, architecture is the distribution of complexity in a system. And also, how subsystems communicate with each other. A battle between craftmanship and the economics and economics always win... https://hackernoon.com/applying-clean-architecture-on-web-application-with-modular-pattern-7b11f1b89011 1. Independent of Frameworks 2. Testable 3. Indepe