I have experienced a cool critical situation about lost live data. Lost live data is a common problems and myth.
Outline:
Bash ls order by time
ls default order by name
ls order by both time and name (fallback)
Simulate time when create file
Simulate time when move file (move is faster and may be timestamp duplicate)
Bash programming variable as output of a command
use cat /dev/urandom to generate random string, number ...
/dev/urandom seem only return number when run inside bash shell (Windows Git-Bash)
...
Our live server has been lost about 7000 images. Exactly its has been renamed, the original image file content is keep. But it's filename has been changed and inside Database it's not reflected yet.
Let make it simple:
In DB a record look like:
ID Image_URL ...
1 image_product_001.jpg
In real image web data folder:
name vuETHeuteuehteecCHUE
And the function that lead to disaster look like:
loop through product_image (after 1-May for example):
with each image do
rename it to new name
new_name = random(). time() . uniqid();
rename_image();
// forgot update DB :)
end
And boom!
When I taking to solve this problem. I know that we have to reserve the Fault rename operation. As many others easily to realize that We could use the timestamp of new_image_file and the order of the loop to reverse rename operation.
The reason is that if the file is process later in loop then may be the new_image_file it's created have bigger (newer) timestamp.
But UNIX time is not in microsecond ?
I have tried to show microsecond of file but no luck ?
https://jason.pureconcepts.net/2013/09/php-convert-uniqid-to-timestamp/
ls -ltc
reverse
ls -ltr
Note smth before I complete this post.
When I asked to solution. My project is in slow progress so I guest I should not take too much time on it.
Secondly, When I try to figure out solution, I know that I (linux/UNIX...) can list file 's modified timestamp in microsecond. And I think it is enough to restore original image name.
3rd, When I failed to read about uniqid() function, the "entropy" word I do not remember it come from parameter or description. But when we (or just me) see this word I immediately think to chaos and random thing. So I abandoned and refuse to read more about uniqid() function. Fuck psychological knowledge.
I don't remember what site I have enter to read about uniqid(), may be php.net or w3c or SO ... but the entropy word lead me to guest that output of uniqid() function is chaos and random as random() string function.
4th, After another team member have solved this problem by using uniqid() function. I feel quite anxious.
When I tried to figure out situation, I (and I think many of you) have already in tired and stress....
5th, After tried to write this document and reproduce, re-run command that I used I figure out that one of my command may be wrong:
ls -ltc (from a dam guy) is reverse order with
ls -ltr
When I keep blink eye to use this output (ls -ltc) from fucking guy I have say that I believe this guy 1 time while Backend developer who create this problem keep pushing me since Customer may be discover problem...
=> And boom.
I continue to investigate this case study and make sure all information is tested, all command are tested carefully to have a valuable conclusion.
Since I was not in this project or rescue team, I can not or hard to access live server to have full verify ... The only left is some log files that I have been taken.
There are some other interesting thing to learn ab technical ie. microtime etc. And there some psychological to learn and think about, may be some JK idea be apply here.
Imagine the world without the order of time (arrow of time) or more precisely is the almost increase of entropy?
I think in this "chaos" there is an order, the almost always increase. If entropy or time is randomly increase or decrease then it's is more chaotic than ever before?
Bash
- Append or add prefix word to each line in a file.
- paste two text file line by line.
- File1 has ab 7k line, each line is filename of image. File2 is full ls list all image in ISO date format. about 200k line.
=> How to get full ls of 7k image from File2
Loop each line in file1. Grep each filename search in File2. Append result.
- Sort by hex value
- cut nth column in a file, n character in a word/string
Detail I will explain bellow.
Tools, Apps and notes.
- GitBash seem limit word, character length or line when paste to it. May be it is Windows buffer limit...
When I paste more than 1750 line to Git-Bash, it hang on and stop run command in next (1751th).
Here is one of command in one line:
grep WAigXu15253467215aeaf1a1801b7 list_ls_iso1.txt >> new_img_filtered
- Other command, tool is mostly Linux/UNIX tool on Git-Bash Windows. Some app I have to use Linux (Ubuntu on VM Box) ie. Perl. We can install python, perl ... on Windows but it seem strictly ie. set ENV variable ...
Sort by column
sort -k2 -n yourfile
Perl script to sort by HEX
https://www.systutorials.com/241274/how-to-sort-a-file-by-hexadecimal-numbers-on-linux-using-sort-command/
Bash - Take nth column in a text file
So we need a clever way to convert hex to decimal or simple sort it. Because of we only want sorted list not really want decimal number.
First 5 character of hex value is same. So we only care about 7 character (of 12) left.
Admin@Minkho MINGW64 ~/Work/priz/simulation1
$ grn "eaf1a" new_img.one |wc -l
6936
Admin@Minkho MINGW64 ~/Work/priz/simulation1
$ wc -l new_img.one
6935 new_img.one
Outline:
Bash ls order by time
ls default order by name
ls order by both time and name (fallback)
Simulate time when create file
Simulate time when move file (move is faster and may be timestamp duplicate)
Bash programming variable as output of a command
use cat /dev/urandom to generate random string, number ...
/dev/urandom seem only return number when run inside bash shell (Windows Git-Bash)
...
Our live server has been lost about 7000 images. Exactly its has been renamed, the original image file content is keep. But it's filename has been changed and inside Database it's not reflected yet.
Let make it simple:
In DB a record look like:
ID Image_URL ...
1 image_product_001.jpg
In real image web data folder:
name vuETHeuteuehteecCHUE
And the function that lead to disaster look like:
loop through product_image (after 1-May for example):
with each image do
rename it to new name
new_name = random(). time() . uniqid();
rename_image();
// forgot update DB :)
end
And boom!
When I taking to solve this problem. I know that we have to reserve the Fault rename operation. As many others easily to realize that We could use the timestamp of new_image_file and the order of the loop to reverse rename operation.
The reason is that if the file is process later in loop then may be the new_image_file it's created have bigger (newer) timestamp.
But UNIX time is not in microsecond ?
I have tried to show microsecond of file but no luck ?
https://jason.pureconcepts.net/2013/09/php-convert-uniqid-to-timestamp/
ls -ltc
reverse
ls -ltr
Note smth before I complete this post.
When I asked to solution. My project is in slow progress so I guest I should not take too much time on it.
Secondly, When I try to figure out solution, I know that I (linux/UNIX...) can list file 's modified timestamp in microsecond. And I think it is enough to restore original image name.
3rd, When I failed to read about uniqid() function, the "entropy" word I do not remember it come from parameter or description. But when we (or just me) see this word I immediately think to chaos and random thing. So I abandoned and refuse to read more about uniqid() function. Fuck psychological knowledge.
I don't remember what site I have enter to read about uniqid(), may be php.net or w3c or SO ... but the entropy word lead me to guest that output of uniqid() function is chaos and random as random() string function.
4th, After another team member have solved this problem by using uniqid() function. I feel quite anxious.
When I tried to figure out situation, I (and I think many of you) have already in tired and stress....
5th, After tried to write this document and reproduce, re-run command that I used I figure out that one of my command may be wrong:
ls -ltc (from a dam guy) is reverse order with
ls -ltr
When I keep blink eye to use this output (ls -ltc) from fucking guy I have say that I believe this guy 1 time while Backend developer who create this problem keep pushing me since Customer may be discover problem...
=> And boom.
I continue to investigate this case study and make sure all information is tested, all command are tested carefully to have a valuable conclusion.
Since I was not in this project or rescue team, I can not or hard to access live server to have full verify ... The only left is some log files that I have been taken.
There are some other interesting thing to learn ab technical ie. microtime etc. And there some psychological to learn and think about, may be some JK idea be apply here.
Imagine the world without the order of time (arrow of time) or more precisely is the almost increase of entropy?
I think in this "chaos" there is an order, the almost always increase. If entropy or time is randomly increase or decrease then it's is more chaotic than ever before?
Bash
- Append or add prefix word to each line in a file.
- paste two text file line by line.
- File1 has ab 7k line, each line is filename of image. File2 is full ls list all image in ISO date format. about 200k line.
=> How to get full ls of 7k image from File2
Loop each line in file1. Grep each filename search in File2. Append result.
- Sort by hex value
- cut nth column in a file, n character in a word/string
Detail I will explain bellow.
Tools, Apps and notes.
- GitBash seem limit word, character length or line when paste to it. May be it is Windows buffer limit...
When I paste more than 1750 line to Git-Bash, it hang on and stop run command in next (1751th).
Here is one of command in one line:
grep WAigXu15253467215aeaf1a1801b7 list_ls_iso1.txt >> new_img_filtered
- Other command, tool is mostly Linux/UNIX tool on Git-Bash Windows. Some app I have to use Linux (Ubuntu on VM Box) ie. Perl. We can install python, perl ... on Windows but it seem strictly ie. set ENV variable ...
Sort by column
sort -k2 -n yourfile
Perl script to sort by HEX
https://www.systutorials.com/241274/how-to-sort-a-file-by-hexadecimal-numbers-on-linux-using-sort-command/
Bash - Take nth column in a text file
cat filename.txt | awk '{ print $2 $4 }'
or
awk '{ print $2 $4 }' filename.txt
Bash Read lines from a file then search a different file using those values
#!/bin/bash
while read name; do
grep "$name" list_to_grep.txt
done
run : ./match.sh < name.txt > Filtered.txt
This loop seem not work. Original here on SO: https://stackoverflow.com/questions/17954908/read-lines-from-a-file-then-search-a-different-file-using-those-values
I modified to more simple use. I just echo n (about 7k line) grep commend to a new file and then copy and run these command.
./match.sh
#!/bin/bash
while read name; do
echo "grep $name list_ls_iso1.txt >> new_img_grep "
done
Result look like tihs:
grep 2Ddn2j15253467225aeaf1a2867af list_ls_iso1.txt >> new_img_filtered
grep 3dohOw15253467225aeaf1a286f74 list_ls_iso1.txt >> new_img_filtered
grep 77srwb15253467225aeaf1a286932 list_ls_iso1.txt >> new_img_filtered
grep 7qHiFn15253467225aeaf1a2863ad list_ls_iso1.txt >> new_img_filtered
grep Bi5MVr15253467225aeaf1a286aca list_ls_iso1.txt >> new_img_filtered
grep dNHa6o15253467225aeaf1a286556 list_ls_iso1.txt >> new_img_filtered
grep E42vya15253467225aeaf1a2862e3 list_ls_iso1.txt >> new_img_filtered
grep g6kTwr15253467225aeaf1a286eac list_ls_iso1.txt >> new_img_filtered
grep JLrRAL15253467225aeaf1a286487 list_ls_iso1.txt >> new_img_filtered
Bash list file by timestamp in full iso format:
ls -l --time-style=full-iso
ls --full-time
ls -la --time-style=full-iso blah
Output:
-rw-r--r-- 1 www-data www-data 123680 2018-05-03 08:52:54.110420627 +0000 2Ddn2j15253467225aeaf1a2867af
-rw-r--r-- 1 www-data www-data 127518 2018-05-03 08:52:54.130420528 +0000 3dohOw15253467225aeaf1a286f74
-rw-r--r-- 1 www-data www-data 167100 2018-05-03 08:52:54.114420607 +0000 77srwb15253467225aeaf1a286932
-rw-r--r-- 1 www-data www-data 118122 2018-05-02 19:22:03.466981823 +0000 7qHiFn15253467225aeaf1a2863ad
-rw-r--r-- 1 www-data www-data 158697 2018-05-03 08:52:54.114420607 +0000 Bi5MVr15253467225aeaf1a286aca
-rw-r--r-- 1 www-data www-data 116740 2018-05-02 19:22:03.470981802 +0000 dNHa6o15253467225aeaf1a286556
-rw-r--r-- 1 www-data www-data 133141 2018-05-02 19:22:03.466981823 +0000 E42vya15253467225aeaf1a2862e3
-rw-r--r-- 1 www-data www-data 131789 2018-05-03 08:52:54.130420528 +0000 g6kTwr15253467225aeaf1a286eac
-rw-r--r-- 1 www-data www-data 104003 2018-05-02 19:22:03.466981823 +0000 JLrRAL15253467225aeaf1a286487
// TODO rewrite this doc for easier to understand.
RANDOM is a reserved word in bash, so if you mistake set RANDOM as variable then it will resulted in only integer number.
Bash loop to simulate create file with random name and index to test duplicate timestamp.
$ cat loop_create_file.sh
#!/bin/bash
for i in {1..111}; do
RANDOM_STR="$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 12 | head -n 1)"
# echo "${RANDOM_STR}"
echo "${RANDOM_STR}" > "file_${RANDOM_STR}_${i}.txt"
# touch "file_$RANDOM_$i"
done
the number of file 111 and HDD (hard drive), random function .. is variables that have to investigate in order to make/test duplicate timestamp.
When create file, timestamp seem to be have ordered event with a small different:
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:08.623305200 +0700 file_FLhZS8blzkXd_111.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:08.570516900 +0700 file_ZjuhlP7fcr47_110.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:08.512840300 +0700 file_pBvOAiolTK9I_109.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:08.455163300 +0700 file_9DX3uCSFVEtR_108.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:08.399442100 +0700 file_azv52wiUTS12_107.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:08.335899200 +0700 file_r3g6gmdluXEF_106.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:08.280178200 +0700 file_PmUyAYB0hVIP_105.txt
So Speed of machine, HDD could be a parameter that affect timestamp of new created file.
To test this we can use a VPS, AWS or Gcloud for testing.
File name have to random to avoid fallback on ls order.
The index is needed to easily track what file is created first in loop. If this part affect ls result we can write this index to file content and avoid filename order by name.
Bash loop for simulate mv command.
mv faster than create file ? Mostly it true because of ...
We need huge number of file 5k+ because of we have timescale in millisecond, to try duplicate timestamp in ms we need 1000 x number_of_file in order to make some chance to duplicate timestamp in ms.
Does mv change modified time ?
The ls full iso log data inspected that 7k new image name does not have same timestamp as UNIX time in file name.
UNIX time in file name suggested that there are only a range of 2-3 second when 7k image has been renamed :
15253467215 to 15253467225
15253467215 to 15253467225
It seem strange when timestamp not in sequence but in 10 seconds diff.
~/Work/priz/simulation1
$ date -d @15253467215
13 May 2453 01:13:35
~/Work/priz/simulation1
$ date -d @15253467225
13 May 2453 01:13:45
=> Does PHP have some batch or buffer limit/mechanism ?
As I have heard that the task that lead to error run about only 2 seconds.
If that is true than there are something that make this timestamp not match ?
The 2 seconds on moving file is reasonable.
So within 2 seconds to 3 seconds we have 7000 file name change/move. So in millisecond scale there are at lest 1000 file with same timestamp in millisecond scale.
Because of 3 seconds x 1000 ms = 6000 (max) unique millisecond timestamp.
But time ISO (and PHP uniqid()) has some strange value/show up that make it can determine what file is newer even it millisecond is same.
For example:
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:02.787023300 +0700 file_4WsRZvZEX8Py_10.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:02.731303300 +0700 file_7xn0KulxmE3b_9.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:02.673625400 +0700 file_0Baia6B1gDgF_8.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:02.620836700 +0700 file_lDRNg7pEOBy2_7.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:02.564137800 +0700 file_sGk2pkdDFvzY_6.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:02.507438200 +0700 file_VDvO0oHYp1wl_5.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:02.456604500 +0700 file_fwxwh5ZY3x36_4.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:02.403816500 +0700 file_aL98h5rxT3n6_3.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 17:46:02.349072200 +0700 file_G3NbEDu4uhZ8_2.txt
You can see that 02.787923... is much more scale than millisecond.
Bash for create more (I test only 1111 new file). For reasonable we should create n x 1000 file.
Just for test.
$ cat loop_create_file.sh
#!/bin/bash
for i in {1..1111}; do
RANDOM_STR="$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 12 | head -n 1)"
# echo "${RANDOM_STR}"
echo "${RANDOM_STR}" > "file_${RANDOM_STR}_${i}.txt"
# touch "file_$RANDOM_$i"
done
Same as above but more file.
And ls -lt or ls -la --time-style=full-iso not show order as expected:
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:47.537382200 +0700 file_xXRLkygRC3ln_319.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:22:13.004108400 +0700 file_xZ42Jh6Yz9jg_694.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:33.447486100 +0700 file_XZzhL50Ou3Ly_114.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:22:27.623019800 +0700 file_y0007iSylqIf_909.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:22:07.436842500 +0700 file_y46U6KRzFL43_612.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:29.959512100 +0700 file_Y4MtjAf3daYk_65.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:53.770374000 +0700 file_y98A1Ul3IBKa_415.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:22:39.916951200 +0700 file_YALgST9BC8lQ_1083.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:42.643615400 +0700 file_YatMCkjlZxl5_247.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:22:32.609609100 +0700 file_yBhNgsmrq3TF_983.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:22:34.257793400 +0700 file_YBUr6nr2y3zS_1003.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:22:15.843290800 +0700 file_ycaPJQ1fmgfs_734.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:52.992227600 +0700 file_Ye09jzkAWKZQ_404.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:22:16.912752900 +0700 file_YfTkdOsAK8Xf_751.txt
You can see that the order of file is not follow index order by loop to create file. So it seem that we can not use timestamp (millisecond) to sort file by it created/modified ? time ?
Oop! I forgot -t in ls -la, it should be ls -lt --time-style=full-iso
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:27.594768800 +0700 file_1w6tCC0zfCBG_27.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:27.537091500 +0700 file_wVsrwC588yql_26.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:27.482347900 +0700 file_HDs1CngB6j7R_25.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:27.422715600 +0700 file_blCXYGNaGXPH_24.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:26.824187000 +0700 file_iKxwfgYyBULB_15.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:26.766510400 +0700 file_oLeaJXs1WyuH_14.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:26.713722700 +0700 file_PfuXfiFJYm6X_13.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:26.658000400 +0700 file_Fs7tGXTASH76_12.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:26.600323500 +0700 file_KaMcCISYkGU6_11.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:26.547534800 +0700 file_XvhwolxityTy_10.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:26.477150700 +0700 file_gX3JxoovdolH_9.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:26.423383700 +0700 file_TID3U1R8CSgg_8.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:26.275165800 +0700 file_f1UoBwdxPz31_7.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:26.217486200 +0700 file_cROyfduE4QqA_6.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:26.159810200 +0700 file_x3v4ZfGMCX9H_5.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:26.104088300 +0700 file_kh2B4NfTc39Q_4.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:26.049345000 +0700 file_NHyF664fe1pt_3.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:25.991667700 +0700 file_vgc80znAn29e_2.txt
-rw-r--r-- 1 Admin 197121 13 2018-05-21 18:21:25.937903000 +0700 file_d7Q9VSimfh46_1.txt
Now we test move timestamp. Move seem run faster so we need n x 1000 file move to test this.
Or we can simple test with some about hundreds of files. If it done in same second then we can know that it can be duplicated timestamp.
Bash for moving file:
$ cat simulate_mv.sh
#!/bin/bash
for i in {10..99}; do
RANDOM_STR="$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 12 | head -n 1)"
# echo "${RANDOM}"
# mv "file_[0-9][0-9][0-9][0-9][0-9]_${i}.txt" "file_${RANDOM}_${i}.txt"
mv "file_"[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]"_${i}.txt" "file_${RANDOM_STR}_$i.txt"
done
Does mv change modified timestamp on files ?
Noooooooooo :)
But folder does :)
Result compared with previous (only filename change - random) timestamp preserved:
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_Nbc3K8hJ7f9G_31.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_UYUv7t1hHrev_30.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_POVac8cnNEiN_29.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_X5V8ibllF6yB_28.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_UkGcD798H6vw_27.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_T6cjrZfxIUrT_26.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_VJ0g77hOmYbV_25.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_caiEhbYEkalV_24.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_xn834EKhlwbW_23.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_otx27cFvmSF8_22.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_pbhGmExXrg4p_21.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_2h2weCdzckmG_20.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_8hko0iOu63De_19.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_e5OqiZt9V8kf_18.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_1xZ6RMjMeBt5_17.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_I5UgYRXwjJXZ_16.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_F6xTVRl7pBgn_15.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_JGWnUgkmwbCk_14.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_bt1ObtEroczk_13.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_gFBDCvoJezEN_12.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_ICJ16JcUlNBI_11.txt
-rw-r--r-- 1 Admin 197121 13 May 21 18:21 file_7aMA2B0Nntf1_10.txt
A site note about why there a ab 500 file has owner -rw-r--r-- 1 www-data www-data
instead of ubuntu ?
Who created this ? If we run php script (as reason why error occur) did it marked owner as www-data or ubuntu ?
If www-data then why there are (in 7000 new file) is owned by ubuntu ? What has been happened that change owner ?
Return to our main problem.
It seem that despite mv speed, the time stamp is still in order as it be follow mv loop bash.
The only concern left is may be speed of machine, HDD vs SSD. This can be tested on VPS or real SSD...
I will test this later.
Let assume that as timestamp full-iso so, we can sort file by timestamp in smaller than millisecond.
This is very important because of it lead to that WE CAN USE TIMESTAMP ISO to sort file even there are thousands of it moved in few seconds.
Now let test this.
The basic idea is that compare the list sorted by Linux (bash) ls --full-iso with proved one.
The proved one here is already done correctly by using PHP uniqid() hexa number.
When my friend move file, he use random_str(). timstamp() . uniqid() as filename.
timestamp is UNIX time and it scale only to second.
random_str is random and can not used to order and trace rollback...
only UNIQID() is ordered by timetamp. It is more than millisecond scale so we can use it.
Use can see that there are uniq UNIQID() data in the end of filename:
For example:
AjDecS15253467215aeaf1a11e679
bjRNzx15253467215aeaf1a11daf2
cmih0k15253467215aeaf1a11e4df
CN93DM15253467215aeaf1a11e594
CurWPt15253467215aeaf1a11e009
cxZjNm15253467215aeaf1a11e342
dRVe4V15253467215aeaf1a11e8fb
DTRrKE15253467215aeaf1a11e1a7
e4nND915253467215aeaf1a11e0da
hKTtW515253467215aeaf1a11e276
I3iWjt15253467215aeaf1a11e9cc
kkbbyt15253467215aeaf1a11e40f
KwYgCo15253467215aeaf1a11dbd0
nlZ9ad15253467215aeaf1a11df3c
tr79J115253467215aeaf1a11e754
Z5BjHr15253467215aeaf1a11dd9a
enTjG315253467215aeaf1a11d86d
F9lTkf15253467215aeaf1a11da0d
instead of ubuntu ?
Who created this ? If we run php script (as reason why error occur) did it marked owner as www-data or ubuntu ?
If www-data then why there are (in 7000 new file) is owned by ubuntu ? What has been happened that change owner ?
Return to our main problem.
It seem that despite mv speed, the time stamp is still in order as it be follow mv loop bash.
The only concern left is may be speed of machine, HDD vs SSD. This can be tested on VPS or real SSD...
I will test this later.
Let assume that as timestamp full-iso so, we can sort file by timestamp in smaller than millisecond.
This is very important because of it lead to that WE CAN USE TIMESTAMP ISO to sort file even there are thousands of it moved in few seconds.
Now let test this.
The basic idea is that compare the list sorted by Linux (bash) ls --full-iso with proved one.
The proved one here is already done correctly by using PHP uniqid() hexa number.
When my friend move file, he use random_str(). timstamp() . uniqid() as filename.
timestamp is UNIX time and it scale only to second.
random_str is random and can not used to order and trace rollback...
only UNIQID() is ordered by timetamp. It is more than millisecond scale so we can use it.
Use can see that there are uniq UNIQID() data in the end of filename:
For example:
AjDecS15253467215aeaf1a11e679
bjRNzx15253467215aeaf1a11daf2
cmih0k15253467215aeaf1a11e4df
CN93DM15253467215aeaf1a11e594
CurWPt15253467215aeaf1a11e009
cxZjNm15253467215aeaf1a11e342
dRVe4V15253467215aeaf1a11e8fb
DTRrKE15253467215aeaf1a11e1a7
e4nND915253467215aeaf1a11e0da
hKTtW515253467215aeaf1a11e276
I3iWjt15253467215aeaf1a11e9cc
kkbbyt15253467215aeaf1a11e40f
KwYgCo15253467215aeaf1a11dbd0
nlZ9ad15253467215aeaf1a11df3c
tr79J115253467215aeaf1a11e754
Z5BjHr15253467215aeaf1a11dd9a
enTjG315253467215aeaf1a11d86d
F9lTkf15253467215aeaf1a11da0d
12 character at the end of filename here is unique hexa number. We can use it to order the 7000 file and roll back mv command to it's original name.
Because of we already have list of original filename by SQL query.
When we run for loop to move these filename to new one in ordered loop. So now we can use this query result to get list original filename in order that for loop execute rename process.
So now we can use this proved (and has been applied to recover lost image data) to verify another way to archive this.
This way is using linux ls order by full-iso time.
What we have to do now is: 1. order file by hexa (appended) number to get ordered by time that file created (moved).
and 2). Order new file by linux full-iso. and finished it by compare two list.
Sort by hexa has been discussed above.
Order new file by full-iso.
List new file look like this:
-rw-r--r-- 1 www-data www-data 123680 2018-05-03 08:52:54.110420627 +0000 2Ddn2j15253467225aeaf1a2867af
-rw-r--r-- 1 www-data www-data 127518 2018-05-03 08:52:54.130420528 +0000 3dohOw15253467225aeaf1a286f74
-rw-r--r-- 1 www-data www-data 167100 2018-05-03 08:52:54.114420607 +0000 77srwb15253467225aeaf1a286932
-rw-r--r-- 1 www-data www-data 118122 2018-05-02 19:22:03.466981823 +0000 7qHiFn15253467225aeaf1a2863ad
-rw-r--r-- 1 www-data www-data 158697 2018-05-03 08:52:54.114420607 +0000 Bi5MVr15253467225aeaf1a286aca
-rw-r--r-- 1 www-data www-data 116740 2018-05-02 19:22:03.470981802 +0000 dNHa6o15253467225aeaf1a286556
-rw-r--r-- 1 www-data www-data 133141 2018-05-02 19:22:03.466981823 +0000 E42vya15253467225aeaf1a2862e3
-rw-r--r-- 1 www-data www-data 131789 2018-05-03 08:52:54.130420528 +0000 g6kTwr15253467225aeaf1a286eac
-rw-r--r-- 1 www-data www-data 104003 2018-05-02 19:22:03.466981823 +0000 JLrRAL15253467225aeaf1a286487
-rw-r--r-- 1 www-data www-data 75559 2018-05-03 08:52:54.114420607 +0000 KYOMSU15253467225aeaf1a2869ff
-rw-r--r-- 1 www-data www-data 142770 2018-05-03 10:43:27.411887315 +0000 lO2g8v15253467225aeaf1a28703c
Now we need sort this list by column $6 and $7. $6 is datetime, $7 is hour and substance.
First we sort file by it hexa number (contained in its filename):
Guide here https://unix.stackexchange.com/questions/139801/sort-by-hex-value
List:
AjDecS15253467215aeaf1a11e679
bjRNzx15253467215aeaf1a11daf2
cmih0k15253467215aeaf1a11e4df
...
Basically we can grab these 12 last character first to a new list. Then sort this list. And finally use grep or some bash to get ordered list by its last 12 character as hex number.
Or we can convert hex to decimal first then sort it.
Grab last 12 hex number:
Get last n character in line https://stackoverflow.com/questions/19858600/accessing-last-x-characters-of-a-string-in-bash
$ cat grab_last_hex.sh
#!/bin/bash
while read name; do
echo "${name:(-12)}"
# echo "grep $name list_ls_iso1.txt >> new_img_grep "
done
You can know that uniqid() function not simple millisecond value, I see that it something multiple it by thousand or shift bit etc. So it output hex value may be huge. You can see that 12 hex number is quite big and over range of integer (big int ?)Because of we already have list of original filename by SQL query.
When we run for loop to move these filename to new one in ordered loop. So now we can use this query result to get list original filename in order that for loop execute rename process.
So now we can use this proved (and has been applied to recover lost image data) to verify another way to archive this.
This way is using linux ls order by full-iso time.
What we have to do now is: 1. order file by hexa (appended) number to get ordered by time that file created (moved).
and 2). Order new file by linux full-iso. and finished it by compare two list.
Sort by hexa has been discussed above.
Order new file by full-iso.
List new file look like this:
-rw-r--r-- 1 www-data www-data 123680 2018-05-03 08:52:54.110420627 +0000 2Ddn2j15253467225aeaf1a2867af
-rw-r--r-- 1 www-data www-data 127518 2018-05-03 08:52:54.130420528 +0000 3dohOw15253467225aeaf1a286f74
-rw-r--r-- 1 www-data www-data 167100 2018-05-03 08:52:54.114420607 +0000 77srwb15253467225aeaf1a286932
-rw-r--r-- 1 www-data www-data 118122 2018-05-02 19:22:03.466981823 +0000 7qHiFn15253467225aeaf1a2863ad
-rw-r--r-- 1 www-data www-data 158697 2018-05-03 08:52:54.114420607 +0000 Bi5MVr15253467225aeaf1a286aca
-rw-r--r-- 1 www-data www-data 116740 2018-05-02 19:22:03.470981802 +0000 dNHa6o15253467225aeaf1a286556
-rw-r--r-- 1 www-data www-data 133141 2018-05-02 19:22:03.466981823 +0000 E42vya15253467225aeaf1a2862e3
-rw-r--r-- 1 www-data www-data 131789 2018-05-03 08:52:54.130420528 +0000 g6kTwr15253467225aeaf1a286eac
-rw-r--r-- 1 www-data www-data 104003 2018-05-02 19:22:03.466981823 +0000 JLrRAL15253467225aeaf1a286487
-rw-r--r-- 1 www-data www-data 75559 2018-05-03 08:52:54.114420607 +0000 KYOMSU15253467225aeaf1a2869ff
-rw-r--r-- 1 www-data www-data 142770 2018-05-03 10:43:27.411887315 +0000 lO2g8v15253467225aeaf1a28703c
Now we need sort this list by column $6 and $7. $6 is datetime, $7 is hour and substance.
First we sort file by it hexa number (contained in its filename):
Guide here https://unix.stackexchange.com/questions/139801/sort-by-hex-value
List:
AjDecS15253467215aeaf1a11e679
bjRNzx15253467215aeaf1a11daf2
cmih0k15253467215aeaf1a11e4df
...
Basically we can grab these 12 last character first to a new list. Then sort this list. And finally use grep or some bash to get ordered list by its last 12 character as hex number.
Or we can convert hex to decimal first then sort it.
Grab last 12 hex number:
Get last n character in line https://stackoverflow.com/questions/19858600/accessing-last-x-characters-of-a-string-in-bash
$ cat grab_last_hex.sh
#!/bin/bash
while read name; do
echo "${name:(-12)}"
# echo "grep $name list_ls_iso1.txt >> new_img_grep "
done
$ ./grab_last_hex.sh < new_img.one > new_img.hex
Now we get list unique hex, we need sort it by value or convert it to decimal first then sort it.
eaf1a2867af
eaf1a286f74
eaf1a286932
eaf1a2863ad
eaf1a286aca
eaf1a286556
eaf1a2862e3
eaf1a286eac
eaf1a286487
To get decimal from hex https://stackoverflow.com/questions/13280131/hexadecimal-to-decimal-in-shell-script
Bash for extract decimal from hex:
Admin@Minkho MINGW64 ~/Work/priz/simulation1
$ cat hex_to_decimal.sh
#!/bin/bash
while read name; do
echo "$((16${name}))"
done
$ ./hex_to_decimal.sh < new_img.hex
./hex_to_decimal.sh: line 4: 16eaf1a2867af: value too great for base (error token is "16eaf1a2867af")
So we need a clever way to convert hex to decimal or simple sort it. Because of we only want sorted list not really want decimal number.
First 5 character of hex value is same. So we only care about 7 character (of 12) left.
Admin@Minkho MINGW64 ~/Work/priz/simulation1
$ grn "eaf1a" new_img.one |wc -l
6936
Admin@Minkho MINGW64 ~/Work/priz/simulation1
$ wc -l new_img.one
6935 new_img.one
'grn' is my alias = grep -rn
So we should only cut 7 of 12 hex number value in original filename one.
Use previous script to cut 12 last character in line. Simple change 12 to 7 to get
2867af
286f74
286932
2863ad
286aca
286556
2862e3
286eac
286487
2869ff
28703c
2866e6
286c4e
Now try to convert these to decimal:
$ ./hex_to_decimal.sh < new_img.hex7
./hex_to_decimal.sh: line 4: 162867af: value too great for base (error token is "162867af")
WTF ? same error
May be 16 prefix as hex mark has problem. Let try bash direct first:
Admin@Minkho MINGW64 ~/Work/priz/simulation1
$ echo $((16#2867af))
2647983
It work, so 16 prefix is culprit
Try new way convert:
$ cat hex_to_decimal.sh
#!/bin/bash
while read name; do
#echo "$((16${name}))"
echo "ibase=16; ${name}" | bc
done
./hex_to_decimal.sh: line 5: bc: command not found
No bc on Git-Bash so try on Ubuntu:
(standard_in) 1: syntax error
:)
Tired of try and search. I use simple way - paste n command to get result:
Prepend word to each line of file. There are many way to do this https://serverfault.com/questions/72744/command-to-prepend-string-to-each-line
Prepend file
sed 's/^/echo \$((16#/' file
and end with append '))' to list file.
echo $((16#2867af))
echo $((16#286f74))
echo $((16#286932))
echo $((16#2863ad))
echo $((16#286aca))
echo $((16#286556))
echo $((16#2862e3))
echo $((16#286eac))
echo $((16#286487))
echo $((16#2869ff))
Then run, remember fucking Windows git-bash limit ab 2000 command paste. Let use linux one.
First, add redirect to result
echo $((16#2867af)) >> new_img.sorted_hex
echo $((16#286f74)) >> new_img.sorted_hex
echo $((16#286932)) >> new_img.sorted_hex
echo $((16#2863ad)) >> new_img.sorted_hex
echo $((16#286aca)) >> new_img.sorted_hex
echo $((16#286556)) >> new_img.sorted_hex
echo $((16#2862e3)) >> new_img.sorted_hex
echo $((16#286eac)) >> new_img.sorted_hex
echo $((16#286487)) >> new_img.sorted_hex
Get date in nano second
date +%H:%M:%S:%N
Oh, so previous date time value is in nano second 1000 000 time by millisecond.
It explain why ls can sort it file very precisely in super small time different.
Now we get decimal (integer) of filename (from hex)
sort it by number
sort -n new_img.sorted_dec > new_img.sorted_dec2
...
2648167
2648370
2648575
2648778
2648978
2649166
2649372
2649573
2649772
2649972
2650172
Result seem ok, unique and ordered.
Now get list ordered full filename. Fuck it like problem is same.
Get back to order list by hex value.
Simple sort by word :((
while read name; do echo "${name:(-7)}"; done < new_img.one |sort -u $1
Dam it! mv do not change modified time, so it seem no way for our to use it as order to restore data.
But I continue to try in case we have this information ie. copy command has been executed instead of mv.
Now we have a list ordered file by it change time (in it's file name hex order)
We need second list by ls iso time. We have to sort it.
Unfortunately I do not have this list in ordered order. Only list by name (?). So I have to manually order it.
-rw-r--r-- 1 www-data www-data 123680 2018-05-03 08:52:54.110420627 +0000 2Ddn2j15253467225aeaf1a2867af
-rw-r--r-- 1 www-data www-data 127518 2018-05-03 08:52:54.130420528 +0000 3dohOw15253467225aeaf1a286f74
-rw-r--r-- 1 www-data www-data 167100 2018-05-03 08:52:54.114420607 +0000 77srwb15253467225aeaf1a286932
-rw-r--r-- 1 www-data www-data 118122 2018-05-02 19:22:03.466981823 +0000 7qHiFn15253467225aeaf1a2863ad
-rw-r--r-- 1 www-data www-data 158697 2018-05-03 08:52:54.114420607 +0000 Bi5MVr15253467225aeaf1a286aca
-rw-r--r-- 1 www-data www-data 116740 2018-05-02 19:22:03.470981802 +0000 dNHa6o15253467225aeaf1a286556
-rw-r--r-- 1 www-data www-data 133141 2018-05-02 19:22:03.466981823 +0000 E42vya15253467225aeaf1a2862e3
-rw-r--r-- 1 www-data www-data 131789 2018-05-03 08:52:54.130420528 +0000 g6kTwr15253467225aeaf1a286eac
-rw-r--r-- 1 www-data www-data 104003 2018-05-02 19:22:03.466981823 +0000 JLrRAL15253467225aeaf1a286487
I will come back to this latter.
TODO: edit and combine bash for shorter and clever version.
Comments
Post a Comment