Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions inference/BatchBase
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ time $DirTransfer \
echo "`date +%s.%N` #mpiexec"
Exec $FJSVXTCLANGA/bin/mpiexec -np ${NumProc} \
--mca orte_abort_print_stack 1 \
--mca common_tofu_use_memory_pool 1 \
--of-proc ${LogDir}/output/%/1000r/out \
-mca plm_ple_cpu_affinity 0 \
-x ParameterFile="$ParamFile" \
Expand Down
28 changes: 10 additions & 18 deletions inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,31 +29,26 @@

### マルチノード版

1. `filter_fasta.sh`を用いて入力のfastaファイルから処理するシーケンスをフィルタする
- `./filter_fasta.sh $InputFasta $OutputFasta $AlignmentDir [$PredictedDir [$NGList]]`
- `InputFasta`: 入力のfastaファイルのパス
- `OutputFasta`: 出力されるfastaファイルのパス
- `AlignmentDir`: 前処理で出力されたalignmentディレクトリのパス
- [任意] `$PredictedDir`: 2回目以降のジョブで、推論結果が含まれるディレクトリのパス
- [任意] `$NGList`: 2回目以降のジョブで、失敗したシーケンス名のリストのパス

1. `inference/parameters_multi`の以下の必須項目を設定する
- `MMCIFCache`: 事前準備で作成したmmcifキャッシュのパス
- `InputFasta`: (フィルタ済みの) 入力シーケンスのfastaファイルのパス
- `AlignmentDir`: 前処理で出力されたalignmentディレクトリのパス
- `OutputDir`: 出力ディレクトリのパス。`$LOGDIR`とした場合はログディレクトリとなる。
- `AlignmentLogDir`: 前処理で出力されたログディレクトリのパス
- `OutputDir`: 出力ディレクトリのパス。`$LOGDIR`とした場合はジョブ毎に生成されるログディレクトリとなる。
- `Timeout`: 入力シーケンスごとのタイムアウト時間 [秒]

1. 必要があれば`inference/parameters_multi`の以下の項目を変更する
2. 必要があれば`inference/parameters_multi`の以下の項目を変更する
- `--jax_param_path`: 使用するAlphaFold2パラメータ
- `max_template_date`: 指定した日付以前のタンパク質構造をテンプレートとして使用する
- `--ignore_timeout_chain_history`: 指定したjobid以前のジョブでタイムアウトで失敗した履歴を無視し、再実行する
- `--ignore_failed_chain_history`: 指定したjobid以前のジョブでメモリ不足などで失敗した履歴を無視し、再実行する

1. ノード数と制限時間を決める
3. ノード数と制限時間を決める
- ノード数: 入力シーケンス数以下の数
- 制限時間: 任意の時間。`estimate_time.awk`を用いて処理時間を推定し、(ノード数)×(制限時間)がおよそ推定処理時間となるように設定してもよい。
- `./estimate_time.awk $InputFasta`

1. `Submit_inference_multi`により推論のジョブを投入する
4. `Submit_inference_multi`により推論のジョブを投入する
- `./Submit_inference_multi {$NumNodes|$NodeShape} $TimeLimit`
- `NumNodes`: ノード数
- `NodeShape`: 3次元ノード形状 (XxYxZ)
Expand All @@ -62,11 +57,8 @@
- 16ノードで1時間の時間制限で実行する場合: `./Submit_inference_multi 16 1:00:00`
- 2x3x8のノード形状で6時間の時間制限で実行する場合: `./Submit_inference_multi 2x3x8 6:00:00`

1. 実行結果は`log/ノード数/Submit_inference_multi.*`に出力される
5. 実行ログは`log/ノード数/Submit_inference_multi.*`に出力されるので、必要に応じて確認する
- 各プロセスの出力は`output/0/out.1.*`に書き出される
- シーケンス毎の処理結果は`$OutputDir/result/*.csv`に出力される

1. 未処理のシーケンスがある場合は再度ジョブを実行する
- 実行に失敗したシーケンスのリスト (NGリスト) を作成
- `find_ng.sh $LogDir > $NGList`により正常に推論できなかったシーケンス名のリストを生成する
- 再度fastaファイルをフィルタし、`inference/parameters_multi`の`InputFasta`を変更
- ジョブを投入
6. 未処理のシーケンスがある場合は`4.ジョブの投入`を再度行う
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
short_0
short_1
short_2
short_3

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
short_0
short_1
short_2
short_3
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
short_0
short_2
short_1
short_3
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
short_0,0
short_1,0
short_2,1
short_3,1
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
short_0
short_1
short_2
short_3

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
short_0
short_1
short_2
short_3
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
short_0
short_2
short_1
short_3
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
short_0,0
short_1,0
short_2,1
short_3,1
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
short_0
short_1
short_2
short_3

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
short_0
short_1
short_2
short_3
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
short_0
short_2
short_1
short_3
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
short_0,0
short_1,0
short_2,1
short_3,1
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
short_0
short_1
short_2
short_3

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
short_0
short_1
short_2
short_3
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
short_0
short_2
short_1
short_3
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
short_0,0
short_1,0
short_2,1
short_3,1
1 change: 1 addition & 0 deletions inference/example/alignment_subdir/0/short_0
1 change: 1 addition & 0 deletions inference/example/alignment_subdir/0/short_1
1 change: 1 addition & 0 deletions inference/example/alignment_subdir/1/short_2
1 change: 1 addition & 0 deletions inference/example/alignment_subdir/1/short_3
8 changes: 8 additions & 0 deletions inference/example/input/short_4.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
>short_0
MAAHKGAEHHHKAAEHHEQAAKHHHAAAEHHEKGEHEQAAHHADTAYAHHKHAEEHAAQAAKHDAEHHAPKPH
>short_1
MAAHKGAEHHHKAAEHHEQAAKHHHAAAEHHEKGEHEQAAHHADTAYAHHKHAEEHAAQAAKHDAEHHAPKPH
>short_2
MAAHKGAEHHHKAAEHHEQAAKHHHAAAEHHEKGEHEQAAHHADTAYAHHKHAEEHAAQAAKHDAEHHAPKPH
>short_3
MAAHKGAEHHHKAAEHHEQAAKHHHAAAEHHEKGEHEQAAHHADTAYAHHKHAEEHAAQAAKHDAEHHAPKPH
11 changes: 7 additions & 4 deletions inference/parameters_multi
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,12 @@
# See the License for the specific language governing permissions and
# limitations under the License.

source $OPENFOLDDIR/scripts/setenv

MMCIFCache=example/mmcif_cache.json
InputFasta=example/input/short.fasta
AlignmentDir=example/alignment
InputFasta=example/input/short_4.fasta
AlignmentDir=example/alignment_subdir
AlignmentLogDir=example/alignment_log_subdir
OutputDir=$LOGDIR
Timeout=3600

Expand All @@ -35,10 +38,10 @@ PARAMS=(
--max_template_date 2021-10-10
--release_dates_path $MMCIFCache
--timeout $Timeout
--max_memory $OPENFOLD_MAX_MEM
--alignment_log_dir ${AlignmentLogDir}
)

source $OPENFOLDDIR/scripts/setenv

# for Torch Extensions
export TORCH_EXTENSIONS_DIR=$TMPDIR

Expand Down
124 changes: 124 additions & 0 deletions inference/status.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
#!/usr/bin/env python3

import os
import re
import argparse
from datetime import datetime
from typing import List, Dict, Any

pat = re.compile('job_([0-9]+).csv')

try:
import pandas as pd
except:
print("Error: Importing the pandas package failed. Make sure that pandas is alredy installed, or just run `pip install pandas`.")
exit(1)


def get_chains(csv_file):
if not os.path.isfile(csv_file):
return None

with open(csv_file) as f:
chains = f.read().strip().split("\n")

return set(filter(None, chains))

def get_log_info(path: str) -> pd.DataFrame:

job_id = os.path.basename(path)

# set directory's update time
last_update = datetime.fromtimestamp(os.path.getmtime(path))

complete_path = os.path.join(path, 'before_complete.csv')
incomplete_path = os.path.join(path, 'before_incomplete.csv')
noalign_path = os.path.join(path, 'before_noalign.csv')
skip_path = os.path.join(path, 'before_skip.csv')
processed_path = os.path.join(path, 'processed.csv')

complete_chains = get_chains(complete_path)
incomplete_chains = get_chains(incomplete_path)
noalign_chains = get_chains(noalign_path)
skip_chains = get_chains(skip_path)

if (complete_chains is None) or \
(incomplete_chains is None) or \
(noalign_chains is None) or \
(skip_chains is None):
data = {'Job ID' : job_id,
'Last update' : last_update,
'#Compl.(b)' : len(complete_chains) if complete_chains else None,
'#Incompl.(b)': len(incomplete_chains) if incomplete_chains else None,
'#NoAlign.' : len(noalign_chains) if noalign_chains else None,
'#Skip' : len(skip_chains) if noalign_chains else None,
'#Success' : None,
'#Failure' : None}
return pd.DataFrame([data])

statuses = ['OK', 'NG_timeout', 'NG_unknown', 'NG_noalignment']
if os.path.isfile(processed_path):
last_update = datetime.fromtimestamp(os.path.getmtime(processed_path))

df = pd.read_csv(processed_path,
names=['chain', 'seq_len', 'status', 'time_all', 'time_infer', 'time_relax'],
usecols=['chain', 'seq_len', 'status'],
dtype = {'chain':'str', 'seq_len':'int32', 'status':'str'})

status_count = { st: (df['status'] == st).sum() for st in statuses}
else:
status_count = { st: 0 for st in statuses}

n_compl = len(complete_chains)
n_incompl = len(incomplete_chains)
data = {'Job ID' : job_id,
'Last update' : last_update,
'#Compl.(b)' : n_compl,
'#Incompl.(b)': n_incompl,
'#NoAlign.' : len(noalign_chains) + status_count['NG_noalignment'],
'#Skip' : len(skip_chains),
'#Success' : status_count['OK'],
'#Failure' : status_count['NG_timeout'] + status_count['NG_unknown'],
}

return pd.DataFrame([data])



if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--root-dir",
type=str,
default="log",
help="Path to the root log directory",
)

args = parser.parse_args()

log_dir = os.path.join(args.root_dir, 'result')

if not os.path.isdir(log_dir):
raise Exception(f'There is no result directory in specified root directory. Please check --root-dir option.')

job_dirs = [x for x in os.listdir(log_dir) if os.path.isdir(os.path.join(log_dir, x))]

df = None
for job_dir in job_dirs:
ret = get_log_info(os.path.join(log_dir, job_dir))
if ret is None:
continue

if df is None:
df = ret
else:
df = pd.concat([df, ret])

if df is not None:
df = df.sort_values('Job ID')
df['Progress[%]'] = ((df['#Compl.(b)'] + df['#Success']) * 100.0 / (df['#Compl.(b)'] + df['#Incompl.(b)']))
df['Progress[%]'] = df['Progress[%]'].astype('float').round(1)
df = df.fillna('-')
print(df.to_string(index=False))
else:
print("No data!")
10 changes: 6 additions & 4 deletions inference/worker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ LogDir=${LOGDIR}

. "$ParameterFile"

export LD_PRELOAD=/usr/lib/FJSVtcs/ple/lib64/libpmix.so:$LD_PRELOAD

ulimit -s 16384
ulimit -c 0

Expand All @@ -50,12 +52,12 @@ if [ $RANK -eq "0" ]; then
fi

# strace -ff -e trace=open,openat -o ${LogDir}/strace.${PMIX_RANK}
time -p numactl --cpunodebind 4-7 --membind 4-7 \
numactl --cpunodebind 4-7 --membind 4-7 \
"${PARAMS[@]}"

if [ $RANK -eq "0" ]; then
#kill -9 $PID_VMSTAT
:
if [ $? -ne 0 ]; then
echo "Program terminated abnormally"
exit 1
fi

unset LD_PRELOAD
Expand Down
Loading