-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PBSControllerLauncher: Unable to connect_client_sync() #619
Comments
The program hangs in the while loop at ipyparallel/ipyparallel/cluster/launcher.py Line 339 in 527d7b2
PROFILE/security are named ipcontroller-client.json and ipcontroller-engine.json whereas the program expects the filenames to include cluster_id , e.g. ipcontroller-1635416202-ou8n-client.json , ipcontroller-1635416202-ou8n-engine.json . Is this also fixed in #606?
|
This is why I need to get CI tests for all the non-slurm batch launchers (#604)! I do believe the issue is fixed in dev, but those custom templates will still reintroduce the problem. If you add I believe these templates will work: c.PBSControllerLauncher.batch_template = '''
#PBS -N ipcontroller
#PBS -V
#PBS -j oe
#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1
cd $PBS_O_WORKDIR
conda activate ipp7
{program_and_args}
'''
c.PBSEngineSetLauncher.batch_template = '''
#PBS -N ipengine
#PBS -j oe
#PBS -V
#PBS -l walltime=01:00:00
#PBS -l nodes={n//4}:ppn=4
cd $PBS_O_WORKDIR
conda activate ipp7
module load intel
mpiexec -n {n} {program_and_args}
''' The next release uses environment variables to pass things like the cluster id, which means you must add |
Thank you for the quick reply! The general method works. 👍 Is it possible to instantiate a Pseudocode: controller_template='''
#PBS -N ipcontroller
#PBS -j oe
#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1
##PBS -q {queue}
cd $PBS_O_WORKDIR
conda activate ipp7
{program_and_args}
'''
engine_template = '''
#PBS -N ipengine
#PBS -j oe
#PBS -l walltime=01:00:00
#PBS -l nodes={n//4}:ppn=4
##PBS -q {queue}
cd $PBS_O_WORKDIR
conda activate ipp7
module load intel
mpiexec -n {n} {program_and_args}
'''
cluster=ipp.Cluster(
n=4,
controller_ip='*',
profile='pbs-2021-10-28',
extra_options={
'c.PBSControllerLauncher.batch_template':controller_template,
'c.PBSEngineSetLauncher.batch_template':engine_template
}) |
Yes! You populate the cluster=ipp.Cluster(
n=4,
controller_ip='*',
profile='pbs-2021-10-28',
)
# this is the same config object you would configure in ipcluster_config.py
# you don't have to call it `c`, but if you do, the rest will look familiar
c = cluster.config
c.PBSControllerLauncher.batch_template = controller_template
c.PBSEngineSetLauncher.batch_template = engine_template
await cluster.start_cluster() |
Fantastic! Thank you! |
Adding lots of examples to my documentation todo list... |
I'll make an 8.0 beta tomorrow. It would be great if you could test it out! |
Okay, great, I will test it. |
I've got another question and potential point for the documentation todo list: How do you configure the controller dynamically in Python / Jupyter Notebook (replacement for |
That can be Cluster(controller_ip="*")
The ambiguity is because there are really two things you are configuring:
Some common options for configuring the controller itself can be done on the Cluster, but for the most part ipcontroller is configured directly through either |
I just published |
Okay, I'm currently in the middle of something but I will give it a try this afternoon/evening. |
No rush! I've only got a few more minutes of work before the weekend. I'll probably aim to do a release around the end of next week. |
I've installed the new beta version import time
start = time.time()
cluster.start_cluster_sync()
end = time.time()
print(end - start)
|
That makes sense. It's the new cluster = Cluster(send_engines_connection_env=False, engines='pbs', controller='pbs', cnotroller_ip='*') then the engine and controller jobs should both be submitted immediately. |
@lukas-koschmieder I just published 8.0.0rc1. Can you test and then close here if you think everything is resolved? |
I have been using ipyparallel 6 for a while and would like to migrate to ipyparallel 7 mainly due to fact that the new Cluster API enables you to manage the entire process through a Jupyter Notebook. Unfortunatelly, I am having difficulties to connect a client to my cluster.
I have created a new IPython profile adding a custom
ipcluster_config.py
, which is a modified version of my existing/working config for ipp 6 (see below).I can successfully start a cluster spawning two PBS jobs (controller and engine).
But if I run the following line, the notebook will only show that the kernel is busy and it will never actually finish.
Am I using the API incorrectly? What might be the problem?
ipcluster_config.py
The text was updated successfully, but these errors were encountered: