AAA Access
Reading files via proxy
Files can be read directly from a WLCG site using a proxy. In this case, further configuration is needed. If you want to read them from the JupyterLab instance, you only have to set these two environmental variables:
export X509_USER_PROXY=<path_to_your_proxyfile>
export X509_CERT_DIR=/cvmfs/grid.cern.ch/etc/grid-security/certificates/
Instead, if you want to use distributed Daskz, you need to properly configure each Dask worker: - First of all, you need your proxyfile to be trasferred to each worker. This can be done via a Dask Plugin:
from dask.distributed import Client
from distributed.diagnostics.plugin import UploadFile
client = Client(address="tcp://127.0.0.1:"+str(sched_port))
client.register_worker_plugin(UploadFile("<path_to_your_proxyfile>"))
def set_proxy(dask_worker):
import os
import shutil
working_dir = dask_worker.local_directory
proxy_name = 'proxy'
os.environ['X509_USER_PROXY'] = working_dir + '/' + proxy_name
os.environ['X509_CERT_DIR']="/cvmfs/grid.cern.ch/etc/grid-security/certificates/"
return os.environ.get("X509_USER_PROXY"), os.environ.get("X509_CERT_DIR")
client.run(set_proxy)
Moving files via Rucio
You can also replicate data on the grid into a desired site, as long as you have the necessary permissions and available storage. This can be done via Rucio:
from rucio.client.client import Client
import os
os.environ["RUCIO_HOME"] = "/cvmfs/cms.cern.ch/rucio/current/"
os.environ['X509_CERT_DIR'] = "/cvmfs/grid.cern.ch/etc/grid-security/certificates/"
os.environ['X509_USER_PROXY'] = "<path_to_your_proxyfile>"
RUCIO_SCOPE = "cms"
ACCOUNT = "<username>" #e.g. "ttedesch"
SITENAME = "<site>" #e.g. "T2_IT_Legnaro"
c = Client(account=ACCOUNT, auth_type="x509_proxy")
CMS_DATASET_NAME = "<dataset_name>" #e.g. "/ZZTo4L_M-1toInf_TuneCP5_13TeV_powheg_pythia8/RunIISummer20UL17NanoAODv9-106X_mc2017_realistic_v9-v1/NANOAODSIM"
To get file replicas at a desired site:
files = c.list_files(scope=RUCIO_SCOPE, name=CMS_DATASET_NAME)
lfns = []
for f in files:
lfns.append(RUCIO_SCOPE + ':' + f['name'])
pfns_dicts = c.lfns2pfns(SITENAME, lfns, scheme='davs', operation='read')
pfns = []
for _,i in pfns_dicts.items():
pfns.append(i)
print(len(pfns))
To replicate a dataset in a desired site:
did = {'scope': RUCIO_SCOPE,
'name': CMS_DATASET_NAME,
'type': 'DATASET'}
c.add_replication_rule([did], 1, SITENAME, lifetime=15778800)
blocks_gen = c.list_content(scope=RUCIO_SCOPE, name=CMS_DATASET_NAME)
blocks = []
for bl in blocks_gen:
blocks.append(bl)
c.add_replication_rule(blocks, 1, SITENAME, lifetime=15778800)
rucio list-rules cms:<dataset_name> #e.g. cms:/VBS_SSWW_TT_polarization_TuneCP5_13TeV-madgraph-pythia8/RunIISummer20UL17NanoAODv9-106X_mc2017_realistic_v9-v1/NANOAODSIM